Bilingual Summarization of English and Arabic Genetic Diseases Texts
Keywords:NLP, medical texts, RNN
Health Literacy aims at empowering patients to take better decisions about their health. The quality of Health Literacy for patients with genetic diseases can be enriched via facilitating the bilingual retrieval of summaries about the genetic diseases texts from the net. This paper proposes helps translator to achieve this task by utilizing NLP and Recurrent Neural Network (RNN) techniques for two tasks: generating abstractive summarizations and making Arabic-English translation. Both summarization and translation tasks require training sets that can be built from English summaries corpus and Arabic-English parallel corpora. The English summaries corpus is built from Orphadata while the parallel corpora is built from Wiki articles. The corpus is utilized for generating the English summaries from the Wiki articles, and the corpora is utilized for translating these summaries into Arabic. This paper defines the research problem. Then, it investigates a set of objectives to solve the problem. After that, it presents a literature review of the tasks in the objectives. Finally, it discusses the proposed solution for the problem from the following aspects: the required corpora, the system architecture, the RNN memory cell components architectures, the proposed software for the implementation, and the system evaluation.
Meilleur, K. G. & Littleton-Kearney, M. T. Interventions to improve patient education regarding multifactorial genetic conditions: a systematic review. American Journal of Medical Genetics Part A 149, 819–830 (2009). DOI: https://doi.org/10.1002/ajmg.a.32723
Howerton, D. A. Medical Information on the Internet. Journal of Pastoral Care & Counseling 73, 52–54 (2019). DOI: https://doi.org/10.1177/1542305019833319
Bustard, D. & Liu, W. Soft-Ware 2002: Computing in an Imperfect World: First International Conference, Soft-Ware 2002 Belfast, Northern Ireland, April 8-10, 2002 Proceedings (Springer, 2003). DOI: https://doi.org/10.1007/3-540-46019-5
Silla, C. N., Pappa, G. L., Freitas, A. A. & Kaestner, C. A. Automatic text summarization with genetic algorithm-based attribute selection in Ibero-American Conference on Artificial Intelligence (2004), 305–314. DOI: https://doi.org/10.1007/978-3-540-30498-2_31
Jaykumar, N. ResQu: A Framework for Automatic Evaluation of Knowledge-Driven Automatic Summarization PhD thesis (Wright State University, 2016).
Mantas, J., Hasman, A. & Househ, M. S. Enabling Health Informatics Applications (IOS Press, 2015).
Afantenos, S., Karkaletsis, V. & Stamatopoulos, P. Summarization from medical documents: a survey. Artificial intelligence in medicine 33, 157–177 (2005). DOI: https://doi.org/10.1016/j.artmed.2004.07.017
Moratanch, N. & Chitrakala, S. A survey on abstractive text summarization in 2016 International Conference on Circuit, power and computing technologies (ICCPCT) (2016), 1–7. DOI: https://doi.org/10.1109/ICCPCT.2016.7530193
Mishra, R. et al. Text summarization in the biomedical domain: a systematic review of recent research. Journal of biomedical informatics 52, 457–467 (2014). DOI: https://doi.org/10.1016/j.jbi.2014.06.009
Moawad, I. F. & Aref, M. Semantic graph reduction approach for abstractive Text Summarization in 2012 Seventh International Conference on Computer Engineering & Systems (ICCES) (2012), 132–138. DOI: https://doi.org/10.1109/ICCES.2012.6408498
Le, H. T. & Le, T. M. An approach to abstractive text summarization in 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR) (2013), 371–376. DOI: https://doi.org/10.1109/SOCPAR.2013.7054161
Zhang, H., Fiszman, M., Shin, D., Wilkowski, B. & Rindflesch, T. C. Clustering cliques for graph-based summarization of the biomedical research literature. BMC bioinformatics 14, 182 (2013). DOI: https://doi.org/10.1186/1471-2105-14-182
Bhargava, R., Sharma, Y. & Sharma, G. Atssi: Abstractive text summarization using sentiment infusion. Procedia Computer Science 89, 404–411 (2016). DOI: https://doi.org/10.1016/j.procs.2016.06.088
Khan, A. et al. Abstractive text summarization based on improved semantic graph approach. International Journal of Parallel Programming 46, 992–1016 (2018). DOI: https://doi.org/10.1007/s10766-018-0560-3
Kishore, K., Gopal, G. N. & Neethu, P. Document Summarization in Malayalam with sentence framing in 2016 International Conference on Information Science (ICIS) (2016), 194–200. DOI: https://doi.org/10.1109/INFOSCI.2016.7845326
Azadani, M. N., Ghadiri, N. & Davoodijam, E. Graph-based biomedical text summarization: An itemset mining and sentence clustering approach. Journal of biomedical informatics 84, 42–58 (2018). DOI: https://doi.org/10.1016/j.jbi.2018.06.005
Gigioli, P., Sagar, N., Rao, A. & Voyles, J. Domain-Aware Abstractive Text Summarization for Medical Documents in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2018), 2338–2343. DOI: https://doi.org/10.1109/BIBM.2018.8621457
Yao, K. et al. Dual encoding for abstractive text summarization. IEEE transactions on cybernetics (2018).
Jose, J. M. et al. Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II (Springer Nature, 2020). DOI: https://doi.org/10.1007/978-3-030-45442-5
Song, S., Huang, H. & Ruan, T. Abstractive text summarization using LSTM-CNN based deep learning. Multimedia Tools and Applications 78, 857–875 (2019). DOI: https://doi.org/10.1007/s11042-018-5749-3
Iwasaki, Y., Yamashita, A., Konno, Y. & Matsubayashi, K. Japanese abstractive text summarization using BERT in 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI) (2019), 1–5. DOI: https://doi.org/10.1109/TAAI48200.2019.8959920
Sotudeh, S., Goharian, N. & Filice, R. W. Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization. arXiv preprint arXiv:2005.00163 (2020).
Hassan, S. & Mihalcea, R. Cross-lingual semantic relatedness using encyclopedic knowledge in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (2009), 1192–1201. DOI: https://doi.org/10.3115/1699648.1699665
Lee, Y.-Y., Ke, H., Huang, H.-H. & Chen, H.-H. Combining word embedding and lexical database for semantic relatedness measurement in Proceedings of the 25th International Conference Companion on World Wide Web (2016), 73–74. DOI: https://doi.org/10.1145/2872518.2889395
Navigli, R. & Ponzetto, S. P. BabelRelate! a joint multilingual approach to computing semantic relatedness in Twenty-Sixth AAAI Conference on Artificial Intelligence (2012).
Bhingardive, S., Redkar, H., Sappadla, P., Singh, D. & Bhattacharyya, P. Indowordnet:: similarity computing semantic similarity and relatedness using indowordnet in Global WordNet Conference (2016), 39.
Camacho-Collados, J., Pilehvar, M. T., Collier, N. & Navigli, R. Semeval-2017 task 2: Multilingual and cross-lingual semantic word similarity in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) (2017), 15–26. DOI: https://doi.org/10.18653/v1/S17-2002
Speer, R. & Lowry-Duda, J. Conceptnet at semeval-2017 task 2: Extending word embeddings with multilingual relational knowledge. arXiv preprint arXiv:1704.03560 (2017). DOI: https://doi.org/10.18653/v1/S17-2008
Yu, Z., Wallace, B. C., Johnson, T. & Cohen, T. Retrofitting concept vector representations of medical concepts to improve estimates of semantic similarity and relatedness. Studies in health technology and informatics 245, 657 (2017).
Abdedda"ım, S., Vimard, S. & Soualmia, L. F. The MeSH-gram Neural Network Model: Extending word embedding vectors with MeSH concepts for UMLS semantic similarity and relatedness in the biomedical domain. arXiv preprint arXiv:1812.02309 (2018).
Henry, S., Cuffy, C. & McInnes, B. T. Vector representations of multi-word terms for semantic relatedness. Journal of biomedical informatics 77, 111–119 (2018). DOI: https://doi.org/10.1016/j.jbi.2017.12.006
Heo, G. E. & Xie, Q. A Hybrid Semantic Relatedness Algorithm by Entity CoOccurrence and Specialized Word Embeddings in 2019 IEEE International Conference on Healthcare Informatics (ICHI) (2019), 1–2. DOI: https://doi.org/10.1109/ICHI.2019.8904663
Glasgow, K., Roos, M., Haufler, A., Chevillet, M. & Wolmetz, M. Evaluating semantic models with word-sentence relatedness. arXiv preprint arXiv:1603.07253 (2016).
Siblini, R. & Kosseim, L. CLaC: Semantic relatedness of words and phrases. arXiv preprint arXiv:1708.05801 (2017).
He, H., Gimpel, K. & Lin, J. Multi-perspective sentence similarity modeling with convolutional neural networks in Proceedings of the 2015 conference on empirical methods in natural language processing (2015), 1576–1586. DOI: https://doi.org/10.18653/v1/D15-1181
Tian, J., Zhou, Z., Lan, M. & Wu, Y. Ecnu at semeval-2017 task 1: Leverage kernelbased traditional nlp features and neural networks to build a universal model for multilingual and cross-lingual semantic textual similarity in Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017) (2017), 191–197. DOI: https://doi.org/10.18653/v1/S17-2028
GOMAA, W. H. A MULTI-LAYER SYSTEM FOR SEMANTIC RELATEDNESS EVAL-
UATION. Journal of Theoretical and Applied Information Technology 97 (2019).
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I. & Specia, L. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv: 1708.00055 (2017). DOI: https://doi.org/10.18653/v1/S17-2001
Hu, Y., Ye, X. & Shaw, S.-L. Extracting and analyzing semantic relatedness between cities using news articles. International Journal of Geographical Information Science 31, 2427–2451 (2017). DOI: https://doi.org/10.1080/13658816.2017.1367797
Khan, M., Ramzan, S., Khan, S., Hassan, S. & Saeed, K. Measuring Text-Based Semantics Relatedness Using WordNet. International Journal of Cognitive and Language Sciences 13, 316–319 (2019).
Al-Ajmi, H. A new English–Arabic parallel text corpus for lexicographic applications. Lexikos 14 (2004). DOI: https://doi.org/10.5788/14-0-696
Alotaibi, H. M. Arabic-English parallel corpus: a new resource for translation training and language teaching. Arab World English Journal (AWEJ) Volume 8 (2017). DOI: https://doi.org/10.2139/ssrn.3053572
Zeroual, I. & Lakhouaja, A. MulTed: A multilingual aligned and tagged parallel corpus. Applied Computing and Informatics (2020). DOI: https://doi.org/10.1016/j.aci.2018.12.003
Ahmad, A. A.-S., Hammo, B. & Yagi, S. ENGLISH-ARABIC POLITICAL PARALLEL CORPUS: CONSTRUCTION, ANALYSIS AND A CASE STUDY IN TRANSLATION
STRATEGIES. Jordanian Journal of Computers and Information Technology (JJCIT) 3 (2017).
Park, J., Kim, K., Hwang, W. & Lee, D. Concept embedding to measure semantic relatedness for biomedical information ontologies. Journal of biomedical informatics 94, 103182 (2019). DOI: https://doi.org/10.1016/j.jbi.2019.103182
Nakamura, T., Shirakawa, M., Hara, T. & Nishio, S. Wikipedia-Based Relatedness Measurements for Multilingual Short Text Clustering. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, 1–25 (2018). DOI: https://doi.org/10.1145/3276473
Strube, M. & Ponzetto, S. P. WikiRelate! Computing semantic relatedness using Wikipedia in AAAI 6 (2006), 1419–1424.
Morgan, J. T. et al. Are we there yet?: The development of a corpus annotated for social acts in multilingual online discourse. Dialogue & Discourse 4, 1–33 (2013). DOI: https://doi.org/10.5087/dad.2013.201
Kim Jung, J. H. Gender bias in natural language processing: BioCorpus-5, a preliminary multilingual Gender-Balanced Corpus of in-domain wikipedia biographies B.S. thesis (Universitat Politécnica de Catalunya, 2019).
Frej, J., Schwab, D. & Chevallet, J.-P. MLWIKIR: A Python toolkit for building largescale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and more.
Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016). DOI: https://doi.org/10.1162/tacl_a_00051
Řehůřek, R. & Sojka, P. Software Framework for Topic Modelling with Large Corpora English. in Proceedings of the LREC 2010 Workshop on New Challenges for NLP
Frameworks http://is.muni.cz/publication/884893/en (ELRA, Valletta, Malta, May 2010), 45–50.
Navigli, R. & Ponzetto, S. P. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. Artificial Intelligence 193, 217–250 (2012). DOI: https://doi.org/10.1016/j.artint.2012.07.001
Speer, R., Chin, J. & Havasi, C. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge, 4444–4451. http://aaai.org/ocs/index.php/AAAI/AAAI17/ paper/view/14972 (2017).
Goldberg, Y. Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies 10, 1–309 (2017). DOI: https://doi.org/10.2200/S00762ED1V01Y201703HLT037
Farzad, A., Mashayekhi, H. & Hassanpour, H. A comparative performance analysis of different activation functions in LSTM networks for classification. Neural Computing and Applications 31, 2507–2521 (2019). DOI: https://doi.org/10.1007/s00521-017-3210-6
Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). DOI: https://doi.org/10.3115/v1/D14-1179
Cohan, A. et al. A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685 (2018). DOI: https://doi.org/10.18653/v1/N18-2097
Yang, S., Wang, Y. & Chu, X. A Survey of Deep Learning Techniques for Neural Machine Translation. ArXiv abs/2002.07526 (2020).
Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 2673–2681 (1997). DOI: https://doi.org/10.1109/78.650093
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural computation 9, 1735–1780 (1997). DOI: https://doi.org/10.1162/neco.1922.214.171.1245
Gers, F. A., Schmidhuber, J. & Cummins, F. Learning to forget: Continual prediction with LSTM (1999). DOI: https://doi.org/10.1049/cp:19991218
Xu, K. et al. Show, attend and tell: Neural image caption generation with visual attention in International conference on machine learning (2015), 2048–2057.
Kalchbrenner, N. & Blunsom, P. Recurrent continuous translation models in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013), 1700–1709.
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
Freitag, M. & Al-Onaizan, Y. Beam search strategies for neural machine translation. arXiv preprint arXiv:1702.01806 (2017). DOI: https://doi.org/10.18653/v1/W17-3207
Nwankpa, C., Ijomah, W., Gachagan, A. & Marshall, S. Activation functions: Comparison of trends in practice and research for deep learning. arXiv preprint arXiv:1811.03378 (2018).
Nguyen, G. et al. Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey. Artificial Intelligence Review 52, 77–124 (2019). DOI: https://doi.org/10.1007/s10462-018-09679-z
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12, 2825–2830 (2011).
Oliphant, T. E. Python for scientific computing. Computing in Science & Engineering 9, 10–20 (2007). DOI: https://doi.org/10.1109/MCSE.2007.58
Bird, S., Klein, E. & Loper, E. Natural language processing with Python: analyzing text with the natural language toolkit (" O’Reilly Media, Inc.", 2009).
Řehůřek, R. & Sojka, P. Gensim—statistical semantics in python. Retrieved from genism. org (2011).
Srinivasa-Desikan, B. Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras (Packt Publishing Ltd, 2018).
McKinney, W. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython (" O’Reilly Media, Inc.", 2012).
Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. BLEU: a method for automatic evaluation of machine translation in Proceedings of the 40th annual meeting of the Association for Computational Linguistics (2002), 311–318. DOI: https://doi.org/10.3115/1073083.1073135
Dew, K. N., Turner, A. M., Choi, Y. K., Bosold, A. & Kirchhoff, K. Development of machine translation technology for assisting health communication: A systematic review. Journal of biomedical informatics 85, 56–67 (2018). DOI: https://doi.org/10.1016/j.jbi.2018.07.018
How to Cite
Copyright (c) 2021 Zainab Almugbel
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
Submission of an article implies that the work described has not been published previously (except in the form of an abstract or as part of a published lecture or academic thesis), that it is not under consideration for publication elsewhere, that its publication is approved by all authors and tacitly or explicitly by the responsible authorities where the work was carried out, and that, if accepted, will not be published elsewhere in the same form, in English or in any other language, without the written consent of the Publisher. The Editors reserve the right to edit or otherwise alter all contributions, but authors will receive proofs for approval before publication.
Copyrights for articles published in IJIER journals are retained by the authors, with first publication rights granted to the journal. The journal/publisher is not responsible for subsequent uses of the work. It is the author's responsibility to bring an infringement action if so desired by the author.