Bilingual Summarization of English and Arabic Genetic Diseases Texts




NLP, medical texts, RNN


Health Literacy aims at empowering patients to take better decisions about their health. The quality of Health Literacy for patients with genetic diseases can be enriched via facilitating the bilingual retrieval of summaries about the genetic diseases texts from the net. This paper proposes helps translator to achieve this task by utilizing NLP and Recurrent Neural Network (RNN) techniques for two tasks: generating abstractive summarizations and making Arabic-English translation. Both summarization and translation tasks require training sets that can be built from English summaries corpus and Arabic-English parallel corpora. The English summaries corpus is built from Orphadata while the parallel corpora is built from Wiki articles. The corpus is utilized for generating the English summaries from the Wiki articles, and the corpora is utilized for translating these summaries into Arabic. This paper defines the research problem. Then, it investigates a set of objectives to solve the problem. After that, it presents a literature review of the tasks in the objectives. Finally, it discusses the proposed solution for the problem from the following aspects: the required corpora, the system architecture, the RNN memory cell components architectures, the proposed software for the implementation, and the system evaluation.


Download data is not yet available.

Author Biography

Zainab Almugbel, Imam Abdulrahman Bin Faisal University

 Lecturer, Computer Science Department, Community College


Meilleur, K. G. & Littleton-Kearney, M. T. Interventions to improve patient education regarding multifactorial genetic conditions: a systematic review. American Journal of Medical Genetics Part A 149, 819–830 (2009). DOI:

Howerton, D. A. Medical Information on the Internet. Journal of Pastoral Care & Counseling 73, 52–54 (2019). DOI:

Bustard, D. & Liu, W. Soft-Ware 2002: Computing in an Imperfect World: First International Conference, Soft-Ware 2002 Belfast, Northern Ireland, April 8-10, 2002 Proceedings (Springer, 2003). DOI:

Silla, C. N., Pappa, G. L., Freitas, A. A. & Kaestner, C. A. Automatic text summarization with genetic algorithm-based attribute selection in Ibero-American Conference on Artificial Intelligence (2004), 305–314. DOI:

Jaykumar, N. ResQu: A Framework for Automatic Evaluation of Knowledge-Driven Automatic Summarization PhD thesis (Wright State University, 2016).

Mantas, J., Hasman, A. & Househ, M. S. Enabling Health Informatics Applications (IOS Press, 2015).

Afantenos, S., Karkaletsis, V. & Stamatopoulos, P. Summarization from medical documents: a survey. Artificial intelligence in medicine 33, 157–177 (2005). DOI:

Moratanch, N. & Chitrakala, S. A survey on abstractive text summarization in 2016 International Conference on Circuit, power and computing technologies (ICCPCT) (2016), 1–7. DOI:

Mishra, R. et al. Text summarization in the biomedical domain: a systematic review of recent research. Journal of biomedical informatics 52, 457–467 (2014). DOI:

Moawad, I. F. & Aref, M. Semantic graph reduction approach for abstractive Text Summarization in 2012 Seventh International Conference on Computer Engineering & Systems (ICCES) (2012), 132–138. DOI:

Le, H. T. & Le, T. M. An approach to abstractive text summarization in 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR) (2013), 371–376. DOI:

Zhang, H., Fiszman, M., Shin, D., Wilkowski, B. & Rindflesch, T. C. Clustering cliques for graph-based summarization of the biomedical research literature. BMC bioinformatics 14, 182 (2013). DOI:

Bhargava, R., Sharma, Y. & Sharma, G. Atssi: Abstractive text summarization using sentiment infusion. Procedia Computer Science 89, 404–411 (2016). DOI:

Khan, A. et al. Abstractive text summarization based on improved semantic graph approach. International Journal of Parallel Programming 46, 992–1016 (2018). DOI:

Kishore, K., Gopal, G. N. & Neethu, P. Document Summarization in Malayalam with sentence framing in 2016 International Conference on Information Science (ICIS) (2016), 194–200. DOI:

Azadani, M. N., Ghadiri, N. & Davoodijam, E. Graph-based biomedical text summarization: An itemset mining and sentence clustering approach. Journal of biomedical informatics 84, 42–58 (2018). DOI:

Gigioli, P., Sagar, N., Rao, A. & Voyles, J. Domain-Aware Abstractive Text Summarization for Medical Documents in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2018), 2338–2343. DOI:

Yao, K. et al. Dual encoding for abstractive text summarization. IEEE transactions on cybernetics (2018).

Jose, J. M. et al. Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II (Springer Nature, 2020). DOI:

Song, S., Huang, H. & Ruan, T. Abstractive text summarization using LSTM-CNN based deep learning. Multimedia Tools and Applications 78, 857–875 (2019). DOI:

Iwasaki, Y., Yamashita, A., Konno, Y. & Matsubayashi, K. Japanese abstractive text summarization using BERT in 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI) (2019), 1–5. DOI:

Sotudeh, S., Goharian, N. & Filice, R. W. Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization. arXiv preprint arXiv:2005.00163 (2020).

Hassan, S. & Mihalcea, R. Cross-lingual semantic relatedness using encyclopedic knowledge in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (2009), 1192–1201. DOI:

Lee, Y.-Y., Ke, H., Huang, H.-H. & Chen, H.-H. Combining word embedding and lexical database for semantic relatedness measurement in Proceedings of the 25th International Conference Companion on World Wide Web (2016), 73–74. DOI:

Navigli, R. & Ponzetto, S. P. BabelRelate! a joint multilingual approach to computing semantic relatedness in Twenty-Sixth AAAI Conference on Artificial Intelligence (2012).

Bhingardive, S., Redkar, H., Sappadla, P., Singh, D. & Bhattacharyya, P. Indowordnet:: similarity computing semantic similarity and relatedness using indowordnet in Global WordNet Conference (2016), 39.

Camacho-Collados, J., Pilehvar, M. T., Collier, N. & Navigli, R. Semeval-2017 task 2: Multilingual and cross-lingual semantic word similarity in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) (2017), 15–26. DOI:

Speer, R. & Lowry-Duda, J. Conceptnet at semeval-2017 task 2: Extending word embeddings with multilingual relational knowledge. arXiv preprint arXiv:1704.03560 (2017). DOI:

Yu, Z., Wallace, B. C., Johnson, T. & Cohen, T. Retrofitting concept vector representations of medical concepts to improve estimates of semantic similarity and relatedness. Studies in health technology and informatics 245, 657 (2017).

Abdedda"ım, S., Vimard, S. & Soualmia, L. F. The MeSH-gram Neural Network Model: Extending word embedding vectors with MeSH concepts for UMLS semantic similarity and relatedness in the biomedical domain. arXiv preprint arXiv:1812.02309 (2018).

Henry, S., Cuffy, C. & McInnes, B. T. Vector representations of multi-word terms for semantic relatedness. Journal of biomedical informatics 77, 111–119 (2018). DOI:

Heo, G. E. & Xie, Q. A Hybrid Semantic Relatedness Algorithm by Entity CoOccurrence and Specialized Word Embeddings in 2019 IEEE International Conference on Healthcare Informatics (ICHI) (2019), 1–2. DOI:

Glasgow, K., Roos, M., Haufler, A., Chevillet, M. & Wolmetz, M. Evaluating semantic models with word-sentence relatedness. arXiv preprint arXiv:1603.07253 (2016).

Siblini, R. & Kosseim, L. CLaC: Semantic relatedness of words and phrases. arXiv preprint arXiv:1708.05801 (2017).

He, H., Gimpel, K. & Lin, J. Multi-perspective sentence similarity modeling with convolutional neural networks in Proceedings of the 2015 conference on empirical methods in natural language processing (2015), 1576–1586. DOI:

Tian, J., Zhou, Z., Lan, M. & Wu, Y. Ecnu at semeval-2017 task 1: Leverage kernelbased traditional nlp features and neural networks to build a universal model for multilingual and cross-lingual semantic textual similarity in Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017) (2017), 191–197. DOI:


UATION. Journal of Theoretical and Applied Information Technology 97 (2019).

Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I. & Specia, L. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv: 1708.00055 (2017). DOI:

Hu, Y., Ye, X. & Shaw, S.-L. Extracting and analyzing semantic relatedness between cities using news articles. International Journal of Geographical Information Science 31, 2427–2451 (2017). DOI:

Khan, M., Ramzan, S., Khan, S., Hassan, S. & Saeed, K. Measuring Text-Based Semantics Relatedness Using WordNet. International Journal of Cognitive and Language Sciences 13, 316–319 (2019).

Al-Ajmi, H. A new English–Arabic parallel text corpus for lexicographic applications. Lexikos 14 (2004). DOI:

Alotaibi, H. M. Arabic-English parallel corpus: a new resource for translation training and language teaching. Arab World English Journal (AWEJ) Volume 8 (2017). DOI:

Zeroual, I. & Lakhouaja, A. MulTed: A multilingual aligned and tagged parallel corpus. Applied Computing and Informatics (2020). DOI:


STRATEGIES. Jordanian Journal of Computers and Information Technology (JJCIT) 3 (2017).

Park, J., Kim, K., Hwang, W. & Lee, D. Concept embedding to measure semantic relatedness for biomedical information ontologies. Journal of biomedical informatics 94, 103182 (2019). DOI:

Nakamura, T., Shirakawa, M., Hara, T. & Nishio, S. Wikipedia-Based Relatedness Measurements for Multilingual Short Text Clustering. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, 1–25 (2018). DOI:

Strube, M. & Ponzetto, S. P. WikiRelate! Computing semantic relatedness using Wikipedia in AAAI 6 (2006), 1419–1424.

Morgan, J. T. et al. Are we there yet?: The development of a corpus annotated for social acts in multilingual online discourse. Dialogue & Discourse 4, 1–33 (2013). DOI:

Kim Jung, J. H. Gender bias in natural language processing: BioCorpus-5, a preliminary multilingual Gender-Balanced Corpus of in-domain wikipedia biographies B.S. thesis (Universitat Politécnica de Catalunya, 2019).

Frej, J., Schwab, D. & Chevallet, J.-P. MLWIKIR: A Python toolkit for building largescale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and more.

Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016). DOI:

Řehůřek, R. & Sojka, P. Software Framework for Topic Modelling with Large Corpora English. in Proceedings of the LREC 2010 Workshop on New Challenges for NLP

Frameworks (ELRA, Valletta, Malta, May 2010), 45–50.

Navigli, R. & Ponzetto, S. P. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. Artificial Intelligence 193, 217–250 (2012). DOI:

Speer, R., Chin, J. & Havasi, C. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge, 4444–4451. paper/view/14972 (2017).

Goldberg, Y. Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies 10, 1–309 (2017). DOI:

Farzad, A., Mashayekhi, H. & Hassanpour, H. A comparative performance analysis of different activation functions in LSTM networks for classification. Neural Computing and Applications 31, 2507–2521 (2019). DOI:

Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). DOI:

Cohan, A. et al. A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685 (2018). DOI:

Yang, S., Wang, Y. & Chu, X. A Survey of Deep Learning Techniques for Neural Machine Translation. ArXiv abs/2002.07526 (2020).

Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 2673–2681 (1997). DOI:

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural computation 9, 1735–1780 (1997). DOI:

Gers, F. A., Schmidhuber, J. & Cummins, F. Learning to forget: Continual prediction with LSTM (1999). DOI:

Xu, K. et al. Show, attend and tell: Neural image caption generation with visual attention in International conference on machine learning (2015), 2048–2057.

Kalchbrenner, N. & Blunsom, P. Recurrent continuous translation models in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013), 1700–1709.

Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

Freitag, M. & Al-Onaizan, Y. Beam search strategies for neural machine translation. arXiv preprint arXiv:1702.01806 (2017). DOI:

Nwankpa, C., Ijomah, W., Gachagan, A. & Marshall, S. Activation functions: Comparison of trends in practice and research for deep learning. arXiv preprint arXiv:1811.03378 (2018).

Nguyen, G. et al. Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey. Artificial Intelligence Review 52, 77–124 (2019). DOI:

Pedregosa, F. et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12, 2825–2830 (2011).

Oliphant, T. E. Python for scientific computing. Computing in Science & Engineering 9, 10–20 (2007). DOI:

Bird, S., Klein, E. & Loper, E. Natural language processing with Python: analyzing text with the natural language toolkit (" O’Reilly Media, Inc.", 2009).

Řehůřek, R. & Sojka, P. Gensim—statistical semantics in python. Retrieved from genism. org (2011).

Srinivasa-Desikan, B. Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras (Packt Publishing Ltd, 2018).

McKinney, W. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython (" O’Reilly Media, Inc.", 2012).

Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. BLEU: a method for automatic evaluation of machine translation in Proceedings of the 40th annual meeting of the Association for Computational Linguistics (2002), 311–318. DOI:

Dew, K. N., Turner, A. M., Choi, Y. K., Bosold, A. & Kirchhoff, K. Development of machine translation technology for assisting health communication: A systematic review. Journal of biomedical informatics 85, 56–67 (2018). DOI:




How to Cite

Almugbel, Z. (2021). Bilingual Summarization of English and Arabic Genetic Diseases Texts . International Journal for Innovation Education and Research, 9(9), 342–373.