Automatic Arabic Text Categorization Using Efficient Classification Techniques

 

Automatic Arabic Text Categorization Using Efficient Classification Techniques
Mouhammd Al-awadi

With the increasing growth of Arabic contents on the internet, the need to organize these contents in a fast and automatic way is also increasing. This environment needs automatic and efficient categorization methods for texts, to be automatically classified into categories and groups. Categorization systems can significantly contribute to monitor huge quantity of Arabic texts that exist in many forms, such as in news, journal articles and web pages.

Generally, people need an easy-to-use approach and retrieval engines to classify files and texts according to their interests. It is hard and difficult task to manually classify documents. The availability of automatic tools that provide such services for Arabic documents are restricted. However, there are many researches focused on some methods, which have more effective morphological analysis and study their effect on Arabic text categorization. CITATION Saa10 \l 1033 (Saad, 2010), CITATION AlN14 \l 1033 (Al-Nashashibi, 2014), CITATION Awa07 \l 1033 (Arabic text preprocessing for the natural language processing applications, 2007)Here, we need more researches to check and improve the accuracy of these methods, moreover, to improve their effect on text categorization (TC) Performance.

Arabic language categorization needs further study and attention, because it is different from other languages, such as English and European languages which have been already developed. There are some problems on the way of Arabic language is structured, such as mismatching obstacle. This issue means that using different words in different texts can give the same meaning as a result. Arabic words have different meanings and usages according to their relations to the context. This gap has been intensively studied over the time and many methods have been examined to solve such problems CITATION Saa10 \l 1033 (Saad, 2010).

Text categorization is one of the artificial intelligence applications CITATION Man15 \l 1033 (Information Systems Design and Intelligent Applications, 2015), that allows collection a large number of unstructured text to be grouped into classes based on specific criterion, such as topics, author of the text, text language etc. Text Categorization (TC) also called "text classification or document categorization. The interest of Arabic text classification is increased recently.

Text Categorization is overlapping in areas of Machine Learning (ML) and Information Retrieval (IR), and related to many important areas of scientific research, such as data mining, texts retrieval and natural language processing. (Figure 1.1) illustrates the intersection between those fields.