Advancing the Arabic Language Online A Brief Examination of Public Open Spaces in the City of Doha
01 Jan 07:37 AMSector : ICT Country : Qatar
By Majd Abbar
The Qatar National Vision 2030, published in 2008, sets forward a framework to diversify the nation’s wealth and to secure its long–term future by becoming a knowledge economy through focusing commitment to social, economic, human and environmental development. Education, science and culture are key components to forming the foundation for this development.
Language plays a very important and crucial role in the dissemination of knowledge and information, leading to the development and sustainability of civilizations. It is the expression of thoughts, ideas and emotions. Knowledge is based on the language used by the community to exchange ideas and build on cultural roots. The Arabic language is no exception.
It is essential that language does not become a barrier towards the adoption of modern technologies and science, and unfortunately, the Arabic language and its use have not evolved sufficiently in the recent decades to keep up with the innovations in technology and scientific advancements. This barrier has led to a disproportionate amount of digital Arabic content when compared to other languages given the large number of Arabic speakers worldwide.
Under the patronage of H.H. Sheikha Moza bint Nasser al Missned, Chairperson of Qatar Foundation for Education, Science and Community Development, the Renaissance of the Arabic Language Forum was held in May of 2012, promoting the advancement of the Arabic language.
At Qatar Computing Research Institute (QCRI) we are dedicated to promoting the Arabic language and its renaissance, a major initiative for Qatar and for Qatar Foundation. Our research does not only address the lack of content, but also the challenges in retrieving this content when it exists, making it accessible and enabling information flow across language barriers.
QCRI was honored to take part in the Renaissance of the Arabic Language Forum and present its vision of the challenges facing the Arabic Internet users and the opportunities that exist. The challenges were summarized into two main aspects: 1) the lack of valuable digital Arabic content and 2) the inability to retrieve that content when available.
Raising the bar
Arabic content represents less than 1 percent of all online content. Different sources provide different ranges for the online Arabic content ranging from 0.5 percent to 3 percent, however regardless of which figure you chose to believe, the Arabic language is severely under represented on the web. This is especially startling when you consider that the Arabic lanconnected guage is the fifth most spoken language in the world and yet ranks 11th on the web.
It is important to note that this is not a result of the lack of desire or need for Arabic content as evidenced in the fact that Arabic is the fastest-growing language on the internet, with Arabic-speaking internet users increasing 2,298 per cent from 2000-2009, according to the Internet World Statistics Report and 228 percent over the previous two years. The number of internet users in the Middle East and North Africa (MENA) region has jumped from 3.3 million users in 2000 to over 90 million in 2012 and it is expected to continue growing. It is also the fastest growing language on the most popular social media sites such as Facebook and Twitter, where the use of the Arabic language on Twitter had grown by over 2000 percent in the last year. This growth is not confined to social media, also as evidenced by the fact that Arabic is the second fastest growing language on Wikipedia, after Portuguese. The number of Arabic articles on Wikipedia had grown by 29 percent over the past year. The demand is certainly there, yet the supply continues to be lacking.
Enabling a renaissance in digital Arabic content
At QCRI, we choose to consider this as a challenging opportunity rather than a blocking obstacle and as such we have launched our “Ethraa” initiative to promote the renaissance of the Arabic language through the enrichment of digital Arabic content. We strive to achieve this through original content creation, translation of existing content and the digitization of non-digital Arabic content.
Some of the challenges of creating original content arise from the lack of proper editing tools that are intuitive and easy to use and which provide for high productivity. QCRI has partnered with a technology development company to create a tool that will help editors to create and/or translate articles online for the Arabic Wikipedia. Further, the Ethraa initiative encourages the creation of Arabic content through grassroots efforts with local and regional partners, and will be looking to integrate these efforts with academic programs.
Translation of existing content is one of the simpler forms to create content, albeit not a very economical nor sustainable one in the long run. As an initial effort to boost the presence of Arabic content on Wikipedia, QCRI partnered with Wikimedia in a program to provide translations for 10,000 articles chosen to be relevant to the Arab users. There are future programs, in partnership with national libraries around the world, in the pipeline to translate important works into the Arabic language.
Similarly, there are many non-digital items of invaluable cultural value that already exist in the Arabic language, however they are not easily accessible to Arab users. Hence, QCRI is undertaking major research into Optical Character Recognition, OCR, and is partnering with the Qatar National Library to digitize the most important of these items.
While these challenges can be addressed by QCRI, there are other factors that play a role in preventing greater use of the Arabic language on the Internet but are outside the scope of the Institute, such as:
- High cost of bandwidth and equipment
- Inadequate telecommunication infrastructures
- Digital illiteracy due to outdated education systems
- Censorship and internet monitoring
The Arabic Language Technologies team at QCRI is also undertaking many research areas within the Arabic language domain such as Natural Language Processing, optimization of Search for the Arabic language, creation and incubation of innovative technologies dealing with the Arabic language among others. Several of the research areas have already yielded products that are in the process of being commercialized such as an Electronic Book Reader with true native Arabic support, Speech to Text engine and morphological search.