THE KNOTTY PROBLEM OF USING AFRICAN LANGUAGES FOR E-MAIL AND INTERNET
27 April 2001
As the web and e-mail spread, Africans will increasingly want to have information and communicate in their own languages. For the purposes of generating and transmitting text electronically, Africa has three categories of languages: those that use basically the same characters one finds in the major languages of West European origin; those which use basically that same Latin alphabet but with some added letters; and those which use non-Latin alphabets. The last two pose considerable problems for those wanting to see digital advances as a way of improving communication. Bisharat’s Don Osborn looks at how these obstacles can be tackled.
As the information revolution worldwide becomes increasingly multilingual, and as the new technologies in Africa gradually move beyond the capital cities, what are the barriers to greater use of the indigenous languages of the continent?
There are of course a number of interrelated issues to consider in a comprehensive discussion of this question, which one might broadly characterize as including: structural issues (e.g., basic physical access to the technology, technical problems), socio-linguistic factors (issues relating to orthographies, literacy, multiplicity of languages and dialect variation within languages, and attitudes about languages), economic considerations (lack of resources, other priorities in using IT for development), and even political concerns (what effect would validating linguistic diversity in the new technologies have on divisions in a society).
This short article highlights a fairly narrow but significant technical matter: the current possibilities for generation and transmission of text in African languages. Text of course means characters, and the larger the number of characters outside of the set used in the main languages of IT, the more complicated the problem becomes. Since all African languages are not the same in their orthographies, it is useful to group them in three categories in order to consider what is involved and is actually being done:
- those that use basically the same characters one finds in the major languages of West European origin;
- those which use basically that same Latin alphabet but with some added letters;
- and those which use non-Latin alphabets.
- For African languages of the first category &SHY; that use the Latin alphabet of European languages &SHY; there are no special technical problems to working with text, production of web content, or even software localization. This is especially the case for languages like Swahili, Somali, and many in Southern Africa that use only ASCII characters (i.e., no accents).Even languages such as Sango that use several accented characters common to major European languages can be readily used in word-processing and on the web (see for example http://sango.free.fr/).
- However, many African &SHY; and most West African &SHY; languages in their officially adopted orthographies use the Latin alphabet with a few extra or different characters/letters or less-common digraphs to represent sounds not found in major European languages. The extended alphabet adopted by many countries for their maternal languages had its genesis at a conference of African language experts held in Bamako in 1966.For using the special characters on computers and the internet there are several approaches:
(a) The "correct" one. That is, in a word processor to have a font that includes these characters. There actually seems to be a growing number of such fonts, often created to meet specific needs on a local level or as part of a commercial line of multilingual software. Unfortunately they and the keyboard arrangements for them are generally incompatible.
For the web, that means being able to have these added characters in a text with a standard code for each character, a single code set including these, and some standard set of glyphs on the receiving end that a browser would call up to represent them. Unicode is proposed as a solution to this (as well as to the lack of standardization for wordprocessing).However it is not there yet as you might find, depending on how your browser handles Unicode (utf-8), in looking at the Fula (Peulh, Pulaar), Ewe, Kabye, and Maninka versions of the Universal Declaration of Human Rights at http://www.unhchr.ch/udhr/navigate/region.htm.If you get a lot of empty boxes in the texts then you can see why people still are using workarounds such as below to create and share text in these languages.
(b) The "old-correct" or obsolete one. That is, for some languages such as Bambara, Ewe, or Fula (Pular/Fuuta Jalon) some digraphs or accented characters used in European languages were employed before the special characters of the extended alphabet were officially adopted (e.g., "ny" or "n tilda" for the "n with left hook"; "o accent grave" or "underlined o" for the "open o"; "dh" for the "hooked d"). This is the approach I used when typing Bambara for class years ago or in e-mail more recently. It lets one produce and present text, but is not satisfactory to those who have learned in and/or are used to using the current orthography. Also, accents might be confused with tone indicators used in texts for some of the tonal languages (Bambara, Yoruba). A site with the "old-correct" transcription of Fula (Pular/Fuuta Jalon) is: http://www.fuuta-jalon.net/Pular/pular.html
(c) The substitute solution: Use something that stands for the special characters. For instance use capital letters in place of the special characters (e.g., "E" for the "open e"). An example in Bambara can be seen at: http://callisto.si.usherb.ca/~malinet/index_ba.html
Another example is digraphs for modified consonants, such as "’d" or "’k" for the "hooked d" and "hooked k" as is the approach used for text in a Hausa page (see esp. the part named "Mawallafan Littattafan Hausa": http://www.gumel.com/Littattafan-Hausa.htm
Yet another is to substitute similar-looking letters from other alphabets, such as the Greek letter "ß" for the "hooked b" used in Fula and Hausa. Some of the texts on the Fula (Pular/Fuuta Jalon) site mentioned above (b) show this.
(d) The "little image file" solution where little image files are used for the special characters inserted as needed in the text. This is very cumbersome except for short texts. A site where that was done, for Bambara, is http://www.djembe.com/bambara_1.cfm .A Wolof learning site uses a little image file to help readers ascertain whether their browser can read the letter "eng":
(e) The "big image file" solution. Where text in proper orthography is turned into image files (.jpg or .pdf), usually for the web. One example for Fula (Pulaar) is at the bottom of the page at http://africandl.org/fuuta_lib/aan_pulaar-eng.html ;
another is the Declaration of Human Rights in Bambara at http://www.unhchr.ch/udhr/lang/bra.htm .This solution is sometimes also used for languages written in non-Latin alphabets.
(f) The "whatever works easiest" (or "fast & dirty") solution. That is, just use the closest standard Latin letter for each special character (e.g., "e" for the "open e"). This was done with Bambara at:
Examples for Hausa include
http://www.unhchr.ch/udhr/lang/gej.htm and most of the site
http://www.gumel.com/. The advantage is that it gets the material out there in readable form quickly, rather than working on the technical solutions or settling on a substitute solution. As a consequence, it is the method apparently used most for e-mail in African languages (and even sometimes in the case of French text, which some e-lists/groups and at least one e-newsletter disseminate without accents). The disadvantage, of course, is that many words can thus be misread.
(g) "Hybrid" solutions are a mix of a couple of the above. For example, Wolof text at http://www.bok.net/pajol/index.wo.html uses accented characters but not the letter "eng." And two sites with Fula (Pular/Fuuta Jalon) deal in different ways with the transition from the old transcription to the new, the one cited in (b) above and
- For African languages with their own script, such as the Ge’ez used in Ethiopian and Eritrean languages or Tifinagh used in Tamasheq and Berber, special coding is necessary. This process is already well advanced for Arabic, and is not unlike what has been or is being dealt with for the several non-Latin alphabets used across much of Eurasia.In the case of Ge’ez, apparently several font + keyboard packages are available raising the problem of mutually incompatible systems and the desirability of some sort of standardization, as explored in
http://www.punchdown.org/rvb/email/UniGeez.html. For Tifinagh some fonts are available.Other less widely used alphabets such as N’ko and Vai are virtually non-existent in information technology. In any event, unlike the case for languages using extended Latin alphabets, there are no shortcut solutions &SHY; either you have the full orthography in text (or image file), or you substitute a transcription or transliteration in Latin characters.
The technical issues relating to producing and sharing text in extended-Latin or non-Latin characters are not the only or even the most significant impediments to increased African language use on computers and the internet. And these problems can be got around in one way or another when people have a mind (and means) to do so. However, as increasing numbers of people in Africa encounter the new technologies, a multiplicity of incompatible and often ad hoc systems for processing their languages in computer software, and the various alternative solutions for display of text with special or non-Latin characters on the internet will not be able to serve them well.
A shorter version of this article was included in the Open University Development and Environment Society Update (July 2001) <http://www.geocities.com/oudesociety>; it was derived from a note originally posted on the <africa_web_content_owner@YahooGroups.com> list (Feb. 2000).
Don Osborn is Associate Director for Agriculture with Peace Corps in Niger (the views expressed are his own). He is also the founder of the Bisharat Language, Technology & Development Initiative (www.kabissa.org/bisharat).
METROCOMIA SURVEY CLAIMS 0.5 MILLION UGANDANS USE INTERNET? We reported the above from Uganda’s New Vision newspaper in our last issue. Several sceptical readers challenged the figure. One spoke to Metrocomia: "Their sample size was 2-300, it was not a thorough "scientific survey", in their opinion. By their own extrapolations they believe that it is more like 250,000 people are using the Internet in Kampala"._