Browsing in Foreign Languages
and Non-Latin Scripts

©1999, 2003 Kathleen Hartford
All rights reserved

NOTE: This page is a supplement to American Politics and International Relations on the Internet: The Smart Student's Guide, by Kathleen Hartford (with Bonnie Jeanne Duval assisting), published by McGraw-Hill. The main web site for the book is at http://www.mhhe.com/hartford/.

International though the Internet may seem, it has a good long way to go yet, and one of the longest ways of all is in dealing with foreign languages and scripts other than those we use for plain old English. The words you see in English are rendered with a character set known as ASCII (American Standard Code for Information Interchange). Software that has been developed for an English-language environment will automatically "translate" the strings of "7-bit characters" (combinations of seven 0s and 1s) in text files into the appropriate characters for human consumption. There are only 128 possible combinations of these -- which is a serious limitation when we start getting into:

  1. characters in other languages that use the Latin alphabet, but with diacritical markings over some letters (è, é, ê, and ë, for example);
  2. languages that use non-Latin alphabets, such as Arabic, Cyrillic (Russian and some other Slavic languages), Greek, Hebrew, and Hindi; and
  3. languages that use ideographic writing systems, which may use characters running into the thousands or tens of thousands, such as Korean, Japanese, and Chinese.
Arriving at a way of representing these different characters for computer systems is no longer a major technical problem. The major problem is getting everyone using the characters to agree upon using the same encoding system for representing them![1] Chinese characters, for example, have two main encoding systems, one of which is primarily used on the China mainland, and the other of which is mostly used in Taiwan. Russian has several encoding systems.[2]

Now, before you get too worried, let me hasten to assure you that the problem has solutions, and for the major Western European languages (type 1), the agreement is consistent enough that the major browsers' more current versions are equipped to deal with them pretty expeditiously. And there are ways to get around any problems with the others without too many tears, either. Just hang on, and this page will answer most of your questions or point you towards the pages that will.

The key points for you to keep in mind for foreign-script browsing are basically only two:

  1. your browser needs to be told which encoding system is used for the pages you are viewing, and
  2. it may need to have some special fonts for displaying the characters once it interprets the code for them.
Point one relates to your setting your browser so it can "understand" what it is "seeing," and point two relates to the browser's showing you something that makes sense to you.

For those who are using Win2000 and later, along with later versions of the browser software, you can make your system recognize non-Latin script by first arranging setting in the Win2000 operating system. (See some nice clear instructions provided by Prof. Barrett McCormick for doing this for Chinese characters, and do the analogous moves for any other script you might want.) Then, if your browser is set to detect the proper script codes, it may automatically adjust its readings as it arrives at non-Latin-alphabet pages. If not, you can easily instruct it to do so (in MS-IE: View | Encoding; in Netscape: View | Character set).

Some people actually prefer to use an older OS or browser version. If you are one of those folks, you probably need to work through the steps outlined here. And yes, alas, they are different for the different browsers, so the discussion is divided accordingly, for MS IE and Netscape, for use on Windows 95/98 systems. There may be some slight differences for different releases of these two browsers, so if yours doesn't look like the pictures provided, check the help information provided on the company's support pages.

For other systems: I don't have direct experience with Macs in this area, but you can find some links on this page for informational sites that discuss them. In general, I hear from some friends that the Macintosh is somewhat kinder to the foreign language user than is the PC. If you're using Linux or some other variety of UNIX, I applaud your independence but you're on your own. But then, you probably expect to be, and like it that way.
How do you know when you need to adjust your encoding settings or get new fonts? If you arrive at a web page that is showing bizarre strings of (mostly) alphabetic characters surrounded by strange markings, you are probably on a page for which your browser needs help. The country code in the URL should give you a pretty good idea of which language you're looking at. For some samples, you can take a look at the gibberish representations of Chinese, Japanese, Hebrew, and Hindi.

Microsoft Internet Explorer (version 5.x)

Internet Explorer makes the adjustments pretty easy to handle.
  1. Click on "Help" and then on "Contents and index" to get to the IE Help Window. (If the contents and index options aren't actually displayed in that window initially, click on the "Show" button to make them display on the left side.)
  2. Select the "Index" tab, and type the word "foreign" (without the quotation marks) into the little box provided. You should see an option for "foreign languages" appear. (For this and the next two steps, see the picture linked here.)
  3. Double-click on "foreign languages," and you will get a little box offering you two choices.
  4. Click on the choice for "correctly display Web pages encoded in any language" and then click on the "display" button.
  5. Carefully read through the instructions on that page, and follow them in order to enable the proper reading of encoding by your browser.
  6. Remember that when you go back to English-language pages, you may need to change the encoding back to its original setting.
If you also need new font sets to see foreign-language pages properly, IE will probably prompt you to download them from the Microsoft site the first time you visit a web page in that language. If you choose not to download then, you may not get another prompt. However, it's easy to locate the download page:
  1. Click on the Windows Start button, choose Settings, then select Windows update. This will take you to Microsoft's main update page.
  2. Click on the "Product Updates" button and wait (for what may seem a long time) while your system decides what it already has. Eventually you should see the main frame displaying a linked list of items, with detailed descriptions.
  3. Scroll almost all the way to the bottom and you will see a listing for "International Language Support." Read the descriptions carefully, and click in the little box next to any (or all) of the support packages you want to download; then click on the "download" button.
Installation on your system should be automatic, except that you may need to agree to restarting to make some of the changes take effect.
 

Netscape Communicator/Navigator (version 4.x)

Setting up Netscape's browser for non-Latin scripts is a bit trickier, largely because you will have to figure out for yourself which fonts to get from where. But this is not an insuperable obstacle.

For the encoding settings, you should follow the step-by-step instructions, with pictures, that Netscape provides on its page "International Users Basics." These are pretty straightforward, and Mac and UNIX users will find that Netscape has not forgotten them either. I strongly urge you to read Netscape's own instructions before you try using the fonts information given below.

For the fonts, you will want to have the font set for the appropriate language/s installed on your system before you try changing settings. Netscape leaves it up to you to find other providers for the fonts. Keep in mind that these are specific to the operating system; you can't use Mac fonts on a PC, or Windows fonts on a UNIX machine. Here I provide a list of languages with some sites for good font sets for download. (Some are free, some are shareware, and some you have to purchase before you can use them.) You do need to keep in mind that because these packages are provided by third parties, they are not always compatible with Netscape's browser, or with all versions of it.

  1. General sites (with or pointing to fonts for many different languages):
  2. East Asian (ideographic) languages:  These present a real challenge, both because of the number of characters involved (font sets generally are very large compared to those for alphabetic languages) and because of the multiplicity of encoding systems. You can use an application that is designed as a "passive" one just for browsing, but if you want to do any text input in the language (e.g., using a search engine), you need a somewhat fancier package.
  3. Arabic
  4. Hebrew
  5. Russian/Cyrillic. I doubt that you really need to get additional fonts for Cyrillic alphabet display on the latest versions of Netscape Communicator; mine seems to be able to load the pages automatically with correct display of the characters.
  6. Other Languages. If you are still looking, you could try the following approaches:
For Chinese, at least, once you've installed the fonts that work with Netscape's browser, they also work pretty well with the Opera browser too, and they may work with others.

Some things to watch out for: If you're using MS-IE and another browser at the same time, you may get interference by activating two different font packages at the same time. You may have to turn one off, or close one of the browsers, to prevent crashes or other strange occurrences. Some packages, Netscape warns, may be incompatible with its browser. If you experience frequent system crashes with a particular font package, try uninstalling it and installing a different one. (I have also run into some problems with crashes when I'm using Chinese or Japanese font "platform" applications with other programs running, so if I really need to be sure the system will stay stable, I close all unnecessary applications before using the font applications.) Things still may not operate perfectly. With ostensibly the same operating system and same software applications, I have had no trouble reading some pages on a desktop system that have utterly confounded my laptop system. Think of such situations as growth opportunities....

For More Information

If you want to know even more, try using the language or area-related pages in a good subject guide (see Chapter 2 of the Smart Student's Guide, or pick your own favorite).



 

Notes:

[1] Some of the preceding information is based on Daniel R. Tobias, "Dan's Web Tips: Characters and Fonts," http://www.softdisk.com/comp/dan/webtips/char.html (1997-1999; last accessed 14 December 1999)
[2] Information on the Russian font encoding comes from the Cornell University Russian web pages; that on the Chinese encoding, I've known about for so long that I can't recall where it came from. You'll find it mentioned on any number of web sites that deal with reading Chinese characters online.
Comments? Suggestions? Please send them to: weblady@pollycyber.com.