Browsing in Foreign Languages
and Non-Latin Scripts
©1999, 2003 Kathleen Hartford
All rights reserved
NOTE: This page is a supplement to American Politics
and International Relations on the Internet: The Smart Student's Guide,
by Kathleen Hartford (with Bonnie Jeanne Duval assisting), published by
McGraw-Hill. The main web site for the book is at http://www.mhhe.com/hartford/.
International though the Internet may seem, it has a good long way to
go yet, and one of the longest ways of all is in dealing with foreign languages
and scripts other than those we use for plain old English. The words you
see in English are rendered with a character set known as ASCII (American
Standard Code for Information Interchange). Software that has been developed
for an English-language environment will automatically "translate" the
strings of "7-bit characters" (combinations of seven 0s and 1s) in text
files into the appropriate characters for human consumption. There are
only 128 possible combinations of these -- which is a serious limitation
when we start getting into:
-
characters in other languages that use the Latin alphabet, but with diacritical
markings over some letters (è, é, ê, and ë, for
example);
-
languages that use non-Latin alphabets, such as Arabic, Cyrillic (Russian
and some other Slavic languages), Greek, Hebrew, and Hindi; and
-
languages that use ideographic writing systems, which may use characters
running into the thousands or tens of thousands, such as Korean, Japanese,
and Chinese.
Arriving at a way of representing these different characters for computer
systems is no longer a major technical problem. The major problem is getting
everyone using the characters to agree upon using the same encoding system
for representing them![1] Chinese characters, for
example, have two main encoding systems, one of which is primarily used
on the China mainland, and the other of which is mostly used in Taiwan.
Russian has several encoding systems.[2]
Now, before you get too worried, let me hasten to assure you that the
problem has solutions, and for the major Western European languages (type
1), the agreement is consistent enough that the major browsers' more current
versions are equipped to deal with them pretty expeditiously. And there
are ways to get around any problems with the others without too many tears,
either. Just hang on, and this page will answer most of your questions
or point you towards the pages that will.
The key points for you to keep in mind for foreign-script browsing are
basically only two:
-
your browser needs to be told which encoding system is used for the pages
you are viewing, and
-
it may need to have some special fonts for displaying the characters once
it interprets the code for them.
Point one relates to your setting your browser so it can "understand" what
it is "seeing," and point two relates to the browser's showing you
something that makes sense to you.
For those who are using Win2000 and later, along with later versions
of the browser software, you can make your system recognize non-Latin script
by first arranging setting in the Win2000 operating system. (See some nice
clear instructions provided by Prof. Barrett
McCormick for doing this for Chinese characters, and do the analogous
moves for any other script you might want.) Then, if your browser is set
to detect the proper script codes, it may automatically adjust its readings
as it arrives at non-Latin-alphabet pages. If not, you can easily instruct
it to do so (in MS-IE: View | Encoding; in Netscape: View | Character set).
Some people actually prefer to use an older OS or browser version. If
you are one of those folks, you probably need to work through the steps
outlined here. And yes, alas, they are different for the different browsers,
so the discussion is divided accordingly, for MS IE
and Netscape, for use on Windows 95/98 systems.
There may be some slight differences for different releases of these two
browsers, so if yours doesn't look like the pictures provided, check the
help information provided on the company's support pages.
For other systems: I don't have direct experience with Macs
in this area, but you can find some links on this page for informational
sites that discuss them. In general, I hear from some friends that the
Macintosh is somewhat kinder to the foreign language user than is the PC.
If you're using Linux or some other variety of UNIX, I applaud your independence
but you're on your own. But then, you probably expect to be, and like it
that way.
How do you know when you need to adjust your encoding settings or get new
fonts? If you arrive at a web page that is showing bizarre strings of (mostly)
alphabetic characters surrounded by strange markings, you are probably
on a page for which your browser needs help. The country code in the URL
should give you a pretty good idea of which language you're looking at.
For some samples, you can take a look at the gibberish representations
of Chinese, Japanese,
Hebrew,
and Hindi.
Microsoft Internet Explorer (version 5.x)
Internet Explorer makes the adjustments pretty easy to handle.
-
Click on "Help" and then on "Contents and index" to get to the IE Help
Window. (If the contents and index options aren't actually displayed in
that window initially, click on the "Show"
button to make them display on the left side.)
-
Select the "Index" tab, and type the word "foreign" (without the quotation
marks) into the little box provided. You should see an option for "foreign
languages" appear. (For this and the next
two steps, see the picture linked here.)
-
Double-click on "foreign languages," and you will get a little box offering
you two choices.
-
Click on the choice for "correctly display Web pages encoded in any language"
and then click on the "display" button.
-
Carefully read through the instructions on that page, and follow them in
order to enable the proper reading of encoding by your browser.
-
Remember that when you go back to English-language pages, you may need
to change the encoding back to its original setting.
If you also need new font sets to see foreign-language pages properly,
IE will probably prompt you to download them from the Microsoft site the
first time you visit a web page in that language. If you choose not to
download then, you may not get another prompt. However, it's easy to locate
the download page:
-
Click on the Windows Start button, choose Settings, then select Windows
update. This will take you to Microsoft's main update page.
-
Click on the "Product Updates" button and wait
(for what may seem a long time) while your system decides what it already
has. Eventually you should see the main frame displaying a linked list
of items, with detailed descriptions.
-
Scroll almost all the way to the bottom and you will see a listing for
"International Language Support." Read the descriptions carefully, and
click
in the little box next to any (or all) of the support packages you
want to download; then click on the "download" button.
Installation on your system should be automatic, except that you may need
to agree to restarting to make some of the changes take effect.
Netscape Communicator/Navigator (version 4.x)
Setting up Netscape's browser for non-Latin scripts is a bit trickier,
largely because you will have to figure out for yourself which fonts to
get from where. But this is not an insuperable obstacle.
For the encoding settings, you should follow the step-by-step instructions,
with pictures, that Netscape provides on its page "International
Users Basics." These are pretty straightforward, and Mac and UNIX users
will find that Netscape has not forgotten them either. I
strongly urge you to read Netscape's own instructions before you try using
the fonts information given below.
For the fonts, you will want to have the font set for the appropriate
language/s installed on your system before you try changing settings. Netscape
leaves it up to you to find other providers for the fonts. Keep in mind
that these are specific to the operating system; you can't use Mac fonts
on a PC, or Windows fonts on a UNIX machine. Here I provide a list of languages
with some sites for good font sets for download. (Some are free, some are
shareware, and some you have to purchase before you can use them.) You
do need to keep in mind that because these packages are provided by third
parties, they are not always compatible with Netscape's browser, or with
all versions of it.
-
General sites (with or pointing to fonts for many different languages):
-
Yamada Language Center
at University of Oregon has nicely arranged its fonts archive for your
convenience, indicating with icons the operating system for which each
package is designed. You will find uneven results here, however; the "Chinese"
fonts offered are not for Chinese characters at all. The resources listed
for some other languages are quite numerous, though, so browse around here
to see whether you find anything of use for your needs.
-
Fonts in Cyberspace
-
Font Pages (by Luc
Devroye)
-
East Asian (ideographic) languages: These present a real challenge,
both because of the number of characters involved (font sets generally
are very large compared to those for alphabetic languages) and because
of the multiplicity of encoding systems. You can use an application that
is designed as a "passive" one just for browsing, but if you want to do
any text input in the language (e.g., using a search engine), you need
a somewhat fancier package.
-
Arabic
-
Hebrew
-
Russian/Cyrillic. I doubt that you really need to get additional
fonts for Cyrillic alphabet display on the latest versions of Netscape
Communicator; mine seems to be able to load the pages automatically with
correct display of the characters.
-
Other Languages. If you are still looking, you could try the following
approaches:
-
Use AltaVista, Fast, or Northern Light to locate likely font spots, using
a search string like the following:
+font +browser +language-name
where "language-name" is the name of the language you're looking for.
-
Browse the site of a university language department or a university library
at a school that you know has strong offerings in the geographic region
whose language you want to be able to read online. They may have a page
or two providing good tips and further links.
For Chinese, at least, once you've installed the fonts that work with Netscape's
browser, they also work pretty well with the Opera browser too, and they
may work with others.
Some things to watch out for: If you're
using MS-IE and another browser at the same time, you may get interference
by activating two different font packages at the same time. You may have
to turn one off, or close one of the browsers, to prevent crashes or other
strange occurrences. Some packages, Netscape warns, may be incompatible
with its browser. If you experience frequent system crashes with a particular
font package, try uninstalling it and installing a different one. (I have
also run into some problems with crashes when I'm using Chinese or Japanese
font "platform" applications with other programs running, so if I really
need to be sure the system will stay stable, I close all unnecessary applications
before using the font applications.) Things still may not operate perfectly.
With ostensibly the same operating system and same software applications,
I have had no trouble reading some pages on a desktop system that have
utterly confounded my laptop system. Think of such situations as growth
opportunities....
For More Information
If you want to know even more, try using the language or area-related pages
in a good subject guide (see Chapter 2 of the Smart
Student's Guide, or pick your own favorite).
Notes:
[1] Some of the preceding information is based on Daniel
R. Tobias, "Dan's Web Tips: Characters and Fonts," http://www.softdisk.com/comp/dan/webtips/char.html
(1997-1999; last accessed 14 December 1999)
[2] Information on the Russian font encoding comes
from the Cornell
University Russian web pages; that on the Chinese encoding, I've known
about for so long that I can't recall where it came from. You'll find it
mentioned on any number of web sites that deal with reading Chinese characters
online.
Comments? Suggestions? Please send them to: weblady@pollycyber.com.