Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Software Internationalization 29

Anonymous Coward writes "It seems that the folks over at O'Reilly have quietly released a book entitled, "Java Internationalization". The website for the book can be reached from the Java O'Reilly site, . The authors also have a website dedicated to the book. I'm curious as to how developers are treating software internationalization, not just in Java, but in other programming languages like C#, C++, Perl. For software designers out there today, is internationalization and localization a forethought or an afterthought? Is Java the only viable language for writing truly multi-lingual applications?"
This discussion has been archived. No new comments can be posted.

Software Internationalization

Comments Filter:
  • Non sequitur (Score:4, Informative)

    by babbage ( 61057 ) <cdeversNO@SPAMcis.usouthal.edu> on Thursday November 29, 2001 @05:58PM (#2633690) Homepage Journal
    Why should Java be the only internationalizable -- ugh, that's too long -- the only I10N-able language? If you put in a bit of forethought -- abstract all your strings out into language specific resource files / db entries / whatever, ditto for images -- then a basic level of I10N should be and in fact is achievable in any programming language.

    The tricky part has nothing to do with coding language preference, but in the overall design of the application itself. Provided that you can come up with acceptable translations of all your output strings -- which itself can be tricky -- that still doesn't really address more subtle interface issues you might face, depending on what you're trying to do.

    For web design, it could be worthwhile to have drastically different versions of your content for different locales -- IKEA and the BBC are interesting case studies for this. For other applications, one interface framework might be fine, but really this involves a lot of work and study of your target audience, and it goes far beyond (and is much more interesting than) the question of what language you code in.

    That said, Unicode is a truly terrifying thing, and any language that makes it easier to work with is a welcome thing. Java supposedly uses Unicode internally, and if that helps as much as it seems like it should then great. Otherwise, or maybe even still, you face a much gentler slope in going to other Latinish languages (most of the European ones and any of the others that have adopted that alphabet or at least have a cultural standard for & acceptance of it (thus Japanese counts, Chinese doesn't), to anything with a much different character set (Russian, Arabic, Hebrew) and beyond (the CJKV languages -- Chinese, Japanese, Korean, Vietnamese).

    I can deal with the prospect of planning for French, German, Spanish, and Italian versions of work that I do, but having to go beyond that is a very daunting prospect. And, of course, and interesting one... :)

    • Ummm...Vietnamese uses the Roman alphabet, albeit with some wacky diacritical marks. I suppose historically they have used Chinese characters, but I don't think that's what you mean. ;)
      • Yeah, I know -- my fiance was born in Vietnam :)

        Still, that's just the term I'm aware of -- CJKV, referring to those four languages. I'm assuming that it's because those accent marks are used so heavily that it might as well be a different alphabet, albeit one that looks a lot like the Latin alphabet. Something like how the Slavic [Russian etc] alphabet is an evolutionary descendant of Greek & Latin, Greek grew out of Phoenician [? I think that was the ancestor alphabet...?], etc.

        But hey, don't take my word for it, check out the obligatory O'Reilly book [oreilly.com]... :)

        • Re:Non sequitur (Score:2, Interesting)

          by Moridineas ( 213502 )
          Right--sort of. Latin evolved further from Greek basically. Russian is directly from the greek (or, from the Greek w/o the Latin intermediary) and English is pretty much the Latin script.

          Arabic is also Right-to-left which can be trouble, though there are still a small number of characters (compared to say Chinese).

          the problem with an alphabet like Arabic is not only the storage, but the display. Different letters have different shapes, depending on where they occur. Vowels aren't usually written, but probably a good idea to store them, so display them or not. And so on. and of course right-to-left, when 99% of computer design is oriented left-to-right.

          Scott
      • Ever tried looking at a Vietnamese website? Their written language isnt covered by the standard ASCII or EBCDIC character set, as I have seen. But nearly all libraries and interface drivers dont mangle characters as they pass through. The biggest problems I have seen are in implementing the UI correctly. Japanese alone has two or three encodings in wide use(one is almost dead now, and EUC is becoming defacto). And even beyond THAT, at least in the non-shrinkwrap software business, is the localization of functionality. Japanese businesses do a lot of things like in America and in Europe, but they have a lot of their own touches that allow them to provide services to their customers the way their culture dictates. Dont worry too much about translation and encodings; worry more about targeting your customers correctly! That really is what will help determine the success and acceptance of your product, whatever it may be. People throughout the world like things American and European, but they still have their own way of doing things!:)
        • Not really a direct reply to what you're saying, but this is also an interesting issue in coding (as opposed to finished software -- I just mean the code here).

          <anectode>When I was a beginning computer science student, one of our assignments was to find an implementation of the insertion sort algorithm & reimplement it in C++. The only copy of it I could find was from a Venezuelan web site, so all the variables & functions were named in Spanish. This really confused me at first, trying to decipher both C++ and Spanish at the same time, but it made me realize that everything that I'm writing is "in" English, even if I'm writing English Perl or English C++ or whatever. </anecdote>

          A few weeks ago, I read an article that made pretty much the same point, but in reverse -- an American programmer was asking a Panamanian [sp?] programmer if it was annoying to have to look at more or less all reference material in American English. The reply was very interesting to me -- he basically replied by asking if you've ever read sheet music, and been annoyed by all the italian on there: allegro, sotto voce, con vivissimo, etc. Usually this is seen as charming, and just part of the learning experience when you learn to read music -- and not as any kind of cultural imperialism on the part of the Italians.

          It seems like reading & writing software might be on track to be the same way. If for some reason people are still manually writing programs 500 years from now, they might be making software to run in whatever their vernacular is, but maybe the written code will itself use American English notation from the Digital Age, just as musicians today use Italian notation from the Renaissance. I like that idea... :)

          Even if the variables & functions/subs are in the vernacular, the builtins -- for, if, while and so on -- are in English, so the issue isn't really avoidable unless you're using a language that was designed from scratch to use some other [human] reference language. The only really non-American/English one I can think of at the moment is Ruby, and even though it's Japanese it still uses the English conventions. I think this is a sign of how deeply embedded this has become already, and we're only 50 years into the age of computers -- a digital renaissance :)

  • by DeadSea ( 69598 ) on Thursday November 29, 2001 @06:00PM (#2633703) Homepage Journal
    I have a couple of internationalized java programs I've written. i18n works well in java but its not so much a language feature as a couple of simple libraries.

    To internationalize, put all of your translatable strings, images, and formats into a resource. Your resource can be a text file, or an image, or whatever. Your must then get all the information from resources.

    The basic idea is that you have a resource that needs to be translated: resource.txt. Your program determines the locale (say US_en) and then fetches resource.txt.en.US. If then merges that with resource.txt.en and resource.txt. The nice thing is this works even if you can't list your files (they may be on a web server for example). Also, because you are merging files, if something is the same for USA and Great Britian, it can go in resource.txt.en and you don't have to duplicate work in .US and .GB.

    Besides having the libraries to handle this stuff the only thing that java makes it easy to do is determine the current locale. But the concept is simple and with a couple weeks of work you could have similar libraries up for any language.

  • Just Java? No way. (Score:3, Informative)

    by imrdkl ( 302224 ) on Thursday November 29, 2001 @06:14PM (#2633759) Homepage Journal
    Is Java the only viable language for writing truly multi-lingual applications?"

    Disclaimer - I been out of the shrinkwrap game for awhile. The following may be out of date.

    Most commercial apps I've worked with have a core in C or C++, then port the UI to whatever is available. Nearly all Adobe apps, for example, have a cross-platform core, and a localization specific to platform. Macs get MPW code (and alot of Rezedit), Windows gets VC + properties files (or whatever windows gets these days), and Unix gets X (or your favorite UI API).

    Nowadays, string localizations may be done more and more in the specific country, but this is possible in Java as well.

    Sigh, most real client app companies (in my limited experience) which are truly shipping to more than a very few countries are still willing to trade off the pain of porting the UI for the stability of the shared core in C or C++.

    The great part about java is still that it can be an dynamically configurable server app for many languages and people at the same time. That could be the way of the future, or not. I aint gonna wax philosophical in Developers.

    • er, I'd love to know where to get Unix versions of those Adobe apps.... (Photoshop and the SVG viewer to name just two...)
      • Like I said, it's been awhile. I guess PS never made it, no big money in it, natch. But I have illustrator for unix, anyways. There are really two issues to this, one localization to country/language, and the other cross-platform. The country/language part is perhaps easier with Java, but the core is the core, and the UI is easier to port, than to face the cost (and bloat alright? I said it. so slay me) of a java core.
      • adobe used to sell photoshop for unix, it only cost like 3,000 USD
  • Microsoft (Score:3, Informative)

    by Karma 50 ( 538274 ) on Thursday November 29, 2001 @06:17PM (#2633773) Homepage
    You may not like this advice but Microsoft have lots of information on i18n and l10n [microsoft.com]

    Some is Windows based, obviously, but some isn't.

    It's a good reference if you're not ideologically opposed to visiting some sites [microsoft.com]
  • Most languages that I program in either have an I10n or similar module or library that makes abstracting I10n issues out of your code and centralize them in a place allowing you to easily add new supported languages to your code base.

    I never really thought about I10n until the last client I worked at where we had 8 different languages to cater to. Some of them could be done with the various iso-8859-x sets, but some of them required Unicode which caused real issues for us.

    Now I think about I10n while doing the initial design rather than afterwards because it makes things a whole lot easier--even if you never really need it.

    If you're doing any kind of Open Source project, I would seriously consider I10n issues at the beginning of the project since there are lots of people who speak lots of languages out there. It doesn't mean you have to do all the translations--just have to be able to support them. If it's important, you'll find volunteers to do translations for you. If not, you can always use babelfish or some other translator and attempt the translation yourself and someone will get irked enough to "fix" it for you.
  • Is Java the only viable language for writing truly multi-lingual applications?

    As someone who is unfamiliar with Java, let me turn this question around -- what features does Java have that makes you think it's such a great choice for internationalization? Are they unavailable in other languages in common use?

  • Java not only handles unicode but also does code page translation.

    For example, I had to port an ASP app that used an Access database with Big5 Chinese data. The web pages it output were also Big5. I used Java to convert the data to UTF8 and loaded it into Postgres. A servlet grabs the UTF8 data from the database, Java stores the data as UTF16 internally, and the servlet produces either Big5 or UTF8 web pages, depending on the user's preference. It only took a couple lines of code to make this happen, because Java can convert from its internal Unicode format to other codepages. I believe that the same applies for other languages (e.g., KOI8 Cyrillic).

    Unicode is definitely the standard of the future, and it also allows for easier transfer of data between applications that can't handle CJKV multibyte character sets.

    Of course, I don't know Chinese, which made this a fun project :)
    • This is a nice feature of the Java API, but you can achieve the same result using the UNIX (tm) libiconv implementation. If your UNIX (tm) doesn't have one (or you use Linux, BSD, etc), then there is a Free Version [gnu.org]. It will do all the conversions for you, for many character sets. Most current *nix distributions include this as a package.
  • xml / xsl ? (Score:3, Insightful)

    by spike666 ( 170947 ) on Thursday November 29, 2001 @11:35PM (#2634788) Journal
    i would imagine that by designing and implementing an application that allows you to use externalized strings, you could easily switch presentation languages. XML/XSL is just the current sexy way to do that easily.

    Apple's MacOSX does that - they have localizations by language that install with every OSX Carbon or Cocoa application. (though they dont use XML for the actual string table - most likely they are using some sort of hash table for speedier access)

    theres tonnes of ways to do it, and none of them require you to go one language or another. java is just a nice way and the personal preference of many.
  • I18n (Score:2, Interesting)

    by Anonymous Coward
    There is so much more to i18n than translating text messages. The biggest problem is character encoding.

    I work in Japan, only work in Japanese and we only have products for the Japanese market. Yet, most of our time is spent dealing with i18n issues - converting to and from different encodings (shift-jis for MS and EUC-JP for Linux). There are several other encodings used in other areas as well.

    The reason Java is so good in this environment is that the internal encoding is all unicode. Therefore we just have to translate encodings at input and output and everything else works with very few problems. (Having said that, even though Java support for multibyte character sets is very good there are still a few gotchas to watch out for). The whole API and 3rd party software is then available for use without limitation. I don't think this can be said for many other programming environments.

    Slightly off-topic but the take up of Linux in places such as China and Japan will be greatly accelerated if flagship software, such as Nautilus, would work in a multibyte character environment at version 1.0
    • I agree with this assesment. The reason that Java is so nice for I18N is that the internal representation is Unicode. It makes it so easy to have output in UTF-8 (one of the current most popular Unicode encodings). This is really nice for web browsers, because the current Netscape [netscape.com], Mozilla [mozilla.org], and IE [microsoft.com] all have very good UTF-8 support. Although I put down Microsoft all the time, I give them credit for a very good implementation of UTF-8 and font support for multi-lingual applications. The Mozilla team is right on their heels, however, to the point of now supporting Arabic glyph shaping. If you don't know what that is, Arabic text changes the shape of the characters depending on the context. Therefore, you can't use a simple font encoding where code 0xblahblahblah uses font glyph 0xblahblahblah. You have to analyze the data to produce a proper representation.
    • Re:I18n (Score:2, Informative)

      by BdosError ( 261714 )
      And another issue, that Java deals with, is text direction. At the simplest level, this can just be left-to-right or right-to-left, but Java also handles mixing different languages and thus text directions. Think about the hassles of embedding r-t-l text in l-t-r text e.g. A Hebrew quote/name inside English text. Especially, consider text selection as you select from the English text into the Hebrew! Java's I18N package can handle this. There was a good discussion of this a couple of years back in Java Report [javareport.com]. This article [ibm.com] at IBM's DeveloperWorks looks to be similar to what I remember, and dicusses the Arabic lettering issues.
  • Part of the problem is that there is no agreed-upon implementation. The POSIX group [ieee.org] could not choose between X/Open's [xopen.org] catgets implementation and GNU's [gnu.org] gettext, and as such, left it out of the standard entirely. Another problem with both toolsets is that neither presents a truly extensible strings database format. If you need to add additional storage fields to the strings database for a language other than C, you're out of luck if you plan to use the library and tools on the same files. Very short-sighted IMHO.
  • Java has outstanding support for writing internationalized applications. It's not so much the language itself as the standard libraries.

    I've done some work in this area. Here's what Java supplies:

    • Unicode support. It's everywhere. Right down to the String constructor [sun.com]. The standard I/O classes have built-in Unicode support too, and they make a clear distinction between "reading bytes from a file" and "reading characters". I don't know any other language whose standard library is so hardcore about this.
    • Locale-specific formatting of dates [sun.com] and numbers [sun.com]. (That's right - "multi-lingual" is only part of the problem.)
    • Functions for handling time zones [sun.com], because you'll need to handle time zone differences.
    • Support for looking up localized strings. In Java you use ResourceBundles [sun.com].

    "Library design is language design." Believe it. All languages have strengths and weaknesses. I18N support is a major strength for Java.

    Java.sun.com has an I18N tutorial [sun.com].

"Why should we subsidize intellectual curiosity?" -Ronald Reagan

Working...