Mr. Pike, Tear Down This ASCII Wall! 728
theodp writes "To move forward with programming languages, argues Poul-Henning Kamp, we need to break free from the tyranny of ASCII. While Kamp admires programming language designers like the Father-of-Go Rob Pike, he simply can't forgive Pike for 'trying to cram an expressive syntax into the straitjacket of the 95 glyphs of ASCII when Unicode has been the new black for most of the past decade.' Kamp adds: 'For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.' So, should the new Hello World look more like this?"
Yes, Unicode is "the new black" (Score:2, Informative)
Yes, it's the next fad that just _everyone_ has to wear. this season. Within 5 years, it will be something else, and given the ability of major vendors like Microsoft to get Unicode _wrong_, it's not stable for mission critical applications. If you want your code to remain parseable and cross-platform compatible and stable in both large and small tools, write it in flat, 7-bit ASCII. You also get a significant performance benefit from avoiding the testing and decoding and localization and most especially the _testing_ costs for multiple regions.
Look up "microsoft unicode error" on Google for hundreds if not thousands of examples. ASCII for code is like flat text for email. It assures that you're not simply publishing coding spam, and actually wrote what you meant.
We've tried this before (Score:5, Informative)
Everyone who tried to do something useful in APL, put up your hand.
Re:The thing with ASCII (Score:4, Informative)
No we don't (Score:5, Informative)
Because I don't want to have to own a 2000 key keyboard, or alternatively learn a shitload of special key combos to produce all sorts of symbols. The usefulness of ASCII, and just of the English/Germanic/Latin character set and Arabic numerals in general is that it is fairly small. You don't need many individual glyphs to represent what you are talking about. A normal 101 key keyboard is enough to type it out and have enough extra keys for controls that we need.
To see the real absurdity of it, apply the same logic to the numerals of the character set. Let's stop using Arabic numerals, let's use something more. Let's have special symbols to denote commonly used values (like 20, 25, 100, 1000). Let's have different number sets for different bases so that a 3 can be told what base its in just by the way it looks! ...
Or maybe not. Maybe we should stick with the Arabic numerals. There's a reason they are so widely used: The Indians/Arabs got it right. It is simple, direct, and we can represent any number we need easily. Combining them with simple character indicators like H to indicate hex works just fine for base as well.
You might notice that even languages that don't use the English/ASCII character set tend to use keyboards that use it. Japanese and Chinese enter transliterated expressions that the computer then interprets as glyphs. Doesn't have to be that way, they could different keyboards, some of them rather large depending on the character set being used, but they don't. It is easy and convenient to just use the smaller, widely used, character set.
Now none of this means that you can't use Unicode in code, that strings can't be stored using it, that programs can't display it. Indeed most programs these days can handle it, just fine. However to start coding in it? To try and design languages to interpret it? To make things more complex for their own sake? Why?
I am just trying to figure out what he thinks would be gained here. Also remembering that the programming languages, the compilers, would need to be changed at the low level. Compilers do not take ambiguity, if a command is going to change from a string of ASCII characters to a single unicode one, that has to be changed in the compiler, made clear in the language specs and so on.
What about Sun's Fortress language (Score:5, Informative)
Re:Learn2code (Score:3, Informative)
Re:The thing with ASCII (Score:5, Informative)
Japanese is typed using a more-or-less standard QWERTY keyboard.
Tediously.
Re:The thing with ASCII (Score:4, Informative)
Not something as simple as writing ASCII by a long shot.
Re:The thing with ASCII (Score:3, Informative)
I recommend that everyone GOAT SEe the parent video ASAP
Re:Project Gutenberg (Score:5, Informative)
This is untrue.
First off, Simplified and Traiditional characters are separated in Unicode.
Second off, Cyrillic characters and Latin characters have always been considered two different scripts, while Chinese logographs are considered to be the same script, used in different contexts.
See http://unicode.org/notes/tn26/ [unicode.org].
In any event, it would make good sense for programming environments to be able to handle Unicode source.
Re:The thing with ASCII (Score:5, Informative)
Re:Would it be less tedious to have 10,000+ keys? (Score:3, Informative)
Re:The thing with ASCII (Score:3, Informative)
If you want to test and/or frustrate a newbie, replace one of those in their program and see how long it takes them to fix it.
The first time I ran into something like that it took me a good while. I ended up comparing hex dumps to find it. I should have just retyped the suspect code sections from scratch instead, but I was determined to get to the bottom of it and find out exactly why it crashed.
I certainly turned me back into an ASCII fan.
Re:The thing with ASCII (Score:3, Informative)
Japanese characters are mostly sound-based rather than meaning-based, though a single Japanese character will generally map to two latin characters.
I assume you're referring to the katakana, here... So, yes, using a phonetic set of approximately 50 characters, your writing will be sound-based.
Unfortunately, you are also underinformed, as there are actually 3 character-based written languages in use in Japanese writing [wikipedia.org].
Part of the problem, here, would be that the same (spoken) word can refer to many different concepts, and the (non-phonetic) written language reflects the meanings, rather than the pronunciation. For example:
Some Japanese words are written with different kanji depending on the specific usage of the word—for instance, the word naosu (to fix, or to cure) is written as "" when it refers to curing a person, and "" when it refers to fixing an object.
Bah, slashdot apparently doesn't like my attempt to use the characters. Whatever, the quoted text is from the linked article.
Re:limiting? (Score:3, Informative)
the chinese have problems to learn his own language, because have all that signs, it make it unncesary complex.
26 letter lets you write anything, you dont need more letters, really. ask any novelist.
also, programming languages are something international, and not all keyboards have all keys, even keys like { or } are not on all keyboards, so tryiing to use funny characters like ñ would make programming for some people really hard.
all in all, this is not a very smart idea , imho
Judging by your post, it appears that you have problems learning your own language. It certainly appears that simple spelling, capitalization, punctuation and correct grammar in the English language are apparently beyond your abilities.
Re:Yes, Unicode is "the new black" (Score:0, Informative)
Windows Find/Search cannot find matches in Unicode text files, surely one of the simplest file formats of all, even though the command line FIND tool can (unless you install/enable Windows Indexing Service which then cripples the system with its stupid default indexing policies). This has been broken since Windows NT 4.0.
You cannot find in files at all in Windows 7's Explorer without indexing enabled: it's 100% broken. All it shows is how much Microsoft cared about fixing the non-default configuration, which is to say, they didn't care. You've only shown the responsible MS team's ineptitude, not some greater impossibleness of proper Unicode handling.
Article author didn't read spec (Score:2, Informative)
Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?
The go spec [golang.org] is defined in terms of unicode, and specifically gives non-ascii characters as example identifiers. Go source code is defined to be UTF-8.
Re:Go Cry at the Romans (Score:3, Informative)
I've read that story before, and it's very neat. It's just too bad there's so little truth to it. Here's an example where it really falls apart: "As the railroads were built they were built using the same standard width of all the wagons since the tools had been standardized to that width." Anybody with casual knowledge of railway history should remember the crazy profusion of different -- widely varying -- gauge standards in the early days.
Re:The thing with ASCII (Score:2, Informative)
Millions upon millions of Japanese (and some non-Japanese, like myself) have found the IMEs to be more than satisfactorily efficient and easy to use. Not only that, but they sometimes have predictive input as well (especially on cell phones), which makes typing in Japanese even faster and easier.
French and English are quite different (Score:2, Informative)
I worked for a Canada-based company and one of the magazines in the break room was Forces Quebec. It was something about packaging technology and had the articles written in both English and French, as is standard in Canada.
The bilingual nature isn't what caught my eye, though. What caught my eye was the fact that the typeface for the French articles was just plain smaller in order to fit more text in a certain space. It looked to me like the same page real estate was dedicated to each language, but the typeface for the French text was set to a smaller point size with tight kerning and spacing.
No wonder French people talk so fast. They have to!
In fact, when I mentioned the same thing to one of my coworkers, a Mexico native, he wasn't surprised at all. He said the same is true for Spanish as well.
When he told me that, I remembered Cheech Marin's "Born in East L.A." where he sings about being deported to Mexico despite being a US citizen "Next thing I know I'm in a foreign land. People talkin so fast I could not understand."
Re:The thing with ASCII (Score:2, Informative)
You know, this was tried. It was called APL. It sucked, and I mean, like the environment outside the ISS.
We like our set of alphanumerics because it's easy to recognize, easy to compound into much more complex entities that are *also* easy to recognize, and it leverages an entire lifetime of familiarity with text.
So please. Go away. Go away yelling about glyphs, or go away quietly, but just... go away.
Re:The thing with ASCII (Score:2, Informative)
I don't know the kanji for "bara", but I've definitely seen "kani" any number of times---not in texts, but definitely on signs and labels.
"Arigatou" is certainly not something you'd see in kanji in texts, but I've been mailed with the kanji any number of times (and you'll certainly see it in the form "arigatai"). I doubt there's a junior high school graduate in this country who doesn't know the kanji for that.
Re:Would it be less tedious to have 10,000+ keys? (Score:4, Informative)
It'll be interesting when you go to write some Perl code with your pen+tablet. The text recognition assumes you're writing in a natural language, so braces and punctuation are often tedious to get right. Write some basic Perl (with hashes, arrays, and some scalars) on your local handwriting-recognizing device, and let us know how amusing it is.
Re:The thing with ASCII (Score:4, Informative)
You know, this was tried. It was called APL. It sucked, and I mean, like the environment outside the ISS.
I thought it sucked. You thought it sucked. A load of guys from the maths department that wanted to do quick mathematical computations loved it. APL [wikipedia.org]was not meaningless symbols to everyone.
Bob Bemer birthed backslash (Score:1, Informative)
Wikipedia claims that ASCII grew the backslash [\] specifically to support ALGOL's /\ and \/ Boolean operators. No source is provided for the claim. ftfa
Here's one of the two sources that Wikipedia cites, straight from the inventor of the backslash: HOW ASCII GOT ITS BACKSLASH [bobbemer.com] citing his book [ R.W.Bemer, "A view of the history of the ISO character code", Honeywell Computer J. 6, No. 4, 274-286, 1972 ]
"I had called a joint meeting of IBM, SHARE, and GUIDE, to regularize the IBM 6-bit set to become the standard BCD Interchange Code [76]. Frequency studies of symbol occurrence had been prepared, particularly from ALGOL programs. The meeting of 1961 July 6 produced general agreement on a basic 60-64-character set, which included the two square brackets and the reverse slant, which was chosen in conjunction with "/" to yield 2-character representations for the AND and OR of early ALGOL. This is reflected in the set I proposed to ANSI X3.2 on 1961 September 18."
(Note: I had put the backslash in position 5/15. It enabled the ALGOL "and" to be "/\" and the "or" to be "\/".)
Apparently [thocp.net] he also invented ten other ASCII codepoints (called himself the father of ASCII), timesharing, escape sequences, the Y2K bug, word processors... and COBOL.
Unicode in C, C++ and Perl (Score:2, Informative)
One thing many people aren't aware of is that for several years now (since GCC3), GCC and G++ accept UTF-8 as their default input encoding, and internally store narrow and wide strings as UTF-8 and UTF-32, respectively. It's recoded to the output stream locale when you do any output. This means you can write your source code in Unicode (in strings and comments at least) and it all works perfectly. It has full support in the C and C++ standard libraries. I've been using it for years; it works perfectly. It would be nice to get support for UTF-8 symbols in the linker, so we can have UTF-8 variable names as well. The same applies to Perl, though perl6 even gives you the ability to have Unicode operators, and possibly variable names.
I do routinely use UTF-8 symbols in R (example: "deltaCt" can be replace with the actual Delta symbol [Slashdot ate the Unicode--seriously poor!]). It makes the code more readable, and entry isn't the massive issue people make it out to be. AltGr/compose keys handle the common symbols, and you can look up the few odd ones that aren't in the compose tables.
Having the ability to use Unicode does not in any way detract from the ability to use ASCII. Since ASCII is a strict Unicode subset, the ability to use Unicode imposes zero overhead on those who wish to stick with ASCII, so the extent of the hate seen for wanting a bit of progress is a bit shocking. People pointed out how unreadable code could be made, but the reality is that when used sensibly and judiciously, it can make code more concise and readable.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776 [debian.org] for information about some of the issues.
Having native Unicode support end-to-end by default is still a goal we want to achieve; the ASCII C locale is the last holdout. Getting a UTF-8 C locale is the last remaining step, though it'll take a few years to get there.
Regarding editing Unicode sources, both Emacs and vim have pretty decent Unicode support, and Linux distributions have had unicode support for a decade now, and really good support for at least six years. Broken tools are no longer an excuse for not using Unicode.
Regards,
Roger
Re:The thing with ASCII (Score:1, Informative)
So, does this mean you've got Monochromacy? If this is the case, how good is your vision overall? Most people with Monochromacy typically can't see well enough to use a computer without a text-to-speech converter. If you don't have Monochromacy, then perhaps you could shift the colors so that you can see it. In it's stock configuration, if you've got red-green colorblindness, you'd have a difficult time using the language without modifying the color rules. As it stands, I'd still have to agree, it causes it's own set of problems, but there's an "expanded" version of "Color Forth" that's not as terse that doesn't rely on color for hints, that you could probably still use if there's a problem.
Re:The thing with ASCII (Score:3, Informative)
They have a point though. Presumably, if typing something up you would have to look back and forth between the source text and the screen as opposed to English where you can stare at the source text and be sure that when you press the "a" key you get an "a".
Re:The thing with ASCII (Score:3, Informative)
My experience is with Japanese but they share the Chinese writing system (as well as their own).
While there are a large number of symbols most of them are made up of two or more other, simpler symbols. If you find a symbol you don't know you can often guess the general meaning just from the simpler ones it is made up from.
That is not totally unlike how words in English work. Often they are made up of smaller parts or derived from other words.
To bring this back to programming I'm not sure there is much to be gained by extending the available symbols. I don't feel any great desire to type the greater-than-or-equal-to symbol instead of >=.
Re:The thing with ASCII (Score:3, Informative)
I've also thought it would be good to be able to make use of mathematical symbols for, you know, mathematics. The same could be said of word processor-like formatting for comments. I'm dubious about using it for actual code, but I'm open to having my mind changed about that.
Yeah, I like the idea of TeX-style typing that autoparses to a "nice" display. You can edit the display or drop to TeX (or Maple or whatever) input if you need more specificity.
I'm not sure the benefit conveyed is sufficient to overcome the awkwardness (if you've ever used a Maple worksheet for programming, you'll understand what I mean), but I would like to see an editor take advantage of the beauty, even if the code itself is ASCII.
-l