Mr. Pike, Tear Down This ASCII Wall! 728

Posted by samzenpus on Sunday October 31, 2010 @07:55PM from the fresh-design dept.

theodp writes "To move forward with programming languages, argues Poul-Henning Kamp, we need to break free from the tyranny of ASCII. While Kamp admires programming language designers like the Father-of-Go Rob Pike, he simply can't forgive Pike for 'trying to cram an expressive syntax into the straitjacket of the 95 glyphs of ASCII when Unicode has been the new black for most of the past decade.' Kamp adds: 'For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.' So, should the new Hello World look more like this?"

Mr. Pike, Tear Down This ASCII Wall!

This discussion has been archived. No new comments can be posted.

Search 728 Comments Log In/Create an Account

Comments Filter:

Fortress allows Unicode, but has ASCII equivalent (Score:4, Interesting)

by thisisauniqueid ( 825395 ) writes: on Sunday October 31, 2010 @08:24PM (#34084074)

Fortress [wikipedia.org] allows you to code in UTF-8. However it has a multi-char ASCII equivalent for every Unicode mathematical symbol that you can use, so there is a bijective map between the Unicode and ASCII versions of the source, and you can view/edit in either. That is the only acceptable way to advocate using Unicode anywhere in programming source other than string constants. Programming languages that use ASCII have done well over those that don't, for the same reason that Unicode has done well over binary formats.

Re:This is nonsense (Score:3, Interesting)

by Twinbee ( 767046 ) writes: on Sunday October 31, 2010 @08:26PM (#34084096)

One day, I think we'll have a universal language that everyone uses (yeah English would suit me, but I don't care as long as whatever language it is, everyone uses it). Efficiency would rocket through the roof, and hence we'll save billions or trillions of pounds.

In the same way, we'll all be using a single programming language too (even if that language combines more than one paradigm). Yes competition is good in the mean time, but I mean ultimately. It'll be as fast as C or machine code, but as readable as a much higher level language. It won't have baggage such as headers or be unnecessarily verbose either.

Until that point, we need to do a lot more to improve languages, and it won't just be deckchair arranging.

Haskell (Score:3, Interesting)

by kshade ( 914666 ) * writes: on Sunday October 31, 2010 @08:28PM (#34084114)

Haskell supports various unicode characters as operators and it makes me wanna to puke. http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource [haskell.org] IMO one of the great things about programming nowadays is that you can use descriptive names without feeling bad. Single character identifiers from different alphabets are something that rub me the wrong way in mathematics. Keep 'em out of my programming languages!

Bullshit from the article:
Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?
OmegaZero is at least something everybody will recognize. And why would you name a variable like that anyway? It's programming, not math, use descriptive names.
But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?
Because we're not using the same IDE?
And need I remind anybody that you cannot buy a monochrome screen anymore? Syntax-coloring editors are the default. Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?
... what?
For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.
... WHAT? If you don't express your intentions clearly in a program it won't work!
And, yes, me too: I wrote this in vi(1), which is why the article does not have all the fancy Unicode glyphs in the first place.
vim does Unicode just fine. And from the Wikipedia entry on the author (http://en.wikipedia.org/wiki/Poul-Henning_Kamp):
A post by Poul-Henning is responsible for the widespread use of the term bikeshed colour to describe contentious but otherwise meaningless technical debates over trivialities in open source projects.
Irony? Why does this guy come off as an idiot who got annoyed by VB in this article when he clearly should know better?

Re:limiting? (Score:4, Interesting)

by Sycraft-fu ( 314770 ) writes: on Sunday October 31, 2010 @08:29PM (#34084116)

For that matter, we could probably even get away with less letters. Some of them are redundant when you get down to it. What you need are enough letters that you can easily denote all the different sounds that are valid in a language. You don't have to have a dedicated letter for all of them either, it can be through combination (for example the oo in soothe) or through context sensitivity (such as the o in some in context with the e on the end). We could probably knock off a few characters if we tried. If that is worth it or not I don't know but we sure as hell shouldn't be looking at adding MORE.
Also in terms of programming a big problem is that of ambiguity. Compilers can't handle it, their syntax and grammar is rigidly defined, as it must be. That's the reason we have programming languages rather than simply programming in a natural language: Natural language is too imprecise, a computer cannot parse it. We need a more rigidly defined language.
Well as applied to unicode programming that means that languages are going to get way more complex if you want to provide an "English" version of C and then a "Chinese" version and a "French" version and so on where the commands, and possibly the grammar, differ slightly. It would get complex probably to the point of impossibility if you then want them to be able to be blended, where you could use different ones in the same function, or maybe on the same line.

Re:Perl 6 (Score:3, Interesting)

by russotto ( 537200 ) writes: on Sunday October 31, 2010 @08:35PM (#34084168) Journal

Sure, but Perl is often derided as a "write only language", and Perl 6 is simply continuing the tradition.

Idiocracy Hospital Keyboard (Score:3, Interesting)

by theodp ( 442580 ) writes: on Sunday October 31, 2010 @08:36PM (#34084182)

From Idiocracy: Keyboard for hospital admissions [flickr.com]

Re:huh (Score:3, Interesting)

by jonbryce ( 703250 ) writes: on Sunday October 31, 2010 @08:44PM (#34084238) Homepage

No, but I think the idea of being able to draw flowcharts on the screen and attach code to each of the boxes could be an idea that has mileage.

Re:Examples? (Score:2, Interesting)

by izomiac ( 815208 ) writes: on Sunday October 31, 2010 @08:53PM (#34084292) Homepage

From TFA apparently he wants to be able to use (Omega) to name a variable, and ÷ (Division Sign) as an operator. My interpretation of his opinion is that a descriptive name for a variable is inferior to using greek letters, and that using mathematical operators that take an extra five or so keystrokes are superior to the standard +-*/^ set that people have become accustomed to.

IMHO, if you use more than 26 single letter variables something is seriously wrong, and trying to make mathematical formulas pretty in code isn't practical without a whole lot of unneeded complexity. Sure, having an eight line formula with fractions within fractions and tiny exponent numbers might be (slightly) better than five layers of parenthesis, but you aren't going to get that with just unicode (AFAIK), and the pain of dealing with a slightly misplaced term confounding the unicode to math converter isn't one I'd like to experience. Unicode or even LaTeX code for comments might be useful though.

Re:huh (Score:5, Interesting)

by CensorshipDonkey ( 1108755 ) writes: on Sunday October 31, 2010 @09:07PM (#34084366)

Have you ever used a visual diagrammatic code language before, such as LabView? Every scientist I've ever met that had any experience writing code vastly prefers the C based LabWindows to the diagrammatic LabView - diagrammatic is simply a fucking pain in the ass. Reading someone else's program is an exercise in pain, and they are impossible to debug. Black and white, unambiguous plain text coding may not be pretty to look at but it is damn functional. Coding requires expressing yourself in an explicitly clear fashion, and that's what the current languages offer.

Re:We've tried this before (Score:5, Interesting)

by SimonInOz ( 579741 ) writes: on Sunday October 31, 2010 @09:08PM (#34084368)

Incredibly, I worked for a major investment company who had, indeed, done something useful in APL. In fact they had written their entire set of analysis routine in it, and deeply interwoven it with SQL. I had to untangle it all. (Would you beleive they had 6 page SQL stored procedures? No, nor did I - but they did).
APL is great sometimes - especially if you happen to be a maths whizz and good at weird scripts. Not exactly easy to debug, though. Sort of a write-only language.
For the last ten plus years, we have been steadily moving in the direction of more human readable data - the move to XML was supposed to be a huge improvement. It meant you could - sort of - read what was going on at ever level. It also meant we had a common interchange between multiple platforms.
So you want to chuck all that away to get better symbols for programming? No, I don't think so.
I must point out that the entire canon of English Literature is written in - surprise - English, and that's definitely ascii text. I don't think it has suffered due to lack of expressive capability.
What does supriose me, though, is how fundementally weak our editors are. Programs, to me, are a collection of parts - objects, methods, etc, all with internal structure. We seem very poor at further abstracting that - why, oh tell me why, when I write a simple - trivial - bit of Java code, do I need to write funtions for getters and setters all over the place - dammit, just declare them as gettable and settable - or (to keep full source code compatibility) the editor could do it. Simply ,easily, tranparently. And why can't the editor hide everything except what I am concerned with?
Microsoft does a better job of this in C#, but we could go much, much further. We seem stuck in the third generation language paradigm.

Re:Would it be less tedious to have 10,000+ keys? (Score:3, Interesting)

by MichaelSmith ( 789609 ) writes: on Sunday October 31, 2010 @09:23PM (#34084488) Homepage Journal

But few people really look at keyboards. Our fingers know where the button will be. I don't want to hunt and peck for special characters.

Re:The thing with ASCII [COBOL 2.0?] (Score:5, Interesting)

by Tablizer ( 95088 ) writes: on Sunday October 31, 2010 @09:37PM (#34084604) Journal

This proposal isn't about giving programmers more power to code, it's about making it easier for non-english speakers who aren't coders to read the code that their programmers write.
COBOL was originally designed so that managers and customers could read it. But in practice they rarely did because programming logic is typically too low-level and requires knowing the technical context to understand by a non-programmer and/or non-team member anyhow. Being "English-like" or grammatically proper didn't really help that goal in practice. This is why the idea was abandoned in later languages.
Perhaps it's comparable to legalese. Making it proper English doesn't necessarily improve readability by non-lawyers. It's still gibberish to most of us without a legal background.
It's not worth-while to slow down production programmers in a trade for the rare case where non-programmers will want to read code for an actual need (not just curiosity). Thus, it's an uneconomical requirement as long as there is such a trade-off.

Re:The thing with ASCII (Score:2, Interesting)

by angus77 ( 1520151 ) writes: on Sunday October 31, 2010 @10:02PM (#34084774)

Japanese is typed using a more-or-less standard QWERTY keyboard.
Tediously.
Not in the least. I do it every day at work. It takes little more effort than writing in English. Unless, of course, your Japanese reading skills are not up to the job---but that won't be the fault of the keyboard.

Please let me emphasize that typing with a QWERTY keyboard is the standard way of typing in Japan. In fact, despite the existence of other methods, I don't know a single person who actually uses those methods.

Re:Yes, Unicode is "the new black" (Score:3, Interesting)

by shutdown -p now ( 807394 ) writes: on Sunday October 31, 2010 @10:24PM (#34084948) Journal

From your reference to Latin-1, I suspect you're from Western Europe, then. If so, then you guys didn't have it all that bad - most non-Unicode-aware apps are not truly ASCII (since we don't have 7-bit bytes around), and so the default encoding more often than not is Latin-1. Even if Americans mostly use it for "funny chars" like special quotation marks etc, you end up with a bunch of useful symbols as well. And your text doesn't end up all garbled.
For folks from Eastern European countries, especially those with non-Latin-based alphabets - like mine - it's a rather different story. Extrapolating that, it must really suck for people with more "exotic" requirements, like Arabic or Chinese...

Re:The thing with ASCII (Score:2, Interesting)

by BrokenHalo ( 565198 ) writes: on Sunday October 31, 2010 @11:14PM (#34085244)

Seriously. This programmer (I use the term loosely) has problems with expression? If this is the case, he needs to go back to school and try learning assembly or fortran programming. Any program worth writing can be coded in fortran, and if it can't be coded in assembler, then it can't be done at all.

If he really wants to go into creative writing, we might remind him that the 26 letters of the alphabet were good enough for Shakespeare.

The trouble with huge character sets. (Score:3, Interesting)

by Animats ( 122034 ) writes: on Monday November 01, 2010 @12:21AM (#34085700) Homepage

This has come up in the context of domain names, where a long, painful set of rules has been devised to try to prevent having two domain names which look similar but are different to DNS. If exact equality of text matters, it's helpful to have a limited character set for identifiers.
There's currently a debate underway on Wikipedia over whether user names with unusual characters should be allowed. This isn't a language question; the issue is willful obfuscation by users who choose names with hard-to-type characters.
As for having more operators, it's probably not worth it. It's been tried; both MIT and Stanford had, at one time, custom character sets, with most of the standard mathematical operators on the keys. This never caught on. In fact, operator overloading is usually a lose. Python ran into this. "+" was overloaded for concatenation. Then somebody decided that "*" should be overloaded, so that "a" + "a" was equivalent to 2*"a". The result is thus "aa". This leads to results like 2*"10" being "1010". The big mistake was defining a mixed-mode overload.
In C++, mixed-mode overloads are fully supported by the template system and a nightmare when reading code.
In Mathematica, the standard representation for math uses long names for functions, completely avoiding the macho terseness the math community has historically embraced.

Re:The thing with ASCII (Score:3, Interesting)

by robbak ( 775424 ) writes: on Monday November 01, 2010 @12:22AM (#34085714) Homepage

i wonder if those with a non-alphabetic language, like the various Chineses or Japanese, would have chosen a keyboard at all? It seems to me that the keyboard is really designed around a language that uses a limited number of glyphs. Even the addition of dïaçrìtîçs are really hacks on the keyboard.

Re:The thing with ASCII (Score:4, Interesting)

by Chrisq ( 894406 ) writes: on Monday November 01, 2010 @05:00AM (#34086736)

Plus the fact that a spoken language changes - good chance you would not be able to understand English as it was spoken say 500 years ago. They would not only have used different words, also used a different pronunciation.
That depends on what accents you are used to. Many Northern British and lowland Scotts accents were not changed by the "Great Vowel Shift" nearly as much as Southern English, Received Pronunciation, or General American.
Being your slave, what should I do but tend
Upon the hours and times of your desire?
Will have an immediately obvious meaning when read in a lowland Scots accent

Re:Learn2code (Score:3, Interesting)

by isorox ( 205688 ) writes: on Monday November 01, 2010 @05:29AM (#34086846) Homepage Journal

I don't know about you, but I have a pile-of-shit key on my keyboard, right between the left Ctrl and Alt.
It's a very useful "meta" key. Aside from controlling my music from amarok, I have a variety of mappings set up, Meta-s shades the window I'm using, Meta-R pops up a run dialog, Meta-CapsLock pops up an rxvt terminal window, Meta-F4 runs xrandr --auto and reconfigures when I plug in an external monitor.
(Capslock itself is mapped to Escape, which I find a lot easier on the wrists on my laptop than using the real escape key -- I rebound it about 5 years ago when my escape key broke and haven't looked back)

Re:Project Gutenberg (Score:1, Interesting)

by Anonymous Coward writes: on Monday November 01, 2010 @06:00AM (#34086924)

In any event, it would make good sense for programming environments to be able to handle Unicode source.
I program in C++ and C# and all the tools I use can handle Unicode source. It's 2010. What can't?

Re:The thing with ASCII (Score:5, Interesting)

by TheRaven64 ( 641858 ) writes: on Monday November 01, 2010 @06:28AM (#34087008) Journal

Apple's documentation in HTML form has a few of the standard ASCII characters replaced with other unicode characters. If you copy and paste into a text editor, you get compiler warnings which seem to be saying that they're expecting the character that is there. They also sometimes contain ligatures, which you don't notice unless you look one character at a time. One of the most irritating problems I found was on the Nouveau wiki a load of constants have 0x prefixes where the x is actually a unicode multiplication symbol. Copy them into the code and it looks right, but the compiler rejects it as an invalid constant type.

Re:We've tried this before (Score:3, Interesting)

by Yetihehe ( 971185 ) writes: on Monday November 01, 2010 @06:31AM (#34087018)

1. Little triangles that hide blocks of code unless you explicitly open and investigate them.
Netbeans. (view > code folds > collapse all)
2. Dynamic error detection. Give me a little underline when I write out a variable that hasn't been defined yet. Give a soft red background to lines of code that wouldn't compile. That sort of thing.
Netbeans.
3. While we're at-it, "warning" colors. When "=" is used in a conditional, for example, that's an unusual situation that should be underlined in Yellow.
Netbeans, but not background, it gives you little yellow icons on left side of code and yellow lines near scrollbar (to track errors in whole document).
4. Hard auto-indent. It may be two spaces in the source code, but accidentally copying the indentation, and putting it in the wrong places, etc, should just be taken care of. That shouldn't even be an issue any more.
Netbeans. (ctrl+shift+v - paste formatted).
5. Code-hint hover. When you hover over a function name, bring up a window with the first few lines of that function. Maybe open it in a "related code" pane?
Netbeans. If you use comments before functions, it will show those.
6. Right-click to jump to anything. Right-click a variable to jump to the declaration, or goto other places it is used. Right-click a class name to bring up that class definition.
Netbeans. But with ctrl+click.
7. Start typing out a function, and get a menu of variable-specific functions that can be called. Flash actually does this surprisingly well, or did before CS5.
Netbeans. Also flash did this surprisingly bad comparing to netbeans.

Another nice feature: ctrl+shift+arrow down - copies current line or selection and inserts it lower (+arrow up - inserts it above). It's a surpirisingly good idea, one I miss in many other editors.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Mr. Pike, Tear Down This ASCII Wall! 728

Mr. Pike, Tear Down This ASCII Wall! More Login

Mr. Pike, Tear Down This ASCII Wall!

Fortress allows Unicode, but has ASCII equivalent (Score:4, Interesting)

Re:This is nonsense (Score:3, Interesting)

Haskell (Score:3, Interesting)

Re:limiting? (Score:4, Interesting)

Re:Perl 6 (Score:3, Interesting)

Idiocracy Hospital Keyboard (Score:3, Interesting)

Re:huh (Score:3, Interesting)

Re:Examples? (Score:2, Interesting)

Re:huh (Score:5, Interesting)

Re:We've tried this before (Score:5, Interesting)

Re:Would it be less tedious to have 10,000+ keys? (Score:3, Interesting)

Re:The thing with ASCII [COBOL 2.0?] (Score:5, Interesting)

Re:The thing with ASCII (Score:2, Interesting)

Re:Yes, Unicode is "the new black" (Score:3, Interesting)

Re:The thing with ASCII (Score:2, Interesting)

The trouble with huge character sets. (Score:3, Interesting)

Re:The thing with ASCII (Score:3, Interesting)

Re:The thing with ASCII (Score:4, Interesting)

Re:Learn2code (Score:3, Interesting)

Re:Project Gutenberg (Score:1, Interesting)

Re:The thing with ASCII (Score:5, Interesting)

Re:We've tried this before (Score:3, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot