Falsehoods Programmers Believe About Names 773
Jamie points out this interesting article about how hard it is for programmers to get names right. Since software ultimately is used by and for humans, and we humans are pretty tightly linked to our names (whatever the language, spelling, or orthography), this is a big deal. This piece notes some of the ways that names get mishandled, and suggests rules of thumb (in the form of anti-suggestions) to encourage programmers to handle names more gracefully.
Re:Sounds like people need to fix thier names (Score:1, Informative)
Re:Sounds like people need to fix thier names (Score:3, Informative)
Re:Sounds like people need to fix thier names (Score:5, Informative)
Mr. Ochocinco [wikipedia.org]
For those that aren't privy to American Football. Apparently some guy with the number 85, renamed himself 85.
Slashdotted already? (Score:5, Informative)
After just 15 minutes of the story being posted?
Wow, that's gotta be a personal best for /. (or, the site is a wee bit underpowered... ;)
Here's the Google cache in the meanwhile: http://webcache.googleusercontent.com/search?q=cache:http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ [googleusercontent.com]
Re:Sounds like people need to fix thier names (Score:5, Informative)
Chinese, written in pinyin, has numbers. Pinyin is how Chinese is typed. The numbers represent tones and every word in Chinese has a tone.
Text only cache (Score:3, Informative)
Even the cache needs tweaking to load.
Text only version. [googleusercontent.com]
Article text (Score:5, Informative)
John Graham-Cumming wrote an article [jgc.org] today complaining about how a computer system he was working with described his last name as having invalid characters. It of course does not, because anything someone tells you is their name is--by definition--an appropriate identifier for them. John was understandably vexed about this situation, and he has every right to be, because names are central to our identities, virtually by definition.
I have lived in Japan for several years, programming in a professional capacity, and I have broken many systems by the simple expedient of being introduced into them. (Most people call me Patrick McKenzie, but I'll acknowledge as correct any of six different "full" names, any many systems I deal with will accept precisely none of them.) Similarly, I've worked with Big Freaking Enterprises which, by dint of doing business globally, have theoretically designed their systems to allow all names to work in them. I have never seen a computer system which handles names properly and doubt one exists, anywhere.
So, as a public service, I'm going to list assumptions your systems probably make about names. All of these assumptions are wrong. Try to make less of them next time you write a system which touches names.
Re:Sounds like people need to fix thier names (Score:3, Informative)
OCHOCINCO!!!!
Re:Sounds like people need to fix thier names (Score:1, Informative)
No, he didn't, he renamed himself Chad Ochocinco, which any standard name field would handle just fine. Incidentally, despite legally changing his name, he claims to still primarily use Chad Johnson.
Re:Dumbfuck summary (Score:5, Informative)
Thanks, Prince (Score:5, Informative)
Thanks, Prince
Re:Sounds like people need to fix thier names (Score:5, Informative)
Bo3b? Presumably, the 3 is silent because he wants to point out how individual he is (ironically, by rehashing a joke made over 50 years ago.)
From Tom Lehrer's introduction to "We will all go together when we go":
I am reminded at this point of a fellow I used to know whose name was Henry, only to give you an idea of what an individualist he was he spelt it H-E-N-3-R-Y. The 3 was silent, you see.
Re:Sounds like people need to fix thier names (Score:5, Informative)
You are a little confused. Please reread the Wikipedia article on Hanyu Pinyin. It normally uses diacritics - namely macron, acute, hacek ("caron"), and grave - to represent the Mandarin tones other than neutral tone. Numbers have been used by people who lack diacritics on their typewriter or input system, but using numbers is not standard in Hanyu Pinyin, instead it's a kludge.
That said, if your input form doesn't allow some guy to type in his name with tone number suffixes on a US Windows keyboard layout where he lacks access to diacritics, then you're not a very thoughtful programmer.
Also, people who make software with an input fields that accept Unicode but specify a particular font that has a tiny character repertoire suck.
Oh, and Slashdot sucks even more for only supporting ASCII and stripping everything else.
Re:Sounds like people need to fix thier names (Score:4, Informative)
He legally changed his name because fans refer to him as "Ochocinco" and he wanted to put it on his jersey, but because the NFL hates both fans and lulz, they only allow a person's legal surname to appear there. Rather than lay down and take it, he gave them a massive middle finger by changing his name.
The NFL actually has a surprising number of players that behave like btards, it's rather amusing.
Re:I've been dealing with this for years. (Score:1, Informative)
It's also fun when a parent has the same first name, yet a different middle name, but the problem being that the middle name has the same first letter. So all the damn computer databases that insist on reducing the middle name to an initial are a pain in the ass. And no, I'm not interested in all this senior citizen stuff I'm not qualified for. (Give another 30 years maybe.) I also wonder if the ol' fart is getting junk mail relating to video games and electronics that he likely has no interest in. The real problem comes up in billing and city stickers and things like that.
The only solution so far is that I put both my first and middle name in the "first name" field in cases where a space is allowed as a valid character. It's something I'll have to keep doing until enough people get a clue and changes their database conventions.
Re:Yeah, article is kind of asinine (Score:3, Informative)
I'm going to throw in my agreement here. Yes, there are people who put numerals in their names, or non-unicode point characters, or various other things, but there just isn't a reason to foist that on other people.
There is frustration about things like, "people have N number of names", and "names don't change" which are good and valid points... but some of the things are just like "dude... seriously..."
Re:Sounds like people need to fix thier names (Score:4, Informative)
Who the hell has numbers in there name?
Former New York Times writer Jennifer 8 Lee [wikipedia.org] does.
Re:Sounds like people need to fix thier names (Score:5, Informative)
Pinyin is how Chinese is typed. The numbers represent tones...
No it isn't. Pinyin is how Chinese is romanized. Chinese is typed using an IME to produce Han characters. Pinyin is typically only used to represent pronunciation, for example in dictionaries, and to represent names in contexts where romanization is necessary (such as international contexts, like Western media), as well as a few other limited contexts. Writing Chinese in Pinyin, even with tone marks, is often inadequate because each syllable/tone combination corresponds to several characters, and the distinction between them is easily lost in romanization. For example, Zhang Zilin [wikipedia.org] and Zhang Ziyi [wikipedia.org] do not have the same surname, even though both are Zhang1 in pinyin.
Comment removed (Score:2, Informative)
Re:I don't know what the complaint is about? (Score:4, Informative)
You'd think that e-mail addresses by comparison would be simpler, but I have a hard time trying to register my e-mail address with sites that won't allow even simple things like "+", "-" or "." characters in the local part.
Proper email validation is not trivial
Check out the huge regex at the bottom of the RFC 5322 compliant validator from CPAN:
http://cpansearch.perl.org/src/RJBS/Email-Valid-0.184/lib/Email/Valid.pm
Re:I don't know what the complaint is about? (Score:4, Informative)
A database MUST treat all of these names the same: McClean, MacClean, MCLean, Mc Clean, Mac Clean. McCleen, ...
I assume you left out a "not" in that sentence? I think there are quite a few people that will kindly (or maybe not-so-kindly) explain why "Mc" and "Mac" are not the same.
Read between the lines a bit. Treat them the same means: treat them as all potentially valid, not that all the names would match in a string comparison.
Re:Article makes wrong assumption about software. (Score:5, Informative)
Is it so hard for you to just use Unicode
Unicode doesn't cover the full set of CJK characters used for names, nor does it cover all writing systems in actual use.
Re:Article makes wrong assumption about software. (Score:3, Informative)
That's true also. However, Unicode covers much more ground immediately with practically no effort required from the programmer - but once you go beyond that, the complexity increases very rapidly (since you have to start dealing with multiple different encodings simultaneously etc).
As well, new Unicode versions come out regularly which expand its reach, and new frameworks/databases update their Unicode support every now and then, so if you start using it today, it'll be much easier (in many cases, completely free) for you to expand coverage in the future in backwards-compatible way.
In contrast, if you, say, use Latin-1 today, you'll either have to start dealing with multiple encodings much sooner, or to recode the database eventually.
Re:Sounds like people need to fix thier names (Score:3, Informative)
The Queen of England
God save her from programmers!
Re:Dumbfuck summary (Score:3, Informative)
Many of the systems that handle names the worst are the ones that try to be "clever", doing things like insisting on first (and only first) letter capitalized, rejecting digits, refusing to allow middle name (or initial) to be blank, always using the first letter of the Middle name and adding a period after or refusing to accept a single character as a name, and many more sins. The "dumb" systems are actually more graceful about it.
The best policy is to accept what is entered. Even that tends to fail if someone has more than 3 names. Then there's the Spanish naming conventions.
Re:I don't know what the complaint is about? (Score:5, Informative)
This problem very often bites in name fields, too, that don't accept "-" and two capital letters in my first name.
And I used to live near a border of two cities, where my postal address was from one city while my real city of residence was the other one. I have had a lot of problems with that, when the guys who made the systems were trying to deduce my city of residence from my postal address. Which is also impossible in my country, because the national post office also permits addresses that have postalnumber + company (instead of city) for large companies who take their mail in one place and deliver it themselves the rest of the way.
Re:I don't know what the complaint is about? (Score:5, Informative)
The Mc's and the Mac's consider the correct usage as a matter of extreme pride. You could end up with one or more bruises if you get it wrong and then insist that "well, they're the same anyway".
Re:First hand experience. (Score:5, Informative)
Hence Butcher, Baker, Smith, Brewer, Tanner, Farmer, etc became "family names".
*Even if the system did a conversion to a latin representation of an asian name most people can't pronounce them because they are based on different sound primitives.
Such a "translation" can easily be one to many, dependent on various factors.
Which is why Asians tend to adopt westernised versions of their real names.
Or they adopt a regular English, German, French, Spanish, etc name to be known by.
Re:I don't know what the complaint is about? (Score:4, Informative)
Just looked it up. I'm Scottish, live in Scotland and always hear people say that the difference in Mac/Mc is important because of the Scots/Irish thing, but according to this article, that's bollocks:
http://www.scottishhistory.com/articles/misc/macvsmc.html [scottishhistory.com]
Re:I've been dealing with this for years. (Score:1, Informative)
I do as well, and it's hilarious or maddening depending on what mood I'm in. I mean, seriously, surnames with apostrophes date back hundreds of years.