Forgot your password?
typodupeerror
Databases Programming

Falsehoods Programmers Believe About Names 773

Posted by timothy
from the can't-we-stick-to-slashdot-user-ids? dept.
Jamie points out this interesting article about how hard it is for programmers to get names right. Since software ultimately is used by and for humans, and we humans are pretty tightly linked to our names (whatever the language, spelling, or orthography), this is a big deal. This piece notes some of the ways that names get mishandled, and suggests rules of thumb (in the form of anti-suggestions) to encourage programmers to handle names more gracefully.
This discussion has been archived. No new comments can be posted.

Falsehoods Programmers Believe About Names

Comments Filter:
  • by dogdick (1290032) on Thursday June 17, 2010 @10:07PM (#32609098)
    Andre3000
  • by ChipMonk (711367) on Thursday June 17, 2010 @10:09PM (#32609112) Journal
    Chad 8 5, for another.
  • by 0100010001010011 (652467) on Thursday June 17, 2010 @10:14PM (#32609134)

    Mr. Ochocinco [wikipedia.org]

    For those that aren't privy to American Football. Apparently some guy with the number 85, renamed himself 85.

  • Slashdotted already? (Score:5, Informative)

    by RenQuanta (3274) on Thursday June 17, 2010 @10:16PM (#32609142) Homepage

    After just 15 minutes of the story being posted?

    Wow, that's gotta be a personal best for /. (or, the site is a wee bit underpowered... ;)

    Here's the Google cache in the meanwhile: http://webcache.googleusercontent.com/search?q=cache:http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ [googleusercontent.com]

  • by spitzig (73300) on Thursday June 17, 2010 @10:17PM (#32609156)

    Chinese, written in pinyin, has numbers. Pinyin is how Chinese is typed. The numbers represent tones and every word in Chinese has a tone.

  • Text only cache (Score:3, Informative)

    by SuperKendall (25149) on Thursday June 17, 2010 @10:19PM (#32609166)

    Even the cache needs tweaking to load.

    Text only version. [googleusercontent.com]

  • Article text (Score:5, Informative)

    by Anonymous Coward on Thursday June 17, 2010 @10:19PM (#32609174)

    John Graham-Cumming wrote an article [jgc.org] today complaining about how a computer system he was working with described his last name as having invalid characters. It of course does not, because anything someone tells you is their name is--by definition--an appropriate identifier for them. John was understandably vexed about this situation, and he has every right to be, because names are central to our identities, virtually by definition.

    I have lived in Japan for several years, programming in a professional capacity, and I have broken many systems by the simple expedient of being introduced into them. (Most people call me Patrick McKenzie, but I'll acknowledge as correct any of six different "full" names, any many systems I deal with will accept precisely none of them.) Similarly, I've worked with Big Freaking Enterprises which, by dint of doing business globally, have theoretically designed their systems to allow all names to work in them. I have never seen a computer system which handles names properly and doubt one exists, anywhere.

    So, as a public service, I'm going to list assumptions your systems probably make about names. All of these assumptions are wrong. Try to make less of them next time you write a system which touches names.

    1. People have exactly one canonical full name.
    2. People have exactly one full name which they go by.
    3. People have, at this point in time, exactly one canonical full name.
    4. People have, at this point in time, one full name which they go by.
    5. People have exactly N names, for any value of N.
    6. People's names fit within a certain defined amount of space.
    7. People's names do not change.
    8. People's names change, but only at a certain enumerated set of events.
    9. People's names are written in ASCII.
    10. People's names are written in any single character set.
    11. People's names are all mapped in Unicode code points.
    12. People's names are case sensitive.
    13. People's names are case insensitive.
    14. People's names sometimes have prefixes or suffixes, but you can safely ignore those.
    15. People's names do not contain numbers.
    16. People's names are not written in ALL CAPS.
    17. People's names are not written in all lower case letters.
    18. People's names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
    19. People's first names and last names are, by necessity, different.
    20. People have last names, family names, or anything else which is shared by folks recognized as their relatives.
    21. People's names are globally unique.
    22. People's names are almost globally unique.
    23. Alright alright but surely people's names are diverse enough such that no million people share the same name.
    24. My system will never have to deal with names from China.
    25. Or Japan.
    26. Or Korea.
    27. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have "weird" naming schemes in common use.
    28. That Klingon Empire thing was a joke, right?
    29. Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.
    30. There exists an algorithm which transforms names and can be reversed losslessly. (Yes, yes, you can do it if your algorithm returns the input. You get a gold star.)
    31. I can safely assume that this dictionary of bad words contains no people's names in it.
    32. People's names are assigned at birth.
    33. OK, maybe not at birth, but at least pretty close to birth.
    34. Alright, alright, within a year or so of birth.
    35. Five years?
    36. You're kidding me, right?
    37. Two different systems containing data about the same person will use the same name for
  • by Anonymous Coward on Thursday June 17, 2010 @10:29PM (#32609228)

    No, he didn't, he renamed himself Chad Ochocinco, which any standard name field would handle just fine. Incidentally, despite legally changing his name, he claims to still primarily use Chad Johnson.

  • Re:Dumbfuck summary (Score:5, Informative)

    by bigstrat2003 (1058574) * on Thursday June 17, 2010 @10:39PM (#32609270)
    Yeah, TFS is very ambiguous about that. Turns out that TFA is talking about names of people, and the pitfalls you can run into when allowing someone to enter their name into a system.
  • Thanks, Prince (Score:5, Informative)

    by BlueBoxSW.com (745855) on Thursday June 17, 2010 @10:45PM (#32609314) Homepage

    Thanks, Prince

  • by BluBrick (1924) <blubrick@@@gmail...com> on Thursday June 17, 2010 @10:57PM (#32609368) Homepage

    Bo3b? Presumably, the 3 is silent because he wants to point out how individual he is (ironically, by rehashing a joke made over 50 years ago.)

    From Tom Lehrer's introduction to "We will all go together when we go":

    I am reminded at this point of a fellow I used to know whose name was Henry, only to give you an idea of what an individualist he was he spelt it H-E-N-3-R-Y. The 3 was silent, you see.

  • by Fnordulicious (85996) on Thursday June 17, 2010 @10:59PM (#32609378) Homepage

    You are a little confused. Please reread the Wikipedia article on Hanyu Pinyin. It normally uses diacritics - namely macron, acute, hacek ("caron"), and grave - to represent the Mandarin tones other than neutral tone. Numbers have been used by people who lack diacritics on their typewriter or input system, but using numbers is not standard in Hanyu Pinyin, instead it's a kludge.

    That said, if your input form doesn't allow some guy to type in his name with tone number suffixes on a US Windows keyboard layout where he lacks access to diacritics, then you're not a very thoughtful programmer.

    Also, people who make software with an input fields that accept Unicode but specify a particular font that has a tiny character repertoire suck.

    Oh, and Slashdot sucks even more for only supporting ASCII and stripping everything else.

  • by Miseph (979059) on Thursday June 17, 2010 @10:59PM (#32609380) Journal

    He legally changed his name because fans refer to him as "Ochocinco" and he wanted to put it on his jersey, but because the NFL hates both fans and lulz, they only allow a person's legal surname to appear there. Rather than lay down and take it, he gave them a massive middle finger by changing his name.

    The NFL actually has a surprising number of players that behave like btards, it's rather amusing.

  • by Anonymous Coward on Thursday June 17, 2010 @11:12PM (#32609440)

    It's also fun when a parent has the same first name, yet a different middle name, but the problem being that the middle name has the same first letter. So all the damn computer databases that insist on reducing the middle name to an initial are a pain in the ass. And no, I'm not interested in all this senior citizen stuff I'm not qualified for. (Give another 30 years maybe.) I also wonder if the ol' fart is getting junk mail relating to video games and electronics that he likely has no interest in. The real problem comes up in billing and city stickers and things like that.

    The only solution so far is that I put both my first and middle name in the "first name" field in cases where a space is allowed as a valid character. It's something I'll have to keep doing until enough people get a clue and changes their database conventions.

  • by snowgirl (978879) on Thursday June 17, 2010 @11:13PM (#32609446) Journal

    I'm going to throw in my agreement here. Yes, there are people who put numerals in their names, or non-unicode point characters, or various other things, but there just isn't a reason to foist that on other people.

    There is frustration about things like, "people have N number of names", and "names don't change" which are good and valid points... but some of the things are just like "dude... seriously..."

  • by fishexe (168879) on Thursday June 17, 2010 @11:26PM (#32609522) Homepage

    Who the hell has numbers in there name?

    Former New York Times writer Jennifer 8 Lee [wikipedia.org] does.

  • by fishexe (168879) on Thursday June 17, 2010 @11:49PM (#32609622) Homepage

    Pinyin is how Chinese is typed. The numbers represent tones...

    No it isn't. Pinyin is how Chinese is romanized. Chinese is typed using an IME to produce Han characters. Pinyin is typically only used to represent pronunciation, for example in dictionaries, and to represent names in contexts where romanization is necessary (such as international contexts, like Western media), as well as a few other limited contexts. Writing Chinese in Pinyin, even with tone marks, is often inadequate because each syllable/tone combination corresponds to several characters, and the distinction between them is easily lost in romanization. For example, Zhang Zilin [wikipedia.org] and Zhang Ziyi [wikipedia.org] do not have the same surname, even though both are Zhang1 in pinyin.

  • by arekq (651007) on Thursday June 17, 2010 @11:57PM (#32609658)

    Pinyin is just one way Chinese is typed.
    There are other ways to type Chinese characters, for example, Cangjie input method, which is based on the graphological aspect of the characters instead of it's sound.

  • by paeanblack (191171) on Friday June 18, 2010 @12:04AM (#32609694)

    You'd think that e-mail addresses by comparison would be simpler, but I have a hard time trying to register my e-mail address with sites that won't allow even simple things like "+", "-" or "." characters in the local part.

    Proper email validation is not trivial

    Check out the huge regex at the bottom of the RFC 5322 compliant validator from CPAN:

    http://cpansearch.perl.org/src/RJBS/Email-Valid-0.184/lib/Email/Valid.pm

  • by nacturation (646836) * <nacturation.gmail@com> on Friday June 18, 2010 @12:22AM (#32609788) Journal

    A database MUST treat all of these names the same: McClean, MacClean, MCLean, Mc Clean, Mac Clean. McCleen, ...

    I assume you left out a "not" in that sentence? I think there are quite a few people that will kindly (or maybe not-so-kindly) explain why "Mc" and "Mac" are not the same.

    Read between the lines a bit. Treat them the same means: treat them as all potentially valid, not that all the names would match in a string comparison.

  • by SEE (7681) on Friday June 18, 2010 @12:44AM (#32609894) Homepage

    Is it so hard for you to just use Unicode

    Unicode doesn't cover the full set of CJK characters used for names, nor does it cover all writing systems in actual use.

  • by shutdown -p now (807394) on Friday June 18, 2010 @01:05AM (#32609986) Journal

    That's true also. However, Unicode covers much more ground immediately with practically no effort required from the programmer - but once you go beyond that, the complexity increases very rapidly (since you have to start dealing with multiple different encodings simultaneously etc).

    As well, new Unicode versions come out regularly which expand its reach, and new frameworks/databases update their Unicode support every now and then, so if you start using it today, it'll be much easier (in many cases, completely free) for you to expand coverage in the future in backwards-compatible way.

    In contrast, if you, say, use Latin-1 today, you'll either have to start dealing with multiple encodings much sooner, or to recode the database eventually.

  • by droopycom (470921) on Friday June 18, 2010 @01:07AM (#32610002)

    The Queen of England

    God save her from programmers!

  • Re:Dumbfuck summary (Score:3, Informative)

    by sjames (1099) on Friday June 18, 2010 @01:46AM (#32610158) Homepage

    Many of the systems that handle names the worst are the ones that try to be "clever", doing things like insisting on first (and only first) letter capitalized, rejecting digits, refusing to allow middle name (or initial) to be blank, always using the first letter of the Middle name and adding a period after or refusing to accept a single character as a name, and many more sins. The "dumb" systems are actually more graceful about it.

    The best policy is to accept what is entered. Even that tends to fail if someone has more than 3 names. Then there's the Spanish naming conventions.

  • by TedRiot (899157) on Friday June 18, 2010 @01:58AM (#32610218)
    True. I run into email validation problems constantly. I have a two-part first name that has "-" in the middle, so my firstname.lastname email addresses (usually work addresses) always have a "-". In addition at the moment I'm a consultant in a large company, where they put "ext-" in front of everyone who is not employed by them but works for them and has an email account from them. I also often run into problems with length, because my name is 19 characters and the last place I worked for had a 15 character company name and when you add TLD to that, you sum to an email address that is 39 characters long, which for some seems to be too much. I really don't get why you would use only 32 characters to store an email address..

    This problem very often bites in name fields, too, that don't accept "-" and two capital letters in my first name.

    And I used to live near a border of two cities, where my postal address was from one city while my real city of residence was the other one. I have had a lot of problems with that, when the guys who made the systems were trying to deduce my city of residence from my postal address. Which is also impossible in my country, because the national post office also permits addresses that have postalnumber + company (instead of city) for large companies who take their mail in one place and deliver it themselves the rest of the way.
  • Wow, if you consider McLean and MacLean the same, I suggest you never visit Scotland.

    The Mc's and the Mac's consider the correct usage as a matter of extreme pride. You could end up with one or more bruises if you get it wrong and then insist that "well, they're the same anyway".
  • by mpe (36238) on Friday June 18, 2010 @03:35AM (#32610508)
    The author must have missed his history lesson explaining that family names only became popular in Western European culture when governments started tabulating people. In a rural village everyone knows that Jack the butcher is different from Jack the baker.

    Hence Butcher, Baker, Smith, Brewer, Tanner, Farmer, etc became "family names".

    *Even if the system did a conversion to a latin representation of an asian name most people can't pronounce them because they are based on different sound primitives.

    Such a "translation" can easily be one to many, dependent on various factors.

    Which is why Asians tend to adopt westernised versions of their real names.

    Or they adopt a regular English, German, French, Spanish, etc name to be known by.
  • by somersault (912633) on Friday June 18, 2010 @05:51AM (#32610976) Homepage Journal

    Just looked it up. I'm Scottish, live in Scotland and always hear people say that the difference in Mac/Mc is important because of the Scots/Irish thing, but according to this article, that's bollocks:

    http://www.scottishhistory.com/articles/misc/macvsmc.html [scottishhistory.com]

  • by Anonymous Coward on Friday June 18, 2010 @08:39AM (#32611700)

    I do as well, and it's hilarious or maddening depending on what mood I'm in. I mean, seriously, surnames with apostrophes date back hundreds of years.

The world is moving so fast these days that the man who says it can't be done is generally interrupted by someone doing it. -- E. Hubbard

Working...