Automatic Spelling Corrections On Github 105
An anonymous reader writes "Github projects may be seeing a different kind of contributor than normal: a small bot is now crawling through projects, contributing spelling corrections. It builds on top of the github API and existing documentation style-checking code. Future directions for the project look beyond spelling mistakes and at automated bug fixing on a large scale."
#!/user/bin/pearl (Score:5, Funny)
I wonder if this bot will do as well as every HR department out there posting "pearl" and "unique admin" positions
Re: (Score:1)
Or correcting Referer to Referrer.
Re: (Score:2)
That's not the bot's fault. Blame Hallam-Baker. :P
Re:Yeah, I'm sure... (Score:5, Interesting)
Re: (Score:3)
Re:Yeah, I'm sure... (Score:5, Funny)
Re: (Score:1)
Err, I meant to type Github, not Google. *headdesk*
Seems appropriate, considering the topic...
Re: (Score:2)
Cluebrick: apply directly to forehead! Cluebrick: apply directly to forehead!
Re: (Score:3)
Exactly, we've all see how well it works on Wikipedia..
Worse, we've seen what it does to texting [damnyouautocorrect.com].
Erasing Fingerprints (Score:4, Interesting)
Eventually someone will contribute SW that will guess the contributors by their distinctive patterns of spelling mistakes. I hope it will be able to find them in the archives. I won't be surprised to read on Slashdot some copyright lawsuit that depends on both apps, perhaps on opposing sides of the claim.
Typos! (Score:1)
What Could Possibly Go Wrong (Score:3)
But at least it's just sticking to READMEs.
Re:What Could Possibly Go Wrong (Score:5, Informative)
It's not like it can autocommit - the original project owner has to accept the patch.
Re:What Could Possibly Go Wrong (Score:4, Funny)
Yeah, I did notice that (pull request), but I secretly love the idea of a braindead iphone type spell corrector running around automatically changing 'strcpy' to 'stripy', or 'unlk' to 'unlink'. And then thinking you can fix it with even more complex regexps.
Variables (Score:1, Insightful)
I hope it leaves alone variable names. Even if the spelling is incorrect, I don't like people fucking with my variable names.
Re: (Score:3)
Re:Variables (Score:4)
More annoying would be if it runs around autocorrecting spelling of documents written in a language it doesn't understand. Or worse, if it tries to mangle everything into that hideous American patois by removing the letter "U" from words like "colour".
Re: (Score:3)
We're just sticking to the original Latin, rather than that hideous Anglo-Norman patois.
Re: (Score:1)
Trtue. I work with a dyslexic hardware engineer, and the spelling of register names he passes to me is somewhat random. But I am not going to dig into his VHDL (from which my .h files are autogenerated) to fix them. I take what I am given, and use cut and paste as a last resort.
(He may be dyslexic. but he is a damned good designer. And I have to type "dyslexic" with considerable care).
Wikipedia has similar bots (Score:5, Informative)
Re: (Score:1)
Re: (Score:2)
My README's on github usually includes some example code. Wonder if it'll be able to detect that.
Oppressive autocorrection. (Score:5, Funny)
Clbuttic overaction, in my opinion. This buttbuttination of our writing by computers is out of hand. I don't know if my consbreastution can take it...
Re: (Score:1)
Re: (Score:2)
You should be okay as long as they practice safe Hex...
spellcheck != predictive text (Score:5, Informative)
Don't confuse what a spell checker does when auto-correcting with what something like T9 or smart phone predictive text does. The latter is the cause of the cell phone headaches.
While a spellchecker will check a string of characters against a dictionary and attempt to correct misspellings (like "misspell" with only 1 s or 1 l), predictive text auto-correct is both more clever and more stupid.
Predictive text makes certain assumptions about the keyboard arrangement and tries to fit typos to possible words that could have been intended had the user not been smashing 3 tiny buttons at once on a cell phone or screen keyboard. While a spellchecker would recognize "danm" as a typo for "damn" with just transposed letters, it would never try to correct it to "calm" on the basis that the letter c is close to the letter d and n and m are nearby or some nonsense as that.
A plain old spellchecker, like the one under discussion here, makes no attempt to guess what word was meant and assume a typo is a result of accidentally pressing keys near the intended ones. It just looks at what words could have been intended based on close matches with the dictionary.
By the way, auto-correct will frequently fail to guess a replacement when the misspelling involves letters that are not nearby on the keyboard.
Re: (Score:1)
That's a shame; so it won't know to change "spell checker" to the correct form, "spelling checker"?
A "spell checker" would be a program that ensures that a witch's spells are valid.
Re:spellcheck != predictive text (Score:4)
Hex editors?
Re: (Score:2)
A plain old spellchecker, like the one under discussion here, makes no attempt to guess what word was meant and assume a typo is a result of accidentally pressing keys near the intended ones..
Actually, no. It's not using either a plain spellchecker, or predictive text. It's just using a small fixed list [github.com] of common errors.
Re: (Score:2)
Thanks for the link. While it's true it's not a true full-featured spellchecker, it does have a list of just over 500 common misspellings and their correct equivalents, so I'd argue it *is* a primitive spellchecker.
In any case, it's definitely NOT a predictive text auto-correction tool, and there's no danger of the results showing up on DYAC [damnyouautocorrect.com].
How many people thought this was a good idea? (Score:2)
I honestly wouldn't expect a lot of developers to cupertino with this decision.
Re: (Score:2)
I honestly wouldn't expect a lot of developers to cooperate with this decision.
There, reverted that Mac-bot correction for you.
Correct to what? (Score:2)
I hope it's optional, because some of us write British English rather than American English. This tool won't do us much good if it starts correcting project names, for instance. 3rd party KDE developers would be even worse off ;)
Re: (Score:2)
I don't know what to believe. You don't have a browser with built in spell check yet?
When autocorrect goes wrong (Score:2)
Not that anyone cares, but here is a real life example of auto-spelling where it is not wanted:
Manager comes across a previously unseen (misspelled) error message in a database field. Database is accessed by several applications.
Manager copies and pastes error-message into email and sends it to colleague. Email client auto-corrects misspelled error message.
Colleague does a grep using the full spelling corrected error message text, can't find any occurrence of it in his code, and points finger at my code.
Gre
Re: (Score:2)
Re: (Score:2)
It gets complicated, doesn't it? Did the 'typewriter' make a mistake in the movie Brazil [youtube.com]?
There you have it (Score:2)
What is needed just as much as a spell checker is a grammar checker. Seems like younger people today simply can't figure out the difference between: Their, There, and They're.
http://www.wikihow.com/Use-There,-Their-and-They're [wikihow.com]
Re: (Score:3)
What is needed just as much as a spell checker is a grammar checker.
Yes! I occasionally need to proofread OCR'd text that has been generated into an HTML file. I've written some code that extracts the text and flags misspellings. That catches a lot of things for me. But, it still misses many errors that a grammar checker *would* find.
Back in the late 80's or early 90's, I purchased an add-on for Microsoft Word 5.0 called something like Grammatik IV. It did a wonderful job of finding and flagging pos
missing tags? (Score:2)
I don't see the sarcasm tags in there.
Re: (Score:2)
I worked on spelling and grammar checking, and I can assure you it's far from easy. The errors most grammar checkers can find reliably will not interfere with general understanding. Other errors are very hard to reliably detect and correct; illiterate language is almost impossible to correct.
E.g., let's take an example from that page (whose link has mysteriously disappeared when I clicked on Reply): "There is an antique store on Camden Avenue." Suppose I made a mistake and wrote: "*Their is an antique store
Re: (Score:2)
Re: (Score:2)
Markov models have the same problem as any other metric: they don't take context into account. And that includes competence and performance of the author: a French coder will make different mistakes than an English theatre journalist, and there are no corpora on which to train the model. There's no money for that. This is just going to be another cheap hack with very little benefit and potentially huge costs.
Re: (Score:2)
We used to use Grammatik that was a stand-alone Unix package and then later integrated into Unix WordPerfect. It was a bit annoying at times (PASSIVE VOICE! PASSIVE VOICE!) but helpful.
Since our move to OpenOffice for Linux, many years ago, it is something we sorely miss. So I can understand your frustration. The grammar addon for OpenOffice is incomplete and slow. I don't have any recommendations, unfortunately.
Re: (Score:2)
What grammar checking tools have you found useful? (Currently using an old Win/XP SP3 system.)
I recall that there was a special checker in Google Wave that took into account the context of surrounding words when suggesting replacements. For example, if you typed "I have bean to the shops", while "bean" (the food) is spelt correctly, it is the wrong word in this context, and "been" would be suggested instead. Unfortunatly Google Wave was discontinued, so I don't know if this functionality is avaliable anywhere.
Re: (Score:2)
The purpose of language is communication. If the idea is clear the grammar ain't important.
But, more often than not, making mistakes muddles the idea. Especially grammar and subtle spelling mistakes. Just learn to write decent and proofread.
Re: (Score:2)
... learn to write decently and proofread
FTFY :)
Re: (Score:2)
Re: (Score:2)
Don't complain about grammer if you can't properly: punctuate.
Re: (Score:2)
does there need to be a difference between spelling those on paper as there's no difference in vocalizing them in your head? if they sound the same in your head, what does it matter how it's written as long as its use is obvious from the other words around it? maybe it would be better to replace those with "ther". the meaning depends on the context anyhow. nowadays you have to be able to read and understand many dialects, not just your local colour.
you see, the current correct spelling was just pulled out
IBM Watson to help correct your coding errors? (Score:1)
I was watching a show on SkyTV about IBM's Watson Supercomputer competing in Jeopardy. Perhaps GITGUB could rent time off IBM's Watson to redirect that AI from Jeopardy and recognising and learning from correct human answers to recognising errors and the human contributed corrections and then learning from this and correcting other code that contains the same or similar errors?
Who knows we might finally get rid of those annoying memory leaks in just about every piece of software I had the pleasure of using
After the spellcheck bot did its magic (Score:2)
The CunningLinguist project got a little more than they bargained for.
README (Score:2)
What if the README is like this:
This program is a spell checker. It will find mistakes like recieve and conveneince
Yank spelling (Score:1)
It had better not change everything to the incorrect, US way of spelling.
Re: (Score:2)
[font colour=red]speling![/font]
Re: (Score:2)
Why sign your comment if you're posting anonymously? Hint: defeats the object of anonymous posting ;-)
Use Levenshtein distance + dictionary (Score:1)
Best regards,
Bernard Hoffman IV,
Computer store salesman, and proud beach house owner.
Taylor Mali already showed us... (Score:3)
Taylor Mali already showed us that spell-checking is not safe. [youtube.com]
The the impotence of proofreading
By Taylor Mali
www.taylormali.com [taylormali.com]
Has this ever happened to you?
You work very horde on a paper for English clash
And then get a very glow raid (like a D or even a D=)
and all because you are the words liverwurst spoiler.
Proofreading your peppers is a matter of the the utmost impotence.
This is a problem that affects manly, manly students.
I myself was such a bed spiller once upon a term
that my English teacher in my sophomoric year,
Mrs. Myth, said I would never get into a good colleague.
And thats all I wanted, just to get into a good colleague.
Not just anal community colleague,
because I wouldnt be happy at anal community colleague.
I needed a place that would offer me intellectual simulation,
I really need to be challenged, challenged menstrually.
I know this makes me sound like a stereo,
but I really wanted to go to an ivory legal colleague.
So I needed to improvement
or gone would be my dream of going to Harvard, Jail, or Prison
(in Prison, New Jersey).
So I got myself a spell checker
and figured I was on Sleazy Street.
But there are several missed aches
that a spell chukker cant cant catch catch.
For instant, if you accidentally leave a word
your spell exchequer wont put it in you.
And God for billing purposes only
you should have serial problems with Tori Spelling
your spell Chekhov might replace a word
with one you had absolutely no detention of using.
Because what do you want it to douch?
It only does what you tell it to douche.
Youre the one with your hand on the mouth going clit, clit, clit.
It just goes to show you how embargo
one careless clit of the mouth can be.
Which reminds me of this one time during my Junior Mint.
The teacher read my entire paper on A Sale of Two Titties
out loud to all of my assmates.
Im not joking, Im totally cereal.
It was the most humidifying experience of my life,
being laughed at pubically.
So do yourself a flavor and follow these two Pisces of advice:
One: There is no prostitute for careful editing.
And three: When it comes to proofreading,
the red penis your friend.
Incredibly stupid (Score:2)
Automated correction of spelling without human plausibility checking is already a serious risk. Automated "correction" of coding errors is a disaster waiting to happen. There are far too many things that seem to be an error but may be in fact critical. Case in point: Reading uninitialized memory. Usually that is an error. But when gathering entropy it is not. The Debian OpenSSL disaster was caused by this type of correction, suggested by Valgrind. Although there was a human without understanding of the code
Re: (Score:2)
Reading uninitialized memory when gathering entropy is harmless and potentially beneficial, but can't properly be considered not-an-error. More to the point, your entropy-gathering system should be no worse for the wear if you *don't* read the uninitialized memory, since on many systems, uninitialized memory could have zero entropy (thus, you can't rely on it as a source of entropy).
The problem is that a person broke the code while in the process of fixing the Valgrind warning -- not that fixing the warning
Along with many... (Score:1)
I have to say I think this is a bad idea, but I also want to add that it's something no one asked for or wanted...a pointless feature that will probably cause more harm than good. I expect more out of the Github crew.
Which version of English (Score:2)
I find it increasingly frustrating that many applications default to US English, despite the locale of my machine or IP address I'm coming from.
And thus find it increasingly frustrating when it tells me words ending in -our are spelled wrong and wants to correct them, or words ending in -ise.
So what will this bot do? Would I expect to see, over and over again, that it's submitting what I would consider incorrect submissions because, like so many things, because it knows only about American English (and to
Re: (Score:2)
Well, IP addresses don't exactly have a language associated with them. Hell, it's been hard enough to try and get the IETF to add a "language" field to TCP packets.
Re: (Score:3)
I'm willing to say that this idea is totally rediculous even!
Re: (Score:2)
Yeah, it stopped being diculous long ago. Now it's just repeats.
I wonder, how this will effect us?
Re: (Score:1, Troll)
AC saying "fucking terrible idea" w/o saying *why* it's a terrible idea is grounds for -1, Troll.
Re: (Score:2)
It's a terrible idea cuz sometimes you WANT to spell things wrong, dig?
Where did you see some mandate that the patches be accepted?
Re:This (Score:4, Insightful)
Excuse me, it's "capisce".
In my experience, "freedom of language" almost always means "ignorance of language" and is akin to "keeping it real".
The handful of people whose grasp of language is so good that they can purposely misspell or use poor grammar for effect will almost certainly not be hindered by anything "github" does, unless this new spellchecker is going to be clumsily used on code and the hilarity that ensues breaks software that said linguistic maestro uses. See what I just did there? I purposely used clumsy sentence structure because I'm Just. That. Good.
Capisce?
Re: (Score:2)
You just made all that up.
And what educated person would want to spell it as an "Anglicized" word?
Re: (Score:2)
And what educated person would want to spell it as an "Anglicized" word?
All of them...
Some examples:
There are *lots* more.
Re: (Score:1)
OK = colour or color? -ise or -ize? If i, a Brit, submit my code written in the language I use, am I going to be bombarded with patches trying to Americanize my code?
And I actually use a hybrid system. I use the "British" colour, but the "American" -ize. Just because they feel better to me. Neither colour not color is the way I pronounce the word - cullur would be a better transliteration (Chaucer pronounced it colour, to rhyme with flower, which is why we spell it that way). But -ize is the way I pronounce
Re: (Score:2)
This is not something that github is enforcing. It is merely some third party application that interfaces with projects hosted on github. It can be given commit privileges and subsequently run by a project by their own admins, on their own systems.
The summary seems to indicate this is a service github is going to start offering, but that is far from the truth.