IBM to Open Voice Recognition Software 189
phug writes "According to the NY Times, IBM is donating code that it estimates cost the company $10 million to develop. One collection of speech software for handling basic words for dates, time and locations, like cities and states, will go to the Apache Software Foundation. The company is also contributing speech-editing tools to a second open-source group, the Eclipse Foundation." There's not much information out there yet - e.g. no word on licenses etc. It is worth pointing out that the Eclipse Foundation was started by IBM.
Great news (Score:5, Interesting)
Re:Great news (Score:5, Funny)
"Computer?.....commmm-PU-terrrrr?"
Now hopefully my co-workers will stop giving me strange looks...well, one can dream can't they? No, I'm asking...can one dream?
Flamebait? (Score:1)
Oh well, I suppose the moderators are wiser than I.
Re:Great news (Score:1)
Seen in a discount bin (Score:2)
Re:Great news (Score:5, Interesting)
No need to outsource to India, opensource it to Linux & ViaVoice!
Woohoo! +1 for IBM again!
Re:Great news (Score:3, Insightful)
The Alameda County (AC) Transit information number here in the Bay Area uses a voice recognition software to address customer inquiries. The system is very buggy and impractical:
1. Voice recognition is far from perfect. Try getting a computer to recognize the name of a destination or complicated query while you yell it over ambient noise (ex. traffic noise around a bus stop) on a cell phone.
2. The software can
Good news indeed (Score:2)
Re:Great news (Score:1)
Regards,
Steve
Not what you think - here's something that is (Score:4, Informative)
If you're interested in open-source voice recognition check out OSSRI [ossri.org] - an effort to bring together some sort of practical large vocab speech recog to linux. They're just starting up, but the mailing list archives hold a fair amount of discussion about the current state of the open-source SR world. (Which, to sum up, isn't that great
ViaVoice (Score:5, Interesting)
Re:ViaVoice (Score:5, Insightful)
ViaVoice is a wide-vocabulary speech recognition. The article hints at more focused set of target words (times, dates, locations) for the donated package. Sounds much more like the software supporting airlines which use voice recognition systems to help you request flight information.
The strategies are quite different.
ViaVoice encourages you invest some of your time reading training scripts so it can learn your voice and thus recognize a wide variety of words from your specific voice.
The time/date/city system is likely to be speaker independent (no training scripts to read) but much smaller vocabulary.
Re:ViaVoice (Score:3, Informative)
Re:ViaVoice (Score:3, Interesting)
Re:ViaVoice (Score:2)
Aren't there problems with eclipse's licensing that prevent it running with qt?
IBM also has a grammar based system. (Score:5, Interesting)
IBM Hursley labs had a name dialler 5 years ago that let you phone the computer, say the name fo the person you wanted to speak with, and get put through. They also had a system that provided weather forecasts based on the name of the city or country you said. I was pleased to name the latter "Global Weather Information System" or GWIS, pronounced Gee-whizz. Both ran on the machine under my desk. Both worked reasonably well, especially given that a lot of the acoustic models for names and places were automagically generated.
Re:IBM also has a grammar based system. (Score:1)
Obligatory quote (Score:4, Funny)
Code-by-voice (Score:5, Interesting)
I don't know about everyone else, but the concept of coding by voice does fascinate me. There are obvious issues (like eliminating having to say every single control character (if at all possible)), but with a background of RSI I think it's at least worth a shot.
Thoughts?
Re:Code-by-voice (Score:4, Interesting)
Given the fact that most languages have a rather limited vocabulary, and the fact that class libraries and defined functions/variables can be extracted from existing code software like this could make educated guesses on what you were trying to say.
Re:Code-by-voice (Score:2, Interesting)
Re:Code-by-voice (Score:1)
Re:Code-by-voice (Score:3, Interesting)
the only way to make voice commands work is to integrate them into your GUI
so your OK-button object does not only have a textlabel-value but also an audiolabel.
this works both ways, one way for accessibility ('hear' what button you will click) and the other way is using your own voice to 'click' it (by saying 'Ok')
Re:Code-by-voice (Score:5, Insightful)
Re:Code-by-voice (Score:4, Insightful)
The parent even mentioned RSI. Not many coders with bad RSI can type faster than speaking it
Re:Code-by-voice (Score:2, Insightful)
I hope this ends up meaning open-source voice-recognition. I do hope it does.
(says a double-crush sufferer under major physical therapy)
Re:Code-by-voice (Score:2)
Re:Code-by-voice (Score:2)
Re:Code-by-voice (Score:2)
Re:Code-by-voice (Score:2)
Exactly! And as the sibling post stated; using higher level languages is the key.
I see an application like this as a first step towards having a computer with the capability of writing the actual code. The task of the programmer is provide the creative energy: describing what the program should do, and how.
Python would be a prime candidate to start with on a project like this. It hasn't been called "executable pseudocode" fo
Re:Code-by-voice (Score:2)
most coders write code much faster than speaking it
This is probably true, but I'm thinking about actual writing of source code coding. What about some of the other things you do while coding? For example, I think a voice-activated debugger could be nice. I could use voice to command the debugger while another window (i.e. a web browser open to the web app I'm debugging) manipulating/controlling the program.
Re:Code-by-voice (Score:4, Funny)
I don't quite see myself sitting at my office going,(read out loud)
if parenthese parenthese invar bitwiseor zero x three parentheseend equals
zero or i less than zero parentheseend curlybrace
No btw erase from first parenthese on last line to second parentheseend
At any rate I'd guess I can write code faster than I can talk it.
Re:Code-by-voice (Score:1)
Re:Code-by-voice (Score:2)
and "assign" or something for =. And or would be ||
Still, seems like a pain.
Re:Code-by-voice (Score:2, Interesting)
"If-block."
if( [condition] ){
[body]
}
"Condition or."
if( [left side] or [right side] ){
"Right side I lessthan zero. Left side parens equals zero."
if( ([number])==0 or i < 0 ){
"Number invar bit-or hex three."
if ( ( inVar | 0x3 ) = 0 or i < 0 ) {
"Body."
I might not switch, but I'm sure it could be made usable with some good design.
-Joahcim.
Re:Code-by-voice (Score:3, Interesting)
Re:Code-by-voice (Score:2)
Yeah, but can you write it faster than you can think it?
Re:Code-by-voice (Score:2)
Of course, the latter would be more efficient, but only if you don't wear your tinfoil hat
Re:Code-by-voice (Score:2)
Also, while a language may not use many phrases, identifiers do. It would be a real nuisance having to spell out all my portmanteau camel-case identifiers.
Re:Code-by-voice (Score:1)
Re:Code-by-voice (Score:2)
Altough your comment is partially relevant to the article, it does not describe what I was referring to.
From the link you posted:
"This project proposal serves to take Eclipse into the voice application space. These tools can be used to develop interactive voice response (IVR) systems based on VoiceXML standards, such as speech-driven applications for performing bank transfers, retrieving e-mail, or querying
Re:Code-by-voice (Score:2)
But would that eliminate RSI or just relocate it? Do you really want to be the first in your cube farm with carpal tongue?
IBM is great! (Score:4, Funny)
Re:IBM is great! (Score:1)
BTW, you might want to fix up your name on the site... I understand that you jumbled it up at the top of the page, but why at the bottom too?
Re:IBM is great! (Score:1, Funny)
One microdrive [amazon.com] suppository coming right up!
Why? (Score:5, Interesting)
Corporations dont usually give a way stuff for nothing, in fact their mission by law is to maximize profit.
Re:Why? (Score:1, Insightful)
Re:Why? (Score:5, Insightful)
Re:Why? (Score:2)
Actually, no. Speech rec is a pretty big business in call centers, as in "Speak the name of the city which you want to fly to..." That sort of thing is hard to do with touch-tone, and expensive to do with live agents. IBM's has a
Re:Why? (Score:3, Insightful)
The same scenario applies here.
Re:Why? (Score:1)
Re:Why? (Score:5, Insightful)
They don't make money on software like other companies. The software they develope is used to provide solutions to other people's problems.
Problems they pay IBM to fix. A large portion of the world is now using Linux for stuff. It's free, it's stable, it's as good as a midrange server OS as anything else out there.
They want to use Linux, IBM wants to get their money. So IBM supports Linux.
Also other aspects is what IBM likes. IBM needed a new OS for everything. They have Mainframes, Unix servers, database servers. S/390, Power series, AS/400, etc etc etc.
For a long time IBM dumped money into propriatory software. Once the platform was antiquated, so was their software, and so the millions of dollars of money they put into their own closed source software is a dead end in just a few years. For all the mainframes, database software, developement software, power series, x86, etc etc etc . All these can be fuffilled by Linux. A open source software OS can provide all the functionality that they NEED.
Of course something like OS/400 is better then Linux at running databases, but IBM has the capabilities of making nearly as good. Also this developement also benifits other platforms they support, that OS/400 won't run on.
Buy using Linux they reduce the duplication of effort. No more OS/400 then AIX then this , then that. All of it can be linux, on nearly all their hardware. They just have to make it work.
That's just one of the reasons. They make money from solutions, not software. People buy IBM to make things work, they don't care HOW or WHY, but they want things to work. With Linux they can get things working, cheaper, and eventually cheaper.
No more dumping billions of lines of code into various bits of software that don't integrate and will be obsolete in 3 years. Linux has the potential, thru it's system design and open-ness and flexiblity to never go obsolete.. It'll just change with the times.
Plus IBM would like to see Linux on the desktop, so they can basicly tell microsoft to fuck themselves when time comes.
With this particular bit of software it ties into their websphere and database efforts. Reseptionists can just talk into the computer, people can just talk into the phone, and the computer understands.
But it's worthless without the database and the infrastructure to back it up. If most of the rest of the infrastructure is open source to their customers, why make this little bit of it closed source? It just doesn't make sense.
Sensationalist headlines like "cost IBM 10 million dollars to produce" is misleading.
IBM doesn't give a flying fuck how much money it cost to make it.
There is a well know thing called "sunk cost". It basicly means that money that is spent, is spent. Your not going to get it back. You don't survive long in business if you don't "get" this concept.
A extreme example:
Say you spent 100,000 dollars on a Windows solution. You have found out now that a Linux solution costing 2000 dollars can do what you want, and better.
Your potential to make money on the new system is very high. Your potential to make money on the old system is very low.
Which is smarter? To dump the old software and go with the new to make lots and lots of money? Or to keep the old software just because "you don't want to waste the 100,000 dollars".
A intellegent person will go with the money making sceme and dump the money pit. A stupid person will be blinded by the sacrifice and stick with the old solution because they can't think clearly.
IBM is all about making money. If they figure they can save money by using Linux vs AIX they will. They do recommend it to some of their existing AIX customers...
Think about it this way:
Linux is cheaper and almost as good. IBM saves money, their customers save money. More saved money by IBM customers means that they are more likely to grow and make even more money.
Re:Why? (Score:3, Insightful)
The other thing is that they like the idea of having one very common OS, like windows is. But with windows someone else controls it. With Linux IBM can go where they want, even if Linus wants to go somewhere else. Now I imagine I
Re:Why? (Score:4, Insightful)
They don't make money on software like other companies. The software they develope is used to provide solutions to other people's problems.
No. IBM makes lots of money off software and patents for software processes. WebSphere Application Server, WebSphere Portal Server, Lotus Notes, and of course DB2 make up over a billion dollars in revenue last I heard. Granted, that's less than 5% of IBM's total revenue but it's still income.
They want to use Linux, IBM wants to get their money. So IBM supports Linux.
For IBM Global Services, yes. For Server Group's blade series, yes. For Software Group, hell no. Where is the Lotus Notes client that runs on anything but Windows?
For a long time IBM dumped money into propriatory software. Once the platform was antiquated, so was their software, and so the millions of dollars of money they put into their own closed source software is a dead end in just a few years. For all the mainframes, database software, developement software, power series, x86, etc etc etc . All these can be fuffilled by Linux. A open source software OS can provide all the functionality that they NEED.
No. z/OS has far more capabilities in the traditional business-oriented mainframe space than Linux at present, and it's stupid for IBM to try to push a Unix-like OS into a tightly-controlled mainframe environment. IBM *is* pushing Linux-on-mainframe as a consolidated web hosting environment, but IBM has no plans to kill z/OS.
No more dumping billions of lines of code into various bits of software that don't integrate and will be obsolete in 3 years. Linux has the potential, thru it's system design and open-ness and flexiblity to never go obsolete.. It'll just change with the times.
Not really. First, *lots* of IBM's software never exits the lab, and much that does dies a nasty death in the market. (See Tivoli for dozens of examples.) Second, IBM is riding the Linux bandwagon simply because *it has to* in order to survive.
Plus IBM would like to see Linux on the desktop, so they can basicly tell microsoft to fuck themselves when time comes.
No they don't. If they did they would port Lotus Notes (IBM's flagship desktop application) to Linux.
Sensationalist headlines like "cost IBM 10 million dollars to produce" is misleading.
IBM doesn't give a flying fuck how much money it cost to make it.
IBM does care, a lot, about how much it costs to build something. Let me tell you an IBM internal secret: Eclipse was meant to take down *MS Visual Studio* back in *2000*. Yes, IBM was hoping that Eclipse would *outsell* VS, and when that obviously couln't happen IBM turned it into a marketing win. And lest we forget history already: it took several months of open-source activity before Eclipse was usable by the masses.
Say you spent 100,000 dollars on a Windows solution. You have found out now that a Linux solution costing 2000 dollars can do what you want, and better...
A intellegent person will go with the money making sceme and dump the money pit. A stupid person will be blinded by the sacrifice and stick with the old solution because they can't think clearly.
An intelligent person will evaluate the total business cost of that solution, and ask themselves if they have enough in-house experience to run the Linux solution with the same apparent reliability as the Windows solution. If you've got some *nix talent in-house, the switch is worth it. If you don't have that talent, then the *one-time* cost of $98,000 is more than offset by the continual cost of a new full-time salary.
Think about this: I could go with a cheapo MS MS SQL setup for my company or a expensive IBM database.
Or you could look at the "free" open-source database and cut both Microsoft and IBM out of the picture.
Because it works 99.99995% of the time, an
Re:Why? (Score:2)
That might not be their motivation, but it might be the *exact* reason why God created IBM.
Good will and a tax deduction (Score:3, Insightful)
The code was donated to 2 non-profit organizations (Score:1)
The Apache Software Foundation (ASF) is a non-profit 501(c)(3) corporation, incorporated in Delaware, USA, in June of 1999.
From http://www.eclipse.org/org/documents/Eclipse%20BYL AWS%202003_11_10%20Final.pdf [eclipse.org]
The Eclipse Foundation is formed exclusively as a non-profit trade association, as set out in section 501 (c) (6) of the Internal Revenue Code (the "Code").
Re:Why? (Score:2)
I'd love to get my house wired star trek style, and now (hopefully) this is one less issue I need worry about... where to find reliable open source voice recognition
Re:Why? (Score:2)
I'm sure that this post will get modded as "flamebait" or "troll" but, from my own experience using ViaVoice, this looks more like yea ole "instead of admitting that our product is a failure, we'll just turn it into a marketing coup by releasing it as open source" strategy.
I purchased ViaVoice about a year or so ago as an add on to a digital recorder for my wife. My wife followed all the instructions very carefully. She went through the training phase and would always speak very s
That means one ore thing missing in linux gone? (Score:5, Interesting)
Hooray for IBM and as Ali said in the Linux ad "don't back down"!!
Re: (Score:2)
Viable (Score:3, Insightful)
Re:Viable (Score:1)
>you take into account the different accents,
>dialect and slang, is it just a pipe dream? Is it a
>software or hardware related issue?
the real problem is propably the different voices. the others i can live without.. ( besides it's good for my mind to be able to say some *()$#% dialect that the computer won't understand )
Re:Viable (Score:2, Informative)
Re:Viable (Score:2)
Verizon Business Internet unit uses voice recognition for their first level of the help system. It asks for your account number (say or key it in) and then attempts to ask you abou
Re:Viable (Score:3, Interesting)
I used to work for MacSpeech, we also did large vocabulary dictation systems like ViaVoice.
Back when I was there it really wasn't viable for most people.
However, not all people can type, this includes both the "Hands Free" market (disabilities) and the "Hands Busy" market. Surprisingly, many people also don't want to type, this includes medical and legal professionals. They have an interesting problem, they often need to generate large amounts of boilerplate text quickly. Doctors, Radiolo
Re:Viable (Score:2)
Re:Viable (Score:1)
More than viable. Call Amtrak and get a few train schedules.
Around 2 decades late... (Score:5, Funny)
Re:Around 2 decades late... (Score:2)
Human-Centered Computing! (Score:5, Interesting)
My brother (who works for IBM) recently sent me an article on USA Today [usatoday.com] about the system IBM and Honda have developed for speech-interface with a GPS-enabled navigation computer. Really cool stuff.
For those of you who haven't read it, check out The Unfinished Revolution [harpercollins.com] by Michael Dertouzos. I don't agree with all of his analysis (he was a little lacking in pragmatism on some points), but overall this book was very insightful. This book, along with Weaving the Web [w3.org] by Tim Berners-Lee, caused a big paradigm shift in my thinking about computer technology.
Code or training? (Score:5, Insightful)
So my question is- will the code released include training to make it work and or will someone be able to put together the necessary resources to train the system.
Re:Code or training? (Score:4, Interesting)
Actually it should be quite easy: The client reads your keyboard and the microphone, and you are supposed to speak loudly whatever you type. The training results are regularly exchanged with the central server.
Re:Code or training? (Score:1)
And they give us a free lunch coupon afterwards. Will read script for food. :-)
Re:Code or training? (Score:2)
Anyone whose ever worked with a netural network can tell you that real training is a sort of half skill half art. Even assuming you can get all the people to read the script (a big undertaking but certainly doable), you'd have to know how to train
psh (Score:1, Funny)
Oh...this is voice recognition...umm...let me revise.
They'll never be able to understand Microsoft Sam!
HTK is already availabale as open source (Score:5, Informative)
This is not earth-shattering news, since HTK has been available for some years. HTK was owned by a company called Entropic and was released as open source when it was bought by Microsoft. HTK can be found at http://htk.eng.cam.ac.uk/ [cam.ac.uk]. and can handle network grammars. This lessens the impact of IBM's news.
Re:HTK is NOT availabale as open source (Score:5, Informative)
2.1 The Licensor hereby grants the Licensee a non-exclusive license to a) make copies of the Licensed Software in source and object code form for use within the Licensee's organisation; b) modify copies of the Licensed Software to create derivative works thereof for use within the Licensee's organisation.
2.2 The Licensed Software either in whole or in part can not be distributed or sub-licensed to any third party in any form.
This license is in no way Open Source [opensource.org]. Yes, you can play with the source, but you cannot build something useful with it and redistribute under the same license.
Re:HTK is NOT availabale as open source (Score:4, Informative)
Re:HTK is already availabale as open source (Score:2, Informative)
FreeTTS [sourceforge.net] is a speech synthesizer written entirely in the Java programming language.
Re:HTK is already availabale as open source (Score:1)
I don't know about that. Checking out the HTK license shows the following:
Can I build & sell products based on HTK3? You may build a product but you are not allowed to redistribute (parts of) HTK3, i.e. you can't ship shrink-wrap boxes with products that contain HTK3 code.
That's no where close to a license providing freedom. It'll be interesting to see what license IBM picks.
Eclipse licensing (Score:2, Informative)
All new contributions will be under the EPL, so if IBM wants to donate anything to the Eclipse project it will be under this license.
Nice M$-Comment at the end (Score:5, Interesting)
Speech code from IBM to become open source
And even better.. the comment from Microsoft, quoted at the end of the article
"IBM has not executed in bringing this technology to a broad market as Microsoft has."
Beside the jokes; The article states as well that Microsoft introduced their Speech Server 2004 last March, and that 100,000 software programmers have downloaded Microsoft's free software developers' kit for building speech applications on its Windows
Re:Nice M$-Comment at the end (Score:2)
You could build 10,000 boxes and sell them around the world without any licensing fees.
That is somewhat different from a solution developed with Microsoft Speech Server 2004.
Re:Nice M$-Comment at the end (Score:2, Interesting)
You could build 10,000 boxes and sell them around the world without any licensing fees.
That is somewhat different from a solution developed with Microsoft Speech Server 2004.
Afraid not. IBM is open sourcing 2 things, neither of which is their speech recognition engine. One is just a JSP library, with some tags for generating voicexml f
Sphinx (Score:5, Informative)
Reed
Re:Sphinx (Score:3, Interesting)
Thanks.
Re:Sphinx (Score:2)
voice vs speech recognition (Score:1)
This is an example of speech recognition not voice.
Voice recognition is identifiying an individual by there voice. Example: movie Sneakers which any good geek should have seen. "My voice is my passport, verify."
Speech recognition is simply trying to identify the words being spoken. Like the lackluster system used by United when you call up to get flight times.
Re:voice vs speech recognition (Score:2)
Voice recognition is in fact a _lot_ simpler than speech recognition. I worked on a research project about 5 years back which involved voice recognition - it was based on a relatively simple mathematical model, required very little training, and was quite accurate. I wanted to combine it with a speech recognition system (so while someone was using the speech
Voice software (Score:5, Funny)
Re:Voice software (Score:2)
I'd seen it as a cartoon printed out on an office wall once.. I'd have given credit if I knew where it came from. I guess I should've at least said it wasn't mine, but fwiw now: it wasn't mine.
Beer? (Score:5, Funny)
If it's given to Apache, it'll be the Apache lic. (Score:2)
(Duh.)
Let me know when they start giving away... (Score:3, Funny)
VoiceXML IDE (Score:2, Interesting)
It's a product based on the Eclipse patform (not a plugin, more a standalone application).
It's a VoiceXML-oriented IDE. In a nutshell, VoiceXML is a specification that defines how to make a speech recognition (or DTMF) application for the *phone* (not the desktop) using a Web model (that is, exchanging documents over HTTP). The toolkit developped by IBM allows programmers to build call flows graphically, to edit VoiceXML and grammar documents
Either way... (Score:2, Insightful)
Maybe. When did IBM come out with ViaVoice? It's been a number of years. They even offered it for Linux for a while. When did Microsoft jump on board? Maybe Mr. Mastan's statement is just bull too.
Either way, I'm glad to see IBM doing this. Voice recognition enabled programs open's a whole new and exciting frontier for software developer's both on the desktop and in embedded projec
Re:Either way... (Score:2, Interesting)
They're talking about their voicexml tools. They're open sourcing some tools for developing voicexml-based speech applications that run in a call center somewhere, replacing "press 1 for this, press 2 for that" with "say the name of a city and state, an
In other news, (Score:4, Funny)
Re:Obligatory.... (Score:1)