Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Google Open Source Programming

Google Open-Sources SyntaxNet Natural-Language Understanding Library, Parsey McParseface Training Model 56

Google announced on Thursday that it is open sourcing its new language parsing model called SyntaxNet. It's a piece of natural-language understanding software, Google says, that you can use automatically parse sentences, as part of its TensorFlow open source machine learning library. The company also announced that it is releasing something called Parsey McParseface (Google has a sense of humor), which is a pre-trained model for parsing English-language text. Nate Swanner of The Next Web, attempts to explain it: Combining machine learning and search techniques, Parsey McParseface is 94 percent accurate, according to Google. It also leans on SyntaxNet's neural-network framework for analyzing the linguistic structure of a sentence or statement, which parses the functional role of each word in a sentence. If you're confused, here's the short version: Parsey and SyntaxNet are basically like five year old humans who are learning the nuances of language. In Google's simple example above, 'saw' is the root word (verb) for the sentence, while 'Alice' and 'Bob' are subjects (nouns). Parsey's scope can get a bit broader, too.
This discussion has been archived. No new comments can be posted.

Google Open-Sources SyntaxNet Natural-Language Understanding Library, Parsey McParseface Training Model

Comments Filter:
  • by Anonymous Coward

    It's a piece of natural-language understanding software, Google says, that you can use automatically parse sentences, as part of its TensorFlow open source machine learning library.

    YOU CAN USE AUTOMATICALLY PARSE SENTENCES

  • So, can Parsey McParseface make sense of what manishs posts? Because I generally can't. I assume that the example sentence from the summary probably came from the article, but for some reason the "editor" didn't think to read his summary to make sure that it actually made sense out of context.

    • by HiThere ( 15173 )

      The claim was parse, not make sense of. And it's not clear that it can parse all sentences. Some sentences can't be unambiguously parsed even when you know the context and each included word.

    • It all read just fine to me. The only mistake I noticed was that

      natural-language understanding software

      should have been

      natural-language-understanding software

      since it is the software doing the understanding, not the language. The quote itself is clear and concise. If you didn't understand it that probably just means you lack the technical vocabulary to even make use of the tool.

  • "Parsey McParseface (Google has a sense of humor)"
    more like dour corp peons at google tries hard, very hard, to appear humorous.
    even tay had better humor

  • James while John had had had had had had had had had had had a better effect on the teacher.

  • The company also announced that it is releasing something called Parsey McParseface (Google has a sense of humor)..

    If by 'sense of humor' you mean 'a repeat of something that was humorous a while ago under a different context'.

    • You parsed it wrong. "Sense of humor" here does not indicate that the words are funny; it indicates that the words are goofy or foolish, and that Google was willing to let a thing be named that way.

      I recommend checking a dictionary. There are about a dozen meanings of the word humor, and probably half of them cover this particular usage. One advantage of a computer parser is that it is unlikely to reject a valid statement merely because it didn't consider all of the known patterns.

    • 'a repeat of something that was humorous a while ago under a different context'

      like your sig?

      I'm here all night, try the veal.

  • by jeffb (2.718) ( 1189693 ) on Thursday May 12, 2016 @07:20PM (#52101929)

    Fruit flies like a banana.

  • How large is the permutation of all parsable sentences?

    A concise version of the Library of Babel [wikipedia.org] expressing every idea if a language?

    • The set of all parsable sentences is trivially unbounded, at least in English.

      A sentence can go on, {and on,}* and on.

      • I once started to write some software to analyze books and find all sentence structure on a book, but got too lazy and quit.lso could not find any data sets.

        While all parsable sentences is unbounded, the ones limted to human understanding are.

        • I can't see why they would be. More rigorously, I don't think you can establish a bound on the length of sentences that are humanly understandable. The sentences generated by my little example are all humanly understandable, for example, even though they're of unbounded length.

  • Parsey McParseface (Google has a sense of humor)

    Not really, because Xy McXface is not funny for any value of X.

  • 94% syntax is definitely good, for a machine learning parser. Now if you were to come to the land of rule-based parsers, 94% is the norm.

    Google loves machine learning, and it's easy to see why. That's how they made their whole stack. They have the huge amounts of data to train on, and the hardware to do so. It's so seductive to just throw a mathematical model at huge amounts of data and let it run for a few weeks.

    Rule-based systems don't need any data to work with - they just need a computational linguist to spend a year writing down the few thousand rules. But the end result is vastly better, fully debuggable, easily updatable, understandable, and domain independent. That last bit is really important. A system trained for legalese won't work on newspapers, but a rule-based system usually works equally well for all domains.

    In 2006, VISL [visl.sdu.dk] had a rule-based parser doing 96% syntax for Spanish (PDF) [visl.sdu.dk] - our other parsers are also in that range, and naturally improved since then. Google is hopelessly behind the state of the art.

    • You kinda alluded to the reason yourself...

      > Rule-based systems don't need any data to work with - they just need a computational linguist to spend a year writing down the few thousand rules

      which seams much more expensive than

      > ... just throw a mathematical model at huge amounts of data and let it run for a few weeks.

      but can now yield nearly equal results. "Machine Learning" sounds cooler than a bunch of if statements too

      • by Jezral ( 449476 )

        which seams much more expensive than

        It'd seem that way, but it's really not if you factor in the whole chain.

        Machine learning needs high quality annotated treebanks to train from. Creating those treebanks takes many many years. It is newsworthy when a new treebank of a mere 50k words is published. Add to that the fact that each treebank likely uses different annotations, and you need to adjust your machine learner for that, or add a filter. Plus each treebank is for a specific domain, so your finished parser is domain-specific. If you want to

    • I have not read the original article, so take my comments with some grains of salt.

      But speaking as one who once wrote a syntactic grammar for a parser of English (still in use by a large manufacturer 30 years later, albeit in modified form), the problem with rule-based grammars that lack any statistical weights is that they come up with an unbelievably large number of parses for many real-world sentences. The problem is then to find which of those parses is the correct one, and that's what statistical weig

      • by Jezral ( 449476 )

        ...the problem with rule-based grammars that lack any statistical weights is that they come up with an unbelievably large number of parses for many real-world sentences.

        Generative grammars suffer from that problem and scales very poorly, and may indeed be impractical to use for real world text. Our constraint grammars [wikipedia.org] and finite-state analysers [github.io] do not have that problem. With CG, we inject all the possible ambiguity into the very first analysis phase, then use contextual constraints to whittle them down, where context is the whole sentence or even multiple sentences. This means performance scales linearly with number of rules.

        So the 96% accuracy claim is suspect, not to mention that a comparison of the Google system is already difficult because Spanish =/= English. (Spanish has more morphology on verbs, it's pro-drop, it has relatively free word order compared to English,...)

        The paper is for Spanish, because that's what I

  • A two-year-old gelding destined to race in Australia has been saddled with the name Horsey McHorseface. (pun intended by editors)

    http://www.bbc.com/news/world-... [bbc.com]

We are Microsoft. Unix is irrelevant. Openness is futile. Prepare to be assimilated.

Working...