Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Google Businesses The Internet

Google Raises Word Limit 71

Philipp Lenssen writes "Google quietly raised their web search limit to 32 words. Previously, only up to 10 words were allowed per query, with succeeding words being ignored. This is not only important to specific approaches of advanced searching (for example, when you need to exclude many different keywords using the minus operator), but it's also of great help to certain tools using the Google API. While there doesn't seem to be any official statement from Google yet, some more details can be found at my Google blog."
This discussion has been archived. No new comments can be posted.

Google Raises Word Limit

Comments Filter:
  • ...what the first 32 word google bomb will be.
  • Finnally. (Score:3, Insightful)

    by Phantombantam ( 850513 ) on Saturday January 22, 2005 @08:18PM (#11444944) Homepage
    About time. I always thought of the 10 word limit as gogle's biggest setback.
  • Great (Score:5, Interesting)

    by lastninja ( 237588 ) on Saturday January 22, 2005 @08:27PM (#11444983)
    Now you can search for quotes, without having to strip half of the words away. Just cut and paste it in to the browser. I guess this will also make it easier to search for source-code, as it is now you will likely end up at a documentation - site. When you want is some sourcefile from some Sourceforge project.
  • very complex (Score:3, Interesting)

    by St. Arbirix ( 218306 ) <matthew.townsendNO@SPAMgmail.com> on Saturday January 22, 2005 @08:37PM (#11445031) Homepage Journal
    32 word searching increases the complexity of the search many times over. For a ten word search you're usually talking about finding all documents with all ten words, ordering them by how many of the searched terms were found, and then by their linked-to values. With 32 you're finding ~3.2x as many documents, comparing for 3.2x as many words in each documents, and then finding how popular they were.

    So, um, wow.
    • Re:very complex (Score:5, Insightful)

      by damiangerous ( 218679 ) <1ndt7174ekq80001@sneakemail.com> on Saturday January 22, 2005 @08:47PM (#11445059)
      How are you finding 3.2x as many documents? You should be finding fewer documents, not more.
      • Does google limit search results to documents that contain every single word you've queried for? ...

        Oh. I see. In that case the complexity only increases with the number of times the document is passed over for each word, or 3.2x which is probably over twice as high as the average number of times a document needs to be scanned for a word before finding that it doesn't contain a word...

        That's not too hard. Why is there a limit?
      • Yes, you do find fewer documents, but you have many more potential documents. That is, when a search is run, you must find the documents that contain all of the keywords (in most search engines, anyway).

        So say you have search terms a, b, and c. The documents that contain these are found using the index that they construct, and the documents that contain a are set A, the documents that contain b are set B, and the documents that contain c are set C. You must then find the intersection of A, B, and C. A

    • Re:very complex (Score:2, Insightful)

      by Anonymous Coward
      False.

      With 32 words you will be able to find theoreticaly almost any page. The difference is much more that 3.2x

      With 10 words - you can search for about <NumberOfWords> ^ 10 ( number of words in power 10 ), but with 32 words - this will be <NumberOfWords> ^ 32.

      Now think about number of words in all languages Google can support.
      There are fewer than a thousand of the world's 6800 languages have writing systems ( http://www.ethnologue.com/language_index.asp )
      Let's assume that all languages ha
      • Re:very complex (Score:1, Insightful)

        by Anonymous Coward
        The above post is utter nonsense. (insightful?)

        First of all, all Google search words are required to be present on a webpage, so adding more words lowers the number of hits.

        Besides that, the reasoning above is absurd. Why should the number of possible searches correspond to the number of hits?

        And the number of languages in the world is appearing in the equations above? Even when probably 90% of all webpages are written in english?

        Theres nothing interesting about the sheer number of possible searches. Af
    • 32 word searching increases the complexity of the search many times over.

      Are you sure about that?

      house - 294,000,000
      house car - 24,700,000
      house car boat - 6,250,000
      house car boat dog - 1,570,000
      house car boat dog smoke - 412,000
      house car boat dog smoke funny - 163,000
      house car boat dog smoke funny slashdot - 2,200
  • by fluor2 ( 242824 ) on Saturday January 22, 2005 @08:56PM (#11445098)
    characters like !,.'$ is pretty much not supported by google. i would like those to be included in the future.
  • Matching MSN Search? (Score:5, Interesting)

    by Utopia ( 149375 ) on Saturday January 22, 2005 @10:07PM (#11445482)
    Looks like the limit was raised to match
    MSN's new search [msn.com] whih has has sported a bigger word limit for quite some time.

  • Great! (Score:1, Funny)

    by Anonymous Coward
    Now when I do really specific searchs I can get truly relevant google ads!
  • by prostoalex ( 308614 ) * on Saturday January 22, 2005 @10:13PM (#11445510) Homepage Journal
    I discovered how to make a Firefox plugin for limiting Google searches [moskalyuk.com] to select few sites, but the problem before was that each site:domainname.com directive was treated as a term. So if you wanted to search 7 sites at once, then google would let you enter maximum of 3 keywords to span that search across multiple sites. So this keywords increase, you can do stuff like 5-word searches across 10 domain names, for example.
    • by gl4ss ( 559668 ) on Sunday January 23, 2005 @01:15AM (#11446282) Homepage Journal
      though.. it's still not good enough.

      what I would hope for them to introduce would be a word blacklist that would be personal, and that you could include at least a thousand terms in it.

      why? TO AVOID THOSE FUCKING LINKFARMS, they usually have the same advert links in them so just adding the referral id of the owner of a certain farm will get a lot of meaningless sites out of the search. it's doable now if you make your own program that does the filtering(using googleapi. there's two ways, either go to the sites yourself or request the cache from google.. massive traffic in any case for you and the search will take ages to complete).
      • A personal blacklist is a pretty good idea. Google is already working on personalized search [google.com] based on a profile which contains a list of interests. They should try out more personalization like that.

        However, I don't think that's a good solution for getting rid of link farms. Google should deal with those itself because they mess things up for everybody. They should keep tweaking their alogirthms to detect link farms better and encourage people to report them [google.com].

  • The problem with getting good search results are synonyms (different words that mean the same thing) and homonyms (the same word that means different things). With the 32 word limit, you can avoid both of these problems by following a few simple steps- Let's say, for instance, that you live in new york city and are looking for a moving company that specializes in fragile antiques... typically, the vagueness of such a query makes it hard to find good results, but not if you follow these steps:

    1. Break your search into 2-4 principal, independent concepts- In my example, the concepts are NYC (the location) moving company (the company type) and antiques (the specialty)

    2. For each concept, come up with as many terms as you can that are descriptions or examples of the concept that are very specific and won't trigger homonyms- For instance, you wouldn't want to use the word "New York" because it is too vague and could refer to the state (a company in Albany, NY won't help you). However, "NYC" "Long Island" "Brooklyn" "Queens" "New York City" are great, even if they seem overly specific- You just need one of them to cause a hit on a relevant page.

    3. Put parenthesis around the terms for each concept (be sure to put quotes around each compound term) and OR together the items inside parentheses.

    This is what the entire search might look like:

    ("NYC" OR "Long Island" OR "Brooklyn" OR "Queens" OR "Manhattan" OR "Bronx" OR "New York City" OR "Big Apple") ("moving company" OR "moving companies" OR "specialy movers" OR "professional movers" OR "u-haul" OR "apartment movers") ("fragile" OR "antiques" OR "china" OR "difficult to move")

    It takes a bit of time to put together (and google will run slooooow because this kind of logic is very difficult for the search engine), but a search like this will give you the best possible results on hard queries.
  • I was searching last night for Warez^H^H^H^H^HOpen Source Software downloads and it wasn't giving me any greif about what seemed to ba a fairly long search string.

    [/curiousity]
  • Regexp (Score:4, Insightful)

    by John Hasler ( 414242 ) on Saturday January 22, 2005 @11:19PM (#11445840) Homepage
    Now, if they will just accept regular expressions.
    • They probably heard this statement [granger.free.fr].

    • Now, if they will just accept regular expressions.

      Classic newbie mistake. The biggest problem with search engines is that they return too many answers not too few. Adding regular expressions or stemming makes your answer set even bigger.

      What we need is ways to make the answer set smaller, not larger. Hence the benefit of clustering, for example (see for example http://vivisimo.com/search?query=search+trees&v%3 A sources=Web [vivisimo.com]).
      • Re:Regexp (Score:2, Insightful)

        by vladd_rom ( 809133 )
        >> The biggest problem with search engines is that they return too many answers not too few. [...] What we need is ways to make the answer set smaller, not larger.

        The problem that annoys you is not the size of the answer set, but the lack of a proper sorting function (by relevance) to satisfy you. The fact that you find your desired answer at the 10th or the 30th position is a sign that sorting doesn't work like you'd expect it to. It has nothing to do with the size of the answer set.

        I don't want a
        • I don't want a smaller answer set, I want a bigger one.

          Maybe you do, but most users don't. Less than 30% click next page.

          Reading on, I think what you mean to say is that you would like the answer to be selected from a larger set expanded perhaps to include stemming. In principle that sounds fine, in practice a decent answer is almost always contained in the 31 million+ pages that google returned.

          The problem was that google didn't understand that in the search for "tree" the user meant binary search tree
          • Maybe you do, but most users don't. Less than 30% click next page.

            And what percentage of users can write regular expressions? Probably less than 0.3%, so what's the problem?

            Anyway, your thesis that regexps will lead to longer result lists is incorrect. If I really want to search for "Windows (95|98)", today my only recourse is to enter "Microsoft Windows" and then manually skip the (majority) of irrelevant hits, or to search for both "Windows 95" and "Windows 98", then manually unify the two returned l
            • I actually have first hand data on this, this isn't just speculation. They are rarely used, they generally increase the size of the result set, and they increase the workload substantially.

              • They are rarely used, they generally increase the size of the result set, and they increase the workload substantially.

                Wow, self-contradiction within the scope of a single sentence. If they're "rarely used", then they can't possibly increase workload very much,
                • If they're "rarely used", then they can't possibly increase workload very much

                  Easy: if an operation is sufficiently expensive, the actual cost is noticeable, even when rare. This is known as heavy tail meaning that "a relatively small number of very high cost events skews a mean calculation".

                  No contradiction there.

            • In this example I believe "Windows 95" OR "Windows 98" [google.com] would do the trick.

              Of course regular expressions would be nice, but I just don't see them happening any time soon due to inherit resource requirements.

  • by Guspaz ( 556486 ) on Saturday January 22, 2005 @11:57PM (#11445989)
    "it's also of great help to certain tools using the Google API"

    Hardly. The Google API is limited to 1000 searches per day, making it useless for any sort of web application. About the only thing I can think of that it would be useful for is a desktop program in which the user would only perform a limited number of searches.
    • A java/flash, etc. applet would work, wouldn't it? Or do they limit the daily use with some sort of developer account embedded in the code?
      • It's a developer account. So each search each user did would contribute towards your total.
        • The solution is obvious. You set up a web app and get your visitors to sign up for a Google API account and copy the details in to a profile. They log in, do their searches, it deducts from their own totals and the web app just plods merrily onward.
    • Hardly. The Google API is limited to 1000 searches per day, making it useless for any sort of web application.

      Well, it appears to be useless for your web application. In my opinion, 1,000 queires a day seem a lot for a non-commercial product. Google may add a commercial program that allows more than 1000 queries per day: (google answer: http://www.google.com/apis/api_faq.html#gen15 [google.com].

      Lastly, I always like to mention the API is a new, free, and beta service. My gut says that if you need more than 1,000

      • That doesn't make any sense. Are you saying that a non-profit web app couldn't attract more than 300 to 500 users per day? That's nothing.

        It's said that they may open a commercial program for it for years now, it's not going to change anytime soon.
        • Are you saying that a non-profit web app couldn't attract more than 300 to 500 users per day? That's nothing.

          Sure it could, and if all the web app did was search, the the 300 to 500 users (or more) would exceed the 1,000 queries per day. On the other hand, if all the web app did was search, why would Google want you to freely take people away from their search engine?

          IMO, this API could be put to a good, supplemental use in an application (one where searching could happen, but is not the primary focus).

          • Even with an application, you'd be artificially limiting the possible growth. It wouldn't take very many users before you'd have to remove any such feature from the application because you'd have hit your 1000 search cap.

            And who said anything about taking people away from google freely? The problem is they don't allow you to purchase more searches.
    • Hardly. The Google API is limited to 1000 searches per day, making it useless for any sort of web application.

      Perhaps for a pure non-profit web app, but if you're collecting advertising revenue you might be able to slide some of this Google's way for a higher limit.

      Has anybody actually talked to someone at Google about licensing? (i.e. not just what's on the FAQ)
  • Seems like another step in the evolution towards Google Grid / EPIC [broom.org]
  • The 32 word thing is cool... But adding the ability to add distribution lists to my contacts in GMail would be WAY more useful
  • Now I'll be able to search for the exact error message my windows boxes toss at me. Woo hoo!

    If google had raised it's limits earlier, I could have skipped that school diploma and just went right into I.T. support.

  • In no particular order:

    * A better query language, with wildacrds ("Word*") or stemming, proximity operators, parentheses, complex boolean expressions (something like what Dejanews and the pre-Yahoo AltaVista used to offer).

    * Filtering out linkfarms and search-pages.

Save the whales. Collect the whole set.

Working...