Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Databases

Web Analytics Databases Get Even Larger 62

CurtMonash writes "Web analytics databases are getting even larger. eBay now has a 6 1/2 petabyte warehouse running on Greenplum — user data — to go with its more established 2 1/2 petabyte Teradata system. Between the two databases, the metrics are enormous — 17 trillion rows, 150 billion new rows per day, millions of queries per day, and so on. Meanwhile, Facebook has 2 1/2 petabytes managed by Hadoop, not running on a conventional DBMS at all, Yahoo has over a petabyte (on a homegrown system), and Fox/MySpace has two different multi-hundred terabyte systems (Greenplum and Aster Data nCluster). eBay and Fox are the two Greenplum customers I wrote in about last August, when they both seemed to be headed to the petabyte range in a hurry. These are basically all web log/clickstream databases, except that network event data is even more voluminous than the pure clickstream stuff."
This discussion has been archived. No new comments can be posted.

Web Analytics Databases Get Even Larger

Comments Filter:
  • by coryking ( 104614 ) * on Thursday April 30, 2009 @09:38AM (#27771865) Homepage Journal

    These little puppies [postgresql.org], i.e. recursive queries, look pretty cool too. Sounds like a good tool for threaded comment systems or finding related items in a table:


    Recursive queries are typically used to deal with hierarchical or tree-structured data. A useful example is this query to find all the direct and indirect sub-parts of a product, given only a table that shows immediate inclusions:

    WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
            SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
        UNION ALL
            SELECT p.sub_part, p.part, p.quantity
            FROM included_parts pr, parts p
            WHERE p.part = pr.sub_part
        )
    SELECT sub_part, SUM(quantity) as total_quantity
    FROM included_parts
    GROUP BY sub_part

    ... It will take a while to wrap my brain around this new concept though. That doesn't look like a normal query I'm used to reading!

    They'll get replication some day soon. But there is a lot of cool, very useful stuff with every new release. I usually feel like kid in a candy store wondering what's new that I can exploit.

  • by TooMuchToDo ( 882796 ) on Thursday April 30, 2009 @10:17AM (#27772363)
    I have to say, I love postgresql. We use it to store hundreds of gigabytes of metadata for our 17 petabyte disk/tape storage system at my day gig.
  • Google? (Score:2, Interesting)

    by wiedzmin ( 1269816 ) on Thursday April 30, 2009 @11:29AM (#27773499)
    Who cares about eBay and MySpace... tell me about the major players! What is Google running?

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (5) All right, who's the wiseguy who stuck this trigraph stuff in here?

Working...