Web Analytics Databases Get Even Larger

CurtMonash writes "Web analytics databases are getting even larger. eBay now has a 6 1/2 petabyte warehouse running on Greenplum — user data — to go with its more established 2 1/2 petabyte Teradata system. Between the two databases, the metrics are enormous — 17 trillion rows, 150 billion new rows per day, millions of queries per day, and so on. Meanwhile, Facebook has 2 1/2 petabytes managed by Hadoop, not running on a conventional DBMS at all, Yahoo has over a petabyte (on a homegrown system), and Fox/MySpace has two different multi-hundred terabyte systems (Greenplum and Aster Data nCluster). eBay and Fox are the two Greenplum customers I wrote in about last August, when they both seemed to be headed to the petabyte range in a hurry. These are basically all web log/clickstream databases, except that network event data is even more voluminous than the pure clickstream stuff."
  • by coryking ( 104614 ) * on Thursday April 30, 2009 @09:38AM (#27771865) Homepage Journal

    These little puppies [], i.e. recursive queries, look pretty cool too. Sounds like a good tool for threaded comment systems or finding related items in a table:

    Recursive queries are typically used to deal with hierarchical or tree-structured data. A useful example is this query to find all the direct and indirect sub-parts of a product, given only a table that shows immediate inclusions:

    WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
            SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
        UNION ALL
            SELECT p.sub_part, p.part, p.quantity
            FROM included_parts pr, parts p
            WHERE p.part = pr.sub_part
    SELECT sub_part, SUM(quantity) as total_quantity
    FROM included_parts
    GROUP BY sub_part

    ... It will take a while to wrap my brain around this new concept though. That doesn't look like a normal query I'm used to reading!

    They'll get replication some day soon. But there is a lot of cool, very useful stuff with every new release. I usually feel like kid in a candy store wondering what's new that I can exploit.

  • by TooMuchToDo ( 882796 ) on Thursday April 30, 2009 @10:17AM (#27772363)
    I have to say, I love postgresql. We use it to store hundreds of gigabytes of metadata for our 17 petabyte disk/tape storage system at my day gig.
  • Google? (Score:2, Interesting)

    by wiedzmin ( 1269816 ) on Thursday April 30, 2009 @11:29AM (#27773499)
    Who cares about eBay and MySpace... tell me about the major players! What is Google running?

