Web Analytics Databases Get Even Larger 62

Posted by CmdrTaco on Thursday April 30, 2009 @08:43AM from the who-watches-the-oh-never-mind dept.

CurtMonash writes "Web analytics databases are getting even larger. eBay now has a 6 1/2 petabyte warehouse running on Greenplum — user data — to go with its more established 2 1/2 petabyte Teradata system. Between the two databases, the metrics are enormous — 17 trillion rows, 150 billion new rows per day, millions of queries per day, and so on. Meanwhile, Facebook has 2 1/2 petabytes managed by Hadoop, not running on a conventional DBMS at all, Yahoo has over a petabyte (on a homegrown system), and Fox/MySpace has two different multi-hundred terabyte systems (Greenplum and Aster Data nCluster). eBay and Fox are the two Greenplum customers I wrote in about last August, when they both seemed to be headed to the petabyte range in a hurry. These are basically all web log/clickstream databases, except that network event data is even more voluminous than the pure clickstream stuff."

Web Analytics Databases Get Even Larger

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 62 Comments Log In/Create an Account

Comments Filter:

Web Analytics Databases Get Every Larger? (Score:1, Redundant)

by eldavojohn ( 898314 ) * writes:

"Web analytics databases are getting every larger. eBay now has a 6 1/2 petabyte ...

Um, was there a major development in the English language while I was sleeping last night?
- Re: (Score:3, Funny)
  
  by jez9999 ( 618189 ) writes:
  
  Yesy. It mighty take a whiley to get used to, but I thinky it's quite a plusy overall.
  - Re: (Score:1)
    
    by koutbo6 ( 1134545 ) writes:
    
    Yesy. It mighty take a whiley to get used to, but I thinky it's quite a plusy overally.
    I sure hopey that people checky their grammary more ofteny in the future.
    There, fixedy that for you.
    - Re: (Score:1)
      
      by ivucica ( 1001089 ) writes:
      
      yay
"Every larger"? (Score:1, Redundant)

by dtmos ( 447842 ) * writes:

What's "every larger"? Can I get one, too?
- Re: (Score:2)
  
  by thePowerOfGrayskull ( 905905 ) writes:
  
  I love that some poor sot used up all of his mod points except one to mod down this entire conversation.
  Wait... one left. Oh, shit...
The good news... (Score:5, Funny)

by Yoozer ( 1055188 ) writes: on Thursday April 30, 2009 @08:49AM (#27771323) Homepage

At least these won't get out in the open that easily because someone copied them to an USB drive and lost it somewhere.

- Re: (Score:3, Funny)
  
  by Jurily ( 900488 ) writes:
  
  At least these won't get out in the open that easily because someone copied them to an USB drive and lost it somewhere.
  Imagine a Beowulf cluste- OW! OW!
- Re: (Score:2)
  
  by jollyreaper ( 513215 ) writes:
  
  At least these won't get out in the open that easily because someone copied them to an USB drive and lost it somewhere.
  No, that's what firewall holes are for.
Looks like grammar is getting every worse... (Score:1, Insightful)

by ActusReus ( 1162583 ) writes:

For shame, Taco...
Sure, they get every larger... (Score:2, Funny)

by lunchlady55 ( 471982 ) writes:

...but do they move every zig?
- Re: (Score:1)
  
  by ivucica ( 1001089 ) writes:
  
  they no have to!!!!1 they have chance to survive, they make their time!!!1
I accidentally the every larger database... (Score:1, Offtopic)

by magic_fyodor ( 1453365 ) writes:

is this ok?
- Re: (Score:1)
  
  by LordKane ( 582228 ) writes:
  
  Not the every larger database!
- Re: (Score:1)
  
  by ZERO1ZERO ( 948669 ) writes:
  
  What is the origin of this 'I accidentally the ....' ? I see it around a lot.
  - Re: (Score:2)
    
    by f()rK()_Bomb ( 612162 ) writes:
    
    it started on 4chan. http://encyclopediadramatica.com/I_accidentally_X [encycloped...matica.com]
    - Re: (Score:1)
      
      by ZERO1ZERO ( 948669 ) writes:
      
      sweet, dude. thanks.
Another win for PostgreSQL... (Score:4, Insightful)

by tcopeland ( 32225 ) writes: <tom@thomasleecopela n d .com> on Thursday April 30, 2009 @09:31AM (#27771769) Homepage

...since that's that database on which Greenplum is based. PostgreSQL 8.4 is coming out soon and looks like it's got a lot of improvements [postgresql.org]. Too bad replication didn't make it in... hopefully in 8.5.
One of the improvements that looks good is the parallelized restore; RubyForge's upgrade from PostgreSQL 8.2 to 8.3 [blogs.com] took 30 minutes to restore the db and it seems like this feature will speed that up considerably.

- Re: (Score:1)
  
  by koutbo6 ( 1134545 ) writes:
  
  fp after first on topic post
- Recursive queries too (Score:4, Interesting)
  
  by coryking ( 104614 ) * writes: on Thursday April 30, 2009 @09:38AM (#27771865) Homepage Journal
  
  These little puppies [postgresql.org], i.e. recursive queries, look pretty cool too. Sounds like a good tool for threaded comment systems or finding related items in a table:
  Recursive queries are typically used to deal with hierarchical or tree-structured data. A useful example is this query to find all the direct and indirect sub-parts of a product, given only a table that shows immediate inclusions:
  WITH RECURSIVE included_parts(sub_part, part, quantity) AS ( SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product' UNION ALL SELECT p.sub_part, p.part, p.quantity FROM included_parts pr, parts p WHERE p.part = pr.sub_part ) SELECT sub_part, SUM(quantity) as total_quantity FROM included_parts GROUP BY sub_part
  ... It will take a while to wrap my brain around this new concept though. That doesn't look like a normal query I'm used to reading!
  They'll get replication some day soon. But there is a lot of cool, very useful stuff with every new release. I usually feel like kid in a candy store wondering what's new that I can exploit.
  
- Re: (Score:3, Interesting)
  
  by TooMuchToDo ( 882796 ) writes:
  
  I have to say, I love postgresql. We use it to store hundreds of gigabytes of metadata for our 17 petabyte disk/tape storage system at my day gig.
  - Re: (Score:1)
    
    by InsurrctionConsltant ( 1305287 ) writes:
    
    17 petabyte! Good grief, who are you working for?!
    - Re: (Score:2)
      
      by TooMuchToDo ( 882796 ) writes:
      
      Someplace in the US handling data from the Large Hadron Collider =)
      - Re: (Score:3, Funny)
        
        by pfleming ( 683342 ) writes:
        
        Does the black hole effect help with compression?
        
        Re: (Score:1)
        
        by ivucica ( 1001089 ) writes:
        
        CREATE TABLE lhc_data (i INT, c CHAR(10)) ENGINE = BLACKHOLE; INSERT INTO lhc_data(1,"whoosh");
        Oops, wrong DBMS.
- Re: (Score:3, Informative)
  
  by greg1104 ( 461138 ) writes:
  
  And Aster nCluster is PostgreSQL based [intelligen...rprise.com]. Yahoo's "homegrown system" also started with PostgreSQL [toolbox.com].
2/12? (Score:2)

by N3Roaster ( 888781 ) writes:

2/12? Most people would just write that as 1/6, but I guess that doesn't sound as impressive?
- Re: (Score:1)
  
  by Exawatt ( 1463719 ) writes:
  
  I was wondering what that number was supposed to be. Perhaps 2 1/2? This is why I prefer decimal points.
Google? (Score:2, Interesting)

by wiedzmin ( 1269816 ) writes:

Who cares about eBay and MySpace... tell me about the major players! What is Google running?
- MySQL and Bigtable (Score:2)
  
  by Wee ( 17189 ) writes:
  
  They use MySQL for storing adwords data [typepad.com] and Google Analytics for web site metrics (which itself stores data in Bigtable [google.com]).
  
  Bigtable holds a mind-bogglingly huge amount of information. The amount of stuff in their MySQL clusters is merely "absurdly large" by comparison.
  
  -B
  - Re: (Score:1)
    
    by bami ( 1376931 ) writes:
    
    Google Analytics is dog slow. It usually takes up to 70% of the time to load a page here (might be some shoddy ISP routing issues, but most of Google's stuff loads fast, so I doubt that), so I adblocked/point it to 127.0.0.1 for the whole domain. Same for most analytics websites.
    Sorry, analytics is fun and all, but if you insist doing everything in javascript, at least make sure the page behind it is capable of giving enough bandwidth or something.
Storing the atoms of a human body (Score:2)

by MBoffin ( 259181 ) writes:

This astounds me. These numbers only represent a few companies. Consider that it would take about 5,790 yottabytes* to store a 150lb human body (at a byte per atom). Now consider that people keep in their pocket more storage than existed on the planet 30 years ago. So in another 30 years.... wow. Just think about that for a minute.
* giga tera peta exa zetta yotta
Still get lame recomendations (Score:4, Funny)

by se7en11 ( 833841 ) writes: on Thursday April 30, 2009 @12:45PM (#27774717) Homepage

With all that user data, you'd think they would know me better by now. But I still get these lame recommendations.

"You might be interested in action DVDs because you bought one in the past" - BRILLIANT!!

Greenplum? Really? (Score:1)

by reginaldo ( 1412879 ) writes:

These articles make me believe that Greenplum has some good PR working, because in all the analytics I have done, people tend to scoff at Greenplum.

Hadoop clusters are more scaleable, more flexible, and strangely more supportable than Greenplum. When I worked with Greenplum, we would be able to bring down the server easily by executing simple 'select * from table' queries.

Netezza, which is strangely not mentioned, is much better for doing distincts, which is used quite often in analytics. Greenplum

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Web Analytics Databases Get Even Larger 62

Web Analytics Databases Get Even Larger More Login

Web Analytics Databases Get Even Larger

Web Analytics Databases Get Every Larger? (Score:1, Redundant)

Re: (Score:3, Funny)

Re: (Score:1)

Re: (Score:1)

"Every larger"? (Score:1, Redundant)

Re: (Score:2)

The good news... (Score:5, Funny)

Re: (Score:3, Funny)

Re: (Score:2)

Looks like grammar is getting every worse... (Score:1, Insightful)

Sure, they get every larger... (Score:2, Funny)

Re: (Score:1)

I accidentally the every larger database... (Score:1, Offtopic)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Another win for PostgreSQL... (Score:4, Insightful)

Re: (Score:1)

Recursive queries too (Score:4, Interesting)

Re: (Score:3, Interesting)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3, Funny)

Re: (Score:1)

Re: (Score:3, Informative)

2/12? (Score:2)

Re: (Score:1)

Google? (Score:2, Interesting)

MySQL and Bigtable (Score:2)

Re: (Score:1)

Storing the atoms of a human body (Score:2)

Still get lame recomendations (Score:4, Funny)

Greenplum? Really? (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot