Zvents Releases Open Source Cluster Database Based on Google 87
An anonymous reader writes "Local search engine company, Zvents, has released an open source distributed data storage system based on Google's released design specs. 'The new software, Hypertable, is designed to scale to 1000 nodes, all commodity PCs [...] The Google database design on which Hypertable is based, Bigtable, attracted a lot of developer buzz and a "Best Paper" award from the USENIX Association for "Bigtable: A Distributed Storage System for Structured Data" a 2006 publication from nine Google researchers including Fay Chang, Jeffrey Dean, and Sanjay Ghemawat. Google's Bigtable uses the company's in-house Google File System for storage.'"
Column Orientated DBMS (Score:5, Informative)
IE:
a,b,c,d,e; 1,2,3,4,5,6; a,b,c,d,e;
instead of:
a, 1, a;
b, 2, b;
c, 3, c;
d, 4, d;
e, 5, e;
A cube using the time dimension would look like:
01:01:01; a,b,c,d,e; 1,2,3,4,5; a,b,c,d,e;
01:01:02; a,b,c,d,e; 1,2,6,4,5; a,b,c,d,e;
It's pretty difficult to do the same thing with row-based DBMS. However, you can see that doing an insert is going to be costly.. This looks like a pretty good try, I know there were some other projects going to try to replicate what BigTable does. And after hearing that IBM story the other day about one computer running the entire internet, I started thinking about Google.
More interesting is their distributed file system, which is what makes this really work well.
Re:Column Orientated DBMS (Score:3, Informative)
http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/dimen.htm#i1006266 [oracle.com]
Distributed filesystem - Oracle RAC (Real Application Clusters) fits the bill.
Re:how useful is DHT? (Score:3, Informative)
Relational databases don't implement the relational model correctly anyway. As for transactional consistency, you can get that on top of many different kinds of stores (including file systems); relational databases have no monopoly on that.
Re:You mean, like really, this time (Score:3, Informative)
What?
Wikipedia lists no less than eight Linux distributions designed specifically for building Beowulf clusters.
Using OpenMosix, a single-system-image cluster can be created by booting cluster nodes with LiveCDs and with very little configuration. It's even been done with Xboxes, although they have very poor performance per watt consumed by modern standards.
Google 'Forms' (Score:3, Informative)
Re:Kitten Nipples (Score:3, Informative)
Hadoop keeps all of its file system metadata in memory on a machine called the name node. This includes information about block placement and which files are allocated which blocks. Therefore, the big crunch we've seen is the total amount of memory available to the JVM's heap. With a 16G machine (with ~14.5G heap) for the name node and ~2000 machines acting as data nodes, we're scaling to somewhere between 12-18 million or so files [it's been a while since I've looked...
We're working on making it scale better, of course. But we've come a long way in a really short time. [We've doubled capacity in less than... six months? Something like that.]
Re:Wheel: reinvented (Score:3, Informative)
So, unless SQL is particularly important to you, this is a useless project. There's a reason Google's moving to Erlang so fast - they're discovering that a lot of the tools they've half-assed reinvented in Python already exist in Erlang in far more flexible fashions. This is nothing more than another map/reduce fiasco - a first generation solution to a problem that the internet adores because it's never seen any solution to the problem, but something which has been far better addressed in real industry for thirty or so years. If google would just quit stealing people from Microsoft, who makes application and system software, and start stealing people from AT&T and Ericsson, who make hard realtime system software, they'd find they wouldn't have to spend so much time poorly re-walking what's already been pathed.
If Google would just buy Bluetail already, things would start changing for the better, fast. Metaphors are only useful when they elucidate something specific. Mnesia is radically more powerful than hypertable; I suggest you spend less time at the altar and more at the library. Or, to put it in terms that apparently you will understand, you just tried to rub in my face how much more powerful your Geo is than my Technodrome.
You have done such a spectacularly poor job of making your case that all I can imagine as your reason to say something like that is:
Unbelievable.