Zvents Releases Open Source Cluster Database Based on Google 87
An anonymous reader writes "Local search engine company, Zvents, has released an open source distributed data storage system based on Google's released design specs. 'The new software, Hypertable, is designed to scale to 1000 nodes, all commodity PCs [...] The Google database design on which Hypertable is based, Bigtable, attracted a lot of developer buzz and a "Best Paper" award from the USENIX Association for "Bigtable: A Distributed Storage System for Structured Data" a 2006 publication from nine Google researchers including Fay Chang, Jeffrey Dean, and Sanjay Ghemawat. Google's Bigtable uses the company's in-house Google File System for storage.'"
Looks promising (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Kitten Nipples (Score:2, Insightful)
Re: (Score:1)
Re: (Score:2, Insightful)
Re: (Score:1)
As for scaling, it would scale at the same rate as Non-Commodity Computers... if you have 999 computers all of equal performance, and then you add another one, you could expect a 0.1% change over-all...however its largely based on what sort of controllers you use, the same as h
Re: (Score:2)
Re: (Score:3, Informative)
Hadoop keeps all of its file system metadata in memory on a machine called the name node. This includes information about block placement and which files are allocated which blocks. Therefore, the big crunch we've seen is th
Re: (Score:1)
Re: (Score:1)
You mean, like really, this time (Score:2)
Alright, I know it's only storage and not processing power, but that was inevitable.
Re: (Score:3, Informative)
Really, this time, a full fucking beowulf cluster (that runs linux!) is available to /.ers. No. Fucking. Way.
What?
Re: (Score:2)
Links for anyone interested (Score:1, Redundant)
Zvents: http://www.zvents.com/ [zvents.com]
how useful is DHT? (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
I kind of like dehydrating/serializing objects to a simpler representation when persisting them. This uncomfortable step is nice because it shoehorns the data into a brand new instance.
But that may be just me.
Re: (Score:1)
I have been using ZODB for a couple years now and one thing that bothers me with systems that store objects directly instead of "dehydrated" representations of them is that when the underlying code for the object changes significantly all sort of weird things occur
But that may be just me.
It is not just you. What you described is THE problem for all Object Oriented Database Management Systems.
That's why ORM (object-relational mapping) is so popular. People want a way to just use objects and not have to manually "dehydrate" data to disk. Unfortunately, most ORM isn't smart enough to execute the underlying SQL in an optimal way (depending of course on the relationship between your entities/objects).
Re:how useful is DHT? (Score:5, Interesting)
Re: (Score:1)
Re:how useful is DHT? (Score:4, Insightful)
In the 7 years I've been working in the industry, I've never delivered a single project that I would trust to a non-ACID database. Ever. And I doubt I ever will. If you want something that will generate some marketing material at high speed, and if it fails, who cares, well, use MySQL. If you want to do something that can handle a million pithy comments and if some of them get lost in the shuffle, who cares, well, that's fine too. Use whatever serves fast. If you're running Google, and it doesn't matter if a node drops out because there is no "right" answer to get wrong in the first place as long as you spit out a bunch of links, well, these sorts of non-resilient systems are fine.
Personally, I've never done projects like that. In my projects, if the data isn't perfect always and forever, it's worse than if it had never been written. It's very existence is a liability, because people will rely on it when they shouldn't, for things that can't get by with "close".
So yes. Transactional consistency and a solid relational model are pretty much mandatory, and not going anywhere soon. The idea that they might be replaced by technology such as this is laughable.
Re: (Score:3, Informative)
Relational databases don't implement the relational model correctly anyway. As for transactional consistency, you can get that on top of many different kinds of stores (including file systems); relational databases have no monopoly on that.
Re: (Score:1, Offtopic)
already, we have the Dick-Shrub using such databases to terrorize the populace with expansion planned.
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Pairing with RDBMS (Score:1)
Column Orientated DBMS (Score:5, Informative)
IE:
a,b,c,d,e; 1,2,3,4,5,6; a,b,c,d,e;
instead of:
a, 1, a;
b, 2, b;
c, 3, c;
d, 4, d;
e, 5, e;
A cube using the time dimension would look like:
01:01:01; a,b,c,d,e; 1,2,3,4,5; a,b,c,d,e;
01:01:02; a,b,c,d,e; 1,2,6,4,5; a,b,c,d,e;
It's pretty difficult to do the same thing with row-based DBMS. However, you can see that doing an insert is going to be costly.. This looks like a pretty good try, I know there were some other projects going to try to replicate what BigTable does. And after hearing that IBM story the other day about one computer running the entire internet, I started thinking about Google.
More interesting is their distributed file system, which is what makes this really work well.
Re: (Score:3, Informative)
http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/dimen.htm#i1006266 [oracle.com]
Distributed filesystem - Oracle RAC (Real Application Clusters) fits the bill.
Re: (Score:2)
Re: (Score:1)
No. (Score:2)
Oracle Dimensions are a logical overlay, they have no impact on how the data is physically organized in segments.
Neither does Oracle RAC -- it uses the same underlying storage format as regular Oracle.
You *could* do column-orientation in Oracle with a data cartridge, but that would likely be third party.
I could see Oracle offering this natively in a future release, maybe 11g r2...
I don't think so... (Score:3, Funny)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
You must be new here.
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
The correct word is "Asiantated".
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
The Google system group guys are definitely not kids. Neither are Hypertable developers, who have actually built and deployed web scale search engines themselves. How many people on the slashdot can say that? We're well aware of the literatures in the area (RTFP to find out) and continuously learning from peers. Both Bigtable and Hypertable built upon previous solutions to solve real world web scale data problems. Many algorithms used in Bigtable/Hypertable appeared in literature only in the late 90s+, clai
Yay! (Score:1)
Re: (Score:1)
Re: (Score:1)
"Please wait while the Index is updated"
"Please wait while we Upload new entries"
"Please wait for the FBI to knock on your door"
Re: (Score:1)
They do that already, it is called Google Desktop.
Re: (Score:1)
Been there, done that; not exciting (Score:1)
Can we do a distributed search engine with it? Google@home would be sooo cool.
I'm afraid that's been done before [exciteathome.com], and it didn't work out [news.com] so well, and may have always been a bad idea [forbes.com] in the first place.
Google 'Forms' (Score:3, Informative)
Re: (Score:1)
Don't forget HBase (Score:1)
Re: (Score:1)
Hypertable can run on a variety of DFS' that support global namespace. Currently it can run on HDFS, KFS (GFS clone from Kosmix) and any DFS with a POSIX compliant mounting point, include GlusterFS, Lustre and Parallel NFS, GPFS etc. An S3 DfsBroker can be made easily as well.
Besides DFS flexibility and not-java, Hypertable supports access group (locality group in Bigtable) unlike HBase, where you have to resort to column family hacks for read performance tuning. Hypertable also have more block compressi
Wheel: reinvented (Score:3, Insightful)
Yawn.
Re: (Score:1)
Sorry, you couldn't be more wrong. Mnesia, KDB and Coral8 and Hypertable/Bigtable are completely different beasts for different purposes. Mnesia is mostly a DHT for key-value pair lookups while hypertabe/bigtable support efficient primary key sorted range scans. For concurrent read/write/update, Mnesia requires explicit locking. Hypertable/bigtable doesn't need explicit locking for that, consistency and isolation is achieved through data versioning. The most interesting feature here is time/history versioni
Re: (Score:3, Informative)
Pretty much every database on earth has key sorted ranges. Please be less of a noob. Go look up ondex_match_object.
No, it doesn't. It offers explicit locking, because it's been proven for decades that without it, you cannot have hard realtime queries, something that mnesia wanted to offer. You don't have to use tha
Re: (Score:1)
Re: (Score:2)
I had thought Bluetail was bought many years ago and absorbed into Nortel
Re: (Score:2)
Re: (Score:1)
Citations needed.
All my cited data can be found at http://research.google.com/pubs/papers.html [google.com]
Tell me how to store petabytes in a 4GB Table because Erlang dets use 32 bit file offset?
Storing petabytes small key value pairs on a DHT a la Mnesia is trivial. Sustained ordered on disk scanning at hundreds MB/s per core is not.
Is this the correct place to ask... (Score:1)
Re: (Score:1)
But it's not the correct place to get an answer.
Hbase -- Apache's BigTable (Score:2)
Re: (Score:2)