Object Prevalence: Get Rid of Your Database? 676
A reader writes:" Persistence for object-oriented systems is an incredibly cumbersome task to
deal with when building many kinds of applications: mapping objects to tables,
XML, flat files or use some other non-OO way to represent data destroys encapsulation
completely, and is generally slow, both at development and at runtime. The Object
Prevalence concept, developed by the Prevayler team, and implemented in Java,
C#, Smalltalk,
Python, Perl,
PHP, Ruby
and Delphi, can be a great a solution
to this mess. The concept is pretty simple: keep all the objects in RAM and
serialize the commands that change those objects, optionally saving the whole
system to disk every now and then (late at night, for example). This architecture
results in query speeds that many people won't believe until they see for themselves:
some benchmarks point out that it's 9000 times faster than a fully-cached-in-RAM
Oracle database, for example. Good thing is: they
can see it for themselves. Here's an
article about it, in case you want to learn more."
Re:RAM ? (Score:5, Informative)
Two words... (Score:4, Informative)
Here's the definition of an EJB from the http://java.sun.com [sun.com] site.
And more specifically, here's the definition of an Entity EJB:
Ever looked at object-oriented databases? (Score:5, Informative)
Re:Neat concept... (Score:5, Informative)
Object-oriented programming and data persistance is about a lot more than public web sites. Private, corporate data warehouses with terabytes of persisted objects squeeze every bit of processing power available. For example, I used to work on Mastercard's Oracle data warehouse. An average of 14 million Mastercard transactions occur per day. That's 14 million new records to one table each day, with reporting needing hundreds of other related tables to look up other information. To get something of that scale to run efficiently for a client app (internal to the company) costs millions of dollars. Object persistance on a large scale is tough to get right and is far from perfected, and there's a lot more going on that public web site development. Every new idea helps. Consider the article written on IBM's developerWorks. It's readers are mostly corporate developers.
Sourceforge Link (Score:4, Informative)
Re:RAM ? (Score:1, Informative)
2^32 = 4.2B
Re:RAM ? (Score:4, Informative)
its not how much you need that he's talking about. only with 64 bit computing can one have more than the current limit of RAM (which i believe is 2GB right now). it has to do with the maximum possible number of 32 bit addresses can exist in the RAM. so with a 64 bit processor, you can have enough ram to hold that database all at once time.
Doesn't help object serialization issues (Score:2, Informative)
Object prevalence does nothing to change that. You still have to deal with serialization of all of your business objects, unless you're planning on reloading and re-executing all transactions since the beginning of time every time you restart the server. You can do it less frequently at runtime, but that doesn't save you any development time.
One serious darwback, or even two :-/ (Score:2, Informative)
The second drawback is: the prevalence system is build up on the base of 2 oo design patterns(Singleton and a variant of Command), so in some more minutes all the OO haters (arg!! Java sucks) etc. people will start bashing here as well
Ok, lets get serious. You might ask why now somehing like this came up? Well, traditional data bases are disk based. All research went into effords how to organize, store/cash and querry data from disks. As data piles exploded even faster than ram and disk prices droped there is no fear that prevalent systems will replace relational databases soon.
OTOH prevalent systems tend to lead to very clean architectures and scale very well as long as your data is relatively smal. Well, today one gig of RAM is affordable. Relational Data Base Management Systems are from the ages where not even one gig of disk space was affordable.
I'm realy glad that the authors released their work. Less then 20kb Java code for a compleete data base like prevalent system. Its cool!!
angel'o'sphere
P.S. I use the PircBot framework for an Java IRC bot and the org.prevayler prevalence system for writing an IRC based interface for a MMORPG.
Re:RAM ? (Score:5, Informative)
1) The last full image dump
2) all successful transactions (the DB meaning) serialized in the log, from the last dump to the power failure.
Since your transaction (both DB & business meaning) hasn't been successful, it has not yet been written into the log, so the money stays in the ordering party account. Of course the power failure could have occured just after a transaction has been written to log and before the client software got the message that it was successful, but traditional DBs have this problem too. To sum it all up: the synchronization problems are there, but they are no worse than in traditional DBMSes.
This concept is not new (Score:5, Informative)
Re:RAM ? (Score:3, Informative)
As for the 2 GB limit, there seems to be a feature in the Windows memory architecture - the upper 2 GB of a process' virtual address space is reserved for shared memory. Or something - I kind of stopped thinking at that point.
FInally OO? I think and hope not! (Score:5, Informative)
The problems with OODBMSes can be summarized so (OTOMHRN - on top of my head right now):
1) Proper relational technology can model OO-hierarchies, but the other way around is unnatural and cumbersome, if not impossible. Proper relational technology is a step up on the ladder in generalization from OO-technology. It's simply a generation or two ahead, while OODBMS is several steps backwards.
2) Proper relational technology is proven concepts from mathematics and logics, while OODBMSes are just a hack to store application data "quick'n dirty". Everything can be modelled as general relations, while OO-technology lacks the fundamentals to model *ANYTHING* and is limited and impeded by having an obligatory and *meaningless* top-to-bottom hierarchy. (You cannot have *meaning* without relations of differing types to other entities.)
3) Proper relational technology allows you to extract, convert and manipulate data in standardized methods (using query languages like SQL), in ways not thought of at the time of design. OODBMSes can only be used properly in the context of the OO-application layer, often relying on runtime data. If you need flexible solutions, you will have to spend extra time programming a specialized solution, instead of having the benefit of a fully relational query language (which unlike SQL, can express almost any problem to be solved).
4) The future is relational. Current RDBMSes do not implement true relational technology, which if they did, nothing else would be needed. The matemathics in the theories behind it would be at the programmers disposal during programming, reducing time and potential errors. Yes, it requires understanding the theory, but wouldn't you like a true DBA to do that anyways?
Don't buy into the hype, look into true relational technology and educating yourself. As for storing everything in RAM, and "saving it for the night", I wouldn't risk to have my bank-account in such a DB. Such solutions are only usable for storing non-volatile data. For non-commercial game-servers, it maybe perfect.
3 More Issues for the Do-It-Yourself Database (Score:5, Informative)
4) Concurrency - If you haven't implemented locks for an object model, then you haven't lived. Seriously, I can see a lot of people screwing this up with deadlocks galore. Locking up concurrent systems can be a nightmare.
5) Ad Hoc Support - Goodbye Crystal Reports, Goodbye English Query, Goodbye ANY Ad Hoc query support, because if you need anything different, you're going to have to write a lot more code to enumerate throughout your objects. Have fun.
6) Indexing - I hope you have a good B-Tree library and are familiar with Indexing/Searching algorithms when implementing HARDCODED indexing. Oh yeah, have fun rewriting all of your query procedures when you decide to change your hardcoded indexing.
Nothing says flexible like HARDCODING! Yay!
In all seriousness, this is a bad idea for 99% of projects out there. It's inflexible, unscalable, severely error prone, and timely to implement.
(sarcasm) All this just to avoid the "cumbersome" process of mapping objects to tables?
Seriously people, it's not that hard (3 magnitudes easier than this) and there are a lot of tools that help doing it.
If you're REALLY hung up on not using a relational database, try an Object Database, XML Database, or an Associative Model Database.
Not New, Not Effective (Score:1, Informative)
1) Doesn't deal with schema changes. If you change an object then your database is broken the next time you try to restore.
2) Restoring from logs is prohibitively slow in a production environment.
3) No transaction support, although it would be easy to group log entries into transactions.
4) Is a Java Map object the most efficient access path for a large database? Tuning insertion/deletion/lookup costs for associative access can be complicated.
5) No security.
Modern relational and OO databases provider other features, but those are the biggies.
It's easy to ignore decades of CS research. Try looking up papers on CiteSeer: http://citeseer.nj.nec.com/cs.
Database System vs Database Management System (Score:4, Informative)
Gadfly, a Python package, gives you an in-memory DB and SQL. If you want to trade SQL for extra speed and do more programming, you can run the ISAM-like engines of Btrieve or Berkeley DB without the SQL layer on top. We have SQL RDBMS's because the conventional wisdom is that such a trade is not a good idea.
Re:Two words... (Score:5, Informative)
Re:OO databases are an evolutionary step...backwar (Score:2, Informative)
Of course, there's nothing *relational* about the need to do this, this doesn't have anything to do with the mis-application of methodologies.
These are simple use-cases. And to reject them is to limit the functionality that the solution will offer. That's fine.
But, almost everyone needs the ability to identify all objects with attribute X. It's called a report, and it provides you with the information needed to manage the process. Without this ability you're driving in the dark without headlights.
What about ZODB? (Score:2, Informative)
Gemstone (Score:1, Informative)
-Concurrent transaction management
-The command object is not necessary, all changes to objects logged at transaction commit
-Optimized object storage (java serialization is notoriously slow)
-Transparent caching (not all objects may fit in memory)
-Clustering/shared remote cache
I'm curious if anyone else is using this.
Fine granularity for writes. (Score:3, Informative)
Every time you issue a 'change command', it first makes the change in memory, then records just that command to disk, very much like a journaling file system as I understand it.
Then, presumably, you also change your object in memory to match. If the whole system comes down, then when you start again, it loads its 'starting point', probably from yesterday, and then executes those recorded commands.
Furthermore, future 'reads' on that data aren't blocked by the disk i/o. They wait for the object in memory to change (quick) and pretty much ignore the disk write.
Where I think you're getting confused is that periodically, it goes through the system and makes a new 'starting point', presumably during a period of low utilization (like at night).
I don't know what your comment about never joining tables would mean, this wouldn't have 'tables', but would have objects as you've designed them. Presumably you've designed your objects so that they're accessible in some natural and convenient way. If you haven't, you ought to fix that...
-Zipwow
Re:Don't go there (Score:3, Informative)
With respect to cost, there's JBoss [jboss.org], that's free, and there are many other venders at a variety of prices and performance. I use JBoss - I did, for a time, see some nasty performance problems, but after reading some documentation quickly realized that it was my mistake.
Re:RAM ? (Score:2, Informative)
Java Persistence (Score:1, Informative)
http://www-ccsl.cs.umass.edu/pj2-abs.html
"Saving to disk every now and then"? What is that, like 50% uptime? Just way 'till the cosmic rays from the Perseids hit that Solaris server...
Re:The Electric Database ACID Test (Score:3, Informative)
(product is still on the market)
We had a product which did (we'll call it "X") and tracked all it's information in a "database" we built in-house. The primary architect, of course, was a pretty sharp guy. He had written a whitepaper for the company stating why he thought "unix was dead" and why we should not waste our time, as a company, developing "portable" products, and that we should take full advantage of Microsoft's technologies on Windows.
As far as ACID test goes, NONE of those elements existed in this "database" we used. Nor were there any verification, export, import, or repair tools initially available.
As soon as this product scaled to a reasonable level, (the field was always one step ahead of our test lab, as far as scaling the application goes), we started seeing weird crashes and corruption that we just could not reproduce or isolate in the lab. When the term "database corruption" was used, the architect would throw a fit, and blame some other component, denying that database corruption was even possible.
The absence of tools meant that we could not troubleshoot in the field. Developing tools was the equvalent of admitting that there was a problem. As we scaled our lab, in response, we started to uncover these problems. This was when our architect resigned. His job had suddenly changed from "Technical Primadonna" to "beleaguered fixer of uncounted bugs".
That's when we REALLY started to get into trouble.
At some point, there was serious talk about ripping the whole database out and going to a "real" commercial database solution. Some third party thing. That was shortly before I left that job. But in the end - there was much suffering and pain, and the product lost a great deal of ground to it's competitors all due to a lack of Respect For Those Who Have Gone Before.
Hasn't the K programming lang done this long ago? (Score:2, Informative)
Here is the link that explains the language:
http://www.kuro5hin.org/story/2002/11/14/22741/
following the links to find KDB a database system
that sits in RAM an kills Oracle dead. It runs
some large missions critical databases (like the
swiss bank), so you can't argue that it doesn't
work.
-paul
Re:Database System vs Database Management System (Score:2, Informative)
BDB is a key-value DB -- it doesn't do SQL. Each DB is a simple "table" of keys to blocks of data. Tell BDB how to compare keys and augment BDB to convert the blocks of data into whatever you like (e.g. - un/serialize your objects). If you won't be doing too much convoluted SQL-like logic and you need an ACID datastore, then BDB is the way to go.
It has most everything you could want:
-- fully ACID (trnxns/rollback/snapshots)
-- single or multi-threaded/process access
-- fine grain locking
-- multi-node failover/balancing
-- multiple indexes per DB
-- SUPER flexible (trade features for speed)
-- AWESOME performance
-- little to no DB administration
-- under development for more than 10 years (debugged)
-- can handle DB sizes up to 256TB
-- can handle record sizes up to 4GB
-- supports most popular languages
-- is maintained by a world class company
-- you can buy support from the maintainers
-- open source + FREE for a lot of uses
I don't think anything out there with a similiar feature set can consistenly outpeform BDB.
Congratulations! (Score:3, Informative)
Only one index? (Score:2, Informative)
The article referenced says:
This makes the assumption that the data table is indexed only on the primary key. However, Oracle and other RDBMS systems allow indexing of any field in the table, not just the primary key. This system has no way of retrieving information based on other criteria of a record.
I suppose the alternative would be to create a class for each criteria that you are looking up by, but that means a lot of objects containing the same data, merely to let it be accessed in different ways. I have a table of 2.8 million records that I need to access based on at least 3 criteria. This just won't cut it.
Re:Ever looked at object-oriented databases? (Score:3, Informative)
What do you mean by a "dump"? Sounds like you were using the OO db inappropriately, e.g. by querying for "SELECT * FROM extent1" and then linearly searching it, or something.
3. People would ask us questions about the data we were storing that would have been absolutely trivial to find in a RDBMS (like "how many of these events occured last month when this device was in this state) that we'd have to write long slow-performing pieces of code to retrieve.
Doesn't ObjectStore have a query language similar to SQL? The Object Data Standard defines OQL, but I know that the Object Data Standard is not exactly "industry standard" yet. As for speed, it's still possible to create indexes, optimise data structures and algorithms, etc.
Other people wanted to write applications that used our data. That wasn't too easy, because they wanted slightly different objects. We would have had to agree on a object for everything we shared, or store things twice. With an RDBMS we used could use views, or generate the objects differently from the same tables.
This is an odd complaint. In any decent OODBMS, creating views (even manually) should be fairly simple. Yes, you might have to write some repititive code - there's room for improvement there. This is where aspect-oriented programming (specifically, composition filters) comes in, I think.