Object Prevalence: Get Rid of Your Database? 676
A reader writes:" Persistence for object-oriented systems is an incredibly cumbersome task to
deal with when building many kinds of applications: mapping objects to tables,
XML, flat files or use some other non-OO way to represent data destroys encapsulation
completely, and is generally slow, both at development and at runtime. The Object
Prevalence concept, developed by the Prevayler team, and implemented in Java,
C#, Smalltalk,
Python, Perl,
PHP, Ruby
and Delphi, can be a great a solution
to this mess. The concept is pretty simple: keep all the objects in RAM and
serialize the commands that change those objects, optionally saving the whole
system to disk every now and then (late at night, for example). This architecture
results in query speeds that many people won't believe until they see for themselves:
some benchmarks point out that it's 9000 times faster than a fully-cached-in-RAM
Oracle database, for example. Good thing is: they
can see it for themselves. Here's an
article about it, in case you want to learn more."
gigabytes? (Score:5, Insightful)
Who uses a database small enough to fit in RAM?
Re:gigabytes? (Score:2, Interesting)
Re:gigabytes? (Score:2)
I think many small and mid-sized e-commerce vendors
whould benefit from this.
Re:gigabytes? (Score:2)
Re:gigabytes? (Score:5, Insightful)
Not every solution is for every problem. This isn't for huge data warehousing systems. My impression is that this is for smaller databases where there is a lot of interactions with fewer objects.
I have also seen object databases used as the data entry point for huge projects, where the database is then periodicaly dumped into a large relational database for warehousing and reports.
Re:gigabytes? (Score:2, Insightful)
Offtopic though, I'd love to see a solid state revolution. With the amounts of RAM and flash memory available these days, I don't see why we couldn't run an OS off one. I'm not generally one to be anxious to jump in to new technologies (I used to hate games that used polygons instead of sprites), I think moving to solid state in an intelligent manner would be the biggest thing that could happen in the industry in the near future. ie, along with serial ata, introduce fast, ~2gb bootdrives that run your OS and favorite programs and store everything else on a conventional magnetic hard drive.
Re:gigabytes? (Score:5, Insightful)
And that goes for OO as well. Not every database (or a collection of data) needs to be accessed in Object-Oriented way. Most (or should I say all) data I store to small tables would not benefit from being objects.
And how does this differ from storing non-object-oriented data structures in RAM? You'd still need to implement searches, and how do you search an collection of objects without placing them on the relational line.
Re:gigabytes? (Score:3, Insightful)
What irks me to no end are database freaks who have to do everything with a database, OO freaks who have to do everything with OO, and GP freaks who have to do everthing as pure GP. They're like guys who only know how to use a screwdriver, so they end up using the screwdriver to hammer in nails and chisel wood.
Re:gigabytes? (Score:2, Funny)
The Museum of 20th Century French Military Victories in Paris could make use of this technology on my old 8086 system.
Re:gigabytes? (Score:3, Funny)
Here's a quick history lesson for the morons here (yeah, that's you Daniel Dvorkin):
Gallic Wars: The French not only lost ... they lost to an Italian.
Hundred Years' War: Although they kinda/sorta mostly lost, they were saved by Joan of Arc (a female schizophrenic), who by accident created the First Rule of French Warfare: "France's armies are victorious only when not led by a Frenchman."
Italian Wars: France became the first and only country in history to lose not just one but TWO wars against Italians.
Wars of Religion: France was 0-5-4 against the Huguenots.
Thirty Years' War: Although not technically a principal, they did manage to get invaded anyway. Amusingly, they claim a tie on the basis that eventually the other participants started ignoring them.
War of Devolution: Tied.
Dutch War: Tied.
War of the Augsburg League: Lost, claimed tie.
King William's War: Lost, claimed tie.
French and Indian War: Lost, claimed tie.
Three ties in a row caused some deluded folks to label the period as the height of French military power.
War of the Spanish Succession: Lost.
American In a Scribean foreshadow of the future, France claims a win even though the English colonists saw way more action.
This is eventually known as "de Gaulle Syndrome."
It also establishes the Second Rule of French Warfare: "France only wins when America does most of the fighting."
French Revolution: Won, primarily due the fact that the opponent was also French.
Napoleonic Wars: Lost.
Franco-Prussian War: Lost.
World War I: Tied and on the way to losing. France was saved by the United States.
World War II: Lost. Conquered French liberated by the United States and Britain.
War in Indochina: Lost.
Algerian Rebellion: Lost. The first defeat of a Western army by a Non-Turkic Muslim force since the Crusades. It gave birth to the First Rule of Muslim Warfare: "We can always beat the French." This rule is identical to the First Rules of Italian, Russian, German, English, Dutch, Spanish, Vietnamese and Esquimaux Warfare.
War on Terrorism: France has surrendered to Germans and Muslims just to be safe.
So France's only military victory was against... the French. How quaint.
Re:gigabytes? (Score:3, Insightful)
Even if your database doesn't fit in affordable RAM today, it probably will in a few years. RAM prices fall faster than database sizes increase. Already a couple of gigabytes of storage is more than enough for a big class of applications.
Re:gigabytes? (Score:5, Insightful)
I do, but I'll thank my SQL server for doing it for me. Most aggressively cache data and databases - if Database A is used constantly, it'll be kept in RAM, whereas less-frequently Databases will either stay on the hard disk, or certain tables of that database will be put in memory. It lets you make the most of your RAM.
Very large? (Score:2, Interesting)
Slashdotted (Score:5, Funny)
It's about 9000 times slower right now
Neat concept... (Score:3, Interesting)
You can always have a caching system as the author states, but even then what systems use this? The countless PHP/MySQL sites out there seem to perform just fine. This may be desirable for some very strict real time communications systems, but for just about every other form of app, I don't see it.
What are you going to tell your 3rd party integrators? Drop their XML/ODBC report and surf on over to prevayler.org?
Re:Neat concept... (Score:5, Informative)
Object-oriented programming and data persistance is about a lot more than public web sites. Private, corporate data warehouses with terabytes of persisted objects squeeze every bit of processing power available. For example, I used to work on Mastercard's Oracle data warehouse. An average of 14 million Mastercard transactions occur per day. That's 14 million new records to one table each day, with reporting needing hundreds of other related tables to look up other information. To get something of that scale to run efficiently for a client app (internal to the company) costs millions of dollars. Object persistance on a large scale is tough to get right and is far from perfected, and there's a lot more going on that public web site development. Every new idea helps. Consider the article written on IBM's developerWorks. It's readers are mostly corporate developers.
What about existing data ? (Score:4, Interesting)
That said, I wonder what their position is towards the import of existing data. Many projects would only benefit from the solution if and existing data (usually object-oriented but saved in a roughly flat database as the article points out) can be ported seemlessly to the new environment.
My point is, this solution solves a known problem by introducing a new technology, however this new techno will have to be bent towards the older systems in order to retrieve what was already saved. Same old story : in the database world existing data is paramount.
FInally OO? I think and hope not! (Score:5, Informative)
The problems with OODBMSes can be summarized so (OTOMHRN - on top of my head right now):
1) Proper relational technology can model OO-hierarchies, but the other way around is unnatural and cumbersome, if not impossible. Proper relational technology is a step up on the ladder in generalization from OO-technology. It's simply a generation or two ahead, while OODBMS is several steps backwards.
2) Proper relational technology is proven concepts from mathematics and logics, while OODBMSes are just a hack to store application data "quick'n dirty". Everything can be modelled as general relations, while OO-technology lacks the fundamentals to model *ANYTHING* and is limited and impeded by having an obligatory and *meaningless* top-to-bottom hierarchy. (You cannot have *meaning* without relations of differing types to other entities.)
3) Proper relational technology allows you to extract, convert and manipulate data in standardized methods (using query languages like SQL), in ways not thought of at the time of design. OODBMSes can only be used properly in the context of the OO-application layer, often relying on runtime data. If you need flexible solutions, you will have to spend extra time programming a specialized solution, instead of having the benefit of a fully relational query language (which unlike SQL, can express almost any problem to be solved).
4) The future is relational. Current RDBMSes do not implement true relational technology, which if they did, nothing else would be needed. The matemathics in the theories behind it would be at the programmers disposal during programming, reducing time and potential errors. Yes, it requires understanding the theory, but wouldn't you like a true DBA to do that anyways?
Don't buy into the hype, look into true relational technology and educating yourself. As for storing everything in RAM, and "saving it for the night", I wouldn't risk to have my bank-account in such a DB. Such solutions are only usable for storing non-volatile data. For non-commercial game-servers, it maybe perfect.
Re:FInally OO? I think and hope not! (Score:3, Interesting)
* the separation between _logical_ and _physical_ layers of the database - the DBA controls physical record layout and indices, while the database designer and applications have access to the logical layers. This way they can do their roles independently of each other.
* the ability for the data model to change without affecting the applications. Using VIEWS - you can do quite a bit of modification to the underlying data model, but applications using the older one will still run if the DBA sets up a view.
* the ability to do arbitrary querys on the data
* The ability to set up views to handle more complex interactions. For example, in a mail system I've written, we have a table for campaigns with a sent/not-sent flag, a list of addresses, and three layers of do-not-send lists. We then have a single view which puts all of this together and gets the list of addresses which need to be sent to. This is a view on top of several views.
I'm sure I'm missing some others, too. Basically, a relational database system is a gigantic inference engine when designed appropriately.
Re:What about existing data ? (Score:4, Insightful)
Not likely. The REAL problem with OO databases isn't that RDBMs might be more mature or whatever else you might read, it is that the data is almost always more important to companies than the behaviors that operate on that data. For example, if the company has a database of customers, they might want to use that database in dozens of different ways, and they might want to grow it for years, if not decades. The OO-database view tends to look at things too much from the view of one single application of the data and the data gets entangled with code behavior based on that specific application. With a clean RDBMs you can hit the same database from many different applications (assuming the database has a well thought-out schema to begin with)... the data isn't so tightly wound up with a specific bit of application code.
This 'solution' doesn't fix that aspect of OO databases. In fact, it makes it worse. I will grant that it is a neat technology, but I wouldn't expect to see it take over the place of RDBMs systems any more than OO-databases of the past have.
OOP (Score:2, Interesting)
1) You COULD use an object-relational database if you wanted to keep an OOD aspect.
2) You COULD load non-object oriented data into RAM with lower overhead.
3) A couple gig's of data into RAM... not really a deployable solution for enterprise, don't you think?
Other than that, nifty idea and all.
Two words... (Score:4, Informative)
Here's the definition of an EJB from the http://java.sun.com [sun.com] site.
And more specifically, here's the definition of an Entity EJB:
Re:Two words... (Score:5, Informative)
Re:Don't go there (Score:3, Informative)
With respect to cost, there's JBoss [jboss.org], that's free, and there are many other venders at a variety of prices and performance. I use JBoss - I did, for a time, see some nasty performance problems, but after reading some documentation quickly realized that it was my mistake.
Re:Don't go there (Score:3, Interesting)
http://www.onjava.com/pub/a/onjava/2003/02/26/ejb
A comparison of Tomcat, Orion, Resin and Weblogic is here:
http://radio.weblogs.com/0107789/stories/2002/05/
Ever looked at object-oriented databases? (Score:5, Informative)
Re:Ever looked at object-oriented databases? (Score:3, Interesting)
Here is why:
1. We realized that like _most_ projects there really wasn't anything that object-oriented about the data. The code, yes. But the data was just as easily represented with typical RDBMS relationships and it was much faster to do basic operations. We saw a several thousand time increase in performance when trying to query the database for a particular object and its associated data. A join or ten wasn't nearly as expensive as getting data out of what was essentially just a dump.
2. Objectstore, at the time, had no concept of administration. It was up to the developer to handle things like when files got too big, or creating the OODBMS concept of indexes, or what have you. The "DBA" could stop and start it, and that's about it. So if we grew, or got new hardware, or changed platforms, it was time to dump the old data (because migrating it was an programming project in itself) and start over.
3. People would ask us questions about the data we were storing that would have been absolutely trivial to find in a RDBMS (like "how many of these events occured last month when this device was in this state) that we'd have to write long slow-performing pieces of code to retrieve.
4. Other people wanted to write applications that used our data. That wasn't too easy, because they wanted slightly different objects. We would have had to agree on a object for everything we shared, or store things twice. With an RDBMS we used could use views, or generate the objects differently from the same tables.
5. There was no way to get a read-consistent hot backup across a couple of hundred files. Maybe there is now. This was just foolish.
Re:Ever looked at object-oriented databases? (Score:3, Informative)
What do you mean by a "dump"? Sounds like you were using the OO db inappropriately, e.g. by querying for "SELECT * FROM extent1" and then linearly searching it, or something.
3. People would ask us questions about the data we were storing that would have been absolutely trivial to find in a RDBMS (like "how many of these events occured last month when this device was in this state) that we'd have to write long slow-performing pieces of code to retrieve.
Doesn't ObjectStore have a query language similar to SQL? The Object Data Standard defines OQL, but I know that the Object Data Standard is not exactly "industry standard" yet. As for speed, it's still possible to create indexes, optimise data structures and algorithms, etc.
Other people wanted to write applications that used our data. That wasn't too easy, because they wanted slightly different objects. We would have had to agree on a object for everything we shared, or store things twice. With an RDBMS we used could use views, or generate the objects differently from the same tables.
This is an odd complaint. In any decent OODBMS, creating views (even manually) should be fairly simple. Yes, you might have to write some repititive code - there's room for improvement there. This is where aspect-oriented programming (specifically, composition filters) comes in, I think.
3 issues I see (Score:4, Interesting)
1) You're limited by how much RAM you have on your server, not how much disk space you have
2) If you're making a lot of data changes and have a crash or power outage, I'd imagine that it can take a while to replay the log to get things back to the most recent point in time (you can have the same problem with Oracle, but your checkpoints would be a lot closer together than "once a day")
3) There are millions of people that already know SQL and can write a decent query with it. How does this help them? Never underestimate the power of SQL.
On the other hand, for projects dealing with small amounts of data I can see how implementing this would be far easier than integrating with Mysql, Postgresql or Oracle.
Re:3 issues I see (Score:5, Interesting)
2) You can probably set up your own checkpoints to be more than once a day.
3) I agree. Lack of SQL would cause people to.... GASP.... learn a new system. SQL is very cool. And I admit that I have a system I am thinking of porting away from JDBC and into Prevalence just to see how it goes (No, it isn't mission critical) and one of the first things I realized is that I would have to design a new method of querying. But you know what... That can lead to new thinking and more powerful software in the future.
Buggy whips (Score:5, Insightful)
There are millions of people that already know how saddle and ride a horse. How do these new fangled automobile help them? Never underestimate the power of a horse.
While I agree with your other points... number 3 is never a reason to keep from embracing something new. People are suprisingly trainable.
3 More Issues for the Do-It-Yourself Database (Score:5, Informative)
4) Concurrency - If you haven't implemented locks for an object model, then you haven't lived. Seriously, I can see a lot of people screwing this up with deadlocks galore. Locking up concurrent systems can be a nightmare.
5) Ad Hoc Support - Goodbye Crystal Reports, Goodbye English Query, Goodbye ANY Ad Hoc query support, because if you need anything different, you're going to have to write a lot more code to enumerate throughout your objects. Have fun.
6) Indexing - I hope you have a good B-Tree library and are familiar with Indexing/Searching algorithms when implementing HARDCODED indexing. Oh yeah, have fun rewriting all of your query procedures when you decide to change your hardcoded indexing.
Nothing says flexible like HARDCODING! Yay!
In all seriousness, this is a bad idea for 99% of projects out there. It's inflexible, unscalable, severely error prone, and timely to implement.
(sarcasm) All this just to avoid the "cumbersome" process of mapping objects to tables?
Seriously people, it's not that hard (3 magnitudes easier than this) and there are a lot of tools that help doing it.
If you're REALLY hung up on not using a relational database, try an Object Database, XML Database, or an Associative Model Database.
Re:3 More Issues for the Do-It-Yourself Database (Score:3, Interesting)
Then just wrap all of your lock-sensitive stuff in Prevayler command objects. They've got that working fine, and it guarantees isolation.
Goodbye Crystal Reports, Goodbye English Query, Goodbye ANY Ad Hoc query support, because if you need anything different, you're going to have to write a lot more code to enumerate throughout your objects. Have fun.
Oh, please. If you really need SQL compatability, then dump the data occasionally to a data warehouse, which is where you should be doing unconstrained ad-hoc queries anyhow.
Or if it's so the programmers can peek at the live system, then put in something like BeanShell, which will let you see a lot more than just the persistent data.
Or you could drop an SQL interpreter into your system and present your objects as tables. Many of the pieces are already open sourced, so it would be pretty easy.
Indexing - I hope you have a good B-Tree library and are familiar with Indexing/Searching algorithms when implementing HARDCODED indexing. Oh yeah, have fun rewriting all of your query procedures when you decide to change your hardcoded indexing.
Can you really not think of ways to write these things in flexible ways? If that's the case, you could learn something about being a programmer. Pick up Martin Fowler's Patterns of Enterprise Application Architecture [martinfowler.com].
In all seriousness, this is a bad idea for 99% of projects out there. It's inflexible, unscalable, severely error prone, and timely to implement.
Perhaps you should try it before knocking it. As you are, in order, wrong, mostly wrong, wrong, and confused. It's no magic bullet, but it's a useful approach for some systems.
Interfacing (Score:3, Interesting)
A SOAP interface could go some ways towards accomplishing this but what about the traditional ACID properties of a DBMS? Durability is obviously guaranteed... Consistency? That would depend on programmers following the practices... Atomicity? Not sure about that one. For simple commands it seems to work. What about compound commands? If no rollback occurs how can I assert that I changed both objects not just one? Isolation? Not sura about this one either.
C++ soluton (Score:2, Funny)
Looks like journaling filesystem (Score:2)
Now they're doing the same for in-memory object data structures. Might be a nice idea.
On a different note: the objectdatabase behind zope has perhaps the same net effect. To the programmer, everything is in-memory. The object database reads stuff from disk if needed and keeps things in memory when much-requested. And also with a list of transactions which can be replayed or rolled back.
So: it looks nice, but I'm curious to the net results!
Something about this doesn't sit right with me (Score:3, Insightful)
Since the benchmark page was slashdotted I might be speaking out of my ass. But I never trust "9000 times faster!". It sounds too "2 extra inches to your penis, guaranteed!"
It's not a simple question of speed (Score:4, Insightful)
Blazing fast (Score:4, Funny)
This architecture results in query speeds that many people won't believe until they see for themselves: some benchmarks point out that it's 9000 times faster than a fully-cached-in-RAM Oracle database, for example. Good thing is: they can see it for themselves.
Yes, I've seen it. The page on www.prevayler.org only took about 30 seconds to load. Does that mean that a fully-cached-in-RAM Oracle database would spend 75 hours loading that page...?
no queries (Score:5, Insightful)
In other words, "it doesn't have queries". What real project doesn't (eventually) need queries? And even if writing your queries "by hand" in Java is good enough for now, what real project doesn't eventually need indices, transactions, or other features of a real database system?
Re:no queries (Score:4, Insightful)
Indeed. It looks like a high-level, language-neutral API for traversing linked lists of structs. Yes, you can rip through such a structure far faster than Oracle can process a relational table, but they are two different solutions to two different problems. I wouldn't use an RDBMS for storing vertex data for a scene rendering application, and I wouldn't use an in-memory linked list for storing bank transactions!
Fine granularity for writes. (Score:3, Informative)
Every time you issue a 'change command', it first makes the change in memory, then records just that command to disk, very much like a journaling file system as I understand it.
Then, presumably, you also change your object in memory to match. If the whole system comes down, then when you start again, it loads its 'starting point', probably from yesterday, and then executes those recorded commands.
Furthermore, future 'reads' on that data aren't blocked by the disk i/o. They wait for the object in memory to change (quick) and pretty much ignore the disk write.
Where I think you're getting confused is that periodically, it goes through the system and makes a new 'starting point', presumably during a period of low utilization (like at night).
I don't know what your comment about never joining tables would mean, this wouldn't have 'tables', but would have objects as you've designed them. Presumably you've designed your objects so that they're accessible in some natural and convenient way. If you haven't, you ought to fix that...
-Zipwow
Get best of both worlds... (Score:5, Interesting)
Of course, you can always write your own persistance layer. I've done this a few times - very easy in Java. Map a row in the DB to an object, and cache the object in memory. If need to fetch that data again, check the cache first. When doing a write, write to the DB and update/flush your cache as necessary.
That's just the basics - what's most optimal depends on how your data is accessed and changed (and also your programming language and capability as a programmer). Java has nice really nice stuff for caching built-in, like SoftReference wrapper objects, and of course threading and shared memory that you can use in production.
I'm currently working on a super optimised threaded message board system. Almost all pages (data fetch/change + HTML generation) complete in about 0.001s.
Umm what about multiple servers? (Score:3, Insightful)
You're thinking too general case (Score:3, Insightful)
Its not a general purpose DBMS solution, nor should you interpret it as such.
Sourceforge Link (Score:4, Informative)
OO databases are an evolutionary step...backward (Score:5, Interesting)
In 1999, I worked for a company that used an OO database (ObjectStore) to develop an e-commerce shopping portal. It was a disaster.
OO advocates point to extremely fast (extremely special-case, in practice) queries, and natural persistent object mapping as reasons to why OO is superior.
However, this is very misleading.
Some of the MAJOR problems we ran into in using ObjectStore were:
When developers first consider OO databases, their first assumption is that OODBMS is to RDBMS as OOP is to Procedural Programming. This is a FALSE analogy! Migrating to OODBMS offers precious little to support better software design while introducing significant maintenance and design issues that should be considered prior to using this technology.
Unless I had a product that had an extremely specialized use case that matched OODB strengths, I would NEVER develop on this kind of platform again.
OODB are very different from RDBMS (Score:4, Interesting)
If you are thinking of accessing your objects like you are doing with SQL, then you haven't understood how OODB work. As for accessing your objects and doing your queries, there are tools (like Inspector for ObjectStore) than enable you to do just that.
In term of performance, Oracle and co are nowhere near what you can reach with ObjectStore, provided you designed your application well.
The 2 main problems with OODB, are:
- schema evolution
- reporting
But these can easily be solved by a good design of your application.
OODB is a skill that needs time mastering. After 4 years, seeing ObjectStore application from various companies, I can tell the difference between the ones where people knew what they were doing, and those from people who didn't have a clue...
Re:OODB are very different from RDBMS (Score:3, Insightful)
And, like in every programming project, your requirements are incomplete, so your model will be incomplete, so you need to allow for flexibility. OO DBMS that I have used don't allow for that flexibility (schema evolution), so we build layers on top of the OODB, just the same as we do for relational DBs. I don't see the advantage. By the time we are done optimizing a relational DB, it has all the same indexes that the OODB would have, but we were able to evolve the system, instead of designing it all up front.
I suppose I could argue for an OO DBMS if the number of transactions was high enough and application had a static set of requirements (general ledger, trade system, etc.).
Joe
Re:OODB are very different from RDBMS (Score:5, Funny)
In other words, OODB technology is doomed.
Re:OO databases are an evolutionary step...backwar (Score:5, Interesting)
Although you certainly have a point, there are some remarks I have to make here:
There's no "SELECT * FROM USERS".
That's just like saying Latin is a bad language because it does not have equivalents for 'the', 'le/la', 'de/het', 'der/die/das', whatever. An rdbms is *fundamentally* different from an oodbms
DB Performance when querying outside the normal object hierarchy (...) is orders of magnitude SLOWER on an OODB!
That's right: you are trying to use a oodbms as a rdbms. Ever tried to drive a car like you ride a bicycle?
Oodbms are relatively new, and they have their 'problems', just like rdbms-es have theirs. But the biggest problems arise when one approaches an oodbms like one would an rdbms. Just like you run into problems using an oo language when you have only used a proc. language
Nods head (Score:3, Interesting)
> is orders of magnitude SLOWER on an OODB!
That's right: you are trying to use a oodbms as a rdbms. Ever tried to drive a car like you ride a bicycle?
I've still never heard a good answer to this problem, only that I'm using the wrong hammer.
When performing activities against pure OO storage in which selectively collecting data from a (potentially large) number of objects is required, what is the OC (object-correct) way to do so? Asking each one via a method call is horrendously slow in comparison to a RDBMS. For instance, contrast "select last_activity, uid from users" to
my %blarg;
foreach ( my $user $users->$next() ) {
$blarg{uid} = $user->{uid};
$blarg{last_activity} = $user->{last_activity};
}
I suppose if one is building a product instead of managing an ongoing project, saying that lazy access to the hash will save a little time. I still don't see the performance win, and for ad hoc access, building the methods and accessors just takes too much time to be reasonable.
Use the right tool for the right job, I say. And usually, for managing data, a RDBMS is the right tool. For interacting with that data, OO is frequently nice.
Please correct my incorrect notions.
Re:OO databases are an evolutionary step...backwar (Score:3, Insightful)
Some of the MAJOR problems we ran into in using ObjectStore were:
No, the MAJOR problem you ran into was trying to get RDBMS guys to understand OODBMSs, and you clearly failed.
It is very difficult to "see" an OO database. By nature, the data isn't tabular. It's a persistent object heap. There's no "SELECT * FROM USERS". So tracking down data-related problems involves exporting data to an XML file and sifting through it.
Well, that would be the hard way to do it. I suppose the easy way would be to take two minutes and write a small program to scour through the DB looking for the problems, but my experience with Objectstore and other OODBMSs would lead me to ask a different question -- How did the "data-related problems" get created? Write your classes with strong invariants and tightly encapsulate your data and you won't really have many such issues.
Reporting tools don't exist for OODB.
Actually this isn't really true, but the point is still worth addressing because the available reporting tools aren't very good. This isn't the fault of the tools, it's just a fact that it's impossible to write a general-purpose tool that can intelligently traverse arbitrarily-structured data.
Again, the solution is: write a small program to extract the data you want to report on.
If you need to do lots of ad-hoc queries against the database, such that writing a program each time isn't reasonable, then your usage pattern suggests an RDBMS is more appropriate.
DB Performance when querying outside the normal object hierarchy (aggregate queries grouping on object attributes, etc.) is orders of magnitude SLOWER on an OODB!
Unless you create indexes for those queries, of course. Ad-hoc querying is a real weakness of OODBMSs. OTOH, queries that are planned for and for which good indexes exist are orders of magnitude FASTER on an OODB! Like, three orders of magnitude faster than an RDBMS.
32-bit memory limited our max customer size dramatically
That is a problem if you design your database badly, but Objectstore allows you to segment your DB so that the size of your address space isn't an issue. The segmentation is completely transparent to the programmer using the objects.
Migrating to OODBMS offers precious little to support better software design while introducing significant maintenance and design issues that should be considered prior to using this technology.
OODBMSs have advantages and disadvantages. The advantages are:
The disadvantages vs. RDBMSs are:
Overall, OODBMSs shine when your primary need is for an "applicaiton working store", more than a "database" and when you need maximum performance and minimum time to market (assuming you have staff that knows the tool). If you need ad-hoc queries you can still use an OODBMS, but you will want to export the data to a relational DB for query purposes.
Actually, that's a very nice solution to many problems, IMO. Use an OODBMS as your high-performance working store, and periodically export the the data to a relational "data warehouse" for ad-hoc queries and data mining. This means that you still have to implement and maintain an object-relational mapping, but it's much easier to manage a one-way mapping than a bi-directional mapping.
The system described in the article is fine for some environments, I'm sure, but a high-quality OODBMS would be just as fast, more robust and would allow you to use databases that won't fit in RAM.
Speed is not the only factor (Score:4, Interesting)
Here's the issue they are trying to solve: mapping object to records. That's it. Now the problem with removing the records / database is you lose all of the searching power that is inherit in relational databases. The author states that the codebase is 350 lines of code. How can any complex search engine be implemented in 350 lines of code that also covers the persistance?
Re:Speed is not the only factor (Score:3, Insightful)
Maybe I don't understand this well enough (the Prevaylor site is down), but if this is really a database based upon objects, and you can access them as normal objects, then any good programmer can make a "powerful and flexible querying method." You can write your own hashtables, searching functions, or whatever.
Because they probably didn't put any searching routines into Prevayler. From the SourceForge page: "Ridiculously simple, Prevayler provides transparent persistence for PLAIN Java objects." You write the searching routines.
Ever hear of hash codes and hash tables? You write the code yourself. How do you think MySQL and Oracle do it? They have code which does the searches. With this system you cut out the middleman. It'll have its own weaknesses and strengths, so every manager will have to decide if this system will fill their needs.
At first glance, I see two weaknesses and two strengths to this system. Weaknesses: a) you'll have to be more of a programmer to implement a database. b) the database has to be small enough to fit in memory. Strengths: a) infinitely flexible. b) really fast for anything which will fit in RAM.
Web hosting services won't want this. (they usually have many customers, and all their databases won't fit in RAM at once.) Big e-commerce sites won't want this for their customer databases. (again, probably won't fit in RAM) They may be able to use it for their product data, unless it's really huge--such as Barnes and Noble. I'm sure it'll be quite usable for most small businesses. The need for a programmer may seem like a huge obstacle, but I'm sure if Object Prevalence gets big, there'll be a book called "Object Prevalence in Java for Dummies" in no time.
Memory is CHEAP? (Score:3, Interesting)
This concept is not new (Score:5, Informative)
Old News: Main Memory Databases (Score:3, Insightful)
TimesTen [timesten.com]
Polyhedra [ployhedra.com]
DataBlitz [bell-labs.com]
etc..
The idea it to have enough RAM to be able to store all the database in memory. This gives higher performance than a fully cached Oracle for two primary reasons:
- there is no buffer manager so data can be directly accessed.
- the index structures use smart pointers to access the data in memory.
Typically the data is mapped using mmap or shared memory. Each application can have the databae directly mapped into its memory space.
For providing persistence, typically main memory databases provide transaction logging and checkingpoint to be able to recover the data. Various techniques have been developed to be able to do this without affecting performance.
The Electric Database ACID Test (Score:5, Insightful)
- Atomicity of transactions (commit/rollback),
- Consistency in the enforcement of my data integrity rules,
- Isolation of each transaction from other competing transactions (locking)
- Durable storage that can survive a crash without losing transactions (e.g., journaling)
My experience with RAM-centeric disk-backed object storage is that you, the developer, often have to implement the ACID fetures yourself, from scratch. And from-scratch implementations of complex data-integrity mechanisms tend to be time-consuming to develop and test and often take much, much longer than you think to "get right".
Call me old-fashioned, but I really like using data storage (database) engines that pass the ACID test and have already been debugged and debugged and debugged and debugged and debugged.
-Mark
Re:The Electric Database ACID Test (Score:3, Informative)
(product is still on the market)
We had a product which did (we'll call it "X") and tracked all it's information in a "database" we built in-house. The primary architect, of course, was a pretty sharp guy. He had written a whitepaper for the company stating why he thought "unix was dead" and why we should not waste our time, as a company, developing "portable" products, and that we should take full advantage of Microsoft's technologies on Windows.
As far as ACID test goes, NONE of those elements existed in this "database" we used. Nor were there any verification, export, import, or repair tools initially available.
As soon as this product scaled to a reasonable level, (the field was always one step ahead of our test lab, as far as scaling the application goes), we started seeing weird crashes and corruption that we just could not reproduce or isolate in the lab. When the term "database corruption" was used, the architect would throw a fit, and blame some other component, denying that database corruption was even possible.
The absence of tools meant that we could not troubleshoot in the field. Developing tools was the equvalent of admitting that there was a problem. As we scaled our lab, in response, we started to uncover these problems. This was when our architect resigned. His job had suddenly changed from "Technical Primadonna" to "beleaguered fixer of uncounted bugs".
That's when we REALLY started to get into trouble.
At some point, there was serious talk about ripping the whole database out and going to a "real" commercial database solution. Some third party thing. That was shortly before I left that job. But in the end - there was much suffering and pain, and the product lost a great deal of ground to it's competitors all due to a lack of Respect For Those Who Have Gone Before.
Not a "database" but a persistence mechanism (Score:3, Insightful)
Some people seem to be missing the point: this is not a "database" it is a persistence mechanism. What they are saying is that persisting objects is difficult (er, tend to disagree but I'll bite) and so they are solving this. Whether a RDBMS offers better searching is completely irrelevent as this, in their architecture, is handled by the application.
What they seemed to gloss over is that you need to take snapshots of the actual data. If you didn't you'd have to keep every single "log" in order to safely playback the actions and know you have the same data in the same state. Loose one log, say the very first one, and you're pretty much screwed.
Problems with Object databases (Score:3, Insightful)
The main problems I see with object databases:
1) SQL is incredibly powerful. You give up *a lot* of power when you go from sql semantics to object semantics. Sub-selects, group bys and optimized stored procedures, to name just a few things. All the object language query constructs I've seen fall far short of these. (As a side note, most O/R tools make a hash of it as well.)
2) You immedately make a massive reduction in the number of database administrators who will be willing and/or capable of helping you out in your project.
3) Scaling is always a question. With oracle, it just isn't.
4) Backup, redundancy, monitoring, management, etc. Most mature relational databases have very good tools for doing these infrastructure activities. Developers often forget about banal things like this, but they are crucial for the long term health of IT systems.
Don't get me wrong. Every time I construct some nasty query and go through the mind-numbing process of moving the results into an object, I think to myself "There has to be a better way!", but I've looked at the O/R tools and the object database out there and, sadly, I don't feel they are worth the trade off.
Just my opinion,
prat
Interoperability, Scalability (Score:3, Interesting)
Both of these issues make this solution unusable in an enterprise environment. The RAM size issue has already been mentioned by others and is another very real limitation.
In general, object caching mechanisms are not terribily difficult to create. This generic solution proves the point by only requiring 350 lines of Java code.
I am sure that there is something worthy in this project, I just cannot see it used for anything other than very small-scale development efforts.
Database System vs Database Management System (Score:4, Informative)
Gadfly, a Python package, gives you an in-memory DB and SQL. If you want to trade SQL for extra speed and do more programming, you can run the ISAM-like engines of Btrieve or Berkeley DB without the SQL layer on top. We have SQL RDBMS's because the conventional wisdom is that such a trade is not a good idea.
BS (Score:5, Insightful)
1) Doesn't scale. Most enterprise databases don't fit in RAM. Data volumes grow with the capacities of hard disks which outpace RAM. If your database fits in memory now and you use this architecture, what do you do when it grows larger than your RAM capacity? You fire the guy that proposed this and switch to an RDBMS.
2) Performance claims are BS. Good databases already serialize net changes to redo logs via a sort of binary diff of the data block. Redo logs are usually the limiting factor on transaction throughput, since they require IO to disk. Serializing the actual commands is more inefficient than using a data block diff. You simply cannot minimize the space any better than an RDBMS does, therefore you cannot minimize the IO for this serialization any better, and therefore you cannot do it faster without sacrificing ACIDity. If your performance is too good to be true, then you gave up an essentail feature of the RDBMS.
3) Consistancy. If there is only one object in memory for each record, then you'll be writing a tremendous amount of custom thread-safety code and even then, either A) writers block readers and readers block writers or B) read consistacny isn't guaranteed. Either is usually unacceptable. One alternative is to clone objects at every write (sounds slow and horribly inefficient). Of course, this too has to be serialized, or you don't have ACIDity. If you are serializing these, then you aren't really different than an RDBMS which uses rollback/undo, except you are wasting disk IO and are slower.
4) Reliability. A hardware failure, software hang/crash, or system administration mistake would force recovery from the last full backup. Replaying a full day's transactions could take hours. Sure you could be continually making a disk image, except for read consistancy issues like above. Its not clear what you do even for a daily backup. Are all sessions simply blocked during backup? Ouch.
Every few years object fanatics try to come up with some way to get rid of RDMBS's. The methods invariably rely on sacrificing some of the core capabilities of the RDBMS: data integrity, performance, consistency, ACID, reliability etc... These "innovations" are really only of interest to OO fanatics. In the real world, OO gets sacrificed way before RDBMS's do. This is not going to change.
OO is a tool that is good for writing maintainable code. It is not good for performance critical uses like OS's, device drivers, and real time systems. It is not good for data intensive systems. These things are not likely to change. If all you can accept is OO, then you are a niche player.
Re:BS (Score:3, Insightful)
OO is not all about classes and jump tables. For example, you can get polymorphism in C++ without using any virtual methods at all. If you disagree, then I think your view of what constitutes OO is quite limited, and I'm not surprised you think it's a "niche player".
This Won't Replace A Database (Score:5, Insightful)
I've read a few posts that say that the performance claims (vs a relational database) are not true. I think this will be much faster than a database. This is an in-memory cache. It will be very fast. Our Oracle databases have a cache-hit ratio of 98 and 99+ percent, but will be slower. Why?
First, databases (especially Oracle) do alot of stuff behind the scenes, logging all sorts of stuff from a user connecting to the SQL being run.
Second, this sort of thing offers nearly direct access to the data. SQL usually needs to be parsed before it is executed. The database needs to come up with the optimal query plan before it actually executes the statement. A database offers different ways of joining data, and accessing data. Find me all managers that make more than $50,000 per year and have a last name that start with K. You will have to decide the best way to get the data yourself. A database will do all the work for you.
This is a great, idea, though for a middle-tier cache. Say you want to do some fast searching on a small amount of data. You can use this in the middle tier to save yourself the trip to the database.
A good object oriented database that has not been mentioned yet is Matisse [fresher.com]
WOW! After twenty years... (Score:3, Funny)
After twenty years, we finally get to...
the in-memory database!
Oh wait, didn't my Atari ST have that?
OK (Score:4, Interesting)
MOO (Score:4, Interesting)
race conditions? (Score:4, Insightful)
Meanwhile someone else can run an AddUser Command with the same username. Guess what happens when ChangeUser gets to that 2nd line?
Maybe when this radical new concept in databases can be presented in a way that avoids race conditions I'll pay a little more attention...
Congratulations! (Score:3, Informative)
The bad old days (Score:3, Insightful)
Before dbms applications stored their data in very efficient data stores designed just for that application but were worthless for anything else and hard to upgrade or extend without breaking or rewriting the existing application.
DBMS were developed so that data could be stored in an application independent store that could be used and extended for new applications without breaking everything that went before.
DBMS were never designed to be more efficient than the application specific data stores that they replaced, so that somebody saying that they can build a custom data store just for a particular application that is faster is missing the point entirely.
One word ...er..acronym: (Score:2)
Re:One word ...er..acronym: (Score:3)
Re:RAM ? (Score:4, Insightful)
Re:RAM ? (Score:3, Insightful)
Checkpointing once per day? Re-applying 15 MINUTES worth of Oracle transaction logs takes too long for some failover requirements; you force a log switch every 2 minutes if you have to. Or you eat the performance hit of synchronous replication and spec your hardware to compensate.
I'm guessing this DB was written by a bunch of smart CS graduates who overdosed on OO theory and haven't spent much time in the hard core of OLTP: banks, telcos, airlines, retail, etc.
Re:RAM ? (Score:3, Insightful)
The modders should read the article as well as the poster should.
The time interval of the snap shot is a configureable option.
If you would read the article you knew that and you knew also that the writer is a 19 year old CS graduate, indeed.
Probably you should get a smal dose of OO as well before freaking out like you did.
angel'o'sphere
Re:RAM ? (Score:3, Interesting)
Nobody outside of a garage can loose 1 transaction, don't even imagine loosing minutes of them with a failure.
"Carlos Eduardo Villela is a 19-year old Brazilian graduate in Information Systems."
Re:RAM ? (Score:4, Insightful)
Conversely, some data such as a financial transaction really needs to be commited straight away.
But commited means *you must* write it out to non-volatile storage (i.e. a disk) otherwise the transaction may be lost. So (I believe) most DB's write the update out to their transaction log very quickly and deal with updating the DB tables/indexes at a latter stage. Obviously, this all depends on if you need to allow other processes to access this data immediatly or not.
Personally, I don't think this represents anything new (**in true
What it might offer however is:
1). A nicer interface for managing object persistence; 'cos it is ugly managing mapping objects to DB columns.
2). A clear guide to help people manage which objects need persisting to disk and which are less important.
But thats about all.
---
I'll now go and read the article - you can catch me later contradicting myself!
Re:RAM ? (Score:4, Insightful)
A row in a table is an instance of an object
Foreign-keyed child tables map to collections within the parent object.
You illustrate my point perfectly about putting the cart before the horse. You don't build a database to store your objects -- you build objects to manipulate your database. A badly-designed system is one where the database was tacked on after the object model was complete. Your database schema should be the first thing you write, before you even start thinking about the classes.Unfortunately, Comp Sci cirriculums are heavy on OOP concepts but pathetically light on database theory, which is why you wind up with otherwise talented programmers who don't understand the basic fundamentals of designing solid client-server applications.
OO to RDBMS mapping IS hard (Score:3, Insightful)
One benefit of OO development is the abstraction away from the data store. I want to think about Widgets, and Customers, and Orders, not VARCHAR fields, foreign keys, arbitrary identifier INTEGERs, etc.
So I would argue that the goal IS to build a database to store your objects, instead of building objects to manipulate your database. And I imagine that's what every OO developer would want. But it's hard.
You're suggestion doesn't solve the problem, it just avoids it.
Re:RAM ? (Score:3, Insightful)
On the other hand, there are some cases where you want an object-oriented design, and limiting yourself to what you can fit in an RDBMS schema is a bad idea. There are cases in which you really want a base class extended by multiple subclasses, and now you can't write your foriegn key constraints properly. Alternatively, you could duplicate your common code for each table, but that's even worse.
Using Oracle for storing objects is basically bad; using it to store relations is good. Good design requires you to determine first what sort of data model you have, and then choose your programming paradigm appropriately; deciding to use a relational database just because you need persistence is foolish. You've done something wrong if your queries are mostly "SELECT * FROM tbl WHERE id=?".
I do agree that comp sci doesn't teach enough relational design, because it's often an appropriate design. But sometimes OOP is the right tool for the job, and then you need an appropriate storage system. Relational databases are really their own thing, with a different set of efficient and simple operation, and are not really not that much like objects.
Re:RAM ? (Score:3)
You don't use an RDBMS because it's fast. You use it because it's reliable. Does this new toy support record locking, transactional isolation and integrity, or any of the other key features that an enterprise RDBMS provides? If the answer is no, then it's not a replacement for an RDBMS. If you don't care about the integrity of your data, something like this is fine. When you absoloutely can't have data errors, you take the time to make sure you do it right, which means using an enterprise-grade database server.
Re:RAM ? (Score:3, Insightful)
Prevayler can be just as reliable.
Does this new toy support record locking, transactional isolation and integrity, or any of the other key features that an enterprise RDBMS provides? If the answer is no, then it's not a replacement for an RDBMS.
Wrong.
The question isn't the checklist of features, it's whether you can build equivalently reliable systems with Prevayler. The answer: You can.
You'll recall that Prevayler uses the Command Pattern. Before data is changed, the Command object is serialized and written to disk, then executed. Naturally, this means the commands are run in strict order of arrival, yes?
That's all you need to get transactional integrity. All writes are isolated. If you need to isolate the reads, you can use the same mechanism.
The prevalent approach requires developers to do things a little differently, but you don't have to sacrifice reliability.
Re:RAM ? (Score:2)
And that's regardless of where you store your data.
Re:RAM ? (Score:5, Informative)
Re:RAM ? (Score:4, Interesting)
Why not try reading the article? (Score:3, Interesting)
Re:RAM ? (Score:5, Insightful)
Blazing fast, and easy as hell to fuck up beyond replair- you could do both a read and a write to the same memory area at the same time, or something like that.
This sounds just as bad.
For example, let's say that we're doing a transaction of a few million dollars. In mid process the power dies and the machine goes dark. Outside of shouting 'redunant this that and the other', what state would the machine be in when it comes back online, were is the money, and could we back out of and rerun the transaction?
Re:RAM ? (Score:5, Informative)
1) The last full image dump
2) all successful transactions (the DB meaning) serialized in the log, from the last dump to the power failure.
Since your transaction (both DB & business meaning) hasn't been successful, it has not yet been written into the log, so the money stays in the ordering party account. Of course the power failure could have occured just after a transaction has been written to log and before the client software got the message that it was successful, but traditional DBs have this problem too. To sum it all up: the synchronization problems are there, but they are no worse than in traditional DBMSes.
Re:RAM ? (Score:3, Interesting)
Re:RAM ? (Score:4, Informative)
its not how much you need that he's talking about. only with 64 bit computing can one have more than the current limit of RAM (which i believe is 2GB right now). it has to do with the maximum possible number of 32 bit addresses can exist in the RAM. so with a 64 bit processor, you can have enough ram to hold that database all at once time.
Re:RAM ? (Score:3, Informative)
As for the 2 GB limit, there seems to be a feature in the Windows memory architecture - the upper 2 GB of a process' virtual address space is reserved for shared memory. Or something - I kind of stopped thinking at that point.
Re:RAM ? (Score:2)
Re:RAM ? (Score:5, Insightful)
Don't forget the price tag for all the extra hardware; since a Prevaylent system is thousands of times faster, you can get by with a lot less hardware. And add in all the programmer time spent dealing with SQL. Oh, what about the DBA's salary?
How well does Prevalent do on 30TB+ datasets?
One doesn't use Prevayler for systems like that. Prevayler makes sense if your data can fit in RAM. If it doesn't, you should do something else.
But note that "something else" doesn't have to mean some SQL thingy. Google has a metric shitload of data, and you can bet they don't keep it in an Oracle server.
Re:Obligatory Slashdotted reference (Score:2)
Re:Data integrity? (Score:5, Insightful)
Someone's been reading DBDebunk.com [dbdebunk.com] again.
Yes, data integrity is one of the major considerations here. I'm willing to bet that by the time you implemented the equivalent of constraints, triggers, etc... in a system like this, you would be running no faster than a typical SQL DBMS, and you would have thousands of bugs as you reinvent the wheel. But there are even more considerations than integrity. This is language-specific, or application-specific. What do you do when you need to access your data from another application? Even if it is possible, that means you have to implement all your integrity checks again in that application.
Essentially, what this looks like is just another OO method of heirarchical (or perhaps "multi-valued") data storage. This is nothing new. It will suffer all of the historical problems the industry has had with hierarchical storage (there is a reason the relational data model was invented: the problems IBM had with hierarchical data). For example, what happens to existing data when you need to change your logical schema or business rules? The cost of re-ordering or reformatting _every_ single stored object since the beginning of your application would be ridiculous, and in some cases even impossible. How do you track dependencies? In theory, these kinds of systems will work fine, if your application stays exactly as created, and if the nature of the data doesn't change, and if no other applications are involved. In other words, NOT in the real world.
I have a nick-name for hierarchical data storaqe: "headache-ical".
Re:Data integrity? (Score:3, Interesting)
I don't believe I have missed the point at all. I am painfully aware that the technology in question is not a relational DBMS. But, normalization (with foreign key constraints) is only the beginning of the constraints needed to create a real database. There are constraints that can be placed on columns, on tables, even on specific types with some systems, and there are *database* constraints (when the system is in condition X, don't allow Y, etc...). Triggers are not the optimal solution for database-level constraints, but that is how most SQL systems are limited.
Yes, I agree that SQL systems are limited, by design. The world has not seen a truly relational DBMS, with the possible exception of Dataphor [alphora.com] (which I dislike for other reasons). The relational model is a logical model. It doesn't "care" which implementation you use. DBMS's are just applications. There is no reason you can't do one in Java. It's not about the language, but about the concept. The relational model offers logical advantages that cannot be found in ANY other system. If any other system achieved those advantages, it would become de facto relational. End of story.
No, the point is being missed by a great many people. This whole thread here really is about physical storage optimization, and running in memory. There is no reason a relational DBMS can't run in memory. That is simply an implementation detail. If the Prevayler application framework achieved the level of logical relational operators and constraints of an RDBMS, then it would become an RDBMS. But it seems pretty obvious that this is not its purpose.
I'm glad you like DBDebunk.com. It helped me, see these concepts in much more clarity. Go back and read the site more carefully, and you will see every argument in this Slashdot thread examined and reexamined. Look for the whole logical-physical confusion thing
Yes, it has many languages, but if you try to access your Java Prevayler application data from PHP or Perl, you are going to have to re-implement all your integrity constraints in those languages. That's my point. No longer will you have a single point of control for your business rules, or the logical model of your data.
Huh? This is the most obvious one of all. With SQL, my VB, Java and Perl application could all interact with the same DBMS without needing to talk to each other. Thats a perfect solution in the real world.
I understand that complex database design is always a difficult thing. And I recognize that there is nothing a relational DBMS can do that can't be re-implemented in application code. But that's my whole point. Separation of operations. Your data should have a single point of control for the logical constraints in the data, and that point of control should be a logical firewall between the data, and everything else that happens in the application.
Also, in the end there is also nothing that an OO DBMS can do that a truly relational DBMS can't. This is what many people don't understand about the true relational model. With the right kind of RDBMS (not the typical SQL ones), there are many more things that are possible than just messing around with base types and simple 3NF normalization. And with updateable views, functions, and logical RULEs (check out PostgreSQL), you can tailor the external access to the application in any way you want, without changing your base table design. This is power.
Re:OO and hierarhcial DB's (Score:3, Interesting)
We don't need a theoretical breakthrough. The relational data model is just straightforward logic, as applied to data storage. What we need is an implementation breakthrough. Even the best SQL systems only implement about 70% of the advantages of the relational model, and suffer from needless performance problems, due to being too tied to physical mapping of tables and views.