Object Prevalence: Get Rid of Your Database? 676
A reader writes:" Persistence for object-oriented systems is an incredibly cumbersome task to
deal with when building many kinds of applications: mapping objects to tables,
XML, flat files or use some other non-OO way to represent data destroys encapsulation
completely, and is generally slow, both at development and at runtime. The Object
Prevalence concept, developed by the Prevayler team, and implemented in Java,
C#, Smalltalk,
Python, Perl,
PHP, Ruby
and Delphi, can be a great a solution
to this mess. The concept is pretty simple: keep all the objects in RAM and
serialize the commands that change those objects, optionally saving the whole
system to disk every now and then (late at night, for example). This architecture
results in query speeds that many people won't believe until they see for themselves:
some benchmarks point out that it's 9000 times faster than a fully-cached-in-RAM
Oracle database, for example. Good thing is: they
can see it for themselves. Here's an
article about it, in case you want to learn more."
RAM ? (Score:1, Interesting)
Very large? (Score:2, Interesting)
Re:gigabytes? (Score:2, Interesting)
Neat concept... (Score:3, Interesting)
You can always have a caching system as the author states, but even then what systems use this? The countless PHP/MySQL sites out there seem to perform just fine. This may be desirable for some very strict real time communications systems, but for just about every other form of app, I don't see it.
What are you going to tell your 3rd party integrators? Drop their XML/ODBC report and surf on over to prevayler.org?
What about existing data ? (Score:4, Interesting)
That said, I wonder what their position is towards the import of existing data. Many projects would only benefit from the solution if and existing data (usually object-oriented but saved in a roughly flat database as the article points out) can be ported seemlessly to the new environment.
My point is, this solution solves a known problem by introducing a new technology, however this new techno will have to be bent towards the older systems in order to retrieve what was already saved. Same old story : in the database world existing data is paramount.
OOP (Score:2, Interesting)
1) You COULD use an object-relational database if you wanted to keep an OOD aspect.
2) You COULD load non-object oriented data into RAM with lower overhead.
3) A couple gig's of data into RAM... not really a deployable solution for enterprise, don't you think?
Other than that, nifty idea and all.
Re:RAM ? (Score:3, Interesting)
Re:RAM ? (Score:4, Interesting)
Data integrity? (Score:1, Interesting)
Aside from which, this appears to be a physical implementation. In theory, Oracle should be able to do something similar to get better performance in those cases when the whole DB fits into memory.
What I'd rather see is better abstraction such as truly relational database. Currently most RDBMS vendors only support a (very large) subset of the relational operators and constraints that a true RDBS would ahve.
3 issues I see (Score:4, Interesting)
1) You're limited by how much RAM you have on your server, not how much disk space you have
2) If you're making a lot of data changes and have a crash or power outage, I'd imagine that it can take a while to replay the log to get things back to the most recent point in time (you can have the same problem with Oracle, but your checkpoints would be a lot closer together than "once a day")
3) There are millions of people that already know SQL and can write a decent query with it. How does this help them? Never underestimate the power of SQL.
On the other hand, for projects dealing with small amounts of data I can see how implementing this would be far easier than integrating with Mysql, Postgresql or Oracle.
Interfacing (Score:3, Interesting)
A SOAP interface could go some ways towards accomplishing this but what about the traditional ACID properties of a DBMS? Durability is obviously guaranteed... Consistency? That would depend on programmers following the practices... Atomicity? Not sure about that one. For simple commands it seems to work. What about compound commands? If no rollback occurs how can I assert that I changed both objects not just one? Isolation? Not sura about this one either.
Why not try reading the article? (Score:3, Interesting)
Get best of both worlds... (Score:5, Interesting)
Of course, you can always write your own persistance layer. I've done this a few times - very easy in Java. Map a row in the DB to an object, and cache the object in memory. If need to fetch that data again, check the cache first. When doing a write, write to the DB and update/flush your cache as necessary.
That's just the basics - what's most optimal depends on how your data is accessed and changed (and also your programming language and capability as a programmer). Java has nice really nice stuff for caching built-in, like SoftReference wrapper objects, and of course threading and shared memory that you can use in production.
I'm currently working on a super optimised threaded message board system. Almost all pages (data fetch/change + HTML generation) complete in about 0.001s.
Not quite (Score:2, Interesting)
This method looks good for storing large amounts of single-select equality queries. I didn't see anything while reading though about how to support range queries or aggregates. Let alone what happens when you need to be more expressive in your queries, like putting a join or two in there.
There are a lot of apps out there where this might work well (I saw Google mentioned above, and can think of things like weblogs, etc). Try doing something like an e-commerce site and you'll start to break down. Especially when you start adding "other people bought" (a-la Amazon), etc, or any other queries that are cross-references and generally require joins or other sorts of data-massaging functionality (i.e. databases' bread and butte)
Re:gigabytes? (Score:2, Interesting)
A gigabyte is not a large database. At all. It's tiny! Anything approaching a terabyte, and you're going to start wanting serious fault tolerance on it... most likely use BCV. [google.com]RAM is not going to support this. Performance on DB's in order of a few GB's is easy, just index lots of stuff :) -- Performance becomes an issue in a database about 500GB in size, generally, and this is too big to put into RAM. So, the performance gain of putting anything in RAM is moot, in this case.
Plus, what business is going to sign off against all their data being stored in fragile RAM?
Getting fired for suggesting a production system do this sounds fair!
Re:3 issues I see (Score:5, Interesting)
2) You can probably set up your own checkpoints to be more than once a day.
3) I agree. Lack of SQL would cause people to.... GASP.... learn a new system. SQL is very cool. And I admit that I have a system I am thinking of porting away from JDBC and into Prevalence just to see how it goes (No, it isn't mission critical) and one of the first things I realized is that I would have to design a new method of querying. But you know what... That can lead to new thinking and more powerful software in the future.
Objects do not fit into RDBMS (Score:2, Interesting)
If your objects map directly to relational database tables your classes most probably model some relational database and not your application domain.
The biggest problem using RDBMS to store objects is that they are designed to store data, not objects. There is big impedance mismatch between the object world and relational world. Designing object oriented applications with restrictions borrowed from RDB world creates totally un-object-oriented programming models like EJB where you end up separating data and functionality to different components. This is just the opposite to the object paradigm where data and functionality go together and are both variable from object to object ("record to record" to RDB people).
This is not to say that OO is better or worse than RDB. They just do not match well and it is quite stupid to bind them tight together when building new applications from scratch.
OO databases are an evolutionary step...backward (Score:5, Interesting)
In 1999, I worked for a company that used an OO database (ObjectStore) to develop an e-commerce shopping portal. It was a disaster.
OO advocates point to extremely fast (extremely special-case, in practice) queries, and natural persistent object mapping as reasons to why OO is superior.
However, this is very misleading.
Some of the MAJOR problems we ran into in using ObjectStore were:
When developers first consider OO databases, their first assumption is that OODBMS is to RDBMS as OOP is to Procedural Programming. This is a FALSE analogy! Migrating to OODBMS offers precious little to support better software design while introducing significant maintenance and design issues that should be considered prior to using this technology.
Unless I had a product that had an extremely specialized use case that matched OODB strengths, I would NEVER develop on this kind of platform again.
Speed is not the only factor (Score:4, Interesting)
Here's the issue they are trying to solve: mapping object to records. That's it. Now the problem with removing the records / database is you lose all of the searching power that is inherit in relational databases. The author states that the codebase is 350 lines of code. How can any complex search engine be implemented in 350 lines of code that also covers the persistance?
OOOhhh... (Score:2, Interesting)
But NOTE - to an extent, Tablizer is RIGHT (shudder): single-inheritance, single-dispatch "OO" is inherently weaker than the relational model. Of course, Tablizer completely fails to critique proper OO like Common Lisp CLOS or Guile GOOPS.
Memory is CHEAP? (Score:3, Interesting)
OODB are very different from RDBMS (Score:4, Interesting)
If you are thinking of accessing your objects like you are doing with SQL, then you haven't understood how OODB work. As for accessing your objects and doing your queries, there are tools (like Inspector for ObjectStore) than enable you to do just that.
In term of performance, Oracle and co are nowhere near what you can reach with ObjectStore, provided you designed your application well.
The 2 main problems with OODB, are:
- schema evolution
- reporting
But these can easily be solved by a good design of your application.
OODB is a skill that needs time mastering. After 4 years, seeing ObjectStore application from various companies, I can tell the difference between the ones where people knew what they were doing, and those from people who didn't have a clue...
Interoperability, Scalability (Score:3, Interesting)
Both of these issues make this solution unusable in an enterprise environment. The RAM size issue has already been mentioned by others and is another very real limitation.
In general, object caching mechanisms are not terribily difficult to create. This generic solution proves the point by only requiring 350 lines of Java code.
I am sure that there is something worthy in this project, I just cannot see it used for anything other than very small-scale development efforts.
Re:FInally OO? I think and hope not! (Score:3, Interesting)
* the separation between _logical_ and _physical_ layers of the database - the DBA controls physical record layout and indices, while the database designer and applications have access to the logical layers. This way they can do their roles independently of each other.
* the ability for the data model to change without affecting the applications. Using VIEWS - you can do quite a bit of modification to the underlying data model, but applications using the older one will still run if the DBA sets up a view.
* the ability to do arbitrary querys on the data
* The ability to set up views to handle more complex interactions. For example, in a mail system I've written, we have a table for campaigns with a sent/not-sent flag, a list of addresses, and three layers of do-not-send lists. We then have a single view which puts all of this together and gets the list of addresses which need to be sent to. This is a view on top of several views.
I'm sure I'm missing some others, too. Basically, a relational database system is a gigantic inference engine when designed appropriately.
Re:OO databases are an evolutionary step...backwar (Score:5, Interesting)
Although you certainly have a point, there are some remarks I have to make here:
There's no "SELECT * FROM USERS".
That's just like saying Latin is a bad language because it does not have equivalents for 'the', 'le/la', 'de/het', 'der/die/das', whatever. An rdbms is *fundamentally* different from an oodbms
DB Performance when querying outside the normal object hierarchy (...) is orders of magnitude SLOWER on an OODB!
That's right: you are trying to use a oodbms as a rdbms. Ever tried to drive a car like you ride a bicycle?
Oodbms are relatively new, and they have their 'problems', just like rdbms-es have theirs. But the biggest problems arise when one approaches an oodbms like one would an rdbms. Just like you run into problems using an oo language when you have only used a proc. language
Re:gigabytes? (Score:2, Interesting)
Not every solution is for every problem. This isn't for huge data warehousing systems. My impression is that this is for smaller databases where there is a lot of interactions with fewer objects.
This was compared to an Oracle db running in RAM. Who would spend the money for an Oracle db (and an Oracle admin) for a database small enough to fit in RAM?
What's new? (Score:2, Interesting)
EOF and EJB (Score:2, Interesting)
Some dude cooks up 350 lines of untested code to do something that's already been done and it's news?
What is this, CNN?
Nods head (Score:3, Interesting)
> is orders of magnitude SLOWER on an OODB!
That's right: you are trying to use a oodbms as a rdbms. Ever tried to drive a car like you ride a bicycle?
I've still never heard a good answer to this problem, only that I'm using the wrong hammer.
When performing activities against pure OO storage in which selectively collecting data from a (potentially large) number of objects is required, what is the OC (object-correct) way to do so? Asking each one via a method call is horrendously slow in comparison to a RDBMS. For instance, contrast "select last_activity, uid from users" to
my %blarg;
foreach ( my $user $users->$next() ) {
$blarg{uid} = $user->{uid};
$blarg{last_activity} = $user->{last_activity};
}
I suppose if one is building a product instead of managing an ongoing project, saying that lazy access to the hash will save a little time. I still don't see the performance win, and for ad hoc access, building the methods and accessors just takes too much time to be reasonable.
Use the right tool for the right job, I say. And usually, for managing data, a RDBMS is the right tool. For interacting with that data, OO is frequently nice.
Please correct my incorrect notions.
Absolutely Awesome! (Score:2, Interesting)
OK (Score:4, Interesting)
What happens when your classes change? (Score:1, Interesting)
What happens if, during development or when you upgrade to the next version of the program, you add or remove data members from objects? If your persistance format is just a memory image of objects, and the layout of objects in memory changes, it seems that the memory image would break.
And if you rely on the persistance of commands, what happens if during the upgrade the exact definition of a command changes, or some commands are replaced with other commands. Then you wouldn't restore the exact state, or you might not be able to restore anything at all.
Seems that one of the advantages of a true database is translating into a neutral structure and set of data types.
Re:FInally OO? I think and hope not! (Score:1, Interesting)
OODBMS are querried with a structured languages similar to SQL, called OQL.
While the relations in RDBMS are artificial and can not be mapped to language constructs of the used programming languages(IDs in tables/relations versus "numbers" in Pascal or C), associations in oo languages are just pointer and get mapped automaticaly to backing store data types.
You claim you need to write your access layer in oo languages to an OODBMS is wrong, if it is an OODBMS it takes care for you of that.
The prevalence system we are talking here about is not an OODBMS
And the guys modding you up unfortunatly have no clue about OODBMSs
Well, reading your post three times now, I think it deserves a FUNNY or IRONICAL.
Meanwhile I can not believe you ment it serious, however the modders took it like that
angel'o'sphere
MOO (Score:4, Interesting)
Re:RAM ? (Score:3, Interesting)
Nobody outside of a garage can loose 1 transaction, don't even imagine loosing minutes of them with a failure.
"Carlos Eduardo Villela is a 19-year old Brazilian graduate in Information Systems."
Re:Ever looked at object-oriented databases? (Score:3, Interesting)
Here is why:
1. We realized that like _most_ projects there really wasn't anything that object-oriented about the data. The code, yes. But the data was just as easily represented with typical RDBMS relationships and it was much faster to do basic operations. We saw a several thousand time increase in performance when trying to query the database for a particular object and its associated data. A join or ten wasn't nearly as expensive as getting data out of what was essentially just a dump.
2. Objectstore, at the time, had no concept of administration. It was up to the developer to handle things like when files got too big, or creating the OODBMS concept of indexes, or what have you. The "DBA" could stop and start it, and that's about it. So if we grew, or got new hardware, or changed platforms, it was time to dump the old data (because migrating it was an programming project in itself) and start over.
3. People would ask us questions about the data we were storing that would have been absolutely trivial to find in a RDBMS (like "how many of these events occured last month when this device was in this state) that we'd have to write long slow-performing pieces of code to retrieve.
4. Other people wanted to write applications that used our data. That wasn't too easy, because they wanted slightly different objects. We would have had to agree on a object for everything we shared, or store things twice. With an RDBMS we used could use views, or generate the objects differently from the same tables.
5. There was no way to get a read-consistent hot backup across a couple of hundred files. Maybe there is now. This was just foolish.
Re:Don't go there (Score:2, Interesting)
EJB 2.0 with newer design patterns has made great strides in performance AND scalability. I won't go into it here except to say that the two patterns that EJB developers should look at are
1) Session Facade - If you don't have a trained DBA on staff, or performance is not as important as maintainability.
2) Course Grained Session Beans - If you do have a DBA, and/or you need the best performance.
look here [urbancode.com] for a good EJB patterns benchmark
look here [theserverside.com] for a free book on EJB design patterns.
Re:3 More Issues for the Do-It-Yourself Database (Score:3, Interesting)
Then just wrap all of your lock-sensitive stuff in Prevayler command objects. They've got that working fine, and it guarantees isolation.
Goodbye Crystal Reports, Goodbye English Query, Goodbye ANY Ad Hoc query support, because if you need anything different, you're going to have to write a lot more code to enumerate throughout your objects. Have fun.
Oh, please. If you really need SQL compatability, then dump the data occasionally to a data warehouse, which is where you should be doing unconstrained ad-hoc queries anyhow.
Or if it's so the programmers can peek at the live system, then put in something like BeanShell, which will let you see a lot more than just the persistent data.
Or you could drop an SQL interpreter into your system and present your objects as tables. Many of the pieces are already open sourced, so it would be pretty easy.
Indexing - I hope you have a good B-Tree library and are familiar with Indexing/Searching algorithms when implementing HARDCODED indexing. Oh yeah, have fun rewriting all of your query procedures when you decide to change your hardcoded indexing.
Can you really not think of ways to write these things in flexible ways? If that's the case, you could learn something about being a programmer. Pick up Martin Fowler's Patterns of Enterprise Application Architecture [martinfowler.com].
In all seriousness, this is a bad idea for 99% of projects out there. It's inflexible, unscalable, severely error prone, and timely to implement.
Perhaps you should try it before knocking it. As you are, in order, wrong, mostly wrong, wrong, and confused. It's no magic bullet, but it's a useful approach for some systems.
Re:Speed is not the only factor (Score:2, Interesting)
As has been pointed out already, searching is not the goal of Prevayler -- storage and retrieval is. RDBMS' are very good at scanning through large data-sets, but they do so in a general-purpose manner. This is because they have no concept of the nature of the information.
An application I was working on recently required rapid searching through very massive sets of geographic data (without knowing what objects we were looking for). Due to political issues within the project, we were forced to implement a solution as quickly as possible. We turned to a traditional RDBMS solution. Even with considerable amounts of tweaking, indexing and other optimizations, performance was far from satisfactory. Unfortunately, we had to release the product in that state.
Later experimentation yielded a custom indexing algorithm with a 5000-fold increase in performance on hardware with half the clock speed and RAM. The reason was simple: Our custom solution was able to take into consideration the nature of the problem, whereas the RDBMS could only apply a general approach. Even when using hierarchical indexing (our data was hierarchical) the database couldn't even begin to keep up.
In the end, you have to decide what's important to the project. A database may provide lots of features, but its general nature may not be good enough for the task. Prevayler doesn't do everything, but then again I wouldn't necessarily want it to. :)
Re:RAM ? (Score:2, Interesting)
* Tables map to objects; fields map to the properties of that object on a 1:1 basis
This is horrible, and unnecessary. Your object model *should not* be forced to look like your database schema, and any halfway decent O/R mapper (see hibernate, castor etc for Java) allows you to design an object model that is decoupled from your DB schema.
Re:Data integrity? (Score:3, Interesting)
I don't believe I have missed the point at all. I am painfully aware that the technology in question is not a relational DBMS. But, normalization (with foreign key constraints) is only the beginning of the constraints needed to create a real database. There are constraints that can be placed on columns, on tables, even on specific types with some systems, and there are *database* constraints (when the system is in condition X, don't allow Y, etc...). Triggers are not the optimal solution for database-level constraints, but that is how most SQL systems are limited.
Yes, I agree that SQL systems are limited, by design. The world has not seen a truly relational DBMS, with the possible exception of Dataphor [alphora.com] (which I dislike for other reasons). The relational model is a logical model. It doesn't "care" which implementation you use. DBMS's are just applications. There is no reason you can't do one in Java. It's not about the language, but about the concept. The relational model offers logical advantages that cannot be found in ANY other system. If any other system achieved those advantages, it would become de facto relational. End of story.
No, the point is being missed by a great many people. This whole thread here really is about physical storage optimization, and running in memory. There is no reason a relational DBMS can't run in memory. That is simply an implementation detail. If the Prevayler application framework achieved the level of logical relational operators and constraints of an RDBMS, then it would become an RDBMS. But it seems pretty obvious that this is not its purpose.
I'm glad you like DBDebunk.com. It helped me, see these concepts in much more clarity. Go back and read the site more carefully, and you will see every argument in this Slashdot thread examined and reexamined. Look for the whole logical-physical confusion thing
Yes, it has many languages, but if you try to access your Java Prevayler application data from PHP or Perl, you are going to have to re-implement all your integrity constraints in those languages. That's my point. No longer will you have a single point of control for your business rules, or the logical model of your data.
Huh? This is the most obvious one of all. With SQL, my VB, Java and Perl application could all interact with the same DBMS without needing to talk to each other. Thats a perfect solution in the real world.
I understand that complex database design is always a difficult thing. And I recognize that there is nothing a relational DBMS can do that can't be re-implemented in application code. But that's my whole point. Separation of operations. Your data should have a single point of control for the logical constraints in the data, and that point of control should be a logical firewall between the data, and everything else that happens in the application.
Also, in the end there is also nothing that an OO DBMS can do that a truly relational DBMS can't. This is what many people don't understand about the true relational model. With the right kind of RDBMS (not the typical SQL ones), there are many more things that are possible than just messing around with base types and simple 3NF normalization. And with updateable views, functions, and logical RULEs (check out PostgreSQL), you can tailor the external access to the application in any way you want, without changing your base table design. This is power.
Re:OO and hierarhcial DB's (Score:3, Interesting)
We don't need a theoretical breakthrough. The relational data model is just straightforward logic, as applied to data storage. What we need is an implementation breakthrough. Even the best SQL systems only implement about 70% of the advantages of the relational model, and suffer from needless performance problems, due to being too tied to physical mapping of tables and views.
Re:Don't go there (Score:3, Interesting)
http://www.onjava.com/pub/a/onjava/2003/02/26/ejb
A comparison of Tomcat, Orion, Resin and Weblogic is here:
http://radio.weblogs.com/0107789/stories/2002/05/
Distributing objects (Score:2, Interesting)
We built what we called an event server and used Corba as an interface and clients registered an SQL query against the event server. Not only did they receive the result set but then automatically any updates that other clients applied.
Doing this shifted the system we were building from a >$1 million SGI Origin box onto a cluster of RedHat servers with a Sun E250 running Oracle as DB Server. Previously clients only refreshed about every 15 minutes and now clients receive updates in 10 seconds usually. In memory database is fairly small though - about 500-1000 MB.