Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Object Prevalence: Get Rid of Your Database? 676

A reader writes:" Persistence for object-oriented systems is an incredibly cumbersome task to deal with when building many kinds of applications: mapping objects to tables, XML, flat files or use some other non-OO way to represent data destroys encapsulation completely, and is generally slow, both at development and at runtime. The Object Prevalence concept, developed by the Prevayler team, and implemented in Java, C#, Smalltalk, Python, Perl, PHP, Ruby and Delphi, can be a great a solution to this mess. The concept is pretty simple: keep all the objects in RAM and serialize the commands that change those objects, optionally saving the whole system to disk every now and then (late at night, for example). This architecture results in query speeds that many people won't believe until they see for themselves: some benchmarks point out that it's 9000 times faster than a fully-cached-in-RAM Oracle database, for example. Good thing is: they can see it for themselves. Here's an article about it, in case you want to learn more."
This discussion has been archived. No new comments can be posted.

Object Prevalence: Get Rid of Your Database?

Comments Filter:
  • gigabytes? (Score:5, Insightful)

    by qoncept ( 599709 ) on Monday March 03, 2003 @09:49AM (#5423491) Homepage
    At first, I had a problem understanding object oriented methodology because I kept thinking of objects in terms of a database -- they seemed so much alike. But...

    Who uses a database small enough to fit in RAM?

  • Re:RAM ? (Score:4, Insightful)

    by bmongar ( 230600 ) on Monday March 03, 2003 @09:50AM (#5423496)
    No more than any other database. Perhapse you missed the part where they said they would serialize the commands that change the objects. In this context they are talking about saving the commands.
  • Re:RAM ? (Score:0, Insightful)

    by krugdm ( 322700 ) <<moc.gurki> <ta> <todhsals>> on Monday March 03, 2003 @09:51AM (#5423500) Homepage Journal

    If you're going to implement this, I'd say you'd better be investing in a good UPS system and some scripts to dump everything to disk in case the system is triggered by an outage...

  • Re:gigabytes? (Score:5, Insightful)

    by bmongar ( 230600 ) on Monday March 03, 2003 @09:56AM (#5423534)
    Who uses a database small enough to fit in RAM?

    Not every solution is for every problem. This isn't for huge data warehousing systems. My impression is that this is for smaller databases where there is a lot of interactions with fewer objects.

    I have also seen object databases used as the data entry point for huge projects, where the database is then periodicaly dumped into a large relational database for warehousing and reports.

  • Re:gigabytes? (Score:2, Insightful)

    by qoncept ( 599709 ) on Monday March 03, 2003 @10:02AM (#5423583) Homepage
    Very true, then again, though, if the database is that small anyway, you're probably not taking much of a performance hit unless you never should have been using a database to begin with.

    Offtopic though, I'd love to see a solid state revolution. With the amounts of RAM and flash memory available these days, I don't see why we couldn't run an OS off one. I'm not generally one to be anxious to jump in to new technologies (I used to hate games that used polygons instead of sprites), I think moving to solid state in an intelligent manner would be the biggest thing that could happen in the industry in the near future. ie, along with serial ata, introduce fast, ~2gb bootdrives that run your OS and favorite programs and store everything else on a conventional magnetic hard drive.

  • by sielwolf ( 246764 ) on Monday March 03, 2003 @10:04AM (#5423605) Homepage Journal
    I think this would work well for most web-server DB backends as the data isn't changing on the fly that much. But what about even /. where the content of a discussion thread is changing possibly several times a second (with new posts and mods)? I'd think then you'd want to use the strong atomic operators of the DB to pull directly from the tables instead of relying on serial operators to try and refresh.

    Since the benchmark page was slashdotted I might be speaking out of my ass. But I never trust "9000 times faster!". It sounds too "2 extra inches to your penis, guaranteed!"
  • by Ummite ( 195748 ) on Monday March 03, 2003 @10:05AM (#5423608)
    The advantage of putting data into a database isn't just speed! Just think about sharing data between application, between many computers, exporting data into another format, or simply making a query to change some values! You simply don't want to write code that will change value of data with some specific conditions : you prefer make a single query that any database manager or simply a sql newby could write, not just the 2-3 programmers that have done the work on that code some years ago! You also sometime need to visualise data, make reports, sort data. You simply don't want to code that. I think most serious database can also put data in RAM if you have enough, and is able to do some commit/rollback when it's necessary. So your point that RAM data with serialize in-out is ok, as long as you absolutly need 100% speed, don't need to do complex query on your data and is in used only on one computer.
  • no queries (Score:5, Insightful)

    by The Pim ( 140414 ) on Monday March 03, 2003 @10:06AM (#5423613)
    Queries are run against pure Java language objects, giving developers all the flexibility of the Collections API and other APIs, such as the Jakarta Commons Collections and Jutil.org.

    In other words, "it doesn't have queries". What real project doesn't (eventually) need queries? And even if writing your queries "by hand" in Java is good enough for now, what real project doesn't eventually need indices, transactions, or other features of a real database system?

  • by Frans Faase ( 648933 ) on Monday March 03, 2003 @10:07AM (#5423617) Homepage
    This article made me think about the use of Memory Mapped Files as a means to implement a persistent store in C++. For an example of this, have a look at Suneido [suneido.com].
  • Re:RAM ? (Score:5, Insightful)

    by hrieke ( 126185 ) on Monday March 03, 2003 @10:09AM (#5423636) Homepage
    Reminds me of something that I heard about year ago- one of the DB players (I think IBM) built a fully OO DB in C. Used to store the relations in RAM.
    Blazing fast, and easy as hell to fuck up beyond replair- you could do both a read and a write to the same memory area at the same time, or something like that.

    This sounds just as bad.
    For example, let's say that we're doing a transaction of a few million dollars. In mid process the power dies and the machine goes dark. Outside of shouting 'redunant this that and the other', what state would the machine be in when it comes back online, were is the money, and could we back out of and rerun the transaction?

  • Re:gigabytes? (Score:5, Insightful)

    by juahonen ( 544369 ) <jmz@iki.fi> on Monday March 03, 2003 @10:10AM (#5423650) Homepage

    And that goes for OO as well. Not every database (or a collection of data) needs to be accessed in Object-Oriented way. Most (or should I say all) data I store to small tables would not benefit from being objects.

    And how does this differ from storing non-object-oriented data structures in RAM? You'd still need to implement searches, and how do you search an collection of objects without placing them on the relational line.

  • by jj_johny ( 626460 ) on Monday March 03, 2003 @10:11AM (#5423653)
    Reading through the article it seems to lack a rather small but important item - multiple systems interacting read/write with the same database. This is not a very robust or scalable way of doing things. I wonder how this stacks up to one of the normal ways of improving performance by have one read/write database with lots of read only repicas.
  • Re:no queries (Score:4, Insightful)

    by sql*kitten ( 1359 ) on Monday March 03, 2003 @10:20AM (#5423700)
    In other words, "it doesn't have queries". What real project doesn't (eventually) need queries? And even if writing your queries "by hand" in Java is good enough for now, what real project doesn't eventually need indices, transactions, or other features of a real database system?

    Indeed. It looks like a high-level, language-neutral API for traversing linked lists of structs. Yes, you can rip through such a structure far faster than Oracle can process a relational table, but they are two different solutions to two different problems. I wouldn't use an RDBMS for storing vertex data for a scene rendering application, and I wouldn't use an in-memory linked list for storing bank transactions!
  • Re:RAM ? (Score:3, Insightful)

    by sql*kitten ( 1359 ) on Monday March 03, 2003 @10:27AM (#5423738)
    No more than any other database. Perhapse you missed the part where they said they would serialize the commands that change the objects. In this context they are talking about saving the commands.

    Checkpointing once per day? Re-applying 15 MINUTES worth of Oracle transaction logs takes too long for some failover requirements; you force a log switch every 2 minutes if you have to. Or you eat the performance hit of synchronous replication and spec your hardware to compensate.

    I'm guessing this DB was written by a bunch of smart CS graduates who overdosed on OO theory and haven't spent much time in the hard core of OLTP: banks, telcos, airlines, retail, etc.
  • by bcarothers ( 147707 ) on Monday March 03, 2003 @10:29AM (#5423754)
    Another flaw seems to be that it can only read/write the equivalent of a SQL table at a time.

    In the UserMap example, consider what happens when you need to support 50k users. Every time you change a user, you have to serialize the entire HashMap to disk. Blech.

    I guess by "9000 times faster than a fully-cached-in-RAM Oracle database" they really mean 9000 times faster than a fully-cached-in-RAM Oracle database for read-only applications that never join tables and only look up rows by primary key
  • Re:RAM ? (Score:4, Insightful)

    by Zaiff Urgulbunger ( 591514 ) on Monday March 03, 2003 @10:39AM (#5423824)
    This all depends on what data you're trying to preserve. Some data, such as say a users UI preference changes might be deemed "not that important" and thus you can risk storing these in ram until convenient to commit it somewhere.

    Conversely, some data such as a financial transaction really needs to be commited straight away.

    But commited means *you must* write it out to non-volatile storage (i.e. a disk) otherwise the transaction may be lost. So (I believe) most DB's write the update out to their transaction log very quickly and deal with updating the DB tables/indexes at a latter stage. Obviously, this all depends on if you need to allow other processes to access this data immediatly or not.

    Personally, I don't think this represents anything new (**in true /. fashion, I have not read the article!!**) and I doubt this is faster than a well designed system anyway.

    What it might offer however is:
    1). A nicer interface for managing object persistence; 'cos it is ugly managing mapping objects to DB columns.
    2). A clear guide to help people manage which objects need persisting to disk and which are less important.
    But thats about all.

    ---
    I'll now go and read the article - you can catch me later contradicting myself!
  • Re:RAM ? (Score:2, Insightful)

    by angel'o'sphere ( 80593 ) <angelo,schneider&oomentor,de> on Monday March 03, 2003 @10:44AM (#5423855) Journal
    Why dont you read the articel before youpost you dumb ass?


    I'm guessing this DB was written by a bunch of smart CS graduates who overdosed on OO theory


    Probably.

    and haven't spent much time in the hard core of OLTP: banks, telcos, airlines, retail, etc

    No idea.

    But: Checkpointing once per day? Re-applying 15 MINUTES worth of Oracle .

    Well, why do you come to the braindead opinion that you can not checkpoint more often?

    I checkpoint in one of my prevalent systems every hour and in the other never.

    10 000 cmmands serialized and loaded on system reboot only take 300 seconds execution time. As the core of the system is still under development I do not checkpoint ... so I have no class version conflicts causing trouble reading check point data from old snap shots.

    angel'o'sphere
  • Buggy whips (Score:5, Insightful)

    by Camel Pilot ( 78781 ) on Monday March 03, 2003 @10:49AM (#5423882) Homepage Journal
    3) There are millions of people that already know SQL and can write a decent query with it. How does this help them? Never underestimate the power of SQL.

    There are millions of people that already know how saddle and ride a horse. How do these new fangled automobile help them? Never underestimate the power of a horse.

    While I agree with your other points... number 3 is never a reason to keep from embracing something new. People are suprisingly trainable.

  • Re:gigabytes? (Score:3, Insightful)

    by Ed Avis ( 5917 ) <ed@membled.com> on Monday March 03, 2003 @10:51AM (#5423899) Homepage

    Who uses a database small enough to fit in RAM?

    Even if your database doesn't fit in affordable RAM today, it probably will in a few years. RAM prices fall faster than database sizes increase. Already a couple of gigabytes of storage is more than enough for a big class of applications.
  • by mojorisin67_71 ( 238883 ) on Monday March 03, 2003 @10:52AM (#5423904)
    Main Memory Databases have been researched for nearly 10 years now and there are a number of commercial products. For details you can check out:
    TimesTen [timesten.com]
    Polyhedra [ployhedra.com]
    DataBlitz [bell-labs.com]

    etc..
    The idea it to have enough RAM to be able to store all the database in memory. This gives higher performance than a fully cached Oracle for two primary reasons:
    - there is no buffer manager so data can be directly accessed.
    - the index structures use smart pointers to access the data in memory.

    Typically the data is mapped using mmap or shared memory. Each application can have the databae directly mapped into its memory space.
    For providing persistence, typically main memory databases provide transaction logging and checkingpoint to be able to recover the data. Various techniques have been developed to be able to do this without affecting performance.
  • by kriegsman ( 55737 ) on Monday March 03, 2003 @11:07AM (#5423994) Homepage
    Things I want in a persistent datastore:
    - Atomicity of transactions (commit/rollback),
    - Consistency in the enforcement of my data integrity rules,
    - Isolation of each transaction from other competing transactions (locking)
    - Durable storage that can survive a crash without losing transactions (e.g., journaling)

    My experience with RAM-centeric disk-backed object storage is that you, the developer, often have to implement the ACID fetures yourself, from scratch. And from-scratch implementations of complex data-integrity mechanisms tend to be time-consuming to develop and test and often take much, much longer than you think to "get right".

    Call me old-fashioned, but I really like using data storage (database) engines that pass the ACID test and have already been debugged and debugged and debugged and debugged and debugged.

    -Mark
  • by cushty ( 565359 ) on Monday March 03, 2003 @11:17AM (#5424067)

    Some people seem to be missing the point: this is not a "database" it is a persistence mechanism. What they are saying is that persisting objects is difficult (er, tend to disagree but I'll bite) and so they are solving this. Whether a RDBMS offers better searching is completely irrelevent as this, in their architecture, is handled by the application.

    What they seemed to gloss over is that you need to take snapshots of the actual data. If you didn't you'd have to keep every single "log" in order to safely playback the actions and know you have the same data in the same state. Loose one log, say the very first one, and you're pretty much screwed.

  • Re:gigabytes? (Score:5, Insightful)

    by mbourgon ( 186257 ) on Monday March 03, 2003 @11:18AM (#5424072) Homepage
    Who uses a database small enough to fit in RAM?

    I do, but I'll thank my SQL server for doing it for me. Most aggressively cache data and databases - if Database A is used constantly, it'll be kept in RAM, whereas less-frequently Databases will either stay on the hard disk, or certain tables of that database will be put in memory. It lets you make the most of your RAM.
  • by Anonymous Coward on Monday March 03, 2003 @11:21AM (#5424096)
    "design (and objects) right before you start coding"

    It must be nice never having to change anything after project start. I am _very_ jealous of your perfect world. Congrats to you and your achievement.
  • by praetorian_x ( 610780 ) on Monday March 03, 2003 @11:29AM (#5424159)
    This is not a new idea. There are all sorts of object databases out there. (Versant springs to mind).

    The main problems I see with object databases:

    1) SQL is incredibly powerful. You give up *a lot* of power when you go from sql semantics to object semantics. Sub-selects, group bys and optimized stored procedures, to name just a few things. All the object language query constructs I've seen fall far short of these. (As a side note, most O/R tools make a hash of it as well.)

    2) You immedately make a massive reduction in the number of database administrators who will be willing and/or capable of helping you out in your project.

    3) Scaling is always a question. With oracle, it just isn't.

    4) Backup, redundancy, monitoring, management, etc. Most mature relational databases have very good tools for doing these infrastructure activities. Developers often forget about banal things like this, but they are crucial for the long term health of IT systems.

    Don't get me wrong. Every time I construct some nasty query and go through the mind-numbing process of moving the results into an object, I think to myself "There has to be a better way!", but I've looked at the O/R tools and the object database out there and, sadly, I don't feel they are worth the trade off.

    Just my opinion,
    prat
  • Re:Data integrity? (Score:5, Insightful)

    by rycamor ( 194164 ) on Monday March 03, 2003 @11:33AM (#5424183)
    Aha.

    Someone's been reading DBDebunk.com [dbdebunk.com] again.

    Yes, data integrity is one of the major considerations here. I'm willing to bet that by the time you implemented the equivalent of constraints, triggers, etc... in a system like this, you would be running no faster than a typical SQL DBMS, and you would have thousands of bugs as you reinvent the wheel. But there are even more considerations than integrity. This is language-specific, or application-specific. What do you do when you need to access your data from another application? Even if it is possible, that means you have to implement all your integrity checks again in that application.

    Essentially, what this looks like is just another OO method of heirarchical (or perhaps "multi-valued") data storage. This is nothing new. It will suffer all of the historical problems the industry has had with hierarchical storage (there is a reason the relational data model was invented: the problems IBM had with hierarchical data). For example, what happens to existing data when you need to change your logical schema or business rules? The cost of re-ordering or reformatting _every_ single stored object since the beginning of your application would be ridiculous, and in some cases even impossible. How do you track dependencies? In theory, these kinds of systems will work fine, if your application stays exactly as created, and if the nature of the data doesn't change, and if no other applications are involved. In other words, NOT in the real world.

    I have a nick-name for hierarchical data storaqe: "headache-ical".
  • by battjt ( 9342 ) on Monday March 03, 2003 @11:33AM (#5424193) Homepage
    and like in every object system, it is very important you get the design (and objects) right before you start coding

    And, like in every programming project, your requirements are incomplete, so your model will be incomplete, so you need to allow for flexibility. OO DBMS that I have used don't allow for that flexibility (schema evolution), so we build layers on top of the OODB, just the same as we do for relational DBs. I don't see the advantage. By the time we are done optimizing a relational DB, it has all the same indexes that the OODB would have, but we were able to evolve the system, instead of designing it all up front.

    I suppose I could argue for an OO DBMS if the number of transactions was high enough and application had a static set of requirements (general ledger, trade system, etc.).

    Joe
  • by so90s ( 610321 ) on Monday March 03, 2003 @11:40AM (#5424257)
    There are numerous approaches to the problem of making objects persistant. Unfortunately, people tend to think that once they serialize an object to disk, they have a database.

    This assumption is incorrect. A real database meets the ACID criteria, and no persistence system is worth mentioning before it does so too!

    cheers
  • by truffle ( 37924 ) on Monday March 03, 2003 @12:07PM (#5424426) Homepage

    A major complaint the articles raises is that storing objects in SQL databases or as XML requires storing objects in a non object oriented fashion (which breaks encapsulation).

    This technology relies on object serialization. In Java at least only objects consisting entirely of base types (integers, strings, etc.) can be serialized without writing additional code. This code is analogous to the code required to store an object in a database or an XML file.

  • BS (Score:5, Insightful)

    by bwt ( 68845 ) on Monday March 03, 2003 @12:11PM (#5424452)
    Sorry this just won't cut it in most enterprise systems.

    1) Doesn't scale. Most enterprise databases don't fit in RAM. Data volumes grow with the capacities of hard disks which outpace RAM. If your database fits in memory now and you use this architecture, what do you do when it grows larger than your RAM capacity? You fire the guy that proposed this and switch to an RDBMS.

    2) Performance claims are BS. Good databases already serialize net changes to redo logs via a sort of binary diff of the data block. Redo logs are usually the limiting factor on transaction throughput, since they require IO to disk. Serializing the actual commands is more inefficient than using a data block diff. You simply cannot minimize the space any better than an RDBMS does, therefore you cannot minimize the IO for this serialization any better, and therefore you cannot do it faster without sacrificing ACIDity. If your performance is too good to be true, then you gave up an essentail feature of the RDBMS.

    3) Consistancy. If there is only one object in memory for each record, then you'll be writing a tremendous amount of custom thread-safety code and even then, either A) writers block readers and readers block writers or B) read consistacny isn't guaranteed. Either is usually unacceptable. One alternative is to clone objects at every write (sounds slow and horribly inefficient). Of course, this too has to be serialized, or you don't have ACIDity. If you are serializing these, then you aren't really different than an RDBMS which uses rollback/undo, except you are wasting disk IO and are slower.

    4) Reliability. A hardware failure, software hang/crash, or system administration mistake would force recovery from the last full backup. Replaying a full day's transactions could take hours. Sure you could be continually making a disk image, except for read consistancy issues like above. Its not clear what you do even for a daily backup. Are all sessions simply blocked during backup? Ouch.

    Every few years object fanatics try to come up with some way to get rid of RDMBS's. The methods invariably rely on sacrificing some of the core capabilities of the RDBMS: data integrity, performance, consistency, ACID, reliability etc... These "innovations" are really only of interest to OO fanatics. In the real world, OO gets sacrificed way before RDBMS's do. This is not going to change.

    OO is a tool that is good for writing maintainable code. It is not good for performance critical uses like OS's, device drivers, and real time systems. It is not good for data intensive systems. These things are not likely to change. If all you can accept is OO, then you are a niche player.
  • by moncyb ( 456490 ) on Monday March 03, 2003 @12:22PM (#5424537) Journal

    The first problem I see with this method is the lack of a powerful and flexible querying method.

    Maybe I don't understand this well enough (the Prevaylor site is down), but if this is really a database based upon objects, and you can access them as normal objects, then any good programmer can make a "powerful and flexible querying method." You can write your own hashtables, searching functions, or whatever.

    One of the most powerful features of SQL databases is their capability for searching. No where in the article did I see anthing about advanced querying of the objects.

    Because they probably didn't put any searching routines into Prevayler. From the SourceForge page: "Ridiculously simple, Prevayler provides transparent persistence for PLAIN Java objects." You write the searching routines.

    Even if there is, I'm sure its no where near as fast as a MySQL or Oracle. The author states that it is several orders of magnitude faster, but I bet it is this much faster only on fetch routines where you already know what object you are looking for.

    Ever hear of hash codes and hash tables? You write the code yourself. How do you think MySQL and Oracle do it? They have code which does the searches. With this system you cut out the middleman. It'll have its own weaknesses and strengths, so every manager will have to decide if this system will fill their needs.

    At first glance, I see two weaknesses and two strengths to this system. Weaknesses: a) you'll have to be more of a programmer to implement a database. b) the database has to be small enough to fit in memory. Strengths: a) infinitely flexible. b) really fast for anything which will fit in RAM.

    Web hosting services won't want this. (they usually have many customers, and all their databases won't fit in RAM at once.) Big e-commerce sites won't want this for their customer databases. (again, probably won't fit in RAM) They may be able to use it for their product data, unless it's really huge--such as Barnes and Noble. I'm sure it'll be quite usable for most small businesses. The need for a programmer may seem like a huge obstacle, but I'm sure if Object Prevalence gets big, there'll be a book called "Object Prevalence in Java for Dummies" in no time.

  • by puppetman ( 131489 ) on Monday March 03, 2003 @12:30PM (#5424603) Homepage
    As has been mentioned, it fails the ACI portion of ACID (it's not Atomic - all or nothing, not Consistent - data is left in a consistent state, doesn't provide Isolation - you appear to be the only transaction running; other processes don't affect your data in mid-transaction). Passes Durable, I suppose.

    I've read a few posts that say that the performance claims (vs a relational database) are not true. I think this will be much faster than a database. This is an in-memory cache. It will be very fast. Our Oracle databases have a cache-hit ratio of 98 and 99+ percent, but will be slower. Why?

    First, databases (especially Oracle) do alot of stuff behind the scenes, logging all sorts of stuff from a user connecting to the SQL being run.

    Second, this sort of thing offers nearly direct access to the data. SQL usually needs to be parsed before it is executed. The database needs to come up with the optimal query plan before it actually executes the statement. A database offers different ways of joining data, and accessing data. Find me all managers that make more than $50,000 per year and have a last name that start with K. You will have to decide the best way to get the data yourself. A database will do all the work for you.

    This is a great, idea, though for a middle-tier cache. Say you want to do some fast searching on a small amount of data. You can use this in the middle tier to save yourself the trip to the database.

    A good object oriented database that has not been mentioned yet is Matisse [fresher.com]
  • by MisterFancypants ( 615129 ) on Monday March 03, 2003 @12:44PM (#5424712)
    Their solution really seems to rock, and may finally be the OO to DB paradigm everyone was waiting for.

    Not likely. The REAL problem with OO databases isn't that RDBMs might be more mature or whatever else you might read, it is that the data is almost always more important to companies than the behaviors that operate on that data. For example, if the company has a database of customers, they might want to use that database in dozens of different ways, and they might want to grow it for years, if not decades. The OO-database view tends to look at things too much from the view of one single application of the data and the data gets entangled with code behavior based on that specific application. With a clean RDBMs you can hit the same database from many different applications (assuming the database has a well thought-out schema to begin with)... the data isn't so tightly wound up with a specific bit of application code.

    This 'solution' doesn't fix that aspect of OO databases. In fact, it makes it worse. I will grant that it is a neat technology, but I wouldn't expect to see it take over the place of RDBMs systems any more than OO-databases of the past have.

  • Re:RAM ? (Score:3, Insightful)

    by angel'o'sphere ( 80593 ) <angelo,schneider&oomentor,de> on Monday March 03, 2003 @01:03PM (#5424839) Journal
    Funny, he gets modded up each hour by one point and my answer to it gets modded down by one point each hour.

    The modders should read the article as well as the poster should.

    The time interval of the snap shot is a configureable option.

    If you would read the article you knew that and you knew also that the writer is a 19 year old CS graduate, indeed.

    Probably you should get a smal dose of OO as well before freaking out like you did.

    angel'o'sphere
  • Some of the MAJOR problems we ran into in using ObjectStore were:

    No, the MAJOR problem you ran into was trying to get RDBMS guys to understand OODBMSs, and you clearly failed.

    It is very difficult to "see" an OO database. By nature, the data isn't tabular. It's a persistent object heap. There's no "SELECT * FROM USERS". So tracking down data-related problems involves exporting data to an XML file and sifting through it.

    Well, that would be the hard way to do it. I suppose the easy way would be to take two minutes and write a small program to scour through the DB looking for the problems, but my experience with Objectstore and other OODBMSs would lead me to ask a different question -- How did the "data-related problems" get created? Write your classes with strong invariants and tightly encapsulate your data and you won't really have many such issues.

    Reporting tools don't exist for OODB.

    Actually this isn't really true, but the point is still worth addressing because the available reporting tools aren't very good. This isn't the fault of the tools, it's just a fact that it's impossible to write a general-purpose tool that can intelligently traverse arbitrarily-structured data.

    Again, the solution is: write a small program to extract the data you want to report on.

    If you need to do lots of ad-hoc queries against the database, such that writing a program each time isn't reasonable, then your usage pattern suggests an RDBMS is more appropriate.

    DB Performance when querying outside the normal object hierarchy (aggregate queries grouping on object attributes, etc.) is orders of magnitude SLOWER on an OODB!

    Unless you create indexes for those queries, of course. Ad-hoc querying is a real weakness of OODBMSs. OTOH, queries that are planned for and for which good indexes exist are orders of magnitude FASTER on an OODB! Like, three orders of magnitude faster than an RDBMS.

    32-bit memory limited our max customer size dramatically

    That is a problem if you design your database badly, but Objectstore allows you to segment your DB so that the size of your address space isn't an issue. The segmentation is completely transparent to the programmer using the objects.

    Migrating to OODBMS offers precious little to support better software design while introducing significant maintenance and design issues that should be considered prior to using this technology.

    OODBMSs have advantages and disadvantages. The advantages are:

    • Ease of initial development. No more figuring out how to map between objects and tables.
    • Code can be more object-oriented. With an RDBMS, "tableitis" tends to infect your classes.
    • Performance! Particularly with Objectstore/C++, the facts that (a) database representation is almost identical to in-memory representation and (b) client-side caching means that once an object has been retrieved from the persistent store there is *zero* overhead -- using a persistent object costs *exactly* the same as using a purely in-memory object -- mean that a well-structured Objectstore database is hugely faster than any RDBMS.

    The disadvantages vs. RDBMSs are:

    • Ongoing development requires schema migrations and those can be difficult. Mind you they're not easy for an RDBMS situation, either, since you have to reswizzle all your object-relational mapping stuff.
    • Ad-hoc queries are hard.
    • Getting good performance requires more design effort, particularly with page-oriented OODBMSs like Objectstore (which really act more like a specialized virtual memory system than a database).
    • Very few people understand them.

    Overall, OODBMSs shine when your primary need is for an "applicaiton working store", more than a "database" and when you need maximum performance and minimum time to market (assuming you have staff that knows the tool). If you need ad-hoc queries you can still use an OODBMS, but you will want to export the data to a relational DB for query purposes.

    Actually, that's a very nice solution to many problems, IMO. Use an OODBMS as your high-performance working store, and periodically export the the data to a relational "data warehouse" for ad-hoc queries and data mining. This means that you still have to implement and maintain an object-relational mapping, but it's much easier to manage a one-way mapping than a bi-directional mapping.

    The system described in the article is fine for some environments, I'm sure, but a high-quality OODBMS would be just as fast, more robust and would allow you to use databases that won't fit in RAM.

  • Re:BS (Score:3, Insightful)

    by p3d0 ( 42270 ) on Monday March 03, 2003 @01:35PM (#5425036)
    I agree that OO is not so good for databases, but it works well in OSes, device drivers, and realtime systems. You just need to know how to get good abstractions without sacrificing any performance, and that's not an easy skill to master.

    OO is not all about classes and jump tables. For example, you can get polymorphism in C++ without using any virtual methods at all. If you disagree, then I think your view of what constitutes OO is quite limited, and I'm not surprised you think it's a "niche player".

  • Re:RAM ? (Score:4, Insightful)

    by Tassach ( 137772 ) on Monday March 03, 2003 @01:35PM (#5425044)
    It's a good idea because relational databases, XML files, and flat files are not really all that similar to objects. [...]
    Sorry, you're wrong. With a little forethought it's pretty simple to map an object model to an RDBMS schema. The trick is to design the database schema first, then build your objects on top of it. Doing it in reverse is a bitch. The reason many OO programmers have problems using databases is that they treat the database as an afterthought instead of as the foundation of the application. Here's the basic method I use:
    • Tables map to objects; fields map to the properties of that object on a 1:1 basis
    • Object methods map to stored procedures where appropriate

    A row in a table is an instance of an object

    Foreign-keyed child tables map to collections within the parent object.

    using Oracle for storing objects means that you've already got a badly-designed system
    You illustrate my point perfectly about putting the cart before the horse. You don't build a database to store your objects -- you build objects to manipulate your database. A badly-designed system is one where the database was tacked on after the object model was complete. Your database schema should be the first thing you write, before you even start thinking about the classes.

    Unfortunately, Comp Sci cirriculums are heavy on OOP concepts but pathetically light on database theory, which is why you wind up with otherwise talented programmers who don't understand the basic fundamentals of designing solid client-server applications.

  • by Tablizer ( 95088 ) on Monday March 03, 2003 @02:12PM (#5425316) Journal
    There is a fundamental philosophical disconnect between OO modeling and relational modeling, and this creates continuous problems or time-consuming translating layers. IMO, either one or the other must go, at least as far as domain modeling itself.

    The OO approach's basic building block is more or less a dictionary array: key, value pairs where the key is the method or attribute name, and the value is either a scalar, function/method algorithm or algorithm pointer, or reference to another dictionary array. This is similar to the "network databases" (NDB's) of the 1960's, and object databases tend to share many characteristics with them (both the good and bad).

    Relational, on the other hand, is based on the concept of "tables". Since tables are a larger-scoped structure than dictionaries, you can do more powerful, larger-scale reasoning with them than a web of dictionaries (OO) in my opinion. At least at this point in history. There has yet to be a Dr. Codd of the NDB world, but I cannot rule out the possibility that some kind of "dictionary algebra" will someday be created to rival the power of relational algebra.

    However, tables perhaps do not handle non-consistent "records" as well as the NDB's. These are records in which the fields may be different per record. Tablizing them tends to lead to lots of sparse columns (which may or may not be a bad thing) or to skinny "attribute tables". I personally think the higher-power logic overrides the drawbacks of dealing with non-consistent records, but after long debates I have to concede that the preference may be subjective. Tables just work better for my mind. I don't have an OO mind, and don't like NDB's. I really dig the power of relational algebra and the simplicity that tables seem to provide (to me). NDB's lack some larger-scale structure beyond the record (dictionary) to grasp onto. Trees have been tried for this, but IMO trees don't fit the change-patterns of the real world very well.
  • Re:RAM ? (Score:3, Insightful)

    by dubl-u ( 51156 ) <2523987012&pota,to> on Monday March 03, 2003 @02:34PM (#5425492)
    You don't use an RDBMS because it's fast. You use it because it's reliable.

    Prevayler can be just as reliable.

    Does this new toy support record locking, transactional isolation and integrity, or any of the other key features that an enterprise RDBMS provides? If the answer is no, then it's not a replacement for an RDBMS.

    Wrong.

    The question isn't the checklist of features, it's whether you can build equivalently reliable systems with Prevayler. The answer: You can.

    You'll recall that Prevayler uses the Command Pattern. Before data is changed, the Command object is serialized and written to disk, then executed. Naturally, this means the commands are run in strict order of arrival, yes?

    That's all you need to get transactional integrity. All writes are isolated. If you need to isolate the reads, you can use the same mechanism.

    The prevalent approach requires developers to do things a little differently, but you don't have to sacrifice reliability.
  • Re:RAM ? (Score:5, Insightful)

    by dubl-u ( 51156 ) <2523987012&pota,to> on Monday March 03, 2003 @02:41PM (#5425539)
    To license Oracle with similar features to Prevalent, you would only be looking at a 5 figure pricetag.

    Don't forget the price tag for all the extra hardware; since a Prevaylent system is thousands of times faster, you can get by with a lot less hardware. And add in all the programmer time spent dealing with SQL. Oh, what about the DBA's salary?

    How well does Prevalent do on 30TB+ datasets?

    One doesn't use Prevayler for systems like that. Prevayler makes sense if your data can fit in RAM. If it doesn't, you should do something else.

    But note that "something else" doesn't have to mean some SQL thingy. Google has a metric shitload of data, and you can bet they don't keep it in an Oracle server.
  • by spideyct ( 250045 ) on Monday March 03, 2003 @02:48PM (#5425588)
    We're trying to make progress in the software development field. It hasn't all been worked out yet, but its good to see people taking stabs at it.

    One benefit of OO development is the abstraction away from the data store. I want to think about Widgets, and Customers, and Orders, not VARCHAR fields, foreign keys, arbitrary identifier INTEGERs, etc.

    So I would argue that the goal IS to build a database to store your objects, instead of building objects to manipulate your database. And I imagine that's what every OO developer would want. But it's hard.

    You're suggestion doesn't solve the problem, it just avoids it.
  • race conditions? (Score:4, Insightful)

    by vsync64 ( 155958 ) <vsync@quadium.net> on Monday March 03, 2003 @03:11PM (#5425725) Homepage
    Is anyone else bothered by the complete lack of the synchronized keyword in his example code? So the ChangeUser Command can apparently be in between these 2 lines:

    usersMap.remove(user.getLogin());
    usersMap.put(user.getLogin(), user);

    Meanwhile someone else can run an AddUser Command with the same username. Guess what happens when ChangeUser gets to that 2nd line?

    Maybe when this radical new concept in databases can be presented in a way that avoids race conditions I'll pay a little more attention...

  • by tkrotchko ( 124118 ) on Monday March 03, 2003 @04:40PM (#5426474) Homepage
    This is a useful solution for a single-purpose web site for store session information typical in most stateful web site.

    Its not a general purpose DBMS solution, nor should you interpret it as such.
  • Re:gigabytes? (Score:3, Insightful)

    by Arandir ( 19206 ) on Monday March 03, 2003 @05:07PM (#5426708) Homepage Journal
    How many times has this been said before? "Use the right tool for the job!" If you have a large collection of objects all of the same class, then use a database. If you have a large collection of objects of differing classes, then use an OO method. For small collections of objects, or if you don't have any real objects at all, neither may be appropriate.

    What irks me to no end are database freaks who have to do everything with a database, OO freaks who have to do everything with OO, and GP freaks who have to do everthing as pure GP. They're like guys who only know how to use a screwdriver, so they end up using the screwdriver to hammer in nails and chisel wood.
  • by Chazmyrr ( 145612 ) on Monday March 03, 2003 @05:59PM (#5427206)
    I think you missed the point. People coding against the widgets shouldn't worry about the database structure. People coding the widgets absolutely have to know about the database structure. Database comes first because the design has to be optimized based on the type of activity. A transaction processing database tends to be structured differently than a reporting database. Basing your design on your widget leads to horribly designed databases. Basing your design on how it will be used leads to well designed databases.

    Besides, storing your object in the database rather than storing your data in the database is probably the most retarded suggestion I've ever heard of. Outside of the specific application that knows how to work with the object, how do you get your data? Not to mention doing any type of reporting or analysis out of your database.

  • by Anonymous Coward on Monday March 03, 2003 @11:16PM (#5429995)
    This guy doesn't seem to understand data management or database fundamentals. e.g.- "Because object-oriented programmers do not want to spend time thinking about rolling back a transaction, I check every possible thing that can go wrong. I can do this because all the data are available and all validation is performed before actually changing anything."
    He doesn't understand that the atomic unit of work (i.e.- transaction) may encompass changes to several users at once (e.g.- Say I want to convert all user names to upper case and want that atomic transaction to either entirely succeed or not.) He probably doesn't realize that most of his code amounts to a bunch of constraint checks that would be simpler to code as declarative constraints and would be better enforced in a database management system. Finally, he almost certainly does not realize that hash tables are a *physical* structure and that relations can be a *logical* model on top of that physical structure. e.g.- We're talking about a logical user relation with an underlying physical structure that stores the user attributes as an object bundle, uses a hashed index for speedy access via the user primary key, etc.
  • Re:RAM ? (Score:3, Insightful)

    by iabervon ( 1971 ) on Monday March 03, 2003 @11:25PM (#5430043) Homepage Journal
    Designing your application around an RDBMS is great if what you want is a relation-oriented application, but it's terrible if you want an object-oriented application. I've actually done a relational application, and it was just the right thing; when you want to support complex queries spanning your database, there's nothing better than an RDBMS.

    On the other hand, there are some cases where you want an object-oriented design, and limiting yourself to what you can fit in an RDBMS schema is a bad idea. There are cases in which you really want a base class extended by multiple subclasses, and now you can't write your foriegn key constraints properly. Alternatively, you could duplicate your common code for each table, but that's even worse.

    Using Oracle for storing objects is basically bad; using it to store relations is good. Good design requires you to determine first what sort of data model you have, and then choose your programming paradigm appropriately; deciding to use a relational database just because you need persistence is foolish. You've done something wrong if your queries are mostly "SELECT * FROM tbl WHERE id=?".

    I do agree that comp sci doesn't teach enough relational design, because it's often an appropriate design. But sometimes OOP is the right tool for the job, and then you need an appropriate storage system. Relational databases are really their own thing, with a different set of efficient and simple operation, and are not really not that much like objects.
  • The bad old days (Score:3, Insightful)

    by InnovATIONS ( 588225 ) on Tuesday March 04, 2003 @12:37AM (#5430417)
    Why really did DBMS come about? It was not because of a need for secure transactions or to store a lot of data, although obviously those are necessary qualities of a DBMS.

    Before dbms applications stored their data in very efficient data stores designed just for that application but were worthless for anything else and hard to upgrade or extend without breaking or rewriting the existing application.

    DBMS were developed so that data could be stored in an application independent store that could be used and extended for new applications without breaking everything that went before.

    DBMS were never designed to be more efficient than the application specific data stores that they replaced, so that somebody saying that they can build a custom data store just for a particular application that is faster is missing the point entirely.

Always draw your curves, then plot your reading.

Working...