Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Object Prevalence: Get Rid of Your Database? 676

A reader writes:" Persistence for object-oriented systems is an incredibly cumbersome task to deal with when building many kinds of applications: mapping objects to tables, XML, flat files or use some other non-OO way to represent data destroys encapsulation completely, and is generally slow, both at development and at runtime. The Object Prevalence concept, developed by the Prevayler team, and implemented in Java, C#, Smalltalk, Python, Perl, PHP, Ruby and Delphi, can be a great a solution to this mess. The concept is pretty simple: keep all the objects in RAM and serialize the commands that change those objects, optionally saving the whole system to disk every now and then (late at night, for example). This architecture results in query speeds that many people won't believe until they see for themselves: some benchmarks point out that it's 9000 times faster than a fully-cached-in-RAM Oracle database, for example. Good thing is: they can see it for themselves. Here's an article about it, in case you want to learn more."
This discussion has been archived. No new comments can be posted.

Object Prevalence: Get Rid of Your Database?

Comments Filter:
  • Re:RAM ? (Score:5, Informative)

    by jmcnally ( 100849 ) on Monday March 03, 2003 @09:58AM (#5423556)
    As someone else also posted, applying transactions that have occurred since the last time the db was saved to disk avoids this problem. A small company in WA years ago, Raima, had this transaction log concept implemented nicely to support their network database, dbVista (later called RDM). Basically a transaction log is started for every sequence of updates. All records and pointers are saved in a transaction file first. If any problems or system abends occured the entire sequence would be flushed, avoiding a half-updated sequence of records (for example an invoice is posted but the customer record is not updated). It worked pretty well. The big problem with the RAM scheme is that for very large databases the capacity of the computer or the times required to save to disk are prohibitive.
  • Two words... (Score:4, Informative)

    by Anonymous Coward on Monday March 03, 2003 @10:00AM (#5423567)
    Enterprise JavaBeans.

    Here's the definition of an EJB from the http://java.sun.com [sun.com] site.
    A component architecture for the development and deployment of object-oriented, distributed, enterprise-level applications. Applications written using the Enterprise JavaBeans architecture are scalable, transactional, and multi-user and secure.

    And more specifically, here's the definition of an Entity EJB:
    An enterprise bean that represents persistent data maintained in a database. An entity bean can manage its own persistence or it can delegate this function to its container. An entity bean is identified by a primary key. If the container in which an entity bean is hosted crashes, the entity bean, its primary key, and any remote references survive the crash.
  • by carstenkuckuk ( 132629 ) on Monday March 03, 2003 @10:00AM (#5423569)
    Have you have looked at object-oriented databases? They give you ACID transactions, and also take care of mapping the data into your main memory so that you as a programmer only have to deal with in-memory objects. The leading OODBs are Objectstore (www.exln.com), Versant (www.versant.com) and Poet (www.poet.com).
  • Re:Neat concept... (Score:5, Informative)

    by truthsearch ( 249536 ) on Monday March 03, 2003 @10:12AM (#5423661) Homepage Journal
    The countless PHP/MySQL sites out there seem to perform just fine.

    Object-oriented programming and data persistance is about a lot more than public web sites. Private, corporate data warehouses with terabytes of persisted objects squeeze every bit of processing power available. For example, I used to work on Mastercard's Oracle data warehouse. An average of 14 million Mastercard transactions occur per day. That's 14 million new records to one table each day, with reporting needing hundreds of other related tables to look up other information. To get something of that scale to run efficiently for a client app (internal to the company) costs millions of dollars. Object persistance on a large scale is tough to get right and is far from perfected, and there's a lot more going on that public web site development. Every new idea helps. Consider the article written on IBM's developerWorks. It's readers are mostly corporate developers.
  • Sourceforge Link (Score:4, Informative)

    by BoomerSooner ( 308737 ) on Monday March 03, 2003 @10:17AM (#5423686) Homepage Journal
  • Re:RAM ? (Score:1, Informative)

    by Anonymous Coward on Monday March 03, 2003 @10:20AM (#5423702)
    because you can't address more than 2GB of RAM without more than 32 bits of address space. That's what I've read anyway, but I can't get the math to work out.

    2^32 = 4.2B ... = 4GB, roughly. A 2GB limit would imply that they're losing a bit. Can someone smarter than me explain?
  • Re:RAM ? (Score:4, Informative)

    by tx_mgm ( 82188 ) <notquiteoriginal@gm a i l .com> on Monday March 03, 2003 @10:26AM (#5423728)
    What does 64-bit computing have anything to do with how much RAM one needs

    its not how much you need that he's talking about. only with 64 bit computing can one have more than the current limit of RAM (which i believe is 2GB right now). it has to do with the maximum possible number of 32 bit addresses can exist in the RAM. so with a 64 bit processor, you can have enough ram to hold that database all at once time.
  • by CaseyB ( 1105 ) on Monday March 03, 2003 @10:28AM (#5423749)
    "Persistence for object-oriented systems is an incredibly cumbersome task to deal with when building many kinds of applications: mapping objects to tables, XML, flat files or use some other non-OO way to represent data destroys encapsulation completely,..."

    Object prevalence does nothing to change that. You still have to deal with serialization of all of your business objects, unless you're planning on reloading and re-executing all transactions since the beginning of time every time you restart the server. You can do it less frequently at runtime, but that doesn't save you any development time.

  • The first realy bad drawback is: in some minutes Tabelizer will pop up here and tell us all that "relational rules" are much more intuitive and easyer and faster and better in all regards then this OO thingy.

    The second drawback is: the prevalence system is build up on the base of 2 oo design patterns(Singleton and a variant of Command), so in some more minutes all the OO haters (arg!! Java sucks) etc. people will start bashing here as well :-/

    Ok, lets get serious. You might ask why now somehing like this came up? Well, traditional data bases are disk based. All research went into effords how to organize, store/cash and querry data from disks. As data piles exploded even faster than ram and disk prices droped there is no fear that prevalent systems will replace relational databases soon.

    OTOH prevalent systems tend to lead to very clean architectures and scale very well as long as your data is relatively smal. Well, today one gig of RAM is affordable. Relational Data Base Management Systems are from the ages where not even one gig of disk space was affordable.

    I'm realy glad that the authors released their work. Less then 20kb Java code for a compleete data base like prevalent system. Its cool!!

    angel'o'sphere

    P.S. I use the PircBot framework for an Java IRC bot and the org.prevayler prevalence system for writing an IRC based interface for a MMORPG.
  • Re:RAM ? (Score:5, Informative)

    by archeopterix ( 594938 ) on Monday March 03, 2003 @10:43AM (#5423845) Journal
    Blazing fast, and easy as hell to fuck up beyond replair- you could do both a read and a write to the same memory area at the same time, or something like that.
    Well, the system is there to stop you from doing that - just like in the traditional DBMSes. Synchronizing memory access isn't harder than synchronizing disk access, it might even be easier if you decide to serialize all access.
    This sounds just as bad.

    For example, let's say that we're doing a transaction of a few million dollars. In mid process the power dies and the machine goes dark. Outside of shouting 'redunant this that and the other', what state would the machine be in when it comes back online, were is the money, and could we back out of and rerun the transaction?
    It can be implemented exactly like it is in traditional DB systems, after all they handle similar problems pretty well. After the failure, you have:
    1) The last full image dump
    2) all successful transactions (the DB meaning) serialized in the log, from the last dump to the power failure.
    Since your transaction (both DB & business meaning) hasn't been successful, it has not yet been written into the log, so the money stays in the ordering party account. Of course the power failure could have occured just after a transaction has been written to log and before the client software got the message that it was successful, but traditional DBs have this problem too. To sum it all up: the synchronization problems are there, but they are no worse than in traditional DBMSes.
  • by Tikiman ( 468059 ) on Monday March 03, 2003 @10:44AM (#5423859)
    In fact, this concept actually predates SQL-based databases! The first one I am aware of is MUMPS (Massachusetts General Hospital Utility Multi-Programming System) which goes back to 1966. One company that continues this legacy is Sanchez [sanchez-gtm.com]. Another commercial version is Caché [e-dbms.com]. This makes sense, really - the most obvious solution to serializing an object is to store all properties of a single object together (the OO solution), rather than store a single property of all objects togther (the RDBMS solution)
  • Re:RAM ? (Score:3, Informative)

    by moonbender ( 547943 ) <moonbenderNO@SPAMgmail.com> on Monday March 03, 2003 @10:58AM (#5423952)
    As far as I know - and for what it's worth (not much), I checked with Google - x86-compatible CPUs can address 4 GB. There are extensions in some Intel CPUs that allow programs to address a 36-bit address space, even.
    As for the 2 GB limit, there seems to be a feature in the Windows memory architecture - the upper 2 GB of a process' virtual address space is reserved for shared memory. Or something - I kind of stopped thinking at that point. ;) If you're interested, more info is available [windowsitlibrary.com].
  • by Steeltoe ( 98226 ) on Monday March 03, 2003 @11:10AM (#5424015) Homepage
    OODBMSes have been thoroughly and handily debunked. For the best opinions on relational database technology, visit these hardcore guys: http://www.dbdebunk.com/ [dbdebunk.com]

    The problems with OODBMSes can be summarized so (OTOMHRN - on top of my head right now):
    1) Proper relational technology can model OO-hierarchies, but the other way around is unnatural and cumbersome, if not impossible. Proper relational technology is a step up on the ladder in generalization from OO-technology. It's simply a generation or two ahead, while OODBMS is several steps backwards.

    2) Proper relational technology is proven concepts from mathematics and logics, while OODBMSes are just a hack to store application data "quick'n dirty". Everything can be modelled as general relations, while OO-technology lacks the fundamentals to model *ANYTHING* and is limited and impeded by having an obligatory and *meaningless* top-to-bottom hierarchy. (You cannot have *meaning* without relations of differing types to other entities.)

    3) Proper relational technology allows you to extract, convert and manipulate data in standardized methods (using query languages like SQL), in ways not thought of at the time of design. OODBMSes can only be used properly in the context of the OO-application layer, often relying on runtime data. If you need flexible solutions, you will have to spend extra time programming a specialized solution, instead of having the benefit of a fully relational query language (which unlike SQL, can express almost any problem to be solved).

    4) The future is relational. Current RDBMSes do not implement true relational technology, which if they did, nothing else would be needed. The matemathics in the theories behind it would be at the programmers disposal during programming, reducing time and potential errors. Yes, it requires understanding the theory, but wouldn't you like a true DBA to do that anyways?

    Don't buy into the hype, look into true relational technology and educating yourself. As for storing everything in RAM, and "saving it for the night", I wouldn't risk to have my bank-account in such a DB. Such solutions are only usable for storing non-volatile data. For non-commercial game-servers, it maybe perfect.
  • by JohnDenver ( 246743 ) on Monday March 03, 2003 @11:23AM (#5424112) Homepage
    Personally, I still think it sounds a lot easier to just map objects to a database.

    4) Concurrency - If you haven't implemented locks for an object model, then you haven't lived. Seriously, I can see a lot of people screwing this up with deadlocks galore. Locking up concurrent systems can be a nightmare.

    5) Ad Hoc Support - Goodbye Crystal Reports, Goodbye English Query, Goodbye ANY Ad Hoc query support, because if you need anything different, you're going to have to write a lot more code to enumerate throughout your objects. Have fun.

    6) Indexing - I hope you have a good B-Tree library and are familiar with Indexing/Searching algorithms when implementing HARDCODED indexing. Oh yeah, have fun rewriting all of your query procedures when you decide to change your hardcoded indexing.

    Nothing says flexible like HARDCODING! Yay!

    In all seriousness, this is a bad idea for 99% of projects out there. It's inflexible, unscalable, severely error prone, and timely to implement.

    (sarcasm) All this just to avoid the "cumbersome" process of mapping objects to tables?

    Seriously people, it's not that hard (3 magnitudes easier than this) and there are a lot of tools that help doing it.

    If you're REALLY hung up on not using a relational database, try an Object Database, XML Database, or an Associative Model Database.

  • by Anonymous Coward on Monday March 03, 2003 @11:34AM (#5424201)
    Main memory databases and transaction logging are not new concepts. The problems are:

    1) Doesn't deal with schema changes. If you change an object then your database is broken the next time you try to restore.

    2) Restoring from logs is prohibitively slow in a production environment.

    3) No transaction support, although it would be easy to group log entries into transactions.

    4) Is a Java Map object the most efficient access path for a large database? Tuning insertion/deletion/lookup costs for associative access can be complicated.

    5) No security.

    Modern relational and OO databases provider other features, but those are the biggies.

    It's easy to ignore decades of CS research. Try looking up papers on CiteSeer: http://citeseer.nj.nec.com/cs.

  • by Lucas Membrane ( 524640 ) on Monday March 03, 2003 @11:38AM (#5424244)
    This OO scheme is a database system, but it leaves out much of the management element. (1) Things like changing the database structure without bringing the whole company down probably won't work. (2) You lose all the enforcement of the rules of relational integrity that an RDBMS gives you right out of the box. (3) And you lose Crystal Reports. (1) and (2) kill it technically in many situations, and (3) kills it management-wise.

    Gadfly, a Python package, gives you an in-memory DB and SQL. If you want to trade SQL for extra speed and do more programming, you can run the ISAM-like engines of Btrieve or Berkeley DB without the SQL layer on top. We have SQL RDBMS's because the conventional wisdom is that such a trade is not a good idea.

  • Re:Two words... (Score:5, Informative)

    by neurojab ( 15737 ) on Monday March 03, 2003 @11:53AM (#5424340)
    Entity beans are all about transactions. You've got a transaction context that can propogate over several beans. The EJB container doesn't do this on it's own, however. It uses the ACID properties of the database along with the database's commitment control mechanisms to accomplish the properties you mentioned. Entity beans are usually mapped to tables, and could represent a join in the BMP case. That said I'm not sure if you're saying EJB will benefit from using this as a backend, or that EJB did this first? The latter is false, but the former... I'm not sure this technology will benefit entity beans, but may benefit STATEFUL SESSION beans because they're less RDBMS-centric.

  • by kpharmer ( 452893 ) on Monday March 03, 2003 @12:11PM (#5424454)
    Right, the OODBMS is acceptable - as long as you reject the need to query across objects.

    Of course, there's nothing *relational* about the need to do this, this doesn't have anything to do with the mis-application of methodologies.

    These are simple use-cases. And to reject them is to limit the functionality that the solution will offer. That's fine.

    But, almost everyone needs the ability to identify all objects with attribute X. It's called a report, and it provides you with the information needed to manage the process. Without this ability you're driving in the dark without headlights.
  • What about ZODB? (Score:2, Informative)

    by Ragica ( 552891 ) on Monday March 03, 2003 @01:00PM (#5424816) Homepage
    Though I don't know a lot of the implementation details, this sounds fairly similar to the Zope Object Database (ZODB [zope.org]) which the Zope application is built on top of. There is also a standalone distribution for it, to work with any Python program. It is an object database, serialised to disk. While normally everything is serialised to disk fairly quickly I believe there are memory cache (and thread) settings which can be used to optimise the speed.

  • Gemstone (Score:1, Informative)

    by Anonymous Coward on Monday March 03, 2003 @01:58PM (#5425211)
    There is a commercial product based on this concept that works very well, called Gemstone Facets (for java or smalltalk). It has several advantages:
    -Concurrent transaction management
    -The command object is not necessary, all changes to objects logged at transaction commit
    -Optimized object storage (java serialization is notoriously slow)
    -Transparent caching (not all objects may fit in memory)
    -Clustering/shared remote cache

    I'm curious if anyone else is using this.
  • by zipwow ( 1695 ) <zipwowNO@SPAMgmail.com> on Monday March 03, 2003 @02:06PM (#5425275) Homepage Journal
    I think you misread the article.

    Every time you issue a 'change command', it first makes the change in memory, then records just that command to disk, very much like a journaling file system as I understand it.

    Then, presumably, you also change your object in memory to match. If the whole system comes down, then when you start again, it loads its 'starting point', probably from yesterday, and then executes those recorded commands.

    Furthermore, future 'reads' on that data aren't blocked by the disk i/o. They wait for the object in memory to change (quick) and pretty much ignore the disk write.

    Where I think you're getting confused is that periodically, it goes through the system and makes a new 'starting point', presumably during a period of low utilization (like at night).

    I don't know what your comment about never joining tables would mean, this wouldn't have 'tables', but would have objects as you've designed them. Presumably you've designed your objects so that they're accessible in some natural and convenient way. If you haven't, you ought to fix that...

    -Zipwow
  • Re:Don't go there (Score:3, Informative)

    by atomray ( 202327 ) on Monday March 03, 2003 @02:10PM (#5425296) Homepage
    EJB certainly isn't the perfect technology (what is?), and the specification is lacking in some points (generation of primary keys and an incomplete query language), but it is certainly useful. Take a look at O'Reilley's "Building Enterprise Java Applications (vol 1)", it has a nice overview of enterprise java technologies and how they fit together. It discusses how to use session and entity beans effectively - in essense, you shouldn't be sending entity beans to your client (which it sounds like the guy in the original post was doing - this will kill performance due to the number of RMI calls that will be generated), your clients should typically interact with session beans that perform the business logic using the entity beans.

    With respect to cost, there's JBoss [jboss.org], that's free, and there are many other venders at a variety of prices and performance. I use JBoss - I did, for a time, see some nasty performance problems, but after reading some documentation quickly realized that it was my mistake.
  • Re:RAM ? (Score:2, Informative)

    by scubabear ( 598890 ) on Monday March 03, 2003 @02:12PM (#5425314)
    As others have pointed out, this solution is so ingenious that every major RDBMS vendor implements it. It's called a transaction log, where deltas in the system state (and the commands involved in those changes) are written out to the tran log. The transaction log is designed generally as write-forward-only, meaning that to restore the state of the system, the RDBMS needs to read the log from the front and apply changes as indicated by the log. There are also regular intervals where the DB dirty pages are synced all the way to disk, usually called a checkpoint or something similar. The overall point being - your RDBMS already does this for you, and if you try to implement it yourself you'll generally end up in one of two scenarios:
    • It's really fast, but you screwed up the corner cases - meaning there are holes in your scheme where you can unexpectedly lose data
    • It meticulously covers all holes, but is really, really slow.
    To get it both correct and fast is very, very difficult. And, surprisingly, most people want both!
  • Java Persistence (Score:1, Informative)

    by Anonymous Coward on Monday March 03, 2003 @03:05PM (#5425689)
    Or you could change the compiler support persistence:

    http://www-ccsl.cs.umass.edu/pj2-abs.html

    "Saving to disk every now and then"? What is that, like 50% uptime? Just way 'till the cosmic rays from the Perseids hit that Solaris server...
  • by jafac ( 1449 ) on Monday March 03, 2003 @05:01PM (#5426648) Homepage
    At A Previous Employer Who Shall Remain Nameless:
    (product is still on the market)

    We had a product which did (we'll call it "X") and tracked all it's information in a "database" we built in-house. The primary architect, of course, was a pretty sharp guy. He had written a whitepaper for the company stating why he thought "unix was dead" and why we should not waste our time, as a company, developing "portable" products, and that we should take full advantage of Microsoft's technologies on Windows.

    As far as ACID test goes, NONE of those elements existed in this "database" we used. Nor were there any verification, export, import, or repair tools initially available.

    As soon as this product scaled to a reasonable level, (the field was always one step ahead of our test lab, as far as scaling the application goes), we started seeing weird crashes and corruption that we just could not reproduce or isolate in the lab. When the term "database corruption" was used, the architect would throw a fit, and blame some other component, denying that database corruption was even possible.

    The absence of tools meant that we could not troubleshoot in the field. Developing tools was the equvalent of admitting that there was a problem. As we scaled our lab, in response, we started to uncover these problems. This was when our architect resigned. His job had suddenly changed from "Technical Primadonna" to "beleaguered fixer of uncounted bugs".

    That's when we REALLY started to get into trouble.

    At some point, there was serious talk about ripping the whole database out and going to a "real" commercial database solution. Some third party thing. That was shortly before I left that job. But in the end - there was much suffering and pain, and the product lost a great deal of ground to it's competitors all due to a lack of Respect For Those Who Have Gone Before.
  • by AbbeyRoad ( 198852 ) <p@2038bug.com> on Monday March 03, 2003 @05:04PM (#5426684)

    Here is the link that explains the language:

    http://www.kuro5hin.org/story/2002/11/14/22741/7 91

    following the links to find KDB a database system
    that sits in RAM an kills Oracle dead. It runs
    some large missions critical databases (like the
    swiss bank), so you can't argue that it doesn't
    work.

    -paul

  • by jschultz410 ( 583092 ) on Monday March 03, 2003 @06:47PM (#5427766)
    I can't believe I haven't seen Berkeley DB [sleepycat.com] mentioned a LOT more here as a great alternative to both RDBMS and OODBMS.

    BDB is a key-value DB -- it doesn't do SQL. Each DB is a simple "table" of keys to blocks of data. Tell BDB how to compare keys and augment BDB to convert the blocks of data into whatever you like (e.g. - un/serialize your objects). If you won't be doing too much convoluted SQL-like logic and you need an ACID datastore, then BDB is the way to go.

    It has most everything you could want:
    -- fully ACID (trnxns/rollback/snapshots)
    -- single or multi-threaded/process access
    -- fine grain locking
    -- multi-node failover/balancing
    -- multiple indexes per DB
    -- SUPER flexible (trade features for speed)
    -- AWESOME performance
    -- little to no DB administration
    -- under development for more than 10 years (debugged)
    -- can handle DB sizes up to 256TB
    -- can handle record sizes up to 4GB
    -- supports most popular languages
    -- is maintained by a world class company
    -- you can buy support from the maintainers
    -- open source + FREE for a lot of uses

    I don't think anything out there with a similiar feature set can consistenly outpeform BDB.
  • Congratulations! (Score:3, Informative)

    by I Am The Owl ( 531076 ) on Monday March 03, 2003 @07:18PM (#5428077) Homepage Journal
    You've invented an Object-Oriented database! Wowee zowie! Wait, what's that? You say this is nothing new? Well, you're [sleepycat.com] right [odbmsfacts.com]. Of course it's faster than an Oracle database stored in RAM. Oracle is not designed for the purpose of storing objects. It's a relational database, which is something else entirely.
  • Only one index? (Score:2, Informative)

    by wembley ( 81899 ) on Monday March 03, 2003 @07:25PM (#5428135) Homepage

    The article referenced says:

    "AddUser Command is an equivalent to an SQL INSERT -- using the HashMap.put method is the same as inserting data into an indexed relational table."

    This makes the assumption that the data table is indexed only on the primary key. However, Oracle and other RDBMS systems allow indexing of any field in the table, not just the primary key. This system has no way of retrieving information based on other criteria of a record.

    I suppose the alternative would be to create a class for each criteria that you are looking up by, but that means a lot of objects containing the same data, merely to let it be accessed in different ways. I have a table of 2.8 million records that I need to access based on at least 3 criteria. This just won't cut it.

  • by greenrd ( 47933 ) on Tuesday March 04, 2003 @06:33AM (#5431566) Homepage
    A join or ten wasn't nearly as expensive as getting data out of what was essentially just a dump.

    What do you mean by a "dump"? Sounds like you were using the OO db inappropriately, e.g. by querying for "SELECT * FROM extent1" and then linearly searching it, or something.

    3. People would ask us questions about the data we were storing that would have been absolutely trivial to find in a RDBMS (like "how many of these events occured last month when this device was in this state) that we'd have to write long slow-performing pieces of code to retrieve.

    Doesn't ObjectStore have a query language similar to SQL? The Object Data Standard defines OQL, but I know that the Object Data Standard is not exactly "industry standard" yet. As for speed, it's still possible to create indexes, optimise data structures and algorithms, etc.

    Other people wanted to write applications that used our data. That wasn't too easy, because they wanted slightly different objects. We would have had to agree on a object for everything we shared, or store things twice. With an RDBMS we used could use views, or generate the objects differently from the same tables.

    This is an odd complaint. In any decent OODBMS, creating views (even manually) should be fairly simple. Yes, you might have to write some repititive code - there's room for improvement there. This is where aspect-oriented programming (specifically, composition filters) comes in, I think.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...