Catch up on stories from the past week (and beyond) at the Slashdot story archive

Is the One-Size-Fits-All Database Dead? 208

Posted by kdawson on Tuesday January 09, 2007 @10:50PM from the specialized-and-optimized dept.

jlbrown writes "In a new benchmarking paper, MIT professor Mike Stonebraker and colleagues demonstrate that specialized databases can have dramatic performance advantages over traditional databases (PDF) in four areas: text processing, data warehousing, stream processing, and scientific and intelligence applications. The advantage can be a factor of 10 or higher. The paper includes some interesting 'apples to apples' performance comparisons between commercial implementations of specialized architectures and relational databases in two areas: data warehousing and stream processing." From the paper: "A single code line will succeed whenever the intended customer base is reasonably uniform in their feature and query requirements. One can easily argue this uniformity for business data processing. However, in the last quarter century, a collection of new markets with new requirements has arisen. In addition, the relentless advance of technology has a tendency to change the optimization tactics from time to time."

This discussion has been archived. No new comments can be posted.

Is the One-Size-Fits-All Database Dead?

Load All Comments

Search 208 Comments Log In/Create an Account

Comments Filter:

"In the last quarter century..." (Score:2, Funny)

by AndroidCat ( 229562 ) writes: on Tuesday January 09, 2007 @10:56PM (#17534110) Homepage

Well it's about time we had some change around here!

Share
twitter facebook
- Stonebreaker has a vested interested in Stream Dbs (Score:2, Informative)
  
  by Anonymous Coward writes: on Wednesday January 10, 2007 @12:16AM (#17534732)
  
  He's the CTO of Streambase, so he's not just a "neutral" academic.
  
  http://www.streambase.com/about/management.php [streambase.com]
  
  Parent Share
  twitter facebook
  - Re:Stonebreaker has a vested interested in Stream (Score:2)
    
    by Omni-Cognate ( 620505 ) writes: on Wednesday January 10, 2007 @10:28AM (#17538892)
    
    Like many academics, he has founded a company. It hardly invalidates his research. They aren't trying to hide it either - one of the other authors is contributing in his capacity as a StreamBase employee, as shown right at the top of the paper.
    
    Also, since nobody else has said anything about his neutrality or otherwise, you shouldn't put the word "neutral" in quotes like that. It makes you look like you are trying to set up a straw man. Neutrality is not even particularly important in a researcher. You'd have to go a long way to find one who didn't want his own particular theory to prevail, whatever the reason. Peer review of the content is the criterion on which papers are judged.
    
    Parent Share
    twitter facebook
Was there ever a one-size-fits-all database? (Score:2)

by Ant P. ( 974313 ) writes: on Tuesday January 09, 2007 @10:57PM (#17534118)

The closest thing I can think of that fits that description is Postgres.

Share
twitter facebook
- Re:Was there ever a one-size-fits-all database? (Score:2, Funny)
  
  by Architect_sasyr ( 938685 ) writes: on Wednesday January 10, 2007 @12:17AM (#17534740)
  
  There's a difference between fitting and being forced to fit into something ;)
  
  Parent Share
  twitter facebook
- Re:Was there ever a one-size-fits-all anything? (Score:2)
  
  by EmbeddedJanitor ( 597831 ) writes: on Wednesday January 10, 2007 @12:22AM (#17534790)
  
  Languages, OSs, file systems, databases, microprocessors, cars, VCRs, diskdrives, pizzas, .... none of these are one-size-fits-all.
  There never has been, and probably never will be. A small embedded database will never be replaced by a fat-asses SQL database any more than Linux will ever find aplace in the really bottom-end microcontroller systems.
  
  Parent Share
  twitter facebook
  - Re:Was there ever a one-size-fits-all anything? (Score:4, Funny)
    
    by Fred_A ( 10934 ) writes: <fred AT fredshome DOT org> on Wednesday January 10, 2007 @04:24AM (#17536310) Homepage
    
    Languages, OSs, file systems, databases, microprocessors, cars, VCRs, diskdrives, pizzas, .... none of these are one-size-fits-all.
    
    There never has been, and probably never will be.
    
    Aren't most condoms sold in one size fits all ?
    
    Maybe they could make rubber databases ?
    
    (or it's a bit of a stretch)
    
    Parent Share
    twitter facebook
    - - Re:Was there ever a one-size-fits-all anything? (Score:3, Funny)
        
        by rufty_tufty ( 888596 ) writes: on Wednesday January 10, 2007 @08:50AM (#17537784) Homepage
        
        Which reminds me of the Robin Williams joke
        
        "They came in 3 sizes, extra large, large and white man"
        
        Parent Share
        twitter facebook
- Re:Was there ever a one-size-fits-all database? (Score:3, Informative)
  
  by egghat ( 73643 ) writes: on Wednesday January 10, 2007 @08:59AM (#17537856) Homepage
  
  Btw. Postgres was a project from Stonebreaker meant to deal with the limitiations of SQL (POST inGRES).
  
  See the history of PostgreSQL [postgresql.org].
  
  When the community picked the old, dormant Postgres source code up (no problem due to the BSD licensing), the first that was added (after some debates) was the SQL syntax, hence the name change to PostgreSQL.
  
  Bye egghat.
  
  Parent Share
  twitter facebook
Noticed how roll your own is faster? (Score:2, Interesting)

by BillGatesLoveChild ( 1046184 ) writes: on Tuesday January 09, 2007 @10:58PM (#17534126) Journal

Have you noticed when you code your own routines for manipulating data (in effect, your own application specific database) you can produce stuff that is very, very fast? In the good old days of the Internet Bubble 1.0 I took an application specific database like this (originally for a record store) and generalized it into a generic database capable of handling all sorts of data. But every change I made to make the code more general also made it less efficient. The end result wasn't bad by any means: we solid it as an eCommerce database to a number of solutions, but as far as the original record store database went, the original version was by far the best. Yes. I *know* generic databases with fantastic optimization engines designed by database experts should be faster, but noticed how much time you have to spend with the likes of Oracle or MySQL trying to get it to do what to you is an exceedingly obvious way of doing something?

Share
twitter facebook
- Re:Noticed how roll your own is faster? (Score:5, Interesting)
  
  by smilindog2000 ( 907665 ) writes: <bill@billrocks.org> on Tuesday January 09, 2007 @11:16PM (#17534244) Homepage
  
  I write all my databases with the fairly generic DataDraw database generator. The resulting C code is faster that if you wrote it manually using pointers to C structures (really). http:datadraw.sourceforge.net [sourceforge.net]. Its generic, and faster than anything EVER.
  
  Parent Share
  twitter facebook
  - Re:Noticed how roll your own is faster? (Score:5, Informative)
    
    by Anonymous Coward writes: on Tuesday January 09, 2007 @11:38PM (#17534430)
    
    Looks interesting, will check it out. Working URL for the lazy: http://datadraw.sourceforge.net/ [sourceforge.net]
    
    Parent Share
    twitter facebook
  - Re:Noticed how roll your own is faster? (Score:2)
    
    by The Real Nem ( 793299 ) writes: on Wednesday January 10, 2007 @01:23AM (#17535246) Homepage
    
    It's hard to take any project seriously (professional or not) when it's web page has such glaring mistakes as random letter b's in its source (clearly visible in the all the browsers I've tried), more white space than anyone can reasonably shake a stick at and poor graphics (I'm looking at the rounded corners of the main content).
    
    As interesting as it sounds, it makes me wonder what could be wrong with the code...
    
    Parent Share
    twitter facebook
    - Taken seriously (Score:3, Funny)
      
      by matria ( 157464 ) writes: on Wednesday January 10, 2007 @02:37AM (#17535698)
      
      Almost as bad as trying to take seriously someone who dosn't know his it's from his its, right?
      
      Parent Share
      twitter facebook
  - Re:Noticed how roll your own is faster? (Score:2)
    
    by RAMMS+EIN ( 578166 ) writes: on Wednesday January 10, 2007 @07:15AM (#17537212) Homepage Journal
    
    Is there any documentation for it (didn't see a link on the webpage)? How do I use it in my program?
    
    Are there any benchmark results that prove the claims about it being faster? How much faster (than what?) is it, really?
    
    Parent Share
    twitter facebook
    - Re:Noticed how roll your own is faster? (Score:2)
      
      by RAMMS+EIN ( 578166 ) writes: on Wednesday January 10, 2007 @07:26AM (#17537272) Homepage Journal
      
      ``Is there any documentation for it (didn't see a link on the webpage)? How do I use it in my program?''
      
      Never mind, I found the link. Must have skipped past it the first time. Perhaps it would be a good idea to add it to one of the edges of the page?
      
      Parent Share
      twitter facebook
    - Re:Noticed how roll your own is faster? (Score:2)
      
      by smilindog2000 ( 907665 ) writes: <bill@billrocks.org> on Wednesday January 10, 2007 @07:33AM (#17537328) Homepage
      
      There is a manual in OpenOffice format [sourceforge.net] (yeah, I really AM a PITA). It was benchmarked heavily in the late 90's, internally at an EDA company before deciding to use the integer-based object references. All programs (including a placer and a router) sped up, and the range was from 10% to 20%, depending on the tool. The average was about 15%. Improvements were more pronounced in tools with larger amounts of data, which we felt was due to cache effects. It would be nice to redo the benchmarks, with open-source programs, but it takes a TON of work. You basically have to take a program that is already well optimized by hand, and convert it to use data in a DataDraw database rather than custom C structures. However, the trend has been that cache effects have become even more important, so I expect the benchmarks to be even better next time.
      
      Parent Share
      twitter facebook
  - - Re:Noticed how roll your own is faster? (Score:2)
      
      by smilindog2000 ( 907665 ) writes: <bill@billrocks.org> on Wednesday January 10, 2007 @06:07AM (#17536854) Homepage
      
      There was a C++ wrapper written for the old (and more stable) version of DataDraw. However, we really wanted it to generated C++ classes, and that turned into a mess. It required DataDraw to parse your C++ header files and insert code into your manually written classes. Microsoft pulled it off with their Class Wizard, but we just didn't have the bandwidth to finish it.
      
      Parent Share
      twitter facebook
- Re:Noticed how roll your own is faster? (Score:2)
  
  by sonofagunn ( 659927 ) writes: on Wednesday January 10, 2007 @10:31AM (#17538942)
  
  In our company, we use the database mostly as a warehouse. Our daily processing is done via flat files and Java code. It's just much, much, much faster that way and easier to maintain. I think we're kind of a special case though.
  
  Parent Share
  twitter facebook
  - Re:Noticed how roll your own is faster? (Score:2)
    
    by suggsjc ( 726146 ) writes: on Wednesday January 10, 2007 @11:51AM (#17540184) Homepage
    
    I think we're kind of a special case though.
    Yep, you are special...just like everyone else.
    
    On a side note. I know the term flat files can mean different things to different people, but I find that they are almost always a bad idea (to some degree and depending on your definition). You always run the risk of whatever you are using as delimiters coming up in the data you are parsing giving those "bugs." You always think "we sanatize our data..." and it will never happen to me, but more times than not, it will.
    
    Parent Share
    twitter facebook
- Re:Noticed how roll your own is faster? (Score:3, Interesting)
  
  by fingusernames ( 695699 ) writes: on Wednesday January 10, 2007 @01:21PM (#17541934) Homepage
  
  Back in the late 90s, I worked on a data warehouse project. We tried Oracle, and had an Oracle tuning expert work with us. However, we couldn't get the performance we needed. We wound up developing a custom "database" system, where data was extracted from the source databases (billing, CDRs, etc.) and de-normalized into several large tables in parallel. The de-normalization performed global transformations and corrections. Those tables were then loaded into shared memory (64bit HP multi-CPU system with a huge amount of RAM for those days, 32GB IIRC), indices were built, and a highly optimized algorithm (over time it kept getting tighter and smaller) was used to join the data based on various criteria using standard, left, right and some hybrid methods. The join algorithm operated on pointers to tables of pointers. Initially, developers used a PERL script to pre-process simple pseudo-SQL into C code/macros, that would be linked to their report application. As the project grew, I developed a SQL-derived language that was run through a cross-compiler to generate the C code and macros to link to applications. That language supported joins, views, temporary tables, and some other useful features that enabled developers to work quickly in implementing report requests. The system was very fast for our purposes, performing fraud analysis and sales trends analysis nightly. In parallel to that analysis on a different server, the de-normalized data was also exported to a Redbrick database so users could perform desktop reporting over historical data. I was the overall technical architect for system, and the developer of the joining system and the SQL-like language and compiling/development tools. I'm sure that today though there are data warehouse specific tools that would eliminate most of that.
  
  Larry
  
  Parent Share
  twitter facebook
Prediction... (Score:5, Insightful)

by Ingolfke ( 515826 ) writes: on Tuesday January 09, 2007 @11:00PM (#17534140) Journal

1) More and more specialized databases will begin cropping up.
2) Mainstream database systems will modularize their engines so they can be optimized for different applications and they can incorporate the benefits of the specialized databases while still maintaining a single uniform database management system.
3) Someone will write a paper about how we've gone from specialized to monolithic...
4) Something else will trigger specialization... (repeat)

Dvorak if you steal this one from me I'm going to stop reading your writing... oh wait.

Share
twitter facebook
- Re:Prediction... (Score:4, Interesting)
  
  by Tablizer ( 95088 ) writes: on Tuesday January 09, 2007 @11:47PM (#17534506) Journal
  
  2) Mainstream database systems will modularize their engines so they can be optimized for different applications and they can incorporate the benefits of the specialized databases while still maintaining a single uniform database management system.
  
  I agree with this prediction. Database interfaces (such as SQL) do not dictate implimentation. Ideally, query languages only ask for what you want, not tell the computer how to do it. As long as it returns the expected results, it does not matter if the database engine uses pointers, hashes, or gerbiles to get the answer. It may however require "hints" in the schema about what to optimize. Of course, you will sacrifice general-purpose performance to speed up a specific usage pattern. But at least they will give you the option.
  
  It is somewhat similar to what "clustered indexes" do in some RDBMS. Clusters improve the indexing by a chosen key at the expense of other keys or certain write patterns by physically grouping the data by that *one* chosen index/key order. The other keys still work, just not as fast.
  
  Parent Share
  twitter facebook
  - Re:Prediction... (Score:3, Interesting)
    
    by Pseudonym ( 62607 ) writes: on Wednesday January 10, 2007 @01:56AM (#17535468)
    
    Interfaces like SQL don't dictate the implementation, but they do dictate the model. Sometimes, the model that you want is so far from the interface language, that you need to either extend or replace the interface language for the problem to be tractable.
    
    SQL's approach has been to evolve. It isn't quite "there" for a lot of modern applications. I can forsee a day when SQL can efficiently model all the capabilities of, say, Z39.50, but we're not there now.
    
    Parent Share
    twitter facebook
    - - Re:Prediction... (Score:2)
        
        by dkf ( 304284 ) writes: <donal.k.fellows@manchester.ac.uk> on Wednesday January 10, 2007 @05:24AM (#17536608) Homepage
        
        SQL and text searching? Check out the FTS1 module [sqlite.org] for SQLite [sqlite.org]...
        
        Parent Share
        twitter facebook
        
        Re:Prediction... (Score:3, Insightful)
        
        by Pseudonym ( 62607 ) writes: on Wednesday January 10, 2007 @07:17PM (#17548346)
        
        Z39.50 is actually much, much more than mere "text searching". If you think hard about the way that you interact with a library catalogue or Google compared with how you interact with a RDBMS, you'll realise there are quite a few more differences than just "text searching".
        
        Think about highly heterogeneous data. Libraries, for example, might index books, periodicals, audio-visual items and online resources such as journals. Google indexes web pages, Usenet news articles, PDF documents and so on. And you can search them all by "title".
        
        Think about "result sets" instead of sequences of tuples. When you search google, or a library catalogue, what you get is a bunch of summary information which you page through, then eventually retrieve the record that you want. Or you might refine your query by adding new search terms or sorting your results by some key. The key data structure here is the "result set": a sequence of record numbers. Everything happens to result sets. You sort your results by state, or intersect the set with another query. The whole process is record-oriented. SQL, on the other hand, is data-oriented: the central data structure is a sequence of tuples, and tuples contain real data.
        
        I hear you objecting that there are ways to do this in SQL, and you'd be right. But in this kind of application, it's always going to be at the expense of a lot more time (more processing grunt required, or less opportunity to exploit disk locality) or much more disk space, if only because of the extra indirection required. If you have terabytes of information, this bites, and bites hard. You wouldn't use Google or your library catalogue if it were ten times slower.
        
        SQL is optimised for the case where data is "right there". Z39.50 is optimised for the case where accessing real data is expensive, because it might involve parsing XML or PDF. People complain about how supposedly inefficient XML data is, but the fact is, there's no better way to do text with structure. The real problems are a) people use XML for things that aren't structured text, and b) relational databases can't handle it with reasonable efficiency at the moment.
        
        Yes, I know, SQL will eventually be able to handle things like this. But it's not there yet.
        
        Parent Share
        twitter facebook
  - Re:Prediction... (Score:3, Informative)
    
    by Decaff ( 42676 ) writes: on Wednesday January 10, 2007 @08:04AM (#17537476)
    
    I agree with this prediction. Database interfaces (such as SQL) do not dictate implimentation. Ideally, query languages only ask for what you want, not tell the computer how to do it.
    
    This can be taken a stage further, with general persistence APIs. The idea is that you don't even require SQL or relational stores: you express queries in a more abstract way and let a persistence engine generate highly optimised SQL, or some other persistence process. I use the Java JDO 2.0 API like this: I can persist and retrieve information from relational stores, CSV, XML, LDAP, Object Databases or even flat text files using exactly the same code and queries, and yet I get optimised queries on each - if I persist to Oracle, the product knows enough about Oracle (and even the specific version of Oracle) to generate very otimised SQL.
    
    Parent Share
    twitter facebook
- Re:Prediction... (Score:2)
  
  by theshowmecanuck ( 703852 ) writes: on Wednesday January 10, 2007 @01:33AM (#17535328) Journal
  
  The reasons for this "one size fits all" (OSFA) strategy include the following:
  Engineering costs...
  Sales costs...
  Marketing costs...
  
  What about the cost of maintenance for the customer?
  
  Maybe people will keep buying 'one size fits all' DBMSs if they meet enough of their requirements and they don't have to hire specialists for each type of databases they might have for each type of application. That is, it is easier and cheaper to maintain a smaller number of *standard* architectures (e.g. one) for a company. Otherwise you have to pay for all sorts of different types of specialists. Now if your company only does say, data warehousing, then that is another matter and it is smart to purchase a specialized system. Or if you are a mega corporation you might be able to afford to have a number of specialist teams for each type of system. But I think smaller shops might need to make do with the poor old vanilla DBMS.
  
  Parent Share
  twitter facebook
one size fits 90% (Score:5, Insightful)

by JanneM ( 7445 ) writes: on Tuesday January 09, 2007 @11:03PM (#17534158) Homepage

It's natural to look at the edges of any feature or performance envelope. People that want to store petabytes of particle accellerator data, do complex queries to serve a million webpages a second, have hundreds of thousands of employees doing concurrent things to the backend.

But for most uses of databases - or any back-end processing - performance just isn't a factor and haven't been for years. Enron may have needed a huge data warehouse system; "Icepick Johhny's Bail Bonds and Securities Management" does not. Amazon needs the cutting edge in customer management; "Betty's Healing Crystals Online Shop (Now With 30% More Karma!)" not so much.

For the large majority of uses - whether you measure in aggregate volume or number of users - one size really fits all.

Share
twitter facebook
- Re:one size fits 90% (Score:2)
  
  by smilindog2000 ( 907665 ) writes: <bill@billrocks.org> on Tuesday January 09, 2007 @11:22PM (#17534296) Homepage
  
  This is more true all the time. I work in the EDA industry, in chip design. The databases sizes I work with are naturally well correlated with More's Law. In effect, I'm a permanent power user, but my circle of peers is shrinking into oblivion...
  
  Parent Share
  twitter facebook
- Re:one size fits 90% (Score:2, Insightful)
  
  by TubeSteak ( 669689 ) writes: on Wednesday January 10, 2007 @03:23AM (#17535984) Journal
  
  For the large majority of uses - whether you measure in aggregate volume or number of users - one size really fits all.
  I'm willing to concede that...
  But IMO it is not 100% relevant.
  
  Large corporate customers usually have a large effect on what features show up in the next version of [software]. Software companies put a lot of time & effort into pleasing their large accounts.
  
  And since performance isn't a factor for the majority of users, they won't really be affected by any performance losses resulting from increased specialization/optimizations. Right?
  
  Parent Share
  twitter facebook
- New hat same as the old hat (Score:2)
  
  by Bacon Bits ( 926911 ) writes: on Wednesday January 10, 2007 @10:41AM (#17539082)
  
  The same argument is what gave rise to re-programmable generic processing components: CPUs. You'll note that the processor industry today (AMD in particular) is now also moving towards this kind of diversification. Gaming systems have been using dedicated GPUs for ages (today they're more powerful than entire PCs from 5 years ago) and I'm sure we remember back when math co-processors (i387) were introduced. You'll note that math co-processors were just absorbed back into the generic model.
  
  It's another pendulum in the computing world (much like the serial/parallel dichotomy). Moving from a disparate number of diverse systems to a small number of all-purpose systems. The advances are always for performance, and they typically happen when the current generation plateaus. We've mastered the concepts of one generation, time to explore new concepts (by re-exploring old concepts).
  
  In 10 or 15 years people will be complaining about the difficulty of data portability, the esoteric nature of these unique data files, and the lack of features in area X in one product and area Y in a second product, and the archaic languages you have to use on these old, unsupported systems. There will be a move back to generic storage engines, bringing with it the lessons learned from that round of insight.
  
  Of course, there will always be demands for specialized components just as there will always be demand for generic, standard components. It's the centrists whose demands are for the best combination of performance and features that determine popularity.
  
  Parent Share
  twitter facebook
- Re:one size fits 90% (Score:2)
  
  by Bozdune ( 68800 ) writes: on Wednesday January 10, 2007 @10:55AM (#17539264)
  
  Then why do we need specialized OLAP systems like Essbase, Kx Systems, etc.? So much for OSFA (one size fits all). Any transaction-oriented database of sufficient size, requiring multi-way joins between tables, and requiring sub-second response times to queries, is way out of range of OSFA. Furthermore, it doesn't require petabytes to take a relational database system to its knees. Just a few million transactions, and your DBMS will be on its back waving its arms feebly, along with your server.
  
  Performance IS a factor, a very serious factor indeed, for many applications. Not for Betty or for Icepick Johnny, to be sure; but for almost any business with more than about $200M in sales, I guarantee there's a dataset kicking around that will require specialized tools to analyze properly. Since those specialized tools are typically expensive, and typically difficult to use, that dataset will not get analyzed properly, and the business will be "running blind."
  
  Parent Share
  twitter facebook
Imagine that.... (Score:5, Insightful)

by NerveGas ( 168686 ) writes: on Tuesday January 09, 2007 @11:09PM (#17534210)

... a database mechanism particularly written for the task at hand will beat a generic one. Who would have thought?

steve

(+1 Sarcastic)

Share
twitter facebook
- Why imagine, just read ;-) (Score:2)
  
  by shis-ka-bob ( 595298 ) writes: on Wednesday January 10, 2007 @09:17AM (#17538004)
  
  There is an article, and it has many references. How is a 'Captain Obvious' sort of comment labeled Insightful? The insightful part is in the article. The first author, Michael Stonebraker, architected Ingres and Postgres. He looked at OLAP databases, which is a market that is much larger than a special case. He proposed storing the data in columns rather than in rows. He tested this, it works. In fact it works so well that he can clobber a $300,000 server cluster with a $800 dollar PC. I know that I would be pretty happy to spend a year porting to his database if I could pocket half of that annual hardware cost savings. The savings in electricty would be enough to pay for several pretty serious Starbucks addictions. His key insight seems to be that he can vastly improve OLAP performance by storing the data in columns rather than in rows. This change could be quite transparent to the end users & developers, except for the massive speed-up and cost savings, of course. This paper describes a general solution for a common problem. Stonebraker has developed Vertica [vertica.com], which is still support ad-hoc querries in SQL. This seems like a pretty general purpose solution for OLAP.
  
  Parent Share
  twitter facebook
Dammit (Score:5, Insightful)

by AKAImBatman ( 238306 ) * writes: <akaimbatman@noSPAm.gmail.com> on Tuesday January 09, 2007 @11:15PM (#17534238) Homepage Journal

I was just thinking about writing an article on the same issue.

The problem I've noticed is that too many applications are becoming specialized in ways that are not handled well by traditional databases. The key example of this is forum software. Truly heirarchical in nature, the data is also of varying sizes, full of binary blobs, and generally unsuitable for your average SQL system. Yet we keep trying to cram them into SQL databases, then get surprised when we're hit with performance problems and security issues. It's simply the wrong way to go about solving the problem.

As anyone with a compsci degree or equivalent experience can tell you, creating a custom database is not that hard. In the past it made sense to go with off-the-shelf databases because they were more flexible and robust. But now that modern technology is causing us to fight with the databases just to get the job done, the time saved from generic databases is starting to look like a wash. We might as well go back to custom databases (or database platforms like BerkeleyDB) for these specialized needs.

Share
twitter facebook
- Re:Dammit (Score:3, Funny)
  
  by Jason Earl ( 1894 ) writes: on Wednesday January 10, 2007 @12:36AM (#17534894) Homepage Journal
  
  Eventually the folks working on web forums will realize that they are just recreating NNTP and move on to something else.
  
  Parent Share
  twitter facebook
- Re:Dammit (Score:2)
  
  by Jerf ( 17166 ) writes: on Wednesday January 10, 2007 @12:56AM (#17535050) Journal
  
  Truly heirarchical in nature, the data is also of varying sizes, full of binary blobs, and generally unsuitable for your average SQL system.
  Actually, I was bitching about this very problem [jerf.org] (and some others) recently, when I came upon this article about recursive queries [teradata.com] on the programming reddit [reddit.com].
  
  Recursive queries would totally, completely solve the "hierarchy" part of the problem, and halfway decent database design would handle the rest.
  
  My theory is that nobody realizes that recursive queries would solve their problems, so nobody asks for them, so nobody ever discovers them, so nobody ever realizes that recursive queries would solve their problem. I don't know of an open source DB that has this, and I'd certainly never seen this in my many years of working with SQL. I wish we did have it, it would solve so many of my problems.
  
  Now, if we could just deal with the problem of having a key that could relate to any one of several tables in some reasonable way... that's the other problem I keep hitting over and over again.
  
  Parent Share
  twitter facebook
  - Re:Dammit (Score:2)
    
    by a_ghostwheel ( 699776 ) writes: on Wednesday January 10, 2007 @03:32AM (#17536044)
    
    Or just use hierarchical queries - like START WITH / CONNECT BY clauses in Oracle. Probably other vendors have something similar too - not sure about that.
    
    Parent Share
    twitter facebook
  - Re:Dammit (Score:2)
    
    by Imsdal ( 930595 ) writes: on Wednesday January 10, 2007 @05:02AM (#17536506)
    
    My theory is that nobody realizes that recursive queries would solve their problems, so nobody asks for them, so nobody ever discovers them, so nobody ever realizes that recursive queries would solve their problem.
    
    It used to be that execution plans in Oracle were retreived from the plan table via a recursive query. Since even the tiniest application will need a minimum amount of tuning, and since all db tuning should start by looking at the execution plans, everyone should have run into recursive queries sooner rather than later.
    
    My theory is instead that too few developers are properly trained. They simply don't know what they are doing or how it should be done. During my years as a consultant, I spent a lot of time improving db performance, and never even once did I run into in-house people who even knew what en execution plan was, let alone how to interpret it. (And, to be honest, not all of my consultant colleagues knew either...)
    
    Software development is a job that requires the training of a surgeon, but it's staffed by people who are trained to be janitors or, worse, economists. (I realise that this isn't true at all for the /. crowd. I'm talking about all the others all of us has run into on every job we have had.)
    
    Parent Share
    twitter facebook
  - - Re:Dammit (Score:2)
      
      by Jerf ( 17166 ) writes: on Wednesday January 10, 2007 @12:29PM (#17540868) Journal
      
      In other words, distributed foreign keys.
      Ah, thank you for the name. It's difficult to find the name of something based on its description. I tried before posting but just got a lot of stuff about how to use foreign keys.
      
      Parent Share
      twitter facebook
- Re:Dammit (Score:2)
  
  by poot_rootbeer ( 188613 ) writes: on Wednesday January 10, 2007 @12:31PM (#17540900)
  
  The key example of this is forum software. Truly heirarchical in nature, the data is also of varying sizes, full of binary blobs, and generally unsuitable for your average SQL system.
  
  Hierarchichal? Yes, but I don't see any problem using SQL to access hierarchical information. It's easy to have parent/child relationships.
  
  Data of varying sizes? I thought this problem was solved 20 years ago when ANSI adopted a SQL standard including a VARCHAR datatype.
  
  Full of binary blobs? Why? What in the hell for? So that each user can have an obnormous enoxious "signature banner" graphic that readers have to look at 20 times in any given thread?
  
  There's very little data that belongs in a forum interface that can't be represented in plaintext. For the rest, store it on the filesystem and just store a reference to it in the database.
  
  As anyone with a compsci degree or equivalent experience can tell you, creating a custom database is not that hard.
  
  And as anyone who has ever done software development in the real world can tell you, custom components almost always suck worse than similar standard components.
  
  Parent Share
  twitter facebook
- - Re:Dammit (Score:3, Insightful)
    
    by AKAImBatman ( 238306 ) * writes: <akaimbatman@noSPAm.gmail.com> on Wednesday January 10, 2007 @12:30AM (#17534868) Homepage Journal
    
    But is this actually happening? Has slashdot had to abandon general-purpose RDBMS?
    
    I wasn't referring to Slashdot in particular, but rather general web forum software. Your PhpBB, vBulletins, and JForums of the world are more along the lines of what I'm referring to. After dealing with the frustrations of setting up, managing, and hacking projects like these, I've come to the conclusion that the backend datastore is the problem. The relational theories still hold true, but the SQL database implementations simply aren't built with CLOBs and BLOBs in mind.
    
    That being said, Slashdot is a fairly good example of how they've worked around the limitations of their backend database at a cost equalling or far exceeding the cost of building a customized data store. A costly venture that bit them in the rear [slashdot.org] when they reached their maximum post count.
    
    Not that I'm criticizing Slashcode. Hindsight is 20/20. It's just becoming more and more apparent that for some applications the cost of using an off-the-shelf database has become greater than the cost of building a custom datastore.
    
    Parent Share
    twitter facebook
    - Re:Dammit (Score:2, Insightful)
      
      by Tablizer ( 95088 ) writes: on Wednesday January 10, 2007 @12:46AM (#17534984) Journal
      
      The relational theories still hold true, but the SQL database implementations simply aren't built with CLOBs and BLOBs in mind.
      
      That is very true. They haven't seemed to have perfected the performance handling of highly variable "cells".
      
      That being said, Slashdot is a fairly good example of how they've worked around the limitations of their backend database at a cost equalling or far exceeding the cost of building a customized data store. A costly venture that bit them in the rear
      
      Last night we crossed over 16,777,216 comments in the database. The wise amongst you might note that this number is 2^24, or in MySQLese an unsigned mediumint. Unfortunately, like 5 years ago we changed our primary keys in the comment table to unsigned int (32 bits, or 4.1 billion) but neglected to change the index that handles parents.
      
      It would be nice if more RDBMS offered flexible integers such that you didn't have to pick a size up front. Fixed sizes (small-int,int,long) are from the era where variable-sized column calculations were too expensive CPU-wise. Since then CPU is cheap compared to "pipeline" issues such that variable columns are just as efficient as fixed ones, but only take the space they need.
      
      But it would not have been hard for slashdot to use a big integer up-front. They chose to be stingy and made a gamble, it was not forced on them. It may have cost a few cents more early, but would have prevented that disaster. Plus, bleep happens no matter what technology you use. I am sure dedicated-purpose databases have their own gotcha's and trade-off decision points. Being dedicated probably means they are less road-tested also.
      
      Parent Share
      twitter facebook
    - Re:Dammit (Score:2)
      
      by dcam ( 615646 ) writes: <david&uberconcept,com> on Wednesday January 10, 2007 @02:29AM (#17535670) Homepage
      
      ...the SQL database implementations simply aren't built with CLOBs and BLOBs in mind.
      
      This is extremely true.
      
      I work on a web application that stores a lot of documents (on of our clients stores +50Gb). The database back end is SQL Server (yeah I know). When it was designed (~8 years ago) we decided to store the documents in the filesystem and store the paths in the database. This was largely for performance reasons, although some other considerations were the size of database backups and general db management. It was anticipated that in the future we would moce the documents into the db when performance improved sufficiently. It hasn't.
      
      According to Inside SQL Server 2000 [microsoft.com], all data in SQL server is stored on 8K pages in B trees. BLOBs and CLOBs are broken up into 8K chunks. Performance on reading and writing this data is obviously not fantastic, particularly when you have largish files (we have files that are +100Mb, average size of files would be ~2Mb). In addition the tools in SQL server for adding and retrieving BLOBs are a major headache.
      
      SQL Server is not designed for BLOBs. I can't comment on other relational databases, but I suspect that they would suffer similar issues.
      
      Parent Share
      twitter facebook
      - Re:Dammit (Score:2)
        
        by CaymanIslandCarpedie ( 868408 ) writes: on Wednesday January 10, 2007 @07:55AM (#17537442) Journal
        
        The handling of BLOBS isn't really all that bad anymore in many RDMS. Now certinaly speciallized DBs for this could do better I'm sure, but the old maxims about "never store BLOBS in a DB" don't really hold anymore. Since you mention SQL Server, consider that SharePoint Server uses SQL Server as its data store. We have an install of SharePoint with roughly 150GB of documents and scanned archival PDFs with over 100 users accessing those documents pretty much continuously (not all 100 users but it is very activly used). The performance of opening documents in SharePoint are not (noticeably to a human at least) any slower than opening the documents from a network file share.
        
        Now obviously MS probably had some top of the line DBAs tuning this to get that type of performance, but it doesn't seem that BLOBs are a direct limitation in SQL Server any more as much as limitation of the DBAs trying to get the performance out of the system perhaps if others are still having issues with this.
        
        That being said, our current application is only dealing with roughly 100 users on the local LAN. In the next 6 months we will be testing exposing this on the internet to 10s of thousands of users. We'll see how if it still holds up ;-)
        
        Parent Share
        twitter facebook
      - Re:Dammit (Score:2)
        
        by dcam ( 615646 ) writes: <david&uberconcept,com> on Wednesday January 10, 2007 @03:24AM (#17535988) Homepage
        
        Comments like this one [thescripts.com] would suggest that others have different experience. I've been hunting for details on the storage mechanisms of pgsql to try to work out whether it might be faster but no luck so far.
        
        Parent Share
        twitter facebook
Duh (Score:5, Insightful)

by Reality Master 101 ( 179095 ) writes: <RealityMaster101 ... m minus caffeine> on Tuesday January 09, 2007 @11:18PM (#17534258) Homepage Journal

Who thinks that a specialized application (or algorithm) won't beat a generalized one in just about every case?

The reason people use general databases is not because they think it's the ultimate in performance, it's because it's already written, already debugged, and -- most importantly -- programmer time is expensive, and hardware is cheap.

See also: high level compiled languages versus assembly language*.

(*and no, please don't quote the "magic compiler" myth... "modern compilers are so good nowadays that they can beat human written assembly code in just about every case". Only people who have never programmed extensively in assembly believe that.)

Share
twitter facebook
- Re:Duh (Score:5, Informative)
  
  by Waffle Iron ( 339739 ) writes: on Tuesday January 09, 2007 @11:43PM (#17534468)
  
  *and no, please don't quote the "magic compiler" myth... "modern compilers are so good nowadays that they can beat human written assembly code in just about every case". Only people who have never programmed extensively in assembly believe that.
  
  I've programmed extensively in assembly. Your statement may be true up to a couple of thousand lines of code. Past that, to avoid going insane, you'll start using things like assembler macros and your own prefab libraries of general-purpose assembler functions. Once that happens, a compiler that can tirelessly do global optimizations is probably going to beat you hands down.
  
  Parent Share
  twitter facebook
  - Re:Duh (Score:5, Insightful)
    
    by wcbarksdale ( 621327 ) writes: on Wednesday January 10, 2007 @12:07AM (#17534670)
    
    Also, to successfully hand-optimize you need to remember a lot of details about instruction pipelines, caches, and so on, which is fairly detrimental to remembering what your program is supposed to do.
    
    Parent Share
    twitter facebook
  - Re:Duh (Score:2)
    
    by Pseudonym ( 62607 ) writes: on Wednesday January 10, 2007 @01:50AM (#17535424)
    
    The reason why assembly programmers can beat high-level programmers is they can write their code in a high-level language first, then profile to see where the hotspots are, and then rewrite a 100 line subroutine or two in assembly language, using the compiler output as a first draft.
    
    In other words, assembly programmers beat high-level programmers because they can also use modern compilers.
    
    Parent Share
    twitter facebook
  - Re:Duh (Score:3, Insightful)
    
    by RAMMS+EIN ( 578166 ) writes: on Wednesday January 10, 2007 @07:07AM (#17537176) Homepage Journal
    
    Also, the compiler may know more CPUs than you do. For example, do you know the pairing rules for instructions on an original Pentium? The differences one must pay attention to when optimizing for an Thoroughbred Athlon vs. a Prescott P4 vs. a Yonah Pentium-M vs. a VIA Nehemiah? GCC does a pretty good job of generating optimized assembly code for each of these from the same C source code. If you were to do the same in assembly, you would have to write separate code for each CPU, and know the subtle differences as well as the compiler does.
    
    Parent Share
    twitter facebook
  - - Re:Duh (Score:2)
      
      by Waffle Iron ( 339739 ) writes: on Wednesday January 10, 2007 @01:08AM (#17535136)
      
      Even using assembler macros and prefab libraries of general-purpose assembler functions you're generally no worse off than the compiler.
      
      I don't know how true that is, given that assembler macros and fixed assembler APIs won't be particularly good at inlining calls and then integrating the optimizations of the inlined code with the particular facets of the surrounding code for each expansion.
      
      Parent Share
      twitter facebook
- Re:Duh (Score:2)
  
  by smilindog2000 ( 907665 ) writes: <bill@billrocks.org> on Tuesday January 09, 2007 @11:55PM (#17534558) Homepage
  
  I've never heard the "magic compiler myth" phrase, but I'll help educate others about it. It's refreshing to hear someone who understands reality. Of course, a factor of 2 to 4 improvement in speed is less and less important every day...
  
  Parent Share
  twitter facebook
- Re:Duh (Score:3, Interesting)
  
  by suv4x4 ( 956391 ) writes: on Tuesday January 09, 2007 @11:57PM (#17534580)
  
  "modern compilers are so good nowadays that they can beat human written assembly code in just about every case". Only people who have never programmed extensively in assembly believe that.
  
  Only people who haven't seen recent advancements in CPU design and compiler architecture will say what you just said.
  
  Modenr compilers apply optimizations on a so sophisticated level that would be a nightmare for a human to support such a solution optimized.
  
  As an example, modern Intel processors can process certain "simple" commands in parallel and other commands are broken apart into simpler commands, processed serially. I'm simplifying the explanation a great deal, but anyone who read about how a modern CPU works, branch prediction algorithms and so on is familiar with the concept.
  
  Of course "they can beat human written assembly code in just about every case" is an overstatement, but still, you gotta know there's some sound logic & real reasons behind this "myth".
  
  Parent Share
  twitter facebook
  - Re:Duh (Score:2, Insightful)
    
    by mparker762 ( 315146 ) writes: on Wednesday January 10, 2007 @12:20AM (#17534776) Homepage
    
    Only someone who hasn't recently replaced some critical C code with assembler and gotten substantial improvement would say that. This was MSVC 2003 which isn't the smartest C compiler out there, but not a bad one for the architecture. Still, a few hours with the assembler and a few more hours doing some timings to help fine-tune things improved the CPU performance of this particular service by about 8%.
    
    Humans have been writing optimized assembler for decades, the compilers are still trying to catch up. Modern hand-written assembler isn't necessarily any trickier or more clever than the old stuff (it's actually a bit simpler). Yes compilers are using complicated and advanced techniques, but it's still all an attempt to approximate what humans do easily and intuitively. Artificial intelligence programs use complicated and advanced techniques too, but no one would claim that this suddenly makes philosophy any harder.
    
    Your second point about the sophistication of the CPU's is true but orthogonal to the original claim. These sophisticated CPU's don't know who wrote the machine code, they do parallel execution and branch prediction and so forth on hand-optimized assembly just like they do on compiler-generated code. Which is one reason (along with extra registers and less segment BS) that it's easier to write and maintain assembler nowadays, even well-optimized assembler.
    
    Parent Share
    twitter facebook
    - Re:Duh (Score:2, Insightful)
      
      by suv4x4 ( 956391 ) writes: on Wednesday January 10, 2007 @02:20AM (#17535618)
      
      This was MSVC 2003 which isn't the smartest C compiler out there, but not a bad one for the architecture. Still, a few hours with the assembler and a few more hours doing some timings to help fine-tune things improved the CPU performance of this particular service by about 8%... These sophisticated CPU's don't know who wrote the machine code, they do parallel execution and branch prediction and so forth on hand-optimized assembly just like they do on compiler-generated code. Which is one reason (along with extra registers and less segment BS) that it's easier to write and maintain assembler nowadays, even well-optimized assembler.
      
      Do you know which types of commands when ordered in quadruples will execute at once on a Core Duo? Incidentally those that won't on a Pentium 4.
      
      I hope you're happy with your 8% improvement, enjoy it until your next CPU upgrade that requires different approach to assembly optimization.
      
      The advantage of a compiler is that compiling for a target CPU is a matter of a compiler switch, so compiler programmers can concentrate on performance and smart use of the CPU specifics, and you can concentrate on your program features.
      
      If you were that concerned about performance in first place, you'd use a compiler provided by the processor vendor (Intel I presume) and use the intel libraries for processor specific implementations of common math and algorithm issues needed in applications.
      
      Most likely this would've given you more than 8% boost and still keep your code somewhat less bound to a specific CPU, than with assembler.
      
      An example of "optimization surprise" i like, is the removal of the barrel shifter in Pentium 4 CPU-s. You see, lots of programmers know that it's faster (on most platforms) to bit shift, and not multiply by 2, 4, 8, etc (or divide).
      
      But bit shifting on P4 is handled by the ALU, and is slightly slower than multiplication (why, I don't know, but it's a fact). Code "optimized" for bit shifting would be "antioptimized" on P4 processors.
      
      I know some people adapted their performance critical code to meet this new challenge. But then what? P4 is obsolete and instead we're back to the P3 derived architecture, and the barrel shifter is back!
      
      When I code a huge and complex system, I'd rather buy a 8% faster machine and use a better compiler than have to manage this hell each time a CPU comes out.
      
      Parent Share
      twitter facebook
  - Re:Duh (Score:4, Insightful)
    
    by try_anything ( 880404 ) writes: on Wednesday January 10, 2007 @02:59AM (#17535854)
    
    Modenr compilers apply optimizations on a so sophisticated level that would be a nightmare for a human to support such a solution optimized.
    
    There are three quite simple things that humans can do that aren't commonly available in compilers.
    First, a human gets to start with the compiler output and work from there :-) He can even compare the output of several compilers.
    Second, a human can experiment and discover things accidentally. I recently compiled some trivial for loops to demonstrate that array bounds checking doesn't have a catastrophic effect on performance. With the optimizer cranked up, the loop containing a bounds check was faster than the loop with the bounds check removed. That did not inspire confidence.
    Third, a human can concentrate his effort for hours or days on a single section of code that profiling revealed to be critical and test it using real data. Now, I know JIT compilers and some specialized compilers can do this stuff, but as far as I know I can't tell gcc, "Compile this object file, and make the foo function as fast as possible. Here's some data to test it with. Let me know on Friday how far you got, and don't throw away your notes, because we might need further improvements."
    I hope I'm wrong about my third point (please please please) so feel free to post links proving me wrong. You'll make me dance for joy, because I do NOT have time to write assembly, but I have a nice fast machine here that is usually idle overnight.
    
    Parent Share
    twitter facebook
    - Re:Duh (Score:2)
      
      by ciggieposeur ( 715798 ) writes: on Wednesday January 10, 2007 @11:42AM (#17540002)
      
      With the optimizer cranked up, the loop containing a bounds check was faster than the loop with the bounds check removed.
      
      That actually makes sense to me. If your bounds check was very simple and the only loop outcome was breaking out (throw an exception, exit the loop, exit the function, etc., without altering the loop index), the optimizer could move it out of the loop entirely and alter the loop index check to incorporate the effect of the bounds check. Result is a one-time bounds check before entering the loop and a simplified loop, hence faster execution.
      
      I remember in the discussion on the D compiler someone pointed this out.
      
      Parent Share
      twitter facebook
- Re:Duh (Score:2, Insightful)
  
  by kfg ( 145172 ) writes: on Wednesday January 10, 2007 @12:00AM (#17534606)
  
  The reason people use general databases is not because they think it's the ultimate in performance, it's because it's already written, already debugged, and -- most importantly. . .
  
  . . .has some level of definable and gauranteed data integrity.
  
  KFG
  
  Parent Share
  twitter facebook
- I thought I was an assembler demon (Score:2)
  
  by ratboy666 ( 104074 ) writes: <fred_weigel.hotmail@com> on Wednesday January 10, 2007 @12:01AM (#17534622) Journal
  
  I had a "simple" optimization project. It came down to one critical function (ISO JBIG compression). I coded the thing by hand in assembler, carefully manually scheduling instructions. It took me days. Managed to beat GNU gcc 2 and 3 by a reasonable margin. The latest Microsoft C compiler? Blew me away. I looked at the assembler it produced -- and I don't get where the gain is coming from. The compiler understands the machine better than I do.
  
  Go figure -- I hung up my assembler badge. Still a useful skill for looking at core dumps, though. And for dealing with micro-controllers.
  
  So, have you had at it and benchmarked your assembler vs. a compilers?
  
  Parent Share
  twitter facebook
  - Re:I thought I was an assembler demon (Score:2)
    
    by Reality Master 101 ( 179095 ) writes: <RealityMaster101 ... m minus caffeine> on Wednesday January 10, 2007 @01:37AM (#17535350) Homepage Journal
    
    I looked at the assembler it produced -- and I don't get where the gain is coming from. The compiler understands the machine better than I do.
    
    All that proves is that the compiler knew a trick you didn't (probably it understood which instructions will go into which pipelines and will parallelize). I bet if you took the time to learn more about the architecture, you could find ways to be even more clever.
    
    I'm not arguing for a return to assembly... it's definitely too much of a hassle these days, and again, hardware is cheap, and programmers are expensive. Just that given enough programmer time, humans can nearly always do better than the compiler, which shouldn't be surprising since humans programmed the compiler, and humans have more contextual knowledge of what a program is trying to accomplish.
    
    Parent Share
    twitter facebook
  - Re:I thought I was an assembler demon (Score:2)
    
    by TheLink ( 130905 ) writes: on Wednesday January 10, 2007 @06:44AM (#17537074) Journal
    
    "The compiler understands the machine better than I do."
    
    Actually the people paid lots of money to write Microsoft's C compiler understand the machine better than you do. I doubt you should be surprised.
    
    And the compiler will hopefully be able to keep all the tricks in mind (a human might forget to use one in some cases).
    
    I'm just waiting/hoping for the really smart people to make stuff like perl and python faster.
    
    Java has improved in speed a lot and already is quite fast in some cases, but I don't consider it a high level language (given the amount of code people have to write just to do simple stuff).
    
    Parent Share
    twitter facebook
- - Re:Duh (Score:2)
    
    by Electrum ( 94638 ) writes: <david@acz.org> on Wednesday January 10, 2007 @01:52AM (#17535444) Homepage
    
    No one trusts automatic C++ code generators. So why do we trust automatic SQL code generators?
    
    We don't. That's why we have explain plans and hints.
    
    Parent Share
    twitter facebook
Parallel databases (Score:2)

by meta-monkey ( 321000 ) writes: on Tuesday January 09, 2007 @11:20PM (#17534288) Journal

This reminds me of the parallel databases class I took in college. Sure, specialized parallel databases (not distributed, mind you, parallel) using specialized hardware were definitely faster than the standard SQL-type relational databases...but so what? The costs were so much higher they were not feasible for most applications.

Specialized software and hardware outperforms generic implementations! Film at 11!

Share
twitter facebook
SQL is Dead - Long Live SQL (Score:2)

by Doc Ruby ( 173196 ) writes: on Tuesday January 09, 2007 @11:45PM (#17534488) Homepage Journal

SW platform development always features a tradeoff between general purpose APIs and optimized performance engines. Databases are like this. The economic advantages for everyone in using an API even as awkward and somewhat inconsistent as SQL are more valuable than the lost performance in the fundamental relational/query model.

But it doesn't have to be that way. SQL can be retained as an API, but different storage/query engines can be run under the hood to better fit different storage/query models for different kinds of data/access. A better way out would be a successor to SQL that is more like a procedural language for objects with all operators/functions implicitly working on collections like tables. Yes, something like object lisp, best organized as a dataflow with triggers and events. So long as SQL can be automatically compiled into the new language, and back, for at least 5 years of peaceful coexistence.

Share
twitter facebook
- SQL is part of the problem. (Score:2)
  
  by sonofagunn ( 659927 ) writes: on Wednesday January 10, 2007 @11:49AM (#17540146)
  
  Databases already have the ability to change storage engines as long as they support SQL. The reason my company shuns the database for many specific tasks is that SQL is ill-suited to perform many types of transformations, calculations, and aggregations on data. What may take many pages of SQL (and many temp tables) in a stored proc can be written in a simple Java class and will perform much better, as well as being easier to maintain. A lot of our processing goes like this Raw data from database (simple select queries, which are very fast) -> flat files -> custom Java code -> reporting engine or another database. The speedup over using stored procs or SQL based ETL tools ranges between a factor of 10 and a factor of 100. MDX is a better language than SQL for a lot of purposes, but not all.
  
  Parent Share
  twitter facebook
- - Re:SQL is Dead - Long Live SQL (Score:2)
    
    by Doc Ruby ( 173196 ) writes: on Wednesday January 10, 2007 @01:37AM (#17535352) Homepage Journal
    
    Collections don't (necessarily) have an order.
    
    Objects don't have to be C++ objects. They can be just class blueprints inherited from other classes, for instantiated objects, which are just related logic and the data accessed.
    
    Your SMEQL looks a lot like lisp.
    
    Something like object lisp for large collections of multidimensional (even asymmetric) objects could bring benefits of encapsulation/reuse and relations to a syntax that better reflects both the data model and the sequence of operations, in rules like policies. A dataflow version would be easy to read, debug and maintain.
    
    Parent Share
    twitter facebook
This has been known for years already (Score:3, Interesting)

by TVmisGuided ( 151197 ) writes: <alan...jump@@@gmail...com> on Tuesday January 09, 2007 @11:56PM (#17534566) Homepage

Sheesh...and it took someone from MIT to point this out? Look at a prime example of a high-end, heavily-scaled, specialized database: American Airlines' SABRE. The reservations and ticket-sales database system alone is arguably one of the most complex databases ever devised, is constantly (and I do mean constantly) being updated, is routinely accessed by hundreds of thousands of separate clients a day...and in its purest form, is completely command-line driven. (Ever see a command line for SABRE? People just THINK the APL symbol set looked arcane!) And yet this one system is expected to maintain carrier-grade uptime or better, and respond to any command or request within eight seconds of input. I've seen desktop (read: non-networked) Oracle databases that couldn't accomplish that!

Share
twitter facebook
- Re:This has been known for years already (Score:4, Insightful)
  
  by sqlgeek ( 168433 ) writes: on Wednesday January 10, 2007 @03:12AM (#17535930)
  
  I don't think that you know Oracle very well. Lets say you want so scale and so you want clustering or grid functionality -- built into Oracle. Lets say that you want to partition your enormous table into one physical table per month or quarter -- built in. Oh, and if you query the whole giant table you'd like parallel processes to run against each partition, balanced across your cluster or grid -- yeah, that's built in too. Lets say you almost always get a group of data together rather than piece by piece so you want it physically colocated to reduce disk i/o -- built in.
  
  This is why you pay a good wage for your Oracle data architect & DBA -- so that you can get people who know how to do these sort of things when needed. And honestly I'm not even scratching the surface.
  
  Consider a data warehouse for a giant telecom in South Africa (with a DBA named Billy in case you wondered). You have over a billion rows in your main fact table, but you're only interested in a few thousand of those rows. You have an index on dates and another index on geographic region and another region on customer. Any one of those indexes will reduce the 1.1 billion rows to 10's of millions of rows, but all three restrictions will reduce it to a few thousand. What if you could read three indexes, perform bitmap comparisons on the results to get only the rows that match the results of all three indexes and then only fetch those few thousand rows from the 1.1 billion row table. Yup, that's built in and Oracle does it for you for behind the scenes.
  
  Now yeah, you can build a faster single-purpose db. But you better have a god damn'd lot of dev hours allocated to the task. My bet is that you'll probably come our way ahead in cash & time to market with Oracle, a good data architect and a good DBA. Any time you want to put your money on the line, you let me know.
  
  Parent Share
  twitter facebook
  - Re:This has been known for years already (Score:2)
    
    by georgewilliamherbert ( 211790 ) writes: on Wednesday January 10, 2007 @07:17AM (#17537218)
    
    Nevertheless - anyone doing serious data warehousing who cares about read performance has been using Teradata (older apps) or column-oriented Sybase-IQ (newer apps). Oracle can store over a billion rows, sure; a terabyte's a lot of data, and people have had multi-terabyte databases for the better part of a decade, for some projects.
    
    Why? Despite all the tuning, Sybase-IQ can still run through a general purpose query into its data around ten times faster than tuned Oracle.
    
    It may not matter in the telephone company, but for people who actually have money on the line (financial companies), huge data processing uses appropriate tools. IQ and columns win.
    
    Parent Share
    twitter facebook
  - Re:This has been known for years already (Score:2)
    
    by TVmisGuided ( 151197 ) writes: <alan...jump@@@gmail...com> on Wednesday January 10, 2007 @08:24AM (#17537576) Homepage
    
    Now yeah, you can build a faster single-purpose db. But you better have a god damn'd lot of dev hours allocated to the task. My bet is that you'll probably come our way ahead in cash & time to market with Oracle, a good data architect and a good DBA. Any time you want to put your money on the line, you let me know.
    
    Seems to me this describes AA perfectly...SABRE has been around since what, the mid- to late-70s? And it's still actively developed and maintained. At a fairly hefty annual price tag. And yeah, the user interface is antiquated and arcane, but no one's come up with anything better yet.
    
    Now, I don't know what they're using to get it to play nice with the Internet (since Travelocity is tied directly into SABRE), but that must have been an interesting exercise in programming on its own. That, however, is a discussion topic for another time and place.
    
    Parent Share
    twitter facebook
- - Re:This has been known for years already (Score:2)
    
    by TVmisGuided ( 151197 ) writes: <alan...jump@@@gmail...com> on Wednesday January 10, 2007 @08:28AM (#17537600) Homepage
    
    I can't think of any. The other major CRS out there all look a great deal like SABRE. Some of the smaller airlines may have built some coprocessing systems on RDBMS which use the CRS as a back-end, just to make it a bit easier on the reservations and ticketing people, but since I don't work in the travel field any more I don't know who that might be or what they're using.
    
    Parent Share
    twitter facebook
Please reduce lameness (Score:5, Insightful)

by suv4x4 ( 956391 ) writes: on Wednesday January 10, 2007 @12:06AM (#17534660)

We're all sick with "new fad: X is dead?" articles. Please reduce lameness to an acceptable level!
Can't we get used to the fact that specialized & new solutions don't magically kill existing popular solution to a problem?

And it's not a recent phenomenon, either, I bet it goes back to when the first proto-journalistic phenomenons formed in early uhman societies, and haunts us to this very day...

"Letters! Spoken speech dead?"

"Bicycles! Walking on foot dead?"

"Trains! Bicycles dead?"

"Cars! Trains dead?"

"Aeroplanes! Trains maybe dead again this time?"

"Computers! Brains dead?"

"Monitors! Printing dead yet?"

"Databases! File systems dead?"

"Specialized databases! Generic databases dead?"

In a nutshell. Don't forget that a database is a very specialized form of a storage system, you can think of it as a very special sort of file system. It didn't kill file systems (as noted above), so specialized systems will thrive just as well without killing anything.

Share
twitter facebook
- Re:Please reduce lameness (Score:2, Funny)
  
  by msormune ( 808119 ) writes: on Wednesday January 10, 2007 @02:48AM (#17535752)
  
  I'll chip in: Public forums! Intelligence dead? Slashdot confirms!
  
  Parent Share
  twitter facebook
- Death to Trees! (Score:3, Interesting)
  
  by Tablizer ( 95088 ) writes: on Wednesday January 10, 2007 @02:50AM (#17535770) Journal
  
  Don't forget that a database is a very specialized form of a storage system, you can think of it as a very special sort of file system. It didn't kill file systems
  
  Very specialized? Please explain. Anyhow, I *wish* file systems were dead. They have grown into messy trees that are unfixable because trees can only handle about 3 or 4 factors and then you either have to duplicate information (repeat factors), or play messy games, or both. They were okay in 1984 when you only had a few hundred files. But they don't scale. Category philosophers have known since before computers that hierarchy taxonomies were limited.
  
  The problem is that the best alternative, set-based file systems, have a longer learning curve than trees. People pick up hierarchies pretty fast, but sets take longer to click. Power does not always come easy. I hope that geeks start using set-oriented file systems and then others catch up. The thing is that set-oriented file systems are enough like relational that one might as well use relational. If only the RDBMS were performance-tuned for file-like uses (with some special interfaces added).
  
  Parent Share
  twitter facebook
  - Re:Death to Trees! (Score:3, Insightful)
    
    by suv4x4 ( 956391 ) writes: on Wednesday January 10, 2007 @03:02AM (#17535874)
    
    Anyhow, I *wish* file systems were dead. They have grown into messy trees that are unfixable because trees can only handle about 3 or 4 factors and then you either have to duplicate information (repeat factors), or play messy games, or both.
    
    You know, I've seen my share of RDBMS designs to know the "messiness" is not the fault of the file systems (or databases in that regard).
    
    Sets have more issues than you describe, and you know very well Vista had lots of set based features that were later downscaled, hidden and reduced, not because WinFS was dropped (because the sets in Vista don't use WinFS, they work with indexing too), but because it was terribly confusing to the users.
    
    Parent Share
    twitter facebook
- Comment removed (Score:2)
  
  by account_deleted ( 4530225 ) writes: on Wednesday January 10, 2007 @07:37AM (#17537358)
  
  Comment removed based on user account deletion
  
  Parent Share
  twitter facebook
- - Re:Please reduce lameness (Score:2)
    
    by suv4x4 ( 956391 ) writes: on Wednesday January 10, 2007 @02:38AM (#17535704)
    
    man, you're a fag and an idiot.
    
    The world's not perfect you know. You're a troll, anonymous and a coward.
    
    I'd still pick me over you if given the chance.
    
    Parent Share
    twitter facebook
Isn't it just stating the obvious? (Score:5, Funny)

by Dekortage ( 697532 ) writes: on Wednesday January 10, 2007 @12:24AM (#17534804) Homepage
I've made some similar discoveries myself!
- Transporting 1500 pounds of bricks from the store to my house is much faster if I use a big truck rather than making dozens (if not hundreds) of trips with my Honda Civic.
- Wearing dress pants with a nice shirt and tie often makes an interview more likely to succeed, even if I wear jeans every other day after I get the job.
- Carving pumpkins into "jack-o-lanterns" always turns out better if I use a small, extremely sharp knife instead of a chainsaw.
Who woulda thought that specific-use items might improve the outcome of specific situations?
Share
twitter facebook
- Re:Isn't it just stating the obvious? (Score:2)
  
  by hotdiggitydawg ( 881316 ) writes: on Wednesday January 10, 2007 @09:34AM (#17538212)
  
  What self-respecting geek would carve pumpkins with anything other than a Dremel? Turn your card in at the door, sir...
  
  Parent Share
  twitter facebook
- Re:Isn't it just stating the obvious? (Score:2)
  
  by swillden ( 191260 ) * writes: <shawn-ds@willden.org> on Wednesday January 10, 2007 @11:28AM (#17539758) Journal
  
  Transporting 1500 pounds of bricks from the store to my house is much faster if I use a big truck rather than making dozens (if not hundreds) of trips with my Honda Civic.
  
  Nah, just strap it all on top [snopes.com].
  
  Parent Share
  twitter facebook
- - Re:UPGRADES (Score:2)
    
    by Dekortage ( 697532 ) writes: on Wednesday January 10, 2007 @11:07AM (#17539446) Homepage
    
    You make excellent points. That's why we have things called "planning" and "weighing your options".
    Admittedly, many people do not do this very well, which has led to many of humanity's problems throughout history. Database selection and design are just items #92838701283743^199320 and #92838701283743^199320+1 on the list of things people ought to have thought about more over the last few million years.
    
    Parent Share
    twitter facebook
it's all (okay, mostly) in the queries (Score:2)

by yagu ( 721525 ) * writes: <.moc.liamg. .ta. .ugayay.> on Wednesday January 10, 2007 @12:53AM (#17535032) Journal

I've seen drop dead performance on flat file databases. I've seen molasses slow performance on mainframe relational databases. And I've seen about everything in between.
What I see as a HUGE factor is less the database chosen (though that is obviously important) and more how interactions with the database (updates, queries, etc) are constructed and managed.
For example, we one time had a relational database cycle application that was running for over eight hours every night, longer than the alloted time for all night time runs. One of our senior techs took a look at the program, changed the order of a couple of parentheses, and the program ran in less than fifteen minutes, with correct results.
I've also written flat file "database" applications, specialized with known characteristics that operated on extremely large databases (for the time, greater than 10G), and transactions were measured in milliseconds, typically .001 - .005 seconds) under heavy load. This application would never have held up under any kind of moderate requirement for updates, but I knew that.
I've many times seen overkill with hugely expensive databases hammering lightweight applications into some mangle relational solution.
I've never seen the world as a one-size-fits-all database solution. Vendors of course would tell us all different.

Share
twitter facebook
One size still fits all (Score:2)

by bytesex ( 112972 ) writes: on Wednesday January 10, 2007 @04:32AM (#17536348) Homepage

It's just not called SQL driven RDBMS. It's called Sleepycat.

Share
twitter facebook
Specialization is faster but can be harmful (Score:2)

by Terje Mathisen ( 128806 ) writes: on Wednesday January 10, 2007 @06:38AM (#17537036)

23 years ago I wrote a custom DB to maintain the status of millions of "universal" gift cards, it ran 3-5 orders of magnitude faster (on a 6 MHz IMB AT) than a commercial database running on a big IBM mainframe.

I reduced the key operations (what is the value of this gift card, when was it sold, has it been redeemed previously? etc) to just one operation:

Check and clear a single bit in a bitmap.

My program used 1 second to update 10K semi-randomly-ordered (i.e. in the order we got them back from the shops that had accepted them) records in a database of approximately 10 M records.

20 years later I wrote a totally new version of the same application, but this time the gift cards are electronic debet cards. This time I used Linux-Apache-MySQL-Perl to make a browser-based version, and I stored everything in the DB. Today that is plenty fast enough, and it allows us to make any kind of queries against the DB, like "How many transactions of less than 100 kr was accepted in December, broken down by business area/chain/shop/etc"

Terje

Share
twitter facebook
Creative Commons License (Score:3, Interesting)

by pfafrich ( 647460 ) writes: <(rich) (at) (singsurf.org)> on Wednesday January 10, 2007 @07:27AM (#17537278) Homepage

Has anyone noticed the This article is published under a Creative Commons License Agreement, its the first time I've seen this applied to an academic paper. Another small step for the open-content movement.

Share
twitter facebook
Where's part 1? (Score:2)

by FlopEJoe ( 784551 ) writes: on Wednesday January 10, 2007 @09:04AM (#17537886)

This is titled "OSFA? - Part 2: Benchmarking Results." Has anyone found Part 1?

Share
twitter facebook
MUMPS (Score:2)

by dpbsmith ( 263124 ) writes: on Wednesday January 10, 2007 @10:11AM (#17538670) Homepage

This is, of course, what MUMPS [wikipedia.org] advocates have been saying for years.

MUMPS is a very peculiar language that is very "politically incorrect" in terms of current language fashion. Its development has been entirely governed by pragmatic real-world requirements. It is one of the purest examples of an "application programming" language. It gets no respect from academics or theoreticians.

Its biggest strength is its built-in "globals," which are multidimensional sparse arrays. These arrays and the elements in them are automatically created simply by referring to them. The array indices are arbitrary strings. There can be an arbitrary number of subscripts and the same array can have elements with different numbers of subscripts. Oh, and they're always sorted automatically; each element is created automatically in its proper sequence, and there are fundamental operators for traversing arrays in sequence.

"Global" arrays are persistent across sessions, are stored on the disk, and as in ordinary practice can be hundreds of megabytes in size.

Before you say "this can all be done simply by writing a C++ class," I have to mention the important point, which is that the use of the globals is so intrinsic to the ordinary way MUMPS is really used in practice, that successful implementions of MUMPS must and in practice do make the implementation of globals efficient.

You really can just use "globals" all the time for everything. They work well enough that you don't need to reserve their use for when they're really needed. They're not a luxury. MUMPS programmers rarely use files, except for interchange in and out of the MUMPS universe. Within MUMPS, data is simply kept in globals; it's just the MUMPS way.

"Globals" are extremely flexible and lend themselves naturally to representations of real-world databases. These representations are typically one-off, ad-hoc representations designed by the programmer, who needs to make up-front decisions about the hierarchical organization in which the data will be stored, and writes special-purpose code to perform the accesses. Naturally, this sounds like the dark ages compared to relational technology, but there is an impressive tradeoff. If MUMPS fits the application, development times are short, and performance is dramatically better than for relational systems.

Whether or not this is important in the year 2006, it was very clear a decade ago when medium-scale database applications were typically hosted on minicomputers, that the same hardware resources could support several times as many users running a MUMPS application as a similar application implemented with a relational database, as various organizations found when they converted... in either direction.

Of course relational systems can and are implemented on top of MUMPS.

MUMPS underlies InterSystems' Cache product, and a MUMPS-like language with historical connections to MUMPS underlies the products of Meditech. I'm not sure what the current status of Pick [wikipedia.org] is, but it has some similarities. The company I currently work for has nothing whatsoever to do with either system... except that our business IT system happens to be Pick-based.

Regardless of whether you think of MUMPS itself, there are almost certainly lessons to be learned from the durability of this language and its effectiveness.

Share
twitter facebook
- Re:Perl & CSV (Score:5, Funny)
  
  by Ingolfke ( 515826 ) writes: on Tuesday January 09, 2007 @11:09PM (#17534202) Journal
  
  How did Perl & CSV fare?
  
  It failed the "relational" part of the test. But it failed very quickly.
  
  Parent Share
  twitter facebook
  - Re:Perl & CSV (Score:5, Funny)
    
    by patio11 ( 857072 ) writes: on Wednesday January 10, 2007 @12:29AM (#17534854)
    
    It failed the "relational" part of the test. But it failed very quickly.
    Yep. On the plus side, the Perl hacker who put it together only wasted the time it took to write one line. Granted, the line was 103,954 characters long. He considered breaking it up into two lines to improve readability but ultimately rejected the notion -- anyone not capable of reading the program clearly had no business messing with it anyhow. (Quick question aside from the snark: since Perl has associative arrays can't it emulate a relational database? It was my understanding that after you've got associative arrays you can get to any other conceivable data structure... assuming you're willing to take the performance hit.)
    
    Parent Share
    twitter facebook
    - Re:Perl & CSV (Score:4, Interesting)
      
      by nuzak ( 959558 ) writes: on Wednesday January 10, 2007 @12:39AM (#17534930) Journal
      
      > It was my understanding that after you've got associative arrays you can get to any other conceivable data structure
      
      Once you have lambda you can get to any conceivable data structure. The question is, do you really want to?
      
      sub Y (&) { my $le=shift; return &{sub {&{sub {my $f=shift; &$f($f)}}(sub {my $f=shift; &$le(sub {&{&$f($f)}(@_)})});}}}
      
      Parent Share
      twitter facebook
      - Re:Perl & CSV (Score:5, Interesting)
        
        by patio11 ( 857072 ) writes: on Wednesday January 10, 2007 @04:08AM (#17536216)
        
        I think it implements a Y combinator. Then again, it could just print out "Just another perl hacker". But I'm guessing on the Y combinator. Lets break it down so its readable:
        
        sub Y (&) {
        my $le=shift;
        return &{
        sub { ## SUB_A
        &{
        sub { ## SUB_B
        my $f=shift;
        &$f($f)
        }
        } ##Close SUB_A's block
        (sub { ## SUB_C
        my $f=shift;
        &$le(sub { ##SUB_D
        &{
        &$f($f)
        }
        (@_)
        }## END SUB_D
        )} ##END SUB_C
        ); ##End the block enclosing SUB_C
        } ## END SUB_A
        } ## Close the return line
        } ##Close sub Y
        
        Y can have any number of parameters you want (this is sort of a "welcome to Perl, n00b, hope you enjoy your stay" bit of pain). The first line of the program assigns le to the first parameter and pops that one off the list. That & used in the next line passes the rest of the list to the function he's about to declare. So we're going to be returning the output of that function evaluated on the remaining argument list. Clear so far?
        
        OK, moving on to SUB_A. We again use the & to pass the list of arguments through to ... another block. This one actually makes sense if you look at it -- take the first argument from the list, evaluate it as a function on itself. We're assuming that is going to return a function. Why? Because that opening parent means we have arguments, such as they are, coming to the function.
        
        OK, unwrapping the arguments. There is only one argument -- a block of code encompassing SUB_C. (Wasted 15 minutes figuring that out. Thats what I get for doing this in Notepad instead of an IDE that would auto-indent for me. Friends don't let friends read Perl code.)
        
        By now, bits and pieces of this are starting to look almost easy, if no closer to actual readable computer code. We reuse the function we popped from the list of arguments earlier, and we use the same trick to get a second function off of the argument list. We then apply that function to itself, assume the result is a function, and then run that function on the rest of the argument list. Then we pop that up the call stack and we're, blissfully, done.
        
        So, now that we understand WTF this code is doing, how do we know its the Y combinator? Well, we've essentially got a bunch of arguments (f, x, whatever). We ended up doing LAMBDA(f,(LAMBDA(x,f (x x)),(LAMBDA(x,f (x x)))) . Which, since I took a compiler class once and have the nightmares to prove it, is the Y combinator.
        
        Now you want to know the REALLY warped thing about this? I program Perl for a living (under protest!), I knew the answer going in (Googled the code), and I have an expensive theoretical CS education which includes all of the concepts trotted out here... and the Perl syntax STILL made me bloody swim through WTF was going on.
        
        I. Hate. Perl.
        
        And the reason I hate Perl, more than the fact that the language makes it *possible* to have monstrosities like that one-liner, is that the community which surrounds the language actively encourages them.
        Read the rest of this comment...
        
        Parent Share
        twitter facebook
        
        Re:Perl & CSV (Score:3, Insightful)
        
        by shotgunefx ( 239460 ) writes: on Wednesday January 10, 2007 @10:24AM (#17538836) Journal
        
        Outside of "Golfing", I'd strongly disagree. I don't think the community encourages it for the most.
        
        This is from someone who's spent the last seven years with Perl and in the community. YMMV
        
        Parent Share
        twitter facebook
        
        Re:Perl & CSV (Score:3, Informative)
        
        by FacePlant ( 19134 ) writes: on Wednesday January 10, 2007 @11:42AM (#17540004)
        
        And the reason I hate Perl, more than the fact that the language makes it *possible* to have monstrosities like that one-liner, is that the community which surrounds the language actively encourages them.
        
        Not all of us encourage this.
        
        Its considered *clever* and a mark of great skill that you can strip out all the code that actually explains WTF your code is doing and be left with the perfectly compressed version.
        
        They call this Perl Golf (shaving strokes of your game. Get it?)
        Many of us do not consider it clever. Rather, we consider it stupid and counter-productive.
        
        On the other hand, all of the sample answers posted at the Python Challenge are all golf style, and the Python Challenge is supposed to be a learning tool.
        
        This is modeled as good Perl style to folks just starting with the language,
        People who do this should be tied up with string and left in small dark rooms. For a month.
        
        the Llama book has lots of code which looks like that, and code samples you find will look like it too.
        This just isn't the case. The code samples in the Llama are no more or less obtuse the code samples in my Pragmatic Ruby book.
        
        It appears that the community largely does not teach perl like it is a language that needs to be read.
        I wish I could argue more strongly with you here, other than to assert that I come across code in many languages (Perl, Ruby, Java, C, Lisp), on a regular (daily, weekly, monthly) basis, at work and at home, in books, magazines, and online that appear written to not be read.
        
        Your complaint of bad coding practices is endemic to the industry, and should not be used to condemn a language because it allows the freedom to code poorly.
        
        Parent Share
        twitter facebook
        
        Write-only languages (Score:5, Insightful)
        
        by mysticgoat ( 582871 ) writes: on Wednesday January 10, 2007 @04:15AM (#17536252) Homepage Journal
        
        As any English teacher will tell you, any language that will support great poetry and prose will also make it possible to write the most gawdawful cr*p. Perl bestows great powers, but the perl user must temper his cleverness with wisdom if he is to truly master his craft.
        However in this specific case Google reveals that
        
        ## The Y Combinator sub Y (&) { my $le=shift; return &{ sub { &{sub { my $f=shift; &$f($f) } } (sub { my $f=shift; &$le(sub { &{&$f($f)}(@_) }) }); } } }
        
        was simply "borrowed" from y-combinator.pl [synthcode.com]. This is an instance of Perl being used in a self-referential manner to add a new capability (the Y combinator allows recursion of anonymous subroutines (why anyone would bother to do such an arcane thing comes back to the English teacher's remarks)). Self-referential statements are always difficult to understand because, well, they just are that way (including this one).
        
        Parent Share
        twitter facebook
    - Re:Perl & CSV (Score:2)
      
      by mysticgoat ( 582871 ) writes: on Wednesday January 10, 2007 @03:28AM (#17536016) Homepage Journal
      
      since Perl has associative arrays can't it emulate a relational database?
      I've built actual relational databases to run in memory using Perl's hashes. This was a good way of doing some prototyping for user feedback before telling the MUMPS coders what it was exactly that we wanted them to do. (Their titles were "Programmer/Analyst", but neither one had any interest or skill in analyzing clinical needs: they were both happy to be just codemonkeys.) Performance with Perl was pretty snazzy but my constant worry was that some clever user would find a repeatable way to thrash the disk cache and make the project look bad— but that never happened. Persistence was with modified csv files (using the pipe char as the delimiter since it never occurred in the data sets). The memory resident tables were loaded on startup and written back to disk on shutdown, and we didn't worry about losing data in crashes since these were prototypes, not live. We could open up the disk files between runs with Excel, and use it to do some sanity checking, or introduce strange conditions. The biggest problem was cajoling the doctors and nurses to drop by and play with the prototype, and then try to get useful feedback out of some of them.
      
      Parent Share
      twitter facebook
  - - Re:Perl & CSV (Score:2)
      
      by Ingolfke ( 515826 ) writes: on Wednesday January 10, 2007 @10:52AM (#17539206) Journal
      
      Ah... another one of my ideas shot down by CPAN :)
      
      Parent Share
      twitter facebook
- Re:MIT, not berkley (Score:2)
  
  by hey hey hey ( 659173 ) writes: on Wednesday January 10, 2007 @02:20AM (#17535622)
  
  Back when he did postgres, it was at berkley. He then moved on to the private world to do a start-up from it.
  He was still a professor at Berkeley while working with RTI/Ingres Inc. He didn't leave until his wife wanted to move back near her family (which was after the sale of both Ingres and Illustra). I was at (no doubt just one of) the going away lunches.
  So now he is at MIT. Well, at least MIT picks up good ones.
  Certainly true. Mike is as bright as they come.
  
  Parent Share
  twitter facebook
- Re:No specifics (Score:2, Interesting)
  
  by dedrop ( 308627 ) writes: on Wednesday January 10, 2007 @05:24AM (#17536612)
  
  There's a reason for that. Many years ago, the Wisconsin database group (David DeWitt in particular) authored one of the first popular database benchmarks, the Wisconsin benchmarks. They showed that some databases performed embarrassingly poorly, which made a lot of people really angry. In fact, Larry Ellison got so angry, he tried to get DeWitt fired (Ellison wasn't clear on the concept of tenure). Since then, major databases have a "DeWitt clause" in their end-user license, which says that the name of the database can't be used when reporting benchmark results.
  
  And this years ahead of Microsoft not allowing users to benchmark Vista at all!
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

"In the last quarter century..." (Score:2, Funny)

Stonebreaker has a vested interested in Stream Dbs (Score:2, Informative)

Re:Stonebreaker has a vested interested in Stream (Score:2)

Was there ever a one-size-fits-all database? (Score:2)

Re:Was there ever a one-size-fits-all database? (Score:2, Funny)

Re:Was there ever a one-size-fits-all anything? (Score:2)

Re:Was there ever a one-size-fits-all anything? (Score:4, Funny)

Re:Was there ever a one-size-fits-all anything? (Score:3, Funny)

Re:Was there ever a one-size-fits-all database? (Score:3, Informative)

Noticed how roll your own is faster? (Score:2, Interesting)

Re:Noticed how roll your own is faster? (Score:5, Interesting)

Re:Noticed how roll your own is faster? (Score:5, Informative)

Re:Noticed how roll your own is faster? (Score:2)

Taken seriously (Score:3, Funny)

Re:Noticed how roll your own is faster? (Score:2)

Re:Noticed how roll your own is faster? (Score:2)

Re:Noticed how roll your own is faster? (Score:2)

Re:Noticed how roll your own is faster? (Score:2)

Re:Noticed how roll your own is faster? (Score:2)

Re:Noticed how roll your own is faster? (Score:2)

Re:Noticed how roll your own is faster? (Score:3, Interesting)

Prediction... (Score:5, Insightful)

Re:Prediction... (Score:4, Interesting)

Re:Prediction... (Score:3, Interesting)

Re:Prediction... (Score:2)

Re:Prediction... (Score:3, Insightful)

Re:Prediction... (Score:3, Informative)

Re:Prediction... (Score:2)

one size fits 90% (Score:5, Insightful)

Re:one size fits 90% (Score:2)

Re:one size fits 90% (Score:2, Insightful)

New hat same as the old hat (Score:2)

Re:one size fits 90% (Score:2)

Imagine that.... (Score:5, Insightful)

Why imagine, just read ;-) (Score:2)

Dammit (Score:5, Insightful)

Re:Dammit (Score:3, Funny)

Re:Dammit (Score:2)

Re:Dammit (Score:2)

Re:Dammit (Score:2)

Re:Dammit (Score:2)

Re:Dammit (Score:2)

Re:Dammit (Score:3, Insightful)

Re:Dammit (Score:2, Insightful)

Re:Dammit (Score:2)

Re:Dammit (Score:2)

Re:Dammit (Score:2)

Duh (Score:5, Insightful)

Re:Duh (Score:5, Informative)

Re:Duh (Score:5, Insightful)

Re:Duh (Score:2)

Re:Duh (Score:3, Insightful)

Re:Duh (Score:2)

Re:Duh (Score:2)

Re:Duh (Score:3, Interesting)

Re:Duh (Score:2, Insightful)

Re:Duh (Score:2, Insightful)

Re:Duh (Score:4, Insightful)

Re:Duh (Score:2)

Re:Duh (Score:2, Insightful)

I thought I was an assembler demon (Score:2)

Re:I thought I was an assembler demon (Score:2)

Re:I thought I was an assembler demon (Score:2)

Re:Duh (Score:2)

Parallel databases (Score:2)

SQL is Dead - Long Live SQL (Score:2)

SQL is part of the problem. (Score:2)

Re:SQL is Dead - Long Live SQL (Score:2)

This has been known for years already (Score:3, Interesting)

Re:This has been known for years already (Score:4, Insightful)

Re:This has been known for years already (Score:2)

Re:This has been known for years already (Score:2)

Re:This has been known for years already (Score:2)

Please reduce lameness (Score:5, Insightful)

Re:Please reduce lameness (Score:2, Funny)

Death to Trees! (Score:3, Interesting)

Re:Death to Trees! (Score:3, Insightful)

Comment removed (Score:2)

Re:Please reduce lameness (Score:2)

Isn't it just stating the obvious? (Score:5, Funny)