Follow Slashdot stories on Twitter

Brian Aker On the Future of Databases 175

Posted by kdawson on Tuesday June 03, 2008 @06:52PM from the everything-you-know-changed dept.

blackbearnh recommends an interview with MySQL Director of Technology Brian Aker that O'Reilly Media is running. Aker talks about the merger of MySQL with Sun, the challenges of designing databases for a SOA world, and what the next decade will bring as far as changes to traditional database architecture. Audio is also available. From the interview: "I think there's two things right now that are pushing the changes... The first thing that's going to push the basic old OLCP transactional database world, which... really hasn't [changed] in some time now — is really a change in the number of cores and the move to solid state disks because a lot of the... concept around database is the idea that you don't have access to enough memory. Your disk is slow, can't do random reads very well, and you maybe have one, maybe eight processors but... you look at some of the upper-end hardware and the mini-core stuff,... and you're almost looking at kind of an array of processing that you're doing; you've got access to so many processors. And well the whole story of trying to optimize... around the problem of random I/O being expensive, well that's not that big of a deal when you actually have solid state disks. So that's one whole area I think that will... cause a rethinking in... the standard Jim Gray relational database design."

This discussion has been archived. No new comments can be posted.

Brian Aker On the Future of Databases

Load All Comments

Search 175 Comments Log In/Create an Account

Comments Filter:

Well (Score:5, Funny)

by CrispBH ( 822439 ) writes: on Tuesday June 03, 2008 @07:03PM (#23645175)

I couldn't... agree... more... I'd say that some... very valid... points... have been... raised.

Share
twitter facebook
- Re: (Score:2, Funny)
  
  by felipekk ( 1007591 ) writes:
  
  Very ... funny
- Re: (Score:3, Informative)
  
  by Bogtha ( 906264 ) writes:
  
  Come on, he's talking about the future of databases. He was just trying to set the mood by doing his best Kirk impression.
- - Re: (Score:3, Informative)
    
    by emurphy42 ( 631808 ) writes:
    
    Service-oriented architecture [wikipedia.org]
  - Re: (Score:3, Funny)
    
    by njcoder ( 657816 ) writes:
    
    SOA = Shit-Outta-Acronyms
    - Re:Well (Score:4, Funny)
      
      by liloldme ( 593606 ) writes: on Wednesday June 04, 2008 @09:13AM (#23650609)
      
      Same Old Architecture
      
      Parent Share
      twitter facebook
- - - Re: (Score:2, Funny)
      
      by deroby ( 568773 ) writes:
      
      Wouldn't he rather say something along the lines of : damnedideloodi neighbour !
Dear Slashot (Score:5, Funny)

by CopaceticOpus ( 965603 ) writes: on Tuesday June 03, 2008 @07:07PM (#23645217)

Can we please have another loud, circular debate over which database is best? It's the only way your favorite database will ever win.

Thank you.

Share
twitter facebook
- Re: (Score:2)
  
  by CastrTroy ( 595695 ) writes:
  
  Since you brought up the topic let me bite. I was just doing some testing today to see how certain queries we were running would run on MySQL, as compared to SQL server. Just to see if the query was so slow because of the size of the dataset, or because SQL Server is made by MS. Anyway I was in the process of transferring data, and I ran into this weird 4GB MyISAM table size limit. You can run a query and alter the table so that it can store more data, but what kind of DB in the year 2008 doesn't by def
  - Re: (Score:3, Informative)
    
    by Bacon Bits ( 926911 ) writes:
    
    Simply put, MyISAM isn't meant for data sets that large. It's meant to be fast with less regard for data integrity than the alternatives. That's by design. When you increase the max size of of the table, you change the bit length of the addresses used for indexing the table and such. Increasing the bit length slows the system, particularly when the bit length exceeds the bit size of the processor. I'd argue more that the default engine should be InnoDB rather than MyISAM, and that internal tables shoul
    - Re: (Score:2)
      
      by crucini ( 98210 ) writes:
      
      I was with you until this part:
      If all you care about is speed, then you can get even more if you go with an embedded database like Firebird or SQLite. Or try a flat file. Those are terrifically fast if you do them right.
      In my experience, SQLite is slower than MySQL, despite being in-process. SQLite is transactional, remember? And flat files will only support linear scan lookups, unless you implement your own indexing scheme. Which would make them "non-flat". Unless the table is very small, MySQL will b
    - Re: (Score:2)
      
      by krow ( 129804 ) * writes:
      
      MyISAM is an analytical engine. Load data into it and then run queries. It has some limited concurrent functionality but really its a "load and query" storage engine.
      
      In 90% of the cases I have seen, Innodb is the right pick. Most people learn this the first time they have to deal with a corrupt MyISAM table.
    - - Re: (Score:2)
        
        by Bacon Bits ( 926911 ) writes:
        
        Oh, I quite agree, but given the complaint about query speed here and the limitations of MyISAM on MySQL, the only situation I could envision was one that involved some highly complex query on the order you'd see in OLAP. Or some poorly designed database. What kind of query would take so long with no need for procedural SQL, triggers, writable views, or foreign key constraints? Obviously, I haven't seen the DB in question, but it's suspicious.
  - Re: (Score:2)
    
    by Ed Avis ( 5917 ) writes:
    
    Yes I realize that InnoDB doesn't have these crazy limitations, but MyISAM is much faster,
    That in a nutshell is the problem with any discussion of MySQL - or indeed with MySQL itself. Any limitation like 'doesn't support tables >4Gibyte' can be answered with 'well use a different table type'. Meanwhile, any benchmark or scalability test can be won by MySQL by using MyISAM tables. It's not really meaningful to talk about 'MySQL' at all; you should say 'MySQL/MyISAM', 'MySQL/InnoDB' and so on. They are
  - Re: (Score:2)
    
    by SteveAyre ( 209812 ) writes:
    
    Not any longer - I'm using 5.0 and have an 8GB MyISAM table currently... I believe they've changed it so the 32bit number is no longer an offset but is instead a row, so the limit is now 4.3billion rows which is much larger. There's still a limit, but it's several orders of magnitude larger.
    
    However, you should look at moving to using InnoDB once your data gets that large as handles locking far better and AFAIK has no limit on the size of tables. You'll also probably find it faster for many queries since its
    - Re: (Score:2)
      
      by CastrTroy ( 595695 ) writes:
      
      So then why does MyISAM even exist? For large tables, you have to use InnoDB, because MyISAM doesn't handle lots of rows easily, and for samll numbers of rows, it doesn't make much of a difference, because both will be fast if you are only dealing with a small amount of data.
  - - Re: (Score:2)
      
      by Bacon Bits ( 926911 ) writes:
      
      No. MyISAM uses a 32-bit address for row pointers for tables. A 32-bit address limits you to a 4GB addressable size.
      
      Besides, the last file system to have a 4GB file size limit was FAT32. NTFS's and HFS+'s limits are in exabytes and ext2's is in terabytes.
- Re:Dear Slashot (Score:5, Interesting)
  
  by dave87656 ( 1179347 ) writes: on Wednesday June 04, 2008 @12:48AM (#23647687)
  
  Okay, I'll bite too ...
  
  We've been running MySQL using MyISAM since 2002. It's delivered acceptable performance until recently as we've expanded our application and the data volumes have increased. Now, we have to reorganize it on a frequent basis (we just backup and restore).
  
  But, we really need to move to a transactional model so I've done some benchmarking between InnoDB and Postgresql. In almost all cases, Postgresql was significantly faster. Our application is very transactional with alot of writes.
  
  And from what I've read, Postgresql scales well to multiprocessors and multiple cores where as MySQL does not. I know Falcon is coming but it was still very Alpha at the time I compared - I couldn't get it to run long enough to perform the tests.
  
  Has anyone else compared Postgres to MySQL/Innodb?
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by ChrisA90278 ( 905188 ) writes:
    
    That's exactly what my tests showed too. The MySQL does not scale well. If you have many concurrent clients it slows. The reason is just that it locks entire tables. PostgreSQL and others have a finer ganularity locks that ca lock rows. With the fine lock more people can get in at ones ut with my SQL they all just stand in line waiting for their turn to hold the lock.
    
    We were hitting MySQLs limit at about 10 concurrent users but it would vary based on how active each of them are and the desiggn of the d
    - - Re: (Score:2)
        
        by TheLink ( 130905 ) writes:
        
        Seems there are (were?) some performance issues with innodb when it comes to concurrent inserts into tables with a "autoincrement" primary key.
Leaky abstractions (Score:5, Interesting)

by yoris ( 776276 ) writes: on Tuesday June 03, 2008 @07:10PM (#23645239)

Gotta love that link between the hardware limitations and the software concepts that may seem fancy but are essentially only built to get around them. I believe someone once called it "the law of leaky abstractions" - would be interesting to see what the new limitations would be if you start combining solid-state storage with pervasive multiprocessing, i.e. what can you do with a multi-processor multi-sdd server that you can not do with a single-processor single-hard drive server?

I think TFA is pretty right on the money that parallellization and massive use of SSD could cause some pretty fundamental changes in how we approach database optimization - if I were to imagine that rack that I'm staring at being filled with SSD drives and processors instead of with nothing but hard drives... locality of data takes on a whole new meaning if you don't require data to be on the same sector of the HD, but rather want certain sets of data to be stored on storage chips located around the same processor chips to avoid having to overload your busses.

Then again, I haven't been in this game for so long, so maybe I'm overestimating the impact. Oldtimer opinion would be very welcome.

Share
twitter facebook
- Re: (Score:2)
  
  by Crispy Critters ( 226798 ) writes:
  
  I don't think that is the way that "leaky abstractions" is generally used. Leaky abstractions occur when you need to understand the underlying technology to predict how something will act.
  One example I know is in Python.
  >>> a=[[5]]
  >>> b=a
  >>> a[0][0]=3
  >>> b
  [[3]]
  Should changing one variable also change another? It makes sense if you think of a Python list as linked list of pointers. (It acts that way -- I know nothing about the internals of Python.) The behavio
  - Re: (Score:2)
    
    by tepples ( 727027 ) writes:
    
    Should changing one variable also change another?
    If both variables point to the same object, yes. This is the case for Java's ArrayList and Python's list, which share similar semantics, and for Java's HashMap and Python's dict.
  - Re: (Score:2)
    
    by K. S. Kyosuke ( 729550 ) writes:
    
    It does not change the variable. In Python, a variable is a lexically scoped named reference to a data object. That means that (lexically - by position in the source code) the same variable can point to several different objects at the same time, if you create multiple closure environments (e. g., by returning an inner function from within the enclosing function, where the inner function refers to a variable in an enclosing scope), or simply if you have several frames of the same function on the call stack,
Get ready, aim, fire! (Score:1, Funny)

by Anonymous Coward writes:

Eagerly awaiting the gratuitous Sun trash talking! Every thread seems to devolve into Sun bashing (especially with regard to the MySQL acquisition). Everyone: Sun did not decide to close source MySQL. If anything they are pushing against it.
This IS news! (Score:5, Funny)

by Bluesman ( 104513 ) writes: on Tuesday June 03, 2008 @07:16PM (#23645305) Homepage

MySQL has people who are responsible for *designing* it? I'm shocked, Shocked.

Share
twitter facebook
- Re: (Score:2)
  
  by BigGerman ( 541312 ) writes:
  
  bravo. I wish I had modpoints.
- - Re: (Score:3, Informative)
    
    by gfody ( 514448 ) writes:
    
    I'm sure he meant the 'implementation of'
    Relational algebra has nothing to do with random IO however building a relational database system has everything to do with random IO because it is by and large the worst bottleneck in the system. The best performing RDBMSs are the ones completely designed around avoiding random IO. That's why TFA says a new RDBMS could be created from scratch and blow the existing players out of the water in the new SSD world.
Admittedly.... (Score:2, Interesting)

by Enderandrew ( 866215 ) writes:

I haven't read the article yet, but that summary terrifies me. I keep hearing how in the modern age we shouldn't think about optimal programming because people have more resources than they need.

Databases need to scale to disgusting large numbers. Memory and disk resources should always be treated as expensive, precious commodities, because they might be plentiful on a simple database on robust hardware, but there are plenty of people out there with massive friggin' databases.

In corporate America, Oracle
- Re: (Score:3)
  
  by Samah ( 729132 ) writes:
  
  Just as a reference Re: big applications and databases, the company I work for maintains a transport management system for the largest rail freight provider in Australia. The database for this application (Oracle 10g) currently stands at 1100+ tables (some containing a few million rows) 2400+ packages, and 450+ triggers. The client application is comprised of 2100+ java source files and 500+ interactive Swing forms.
  Your statement of "Databases need to scale to disgusting large numbers." is spot on. ;)
  - Re: (Score:3)
    
    by Enderandrew ( 866215 ) writes:
    
    I used to work for a casino company (the largest in the world at the time, and largely is).
    
    Our slots database didn't just track totals for rewards credits and such, we kept every transaction ever. We can see how long you played at each machine, for how long, when you put money it, when you hit, when you took money out, etc. We know when you want to come in and play, which machines, etc. Consider that for every customer in a casino. Now consider than on a scale of 60 major casinos.
    
    I'm not a DBA, so I was
    - Re: (Score:3, Funny)
      
      by Samah ( 729132 ) writes:
      
      UPDATE SlotMachines SET ChanceToWinPercent = 100;
      COMMIT;
      try {
      machine.collectMoney();
      player.run();
      } catch(SecurityException e) {
      player.resist();
      if(player.isFast())
      casino.exit(0);
      else
      player.yield();
      }
      - Re: (Score:2)
        
        by aproposofwhat ( 1019098 ) writes:
        
        finally {
        casino.profit();
        }
        :P
        
        Re: (Score:2)
        
        by Samah ( 729132 ) writes:
        
        ...I so wish I'd thought of that!
        :)
- Re: (Score:2)
  
  by CastrTroy ( 595695 ) writes:
  
  All software should be tested and optimized for performance. I ran my web app through the profiler today, and found out it was doing tons extra work, which basically accomplished nothing, and which ended up slowing things down quite a bit. It's amazing how much things can be sped up just by running a profiler. All your unit tests may pass, and everthing may appear to be working correctly, but running through a profiler can fix some major bottlenecks quite quickly.
  - Re: (Score:2)
    
    by hemp ( 36945 ) writes:
    
    Sounds like your web app was badly designed ( or maybe not at all ). You should have not spent any time coding superfluous functions and procedures in the first place.
    - Re: (Score:2)
      
      by CastrTroy ( 595695 ) writes:
      
      Yeah, well back here in the real world, people have to deal with applications that are 15 years old, and have tons of legacy code, a good portion of which wasn't "designed". It's nice to talk about how things should be in and ideal project, but in real life, things are usually not ideal.
    - Re: (Score:2)
      
      by Alpha830RulZ ( 939527 ) writes:
      
      I believe Edgar Dijkstra said it well: "Premature optimization is the root of all evil." Profilers exist for a reason, which is that in significant systems, it can be hard to predict where the system and users will spend all their time. It's not economic for a project to optimize everything, all the time. Nor should you - the job is to get good work done. A slow, correct routine is often just fine, for my work at least.
      - Re: (Score:2)
        
        by WuphonsReach ( 684551 ) writes:
        
        I believe Edgar Dijkstra said it well: "Premature optimization is the root of all evil."
        
        Pretty much spot-on. When designing / developing a system, you should work as fast as possible without doing anything stupid (like using a bubble sort). Get it working, make it faster later.
        
        The art lies in knowing when to do which method (quick and dirty vs smart and efficient) so that you don't end up with something so slow as to be a project-killer.
- - Re: (Score:2)
    
    by m0rph3us0 ( 549631 ) writes:
    
    wow.
    
    I imagine a schema like OrderItemId1, OrderItemId2 .. OrderItem500. I have a feeling if you designed your schema, it would be much faster.
  - Re: (Score:2)
    
    by njcoder ( 657816 ) writes:
    
    I'm constantly having to compromise with the physical limitations of my database, such as having to pick and choose which attributes to delete, as the tables in my Oracle database reached it's 1000 column limit a long time ago.
    You do know you can create more than one table right? You don't have to normalize every schema so much :)
    
    Seriously though, I'm curious what type of application would need and actually benefit from such a design.
    - - Re: (Score:2, Informative)
        
        by allanw ( 842185 ) writes:
        
        If you find you have to create thousands of columns response_0001, response_0002, ... response_4096, then you should probably realize that there's something wrong with your schema. It's just basic database normalization. (Though I suppose you might have a reason for doing it this way. But it sounds incredibly horrible.)
      - Re: (Score:3, Insightful)
        
        by njcoder ( 657816 ) writes:
        
        Yeah, like the other responders have mentioned. Sounds like you're doing it horribly wrong.
        
        I don't know much about SPSS, but if you need to present the data in such a flat format, you can always store the data the proper way and create a view when you need to import it into SPSS.
        
        I'm not sure if there is some sort of limit on the number of columns in Oracle views or not.
      - Wenxiong is that you? (Score:2)
        
        by HornWumpus ( 783565 ) writes:
        
        Still up to your old tricks I see.
        When you run out of columns just join to Participant2 and keep adding columns.
      - Re: (Score:2, Informative)
        
        by spatialguy ( 951355 ) writes:
        
        A database table is not an Excel sheet with fewer limits! Have some local wizard help you in the design. And if you use postgresql or any other full featured database, you can use views to retrieve your data in a format you need for analysis.
      - gilding the lily (Score:2)
        
        by Hognoxious ( 631665 ) writes:
        
        In my system, each participant is a row, and each response is a column.
        
        I was going to post a comment along the lines of "why, to keep track of what products each customer buys, of course!" to the grandparent post [slashdot.org].
        
        However sometimes the truth is stranger than fiction and sarcasm is superfluous.
  - Re: (Score:3, Informative)
    
    by CastrTroy ( 595695 ) writes:
    
    I'm going along with the other two guys. I can't see what application would need more than 1000 columns in a single table. What really gets my is the MS SQL Server 2000 maximum of 8 KB ( SQL Server 7 was 2 KB) in a single row. Now there's a limitation that's badly designed. Oh, and you can define a table with 15 Varchar(8000) fields, just don't try filling every field. 1000 columns I could do just fine with (SQL Server supports 2048?) but the big killer is that you can't even use 2000 columns, because i
    - Re: (Score:2)
      
      by hackstraw ( 262471 ) writes:
      
      ..the big killer is that you can't even use 2000 columns, because if you did, you would run out of space in the row, unless the average field size was under 4 bytes.
      
      I'm so glad I've never actually needed to put, like, data in a database. Sounds tricky.
  - Re: (Score:2)
    
    by CrazedWalrus ( 901897 ) writes:
    
    ?!
    
    1000? That's "One Thousand" ?? As in ten times more than 100?
    
    You might want to take some database design courses if you consider a 1000 column limit to be unreasonable or unworkable. The widest table I'd ever seen was one where the original programmer did a poor job of normalization (that is, he didn't do it), and even that only hit about 150 to 200 columns (times about 60 million rows. Ouch.) Not only is it difficult to maintain wide tables like that, the database will probably perform poorly as well whe
    - Re: (Score:2)
      
      by WuphonsReach ( 684551 ) writes:
      
      Even though I'm a firm believer in normalization, there are specific cases where you will end up with 1000+ attributes for a single object/entity. Sure, you could refactor that into a single-many join into a new table, but that can often be unnecessarily complex.
      
      For example, an entity with a few hundred attributes, all of which are of slightly different data types (some are single byte, some are 2-byte integers, some are varchar). To create a single-many sub-table to store those attributes, there are no
  - Re: (Score:2, Interesting)
    
    by TheFlamingoKing ( 603674 ) writes:
    
    I wonder if I've been reading Slashdot too long - I can't tell whether this is a troll, a joke, a newbie, or an actual legitimate issue...
BigTable, HBase and SimpleDB are the future (Score:3, Informative)

by atomic777 ( 860023 ) writes: on Tuesday June 03, 2008 @07:33PM (#23645481)

I recently blogged on this [tomictech.com], but essentially, as long as your average PHP developer thinks of MySQL as a glorified flat file system to place their serialized PHP objects, an always-available, pay-as-you-go distributed database is going to revolutionize application development in the coming years. For those that want to keep control of their data, HBase [apache.org] is coming along quite nicely.

Share
twitter facebook
Can I ask a stupid question... (Score:3, Interesting)

by Enderandrew ( 866215 ) writes: <enderandrewNO@SPAMgmail.com> on Tuesday June 03, 2008 @07:34PM (#23645501) Homepage Journal

I'm actually reading the article now, and as he is talking about design for a database taking multiple cores into consideration, etc, I'm wondering if the traditional lock approach used in MySQL (and most SQL databases as far as I know) somewhat kills parallel operations. Wouldn't the interbase approach work better in a parallel environment?

Again, I'm sure this is a stupid question, but perhaps someone could clue me in.

Share
twitter facebook
- Re:Can I ask a stupid question... (Score:4, Informative)
  
  by Johnno74 ( 252399 ) writes: on Tuesday June 03, 2008 @07:58PM (#23645743)
  
  Umm I'd say you have it wrong - "Traditional" databases have many different lock granularities, such as Table locks, page locks and row locks. SQL server and Oracle certainly do this.
  
  MySQL only does table locks, which are much simpler and much faster for light workloads, but as I'm sure you can imagine when you have many CPUs trying to update the table at once in the end each thread has to wait their turn to grab the lock and perform their updates sequentially.
  
  In SQL Server, Oracle, or any other "enterprisey" db multiple threads can update the same table at exactly the same time, as long as its not the same row.
  
  Stuff like this is exactly why people who use MS-SQL and oracle look down their nose at people who use MySQL and claim it is capable of playing with the big boys.
  
  Once again, despite what MySQL are saying there is nothing innovative here. All this stuff has existed in the mainstream database engines for many, many years and they are still playing catchup.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by XanC ( 644172 ) writes:
    
    What you say is true for MyISAM tables, but MySQL's InnoDB tables fully support row-level locking. And I believe their BDB tables support page-level locking.
    - Re: (Score:2)
      
      by Johnno74 ( 252399 ) writes:
      
      Cheers for the info, I thought it still did table locks only.
      
      Don't get me wrong, I have nothing against MySQL... It has its place, and it does seem to be improving. What does make me grind my teeth is when people say its a serious competitor to MS SQL, or oracle. The typical argument is all the other things that these DBs give you (like proper ACID transactions, distributed transactions, fine-grained security, triggers, functions, etc etc) are just fluff and you don't need them anyway....
- Re: (Score:2)
  
  by kestasjk ( 933987 ) writes:
  
  I'm wondering if the traditional lock approach used in MySQL (and most SQL databases as far as I know) somewhat kills parallel operations
  I'm not sure what you mean, fine-grained locking is up to the engine, and MySQL can't really do it.
  InnoDB is the classic example; you can get MySQL to lock entire InnoDB (or MyISAM) tables, but if you want to lock a certain row in an InnoDB table MySQL can't help, because it doesn't know about InnoDB.
  
  If you're saying "the MyISAM/MySQL table-based locking won't work in a parallel environment", then you're right, but that's nothing new that relates SSDs and micro-cores.
  
  Fully transactional engines have
  - Re: (Score:2)
    
    by Enderandrew ( 866215 ) writes:
    
    As far as I understand Interbase (and now Firebird) use a versioning approach as opposed to locking. I wonder if this approach is more beneficial with parallel processing, but you're right in that the concern was already relevant given multiple users in a database at once.
    
    I wonder which one has better performance for multiple operations at once.
    - Re: (Score:3, Informative)
      
      by anarxia ( 651289 ) writes:
      
      It is called MVCC. Other databases such as oracle and postgres also use this approach. MVCC has its pros and cons. It allows for higher concurrency, but it might require extra copies of data and that translates to more memory and disk space. On a "weak" server it might actually end up being less concurrent.
- Re: (Score:2)
  
  by merreborn ( 853723 ) writes:
  
  I'm wondering if the traditional lock approach used in MySQL (and most SQL databases as far as I know) somewhat kills parallel operations. Wouldn't the interbase approach work better in a parallel environment?
  Both interbase and MySQL's InnoDB engine use the concept of Multi Version Concurrency Control (MVCC) to address the issue of simultaneous access -- although their implemenations differ: InnoDB does make use of row-level write locks. The upcoming Falcon engine is said to do the same with even fewer loc
Cores? (Score:2)

by bar-agent ( 698856 ) writes:

In the article, they say the multiple cores will have a big impact on the design of databases. I don't understand this. Databases are already threaded, are they not? Why does multi-core support require different code than multi-thread support?
- - Re:Cores? (Score:5, Insightful)
    
    by bersl2 ( 689221 ) writes: on Tuesday June 03, 2008 @09:04PM (#23646301) Journal
    
    "sudo" is that command which grants one user authorization to act as another user.
    
    "pseudo-" is that verbal prefix which means "false".
    
    I'm seeing language devolve in front of my eyes...
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by MBCook ( 132727 ) writes:
    
    Right. MySQL, at least under Linux, just starts to fall flat on it's face if it's given more than 4 cores. That's the case for 5.0.x, and I think it is still true of 5.1.x (but that's still beta anyway). I seem to remember the situation being better under Solaris, but it's not fixed entirely.
    Oracle, DBII, and (probably to a lesser extent) PostgreSQL can handle many more processors.
    Just like any other software, multithreading is hard. But where there are times in some programs (say games) where stale data
MonetDB (Score:2)

by Frans Faase ( 648933 ) writes:

It almost sounds like he is talking about MonetDB [monetdb.cwi.nl].
Brian's Blog (Score:2)

by apachetoolbox ( 456499 ) writes:

http://tangent.org/ [tangent.org]

Sup Brian!

-Bryan (from apachetoolbox)
self-fulfilling predictions? (Score:2)

by BigGerman ( 541312 ) writes:

If one's product does not support advanced features or, sometimes, even basic common sense features, no wonder one's predictions call for the world that does not need those features.
Locality is the key (Score:5, Interesting)

by Dave500 ( 107484 ) writes: on Tuesday June 03, 2008 @09:55PM (#23646627)

In my mind as a database engineer for a wall street bank, the biggest change in the near term that we forsee is data locality.

Given the amount of computing power on hand today, it may surprise many how difficult it is to engineer a system capable of executing more than a few thousand transactions per second per thread.

Why? Latency. Consider your average SOA application which reaches out to 4-5 remote services or dataserver calls to execute its task. Each network/rpc/soap/whatever call has a latency cost of anything between one and at worst several hundred milliseconds. Lets say for example that the total latency for all the calls necessary is 10 milliseconds. 1000/10=100 transactions per thread per second. Oh dear.

The amount of memory an "average" server ships with today is in the 32-64GB range. Next year it will be in the 64-128GB range. The average size of an OLTP database is 60-80GB.

So, the amount of memory available to the application tier will very soon be greater than the size of the database, warehouses excluded. Moore's law is quickly going to give the application tier far more memory than it needs to solve the average business state, exceptions noted.

The final fact in the puzzle is that for transaction processing, read operations outnumber write operations by roughly 20 to 1. (This will of course vary on the system, but that *is* the average.)

This situation is strongly in favor in migrating read only data caches back into the application tier, and only paying for the network hop when writes are done in the interests of safety. (There is a lot of research into how writes can be done safely asynchronously at the moment, but its not ready yet IMHO.)

Challenges exist in terms of efficient data access and manipulation when caches are large, performant garbage collection and upset recovery - but they are all solvable with care.

Its my opinion that in the near future large data caches in the application tier will become the norm. What has to be worked out is the most effective way of accessing, manipulating and administering that data tier and dealing with all the inevitable caveats of asynchronous data flow.

Some (not complete) examples of implementing this:

Relational Caches (there are many more):
http://www.oracle.com/technology/products/coherence/coherencedatagrid/coherence_for_java.html
http://www.alachisoft.com/ncache/index.html

Object Caches:
http://www.ogf.org/OGF21/materials/970/GigaSpaces_DataGrid_OGF_Oct07.ppt
http://jakarta.apache.org/jcs/

Share
twitter facebook
- Re:Locality is the key (Score:5, Interesting)
  
  by ppanon ( 16583 ) writes: on Tuesday June 03, 2008 @11:31PM (#23647247) Homepage Journal
  
  Interesting ideas, but it would seem that, once your application tier is spread over multiple servers that don't share a memory space, you are going to have significant distributed cache coherency issues. While I can understand the desire to avoid the marshalling overhead involved in database reads and updates, you're also going to have to reinvent the wheel of distributed concurrency control for each application when it's already been solved in a general way in the clustered database.
  
  For instance, from the JCS link you provided:
  JCS is not a transactional distribution mechanism. Transactional distributed caches are not scalable. JCS is a cache not a database. The distribution mechanisms provided by JCS can scale into the tens of servers. In a well-designed service oriented architecture, JCS can be used in a high demand service with numerous nodes. This would not be possible if the distribution mechanism were transactional.
  
  So if you're having to give up transactional integrity to have your distributed cache, I think it's going to have limited applications because it doesn't solve that 1000 transactions per thread problem you indicated. Sure you can work your way around it a little by treating it as extremely optimistic locking to maintain transactional integrity on writes, but it also does limit the accuracy of the cache and for some applications (financial for starters, I would expect) that's going to be an issue.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Interesting)
    
    by Dave500 ( 107484 ) writes:
    
    Extremely valid point.
    
    Not to bash Oracle, but the ultimate scalability of their multi-host database partitioning solution (RAC) is indeed limited by the amount of communication the distributed lock manager needs to make to ensure transactional isolation as the number of partitions/hosts increase. (Caveat to Oracle fans - 80% of requirements are beneath this threshold - so I understand Oracle's strategy.) (An alternative solution is the "shared nothing" partitioning approach (example - db2's DPF) but this
  - - Re: (Score:2)
      
      by ppanon ( 16583 ) writes:
      
      Yeah, I had some thoughts running that way as well. I was also thinking about pipelined asynchronous queries. I think people are trying to avoid retraining the application developer populace to know how to deal with asynchronous (and concurrent) processing because it's much more error prone and many programmers aren't up to the task. People that can do that are going to be more expensive and that's hard to sell to business.
      
      But heck, if you're really running an application with a workload that needs to scale
- why are you blocking? (Score:2)
  
  by Chirs ( 87576 ) writes:
  
  Given the amount of computing power on hand today, it may surprise many how difficult it is to engineer a system capable of executing more than a few thousand transactions per second per thread.
  
  Lets say for example that the total latency for all the calls necessary is 10 milliseconds. 1000/10=100 transactions per thread per second. Oh dear.
  Your numbers assume that you are blocking while waiting for the responses. It should be possible to process other transactions during that time. Some form of async messaging and a state machine, and you should be good to go.
  
  Even if you don't want to go with asynch processing, why is it a problem if you can only handle 100 transactions per thread per second? Linux will happily run with thousands of threads.
  
  I'm sure there's some reason why it's more complicated than it sounds, I'm curious what it is.
- - Re: (Score:2)
    
    by Dave500 ( 107484 ) writes:
    
    You are right - it does.
    
    The best methodology on how to efficiently and robustly manipulate large data sets in the application tier is still up for debate. In memory databases are but one potential solution to this requirement - there are many others. The javaspaces/gigaspace grid style solution is another.
    
    Every current approach to this issue at present has is pros and cons - that's the price for being on the bleeding edge I guess.
    
    What cannot be ignored is that we are on the edge of a potential paradigm shi
Everyone knows which database is the best! (Score:2)

by csoto ( 220540 ) writes:

PC-File (from good 'ol Jim "Button" Knopf of Buttonware). That and PC-Write (Bob Wallace, Quicksoft) are all everyone ever needed. That and 640K. Where's my box of floppies?
Solid state storage devices are more than disks. (Score:5, Interesting)

by Animats ( 122034 ) writes: on Tuesday June 03, 2008 @11:09PM (#23647135) Homepage

Until recently, solid state storage devices have been treated as "disks". But they're not disks. They have orders of magnitude less latency.
For files, this doesn't matter all that much. For databases, it changes everything. Solid state devices need new access mechanisms; I/O based seek/read/write is oriented towards big blocks and long latencies. The optimal access size for solid state storage devices is much smaller, more like the size of a cache line. With smaller, lower latency accesses, you can do more of them, instead of wasting channel bandwidth reading big blocks to get some small index item. It's not RAM, though; these devices usually aren't truly random access.
It starts to make sense to put more lookup-type functions out in the disk, since getting the data into the CPU has become the bottleneck. Search functions in the disk controller went out of fashion decades ago, but it may be time to bring them back. It may make sense to put some of the lower-level database functions in the disk controller, at the level of "query with key, get back record". Cacheing at the disk controller definitely makes sense, and it will be more effective if it's for units smaller than traditional "disk blocks"
This could be the beginning of the end of the UNIX "everything is a stream of bytes" model of data storage. We may see the "database is below the file system" model, which has appeared a few times in mainframes, make it to ordinary servers.

Share
twitter facebook
- Re:Solid state storage devices are more than disks (Score:2)
  
  by gfody ( 514448 ) writes:
  
  SSD hasn't changed the whole seek/read/write paradigm because the current batch are tailored to be retrofit to existing HBA/RAID storage systems.
  Once the IO drive [fusionio.com] is here we'll have a true random access to nonvolatile storage with latency measured in nanoseconds.
Cores, cpus, nodes, .. (Score:4, Informative)

by tuomoks ( 246421 ) writes: <tuomo@descolada.com> on Wednesday June 04, 2008 @02:51AM (#23648253) Homepage

Had to comment, the reference to Jim Gray was a little weird? I was lucky to work with Jim and we were often talking about technology changes and enhancements. Now - see what for example Tandem did call "massively parallel" database! The system was already built to allow several cpus and several nodes to interconnect transparently, Jim did see how that could be used and how the database optimizer really could work. Of course making direct access to any disc faster will help, especially now when the SDD's are getting bigger but the theory is nothing new. Even SQLite can show you that and think systems where you have 32, 128 or even 256 bit flat, memory speed but storage backed world - will change the picture, or? But be careful, we have already gone through many iterations making part of the system faster, as fixed head disks and even indexing in solid state, and found that it may (will) create other problems, not always seen upfront (except by JG!)

Share
twitter facebook
- Re: (Score:2)
  
  by krow ( 129804 ) * writes:
  
  Hi!
  
  A good number of the people who write databases use Jim Gray's book (aka the black book) as the "way to build a database". It is a great read, but I do not believe that in the face of change that the basic design needs to stay the same any longer.
  
  Look at later interviews with Jim and I believe you will find that he too had noticed that the basic design around B-trees really would not cut it for the hardware we see being shipped today.
  
  Cheers,
  -Brian
Object Databases? (Score:2, Interesting)

by Grapedrink ( 1298113 ) writes:

Not trying to start a war here, but seriously Databases != RDBMS. It seems like no one knows that object databases have been around a long time too. In the context of the article, many of the points can be applied to all types of databases, but it's so focused on the RDBMS (no shock considering the author).

There were a multitude of issues in the past with object databases from agendas, performance, complexity, etc that put relational databases at the forefront. Hardware and the quality of object databases
- Re: (Score:2)
  
  by jadavis ( 473492 ) writes:
  
  It seems like no one knows that object databases have been around a long time too.
  
  Actually longer. An "object database" is really just a graph database. Objects refer to eachother via pointer chains, and it forms a graph. There's really no extra semantic value there, object databases are just graph databases adapted to OO development styles.
  
  Relational database systems were invented to overcome the semantic problems with graph databases, not the performance problems.
  
  One answer why object databases are ignore
  - - Re: (Score:2)
      
      by jadavis ( 473492 ) writes:
      
      I re-read your post and it still seems to have a strong negative attitude about relational database systems. It seems to suggest that they are somehow a legacy that has been outdated.
      
      Show me some concrete data that prove RDMBS are the "best" thing out there as you claim.
      
      Best is subjective. But, despite a huge amount of investment, "research", and marketing, neither OODBMSs nor XML Database Systems have made much progress into useful production applications. I would claim that's because those are a dead end
He doesn't get it (Score:2)

by plopez ( 54068 ) writes:

Memory and disks (solid state or platters) are *hardware*. Transactions are *software*. Transactions are implemented in code to insure the information gets written to disks to faithfully represent what happened with the information (updates, deletes, insertions, reads. With distributed applications, this is even more important than ever.

Also looking at the spec. of the Tokyo cabinet DB model and all I see is a return to network databases. Having suffered through IDMS, I can tell you this is the wrong approa
- Re: (Score:2, Funny)
  
  by Tablizer ( 95088 ) writes:
  
  Solid state disks will never match with hard disks.
  
  In 2025, this will be grouped with "640K RAM will always be more than enough." and you'll feel like such a Bill G.
  - Re: (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    Except that Bill Gates never said that. Bluefoxlucid did.
    
    I'm sure he'll feel lots worse. While Gates gets hounded for something he never said, at least he has mountains and mountains of cash to console him.
- Re: (Score:3, Interesting)
  
  by Anonymous Coward writes:
  
  It's more accurate to say that there will probably always be a tradeoff between slow and fast storage, there will probably always be a tradeoff between permanent and temporary storage, and there will probably always be a tradeoff between expensive and cheap storage.
  
  In 20 years, I do not know what form slow, or cheap, or permanent storage may take. It may not be spinning magnetized platters. But I do know that in 20 years, every well-written database will have algorithms and data structures to deal with slow
  - Re: (Score:2)
    
    by gfody ( 514448 ) writes:
    
    Volatile, nonvolatile, expensive, cheap are all hardware concepts not applicable to software. Applications in the future will just use 'memory' and assume it's persistent and as fast as possible.
    
    The only reason applications are forced to account for slow storage is because hard disk drives have been our biggest bottleneck for 50 years!
    - Nope. (Score:2)
      
      by HornWumpus ( 783565 ) writes:
      
      IIRC that's already the way AS-400's try to work.
      It never works out in practice.
      You sometimes (often) need to manage and understand your memory usage.
      If you think that won't be true in the future implicit in your thought is that the problems being solved (or at least crunched on) will not keep up with hardware in complexity.
      - Re: (Score:2)
        
        by gfody ( 514448 ) writes:
        
        Never say never - the reason existing storage abstractions fail is because the ridiculously bad performance of random IO on a mechanical disk drive. Hiding that bottleneck from the developer is a bad idea especially in retrospect when you consider that it's a bottleneck that's been with us for over 50 years.
        
        If you think that won't be true in the future implicit in your thought is that the problems being solved (or at least crunched on) will not keep up with hardware in complexity.
        I could not parse that sentence, come again?
        
        Re: (Score:2)
        
        by HornWumpus ( 783565 ) writes:
        
        If you think that won't be true in the future, implicit in your thought is that the problems being solved (or at least crunched on) will not keep up with hardware in complexity.
        Sorry missed a comma. Perhaps that should be a simicolon? Who cares?
        What I'm failing to get across is that people have thought like you before and been wrong. 'In the future computer resources will be free and virtually limitless' has proven to be almost as wrong as '640K should be enough'.
        New things to do with computers will
- - Re: (Score:1)
    
    by Tablizer ( 95088 ) writes:
    
    You fail to mention the human factor. We maybe so accustomed to hard disks that we will get sick of them and then switch.
    
    Reliability should also be a driving force. The mechanics of hard-drives make them one of the most fragile and problematic parts in a computer. (Although RAID may dampen that somewhat, but double the cost.)
    - Re: (Score:2)
      
      by bluefoxlucid ( 723572 ) writes:
      
      RAID 5 against (n) disks. Given non-failure probability (p) of one disk in a given time window, we have sqrt(p^n). So for 3 disks at a point in time where p=0.5 (0 p 1 in all cases, any number will illustrate the effects), 35% chance that the whole array still works. 5 disks, 17%. 2 disks (not doable), 50%.
      
      Point: RAID5 doesn't statistically increase reliability. I know someone who worked for a data recovery firm, lots of people swapped the drives the moment they failed and had a second go while the
      - Re: (Score:3, Insightful)
        
        by dgatwood ( 11270 ) writes:
        
        Worse than that. Part of that is the fact that RAID vendors tell people to go with drives from the same manufacturer to optimize performance, when in reality, they should be doing with the most diverse pool of drives possible, dramatically reducing the probability of multiple catastrophic failures at the same time (except through human error, some sort of physical impact, electrical surge, etc.). If a drive has a design defect, it isn't at all uncommon for them all to fail at n+/-k hours for some relative
        
        Re: (Score:2)
        
        by symbolset ( 646467 ) writes:
        
        I should think that for guys that can write MySQL solving the filesystem problem should be child's play. I can think of several answers that banish these issues permanently. And no, I don't know where Nina is.
        The spinning platter problem... No. At the scale of problem we're thinking about here the price difference between SSD and Spinning disks is not that much of an issue.
        
        Re: (Score:2, Interesting)
        
        by ZerdZerd ( 1250080 ) writes:
        
        I bought the same batch, but before setting up the RAID, I used each drive differently (ran benchmarks, copied files etc.). Some heavily, some light usage. The probability of them crashing at the same time should be smaller then if all of them get the same wear and tear.
        
        Re: (Score:2)
        
        by Allador ( 537449 ) writes:
        
        Even mirroring is basically useless for reliability if the drives are from the same vendor unless you swap out the mirrored drive daily
        Can we please leave the ridiculous hyperbole at home?
        
        Although what you say sounds reasonable in theory, it doesnt work out that way in the real world.
        
        Even people using multiple deathstar drives didnt see consistent failure times.
        
        Very very few drive failures are because of defects. The deathstar notwithstanding. Even that though isnt relevant because no one in their right mind would use a drive like that for something important. I mean ... it actually said on the box that it wasnt intended for 24x7 use.
        
        E
      - Re: (Score:2)
        
        by WuphonsReach ( 684551 ) writes:
        
        The bigger issue with RAID5:
        
        Rebuild times scale up as the number of disks in the array increases. Which means that as you RAID5 over more disks, your recovery window (during which a 2nd failure will kill the array) grows increasingly larger.
        
        Which also leads to issues that as the array rebuilds, it puts extra wear and tear on the other drives in the array (as it frantically attempts to recalculate parity bits for the new hot spare). Since your recovery window is a lot larger, and you're exercising all
    - Re: (Score:2)
      
      by CastrTroy ( 595695 ) writes:
      
      Personally, I've had way less problems with hard disks than I have had with RAM chips, CPUs, video cards, motherboards, network cards. I've only ever had 1 hard drive die. I've had countless of the other devices die. Hard drives have very simple circuits, so it's very unlikely that the chips will overheat, or have any capacitors and such blow. The only thing left to fail is the mechanics. Since they are all sealed in a dust free chamber, the only thing to kill it is heat. Keep you hard disks properly
  - - Re: (Score:2)
      
      by lanswitch ( 705539 ) writes:
      
      Once you've lost all your data to a disk failure, you should realize the importance of backups...
      - Re: (Score:2)
        
        by TheLink ( 130905 ) writes:
        
        Once you've failed to restore from a backup you realize the importance of working restores ;).
        
        Best to realize a lot of things before they happen to you.
- Re:The furture of databases you say? (Score:4, Interesting)
  
  by njcoder ( 657816 ) writes: on Tuesday June 03, 2008 @08:34PM (#23646071)
  
  If I was migrating away from Oracle, MS SQL Server wouldn't be my first choice. Postgresql would. Given the choice between a free version that is similar to the original vs a product that is very different that I need to pay for it's a no brainer. Also take into consideration that for some database applications you're going to need some serious horsepower. You're limited in the number of procs you can have in a Windows system. Last time I checked, once you get past 8 processors Windows doesn't scale as well. Even linux doesn't do as well as Solaris, AIX or HPUX past a certain number of procs.
  
  Oracle's RAC seems to be a better solution than MSSQL's approach. PostgreSQL (and EnterpriseDB) are working on a more RAC-like approach.
  
  This is a good story about a company that successfully moved from Oracle to Postgresql [arnnet.com.au]. Basically, they had 2 database systems running Oracle, a data warehouse and an OLTP system. They moved their data warehouse over to Postgresql running on Solaris 10, then they used the licenses they no longer need for the data warehouse to boost the computing power of the OLTP system.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by foniksonik ( 573572 ) writes:
  
  Maybe the article should more clearly say "The future of databases in regards to web application development"?
  
  Then you have to think about how many mid-range client/server desktop applications are being replaced with client/server web applications every day as they get technology refreshes...
  
  There are more and more *relatively* cheap web developers getting hired to replace traditional desktop app developers for non-critical business applications.
  
  SO there is a disconnect here with your statements in general.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Well (Score:5, Funny)

Re: (Score:2, Funny)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

Re: (Score:3, Funny)

Re:Well (Score:4, Funny)

Re: (Score:2, Funny)

Dear Slashot (Score:5, Funny)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Dear Slashot (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Leaky abstractions (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Get ready, aim, fire! (Score:1, Funny)

This IS news! (Score:5, Funny)

Re: (Score:2)

Re: (Score:3, Informative)

Admittedly.... (Score:2, Interesting)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:3, Insightful)

Wenxiong is that you? (Score:2)

Re: (Score:2, Informative)

gilding the lily (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Interesting)

BigTable, HBase and SimpleDB are the future (Score:3, Informative)

Can I ask a stupid question... (Score:3, Interesting)

Re:Can I ask a stupid question... (Score:4, Informative)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Cores? (Score:2)

Re:Cores? (Score:5, Insightful)

Re: (Score:2)

MonetDB (Score:2)

Brian's Blog (Score:2)

self-fulfilling predictions? (Score:2)

Locality is the key (Score:5, Interesting)

Re:Locality is the key (Score:5, Interesting)

Re: (Score:3, Interesting)

Re: (Score:2)

why are you blocking? (Score:2)

Re: (Score:2)

Everyone knows which database is the best! (Score:2)

Solid state storage devices are more than disks. (Score:5, Interesting)

Re:Solid state storage devices are more than disks (Score:2)

Cores, cpus, nodes, .. (Score:4, Informative)

Re: (Score:2)

Object Databases? (Score:2, Interesting)

Re: (Score:2)

Re: (Score:2)