Slashdot Log In
Conquest FS: "The Disk Is Dead"
Posted by
Hemos
on Mon Apr 21, 2003 10:30 AM
from the long-live-the-disk dept.
from the long-live-the-disk dept.
andfarm writes "A few days ago, I sat in at a presentation of a what seems to be a new file system concept: Conquest. Apparently they've developed a FS that stores all the metadata and a lot of the small files in battery-backed RAM. (No, not flash-RAM. That'd be stupid.) According to benchmarks, it's almost as fast as ramfs. Impressive." The page linked above is actually more of a summary page - there's some good .ps research reports in there.
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
well and good (Score:3, Insightful)
(http://www.jumbocaveman.com/)
Re:well and good (Score:5, Insightful)
(http://www.mwatt.com/index.html | Last Journal: Friday February 11 2005, @02:43PM)
As this new filesystem implicitly admits, the price/MB is still so much dramatically lower for HDD's than solid state memory, it will still take quite a will for this replacement to happen.
I disagree that some small killer app must come along to make this happen. Yes, solid state media is coming down in cost and increasing in density, but both need to change by 2 or 3 orders of magnitude before the HDD is dead. What we're waiting for here is the classis convergence of technology and its applications... the apps won't some until the technology can support it and the tech is driven by our demand for it. Expect another 10 years at least.
Re:well and good (Score:5, Insightful)
This was also pointed on Saturday's Slashdot Story [dansdata.com]
Re:well and good (Score:5, Insightful)
(http://www.mwatt.com/index.html | Last Journal: Friday February 11 2005, @02:43PM)
A couple of reasons I see the death of the HDD to be not-to-imminent:
(1) Those damned HDD makers keep pulling new physics out of their as^H^H hats and keep pushing the storage densities to rediculous new levels.
(2) the solid state memory of the future ainta gonna be Flash as we know it now (with slow and limit write cycles) and it also will not be battery-backed RAM (unless we go write it all back to disk for 'permanent' storage at some point). I bet on some variation on today's Flash without its limitations, but the tech has got some ground to make before this all happens.
My other long-term prediction has been that CRTs (vacuum tube, for pete's sake!) will be replaced with LCD or similar tech and we're getting really close.
Re:well and good (Score:4, Insightful)
(http://cstefan.multiply.com/ | Last Journal: Monday December 03, @12:09PM)
Most high end users who are concerned about image quality are still buying CRTs. If you have to do color matching, CAD/CAM, or are gaming you probably still want a CRT.
The price differential between CRT and LCD monitors is still enough that most larger businesses are still only buying CRTs for most of their users. Sure the executives and receptionists are getting LCDs but everyone else gets cheap 17" CRT monitors.
Re:well and good (Score:5, Interesting)
(http://iabervon.org/~barkalow/ | Last Journal: Saturday May 31 2003, @02:01AM)
Furthermore, there are a number of important directories on any system whose total size won't double in the next ten years, because they add one more file of about the same size for each program you install, and they already have ten years of stuff.
In the cases where you do have exponential growth of storage use, the structure of the stored data is extremely simple; you have directories with huge files which are read sequentially and have a flat structure.
I see a real opportunity for a system when you have one gig of solid-state storage for your structured data and HDDs (note that you can now add a new HDD without any trouble, because it's only data storage, not a filesystem) for the bulk data.
Re:well and good (Score:5, Interesting)
It will be OS-on-a-chip (and a good OS at that), it will go for about twenty bucks a pop down at WalMart or CompUSA and Bill Gates will die of an apoplectic fit when it hits the streets. Hackers will figure out ways to diddle it, but corporations and average users will upgrade by merely dropping another sawbuck on the counter and plugging the damned thing in when they get back to their machine(s). Computers will come with these things preinstalled, so there'll be no bitching about not having an OS with any given machine. High-end weirdness will, as ever, continue to drive a niche market, but everybody else will regard it about the same as they regard their pair of pliers; just another tool. Ho hum.
Re:well and good (Score:5, Insightful)
You may have to keep predicting for some time yet. So far, nobody has managed to come up with a solid-state approach that gets anywhere close to the cost of spinning media, and though solid state gets cheaper over time, spinning media does too.
For the most part, posters to this thread missed the point of this effort. The authors observed that some relatively small portion of filesystem data - the metadata - accounts for a disproportionate amount of the IO traffic. So put just that part in battery-backed ram, and get better performance. Hopefully, the increased performance will outweigh the cost of the extra RAM.
The fly in the ointment is that, in the case where there's a small amount of metadata compared to file data, the cost of transferring the metadata isn't that much. But when there's a lot of metadata, it won't all fit in NVRAM. Oops, it's not as big a gain as you'd first think.
It's surprising how well Ext2 does compared to RAMFS and ConquestFS in the author's benchmarks.
Drawback (Score:5, Interesting)
Re:Drawback (Score:5, Interesting)
What I find telling is that such a system has to be implemented at all. It seems clear to me that the operating system's filesystem, in conjunction with the VM, should implement this automatically. In Linux, this is true - large portions of the filesystem get cached if you have gobs of RAM lying around. Why certain more commonly-used OSes do the exact opposite is beyond me.
From my perspective, the right way to handle this is obvious. RAM is there to be used. Just as we have multiprogramming to make more efficient use of CPU and disk resources, we should be making the best possible use of available RAM. Letting it sit idle on the odd chance the user will suddenly need hundreds of meg of RAM out of nowhere is rediculous. From the perspective of the CPU, RAM is dog slow, but from the perspective of the disk, it's blazing fast. ANYTHING that can be done to shift the burden from magnetic storage to RAM should be done. Magnetic storage excels in one area and one area only: cheap permanent storage of vast amounts of data. RAM should be used to cache oft-used data. Why is this not painfully obvious to anyone designing an operating system?
Re:Drawback (Score:4, Informative)
In point of fact, Conquest does not use LRU. Conquest uses a very simple rule- files larger than a threshold are stored on disk, and files smaller than a threshold are stored in RAM. The threshold is currently a compiled-in constant (1 MB), but plans are for it eventually to be dynamic.
The advantage of this approach is that it eliminates the many layers of indirection needed to implement LRU-type caching, which is one reason Conquest consistently outperforms FS's based on LRU cacheing.
wow, if it only could be cost effective... (Score:3, Interesting)
(http://falcon10.ath.cx/)
Who are they kidding? (Score:4, Insightful)
I mean, why *DO* we still have pagefiles?
A MS Gripe: I seriously don't understand why I can't turn it off completely. With multiple GB of RAM dirt cheap, writing to a disk pagefile slows my system down-- It has to!
Re:Who are they kidding? (Score:5, Informative)
should be in control panel - system - advanced - performance --- look in there for something to set the page file to 0 or to disable it.
Re:Who are they kidding? (Score:4, Informative)
(http://--/ | Last Journal: Monday December 09 2002, @05:12PM)
Page files considered good (Score:5, Informative)
The swapfile is where the OS puts things it hasn't used in a while. On windows this would probably include things such as the portions of IE that are now part of the OS and you are forced to have loaded even if you are not using the box for web browsing. Having placed these items in the page file frees up room for things that are currently usefull such as IO buffers/cache (disk and/or net) that can dramatically increase speed by storing things such as recently used executables, meta-information
That being said I think the technology discussed in this article is a bit too single minded. I think adding an extra level in the storage heirarchy between main ram and non-volitile HD is probably a good thing. My idea is to add a HUGE pile of PC100 or similar ram into a system and have this RAM accessed in a NUMA style which is becoming very popular. The nintendo GameCube uses a form of this aproach, there are two types of RAM with a smaller-faster section and a larger-slower section.
The problem with my idea is that the price difference b/w cheap-slow RAM and fast-expensive RAM is not enough to make it worth the extra complexity currently. But, I would guess that if someone took the effort to design/build cheap slow RAM they could find a niche market for a system accelerator device
Re:Who are they kidding? (Score:5, Interesting)
Well, a couple of reasons. Most important, the "pagefile" is there to protect against a hard out-of-memory condition. Modern operating systems are in the habit of overcommitting memory, which means they grant allocation requests even if the available RAM can't fulfill them. The idea is that an app will never actually be using all those pages simultaneously. If things go wrong and all that extra memory is actually needed, the system starts kicking pages to disk to satisfy the cascade of page faults. This means the system will become slow and unresponsive, but it will keep running. But say you didn't have anywhere to swap to. The system can't map a page when a process faults on it, and the process gets killed. But which process gets killed? After all, is it the process's fault if the OS decided to overcommit system memory? The swap space serves as a buffer so a real administrator with human intelligence can come in and kill off the right processes to get the system back in shape.
Swap is also important because not all data can just be reloaded from the filesystem on demand. Working data built in a process's memory is dynamic and can't just be "reloaded." If there's no swap, that means this memory must be locked in RAM, even if the process in question has been sleeping for days! We all know the benefits of disk caching on performance. Process data pages are higher priority than cache pages. Thus if old, inactive data pages are wasting space in RAM, those are pages that could have been used to provide a larger disk cache.
You basically always want swap.
The next boost will be (Score:5, Interesting)
(Last Journal: Wednesday March 30 2005, @04:16PM)
This is not just for exe's but for datafiles as well...
Re:The next boost will be (Score:4, Interesting)
(Last Journal: Thursday December 08 2005, @04:33PM)
It nmap()s executables before running them.
Dead? Hardly... (Score:4, Insightful)
(http://www.bluefeathertech.com/ | Last Journal: Friday November 04 2005, @11:51AM)
You can do exactly the same thing by sticking an operating program into any sort of non-volatile storage (EPROM, EEPROM, memory card, whatever), and including a hard drive in the same device if need be. The new filesystem they're describing simply shifts more of the load to the silicon side instead of the electromechanical realm.
In short; The Disk is far from dead. This is just a first step in that direction.
Old news. (Score:5, Informative)
(http://www.remix.net/)
http://www.superssd.com/products/tera-ramsan/
Up to a terabyte even.
-n
Yeah wutever (Score:5, Funny)
Could be accomplished with a "preferential cache?" (Score:5, Interesting)
(http://brianm.org/)
Too expensive (Score:1)
Perhaps when it's cheaper it may be more feasible for home users.
where....? (Score:1)
(http://linuxkernel.foundries.sourceforge.net/)
How is this any different from .... (Score:3, Interesting)
I guess I can understand the benefits (as minor as they may be relative to price), but the thing that bothers me the most is why does it take 4 years and NSF funds to come up with something that seems so obvious?
And one major problem would be getting over the fact that if the machine craters, you can't just yank the drive and have everything there, though I assume they have some way to "flush" the ram (can't read the
What if the battery fails? (Score:3, Interesting)
(http://finitestate42i.blogspot.com/ | Last Journal: Thursday November 25 2004, @04:55PM)
Pardon my ignorance, but what happens if the battery fails? Of course, this is highly unlikely, but just a scenario.
In a conventional disk the data would remain even if power is switched off, but a RAM would lose the data (or get corrupted or cannot be sure if the data is exactly the same).
Thank you.
GrimReality
2003-04-21 15:51:18 UTC (2003-04-21 11:51:18 EDT)
Same as "what if the hard drive crashes?" (Score:5, Interesting)
Note that hard drive failures are still common and likely to be much more common than a battery failure, as it would be trivial to implement a scheme through which batter recharding would be automatic while the computer was plugged in. The battery would only be directly employed when the system was unplugged or the power was out. Even in that case it would be also trivial to implement a continuous/live backup system to a nonvolatile media like a hard disk, which by that point would be ridiculously cheap.
filesystem (Score:1)
ok, so the FS is faster... (Score:1)
(http://illuminatus.oczombies.net/)
Umm.. (Score:5, Informative)
It's an old IBM 3H 64 bit PCI model with 32MB of ram and battery backup.. newer 4H models support more ram.. but how is this any different?
The most used and smallest files stay in the cache.. the rest are called when needed.. and if god forbid the power fails, and the ups fails.. the card has a battery backup to write out the final changes once the drives come back online.
right.... (Score:2, Redundant)
(http://www.fimble.com/)
Reliability (Score:3, Interesting)
wow (Score:2)
i guess this might be neat now that computers dont have extra 'cards' of memory. but when that was the way to expand your computer it was quite easy to have this method of storage.
Nonvolatile cache does similar things (Score:1, Insightful)
Full paper in HTML (Score:5, Informative)
(http://www.billglover.com/)
The paper [ucla.edu]
Looks like a great server side file system. This is finally a step away from this whole "file" madness. All storage and IO should be memory mapped, and all execution should be in place. Anything else is just silly.
Dead? (Score:4, Interesting)
One thing I've always wondered though. Why not release an OS on an EPROM? It would make boot time and OS operations extremely fast. I'm still surprised to this day that this isn't mainstream. Ahhh, the good ol' days of Commodore when you OS was instantly on when you turned on the PC.....
...goes great with 64-bit && cheap RAM (Score:5, Insightful)
I'm hoping that hardware people will realize that we need huge amounts of fast memory...whether or not we think we need it. We're stuck in a "why would I need more RAM than the applications I run need?" kind of mindset. I think that the sudden freedom 64-bit pointers will provide to software developers will result in a paradigm shift in how memory (both permanent and temporary) is used. Though like all paradigm shifts, it's difficult to predict ahead of time exactly what the change will be like...
huh (Score:1, Redundant)
(http://slashdot.org/journal.pl?op=list&uid=100904 | Last Journal: Saturday September 20 2003, @09:32AM)
Unanswered Questions (Score:2)
(http://pl.atyp.us/ | Last Journal: Friday October 11 2002, @12:31PM)
While this is great for some environments, it will remain a research toy until several real-world problems and limitations are addressed. Several people have already brought up the issue of having more small files than will fit into the BB-RAM. Another issue is portability. With a traditional filesystem, if a whole machine dies you can slap the disk into another one (of the same type). With Conquest, you have to transplant the BB-RAM as well. How many slots do you think a machine has for BB-RAM, vs. how many disks can you attach? At the very least you'd need to coordinate use of the BB-RAM across filesystems, plus a way to flush/restore one filesystem's portion to actual disk.
There are many more issues like these, which would need to be addressed before a Conquest-like approach is really viable in the real world. One of more of those issues might turn out to be a show-stopper. It's interesting research, but don't expect it to replace traditional filesystems any time soon.
Just like BSD (Score:1)
I have an app. for this today. (Score:1)
If I could put one giga-byte stick of ram ($124 from pricewatch) onto a DIMM -> IDE drive board (say $100), then workstations could netboot, download the OS of choice, and run off the local ram disk. They could store their important data on net drives, and (as various Windows versions often need) they could reboot at tremendous speeds. This would eliminate hard drive failures outside the computer room, and would provide an easy solve for many virus problems. I wouldn't even need the Conquest method for dividing up the data, as I would manually divide the big data onto the netdrives and the OS onto the machine.
With a Customer Care staff of 100 the amount of disk-swapping and disk cleaning that goes on is a serious chore that would just go away. Well worth the small extra investment. This would also make it easier to switch people over to Linux. "If you want to try my latest Linux desktop, just boot in "Linux (test) mode", and if you don't like it, reboot in Windows mode."
Hybrid storage devices (Score:2)
Rather than being filesystem dependent, I'd have the device not know or care about filesystem, just logical disk sectors. Those that were accessed frequently would stay on the higher speed medium and those that weren't, the less frequent. Large files that were only partially read wouldn't penalize the computer for being on tape or penalize the high speed storage for unaccessed chunks taking their space.
Unfortunately its probably too complex to actually implement, and disk storage capacities have grown so fast to quickly that it seems like disk is the way to go, its applying disk to the servers that need it intelligently that's the bigger challenge (iscsi, fiber channel, EMC, etc).
Ramdrive (Score:2, Interesting)
Once 64 bit procsesing becomes mainstream, and price per gigabyte of memory better (say, 16 gigs DDR 3200), store the OS on a small (~5 gig) hard drive partition, and transfer the entire thing to a 5 gig ramdrive on startup. Using serial ATA that shouldn't take too long, and the OS will run at dramatically increased speeds, especially if the swap is housed in the ramdrive as well. On shutdown, transfer the contents of the ramdrive back to the hard drive. With the massive RAM support 64 bit processing promises, I'll wager some incredible things are possible for those willing to experiment with technologies like this. Perhaps that's where the technology in this article is heading, although far less volatile/risky as my approach.
Not a new concept or idea at all (Score:2, Informative)
Additionally, laptops take a similar concept and save the system memory image to hard drive and just read that in order to make your boot time a little shorter when you are away from the machine and it powers down.
ram storage (Score:2, Interesting)
(Last Journal: Friday October 19, @09:21PM)
Has anyone statistics on RAM/HD prices? (Score:3, Interesting)
(http://home.netuse.de/~ms)
does anyone has a statistic on HD and RAM prices throughout the last years?
I only have a feeling, that the last years RAM prices have fallen quicker than HD prices. This will naturally lead towards such developments as mentioned.
I think it would be very interesting to study the technical developments in the light of price developments. My bet is, that most inventions are not caused by bright minds but the need for them. For most technical breakthroughs, the mind is not cause but catalysator ;-).
CU, Martin
It's just a persistent RAM cache (Score:2)
(http://www.animats.com)
A more useful line of development might be to reduce the amount of stuff loaded at boot time. Many systems today are loading far too much dreck at startup. Adware, spyware, browsers, Java engines, libraries for Java engines, audio programs, toolbars, color calibrators... Boot-time I/O could be cut 50-80% with modest engineering effort.
Program launch has also become far too complex. It should not take 20 seconds to launch Adobe Photoshop (even LE!) on a gigahertz computer.
Premise is flawed for DB servers (Score:1, Interesting)
Because most accesses to large files are sequential, we can relax many historical disk design constraints, such as complex layout heuristics intended to reduce fragmentation or average seek times.
Ask any DBA of a production DB server, this is plain not true in a DB envrironment. My company's Oracle files are very large, and access is very random. In this instance, the Solaris caching algorithm would grossly outperform this.
I suppose this might for desktops, tho.
Journaling FS + journal on NVRAM (Score:5, Interesting)
(http://slashdot.org/~wowbagger/journal/87552 | Last Journal: Monday September 03, @08:07PM)
If you used full journaling (data writes journaled as well as metadata journaled), then writes will happen at RAM speeds (with the journal flush happening "later" when the system isn't busy).
Meanwhile, files that are being used will be in the VFS buffer cache (evicted as they age or as the system needs the RAM for other purposes), thus making reads fast (after the initial read from disk).
It would seem to me that my approach would automatically tune itself to what you are doing, rather than trying to tune things by hand.
(Granted, this assumes your OS has
but given those assumptions...)
Clearning up some misconceptions (Score:5, Informative)
(http://www.cs.ucla.edu/~scottm/)
First off, Conquest uses the system's RAM. It's not attached by an external bus or network system, e.g.; fibre channel. Not that one would really want to make fibre channel a CPU-RAM bus in the first place. So pointing out products of people "who done this already" doesn't apply if its not done in the system's RAM.
Secondly, Conquest removes all of the disk-related complexity (buffer management, I/O cache management, elevator algorithms, etc.) from the kernel. This allows Conquest to operate at close to theoretical disk I/O bandwidth. Pages go right from RAM to disk. Minimal metadata to update, no inode arrays to traverse.
There is currently a 1M threshold that defines the difference between a "large" file and "small" file. Conquest doesn't decide to pull in only shared objects, libraries and executables. In fact, emacs falls into the "large" file category. However, Andy noted that most large files have "stylized" access, e.g.; MP3s, where the first thing is a seek to the end of the file to read its metadata. The same is true of executables. Conquest has the concept of recursive VMs (VMs in VMs) that handle the different stylized accesses. Not that he's implemented all of them, since he's managed to graduate and is teaching the OS course this quarter.
Lastly, Conquest checkpoints RAM out to disk periodically. No, it's probably not the smartest strategy, but it does work. Thus, if the battery dies or the OS chokes, one can roll back to a reasonable state.
HTH.
Re:Clearning up some misconceptions (Score:4, Informative)
(http://www.cs.ucla.edu/~scottm/)
Of course, putting anything into ramfs also eats up your swap file, whereas Conquest doesn't.
Not very impressive (Score:2)
Might work on IDE (Score:2)
(http://ghazan.hazara.org/)
IDE drives have large caches. I suppose if the control to the caches could be programmable, it could be used by a driver to achieve this on a regular PC, but then, we'd lose the cache speedup. Better still, move part of the Conquest FS functionality to the south bridge of the chipset, in the IDE wirings, using part of the RAM.
I'm the one who gave the talk (Score:5, Informative)
(http://www.cs.hmc.edu/~geoff)
First, the full title of the talk was "The Disk is Dead! Long Live the Disk!" We make no claim that disk manufacturers are going to go out of business tomorrow; history suggests that the technology will survive for at least a decade, and probably more than two. Talk titles are intended to generate attendance, not to summarize important research results in 8 words.
Second, the most common objection to the work boils down to "just use the cache". This point has been raised repeatedly on Slashdot over the past few years. However, if you read our papers or attend one of my colloquium talks (UCSC, May 22nd -- plug), you'll learn that LRU caching is inferior for a number of reasons. We were surprised by that result, but it's true. Putting a fake disk behind an IDE or SCSI interface is even worse, since that cripples bandwidth and flexibility.
Third, for people worried about battery failures, the only question of interest is the MTBF of the system as a whole. All systems fail, which is why we keep backups and double-check them. If your disk failed every 3 days, you couldn't get work done, but there was a time when we dealt with a failure every few months. Conquest's MTBF hasn't yet been analyzed rigorously, but I believe it to be more than 10,000 hours, which is good enough to make it usable.
Finally, I have chosen not to put my talk slides on the Web, at least not for the moment. But you're welcome to mail me with questions: geoff@cs.hmc.edu. It might take me a few days to answer, so be patient.
Netapp filer NVRAM??? (Score:2, Interesting)
silly (Score:2)
You have two ways of doing that: either put the logic for that sort of caching into the disk driver, or put battery backed RAM into the disk drive or controller itself. I think both have been explored in the past.
This begs a question... (Score:2)
When will we get to the point where storage is so cheap and fast that we no longer need RAM?
Anyone have any ideas?
In the Interim ... Holographic Storage (Score:1)
Excerpt:
Holography makes use of the full thickness of the recording material, providing data densities proportional to media thickness.
This makes possible capacities of more than 1,000 GB on a CD disk format. By comparison, DVD technology provides only 9 GB on a double-sided disk.
Peace...
Ex-MislTech
Re:speaking of ramfs... (Score:1, Informative)
RTFA (Score:1, Informative)
The RAM only holds the meta-data and small files.
Re:Cost (Score:3, Informative)
(http://www.pobox.com/~kwerle | Last Journal: Sunday August 14 2005, @09:57PM)
For a whooping 512MB's no doubt.
Dunno where you buy your RAM, but CNET is willing to sell me Kingston memory (512MB 133 MHZ DIMM) for less than $90 (one place says $65, but I don't believe them).
Time for you to find a new RAM supplier.