Lustre File System Getting New Community Distro 68
darthcamaro writes "Oracle acquired a lot of open source tech from Sun that has since been forked — or is in the process of being forked. The open source Lustre high performance computing file system isn't on the list of forked projects, but it is getting a new, community-driven distro that is trying really hard to say that they're not officially a fork. 'Since April of 2010 there has been confusion in the community, and we've seen an impact in the business confidence in Lustre,' Brent Gorda, CEO and president of Whamcloud told InternetNews.com. 'The community has been asking for leadership, the commitment of a for-profit entity that they can rely on for support and a path forward for the technology.'"
What is Lustre File System (Score:5, Informative)
From their website:
http://wiki.lustre.org/index.php/Main_Page [lustre.org]
High Performance and Scalability
For the world's largest and most complex computing environments, the Lustre file system redefines high performance, scaling to tens of thousands of nodes and petabytes of storage with groundbreaking I/O and metadata throughput.
Re: (Score:2)
Any benchmarks?
Re: (Score:3)
Obviously, we have internal benchmarks that tend to show that Lustre is good but I can't talk about specifics on those. What I can do, though, is link to this: http://www.cs.rpi.edu/~chrisc/COURSES/PARALLEL/SPRING-2009/papers/MADbench2-2009.pdf [rpi.edu]
The stuff that I found most interesting is on page 12. The machines named Jaguar and Franklin are Cray's running Lustre. Bassi and Jacquard are both running GPFS. On page 15 they claim that they can make up for the deficiency in Lustre's default settings for shar
Re: (Score:3)
It certainly *can* be used with commodity hardware, but the majority (or maybe all?) of Lustre installations are in high performance computing with thousands, or tens of thousands, of clients (usually the nodes of a supercomputer) accessing the shared file system.
Where more commodity hardware can come in is the installation of the filesystem servers themselves. A system's Object Storage Targets and Metadata Servers (pieces of Lustre) can be external to the Cray and connected via some interconnect such as I
Re: (Score:2)
Any reason Luster cannot be spelled correctly?
Would that impact performance?
Re: (Score:2)
No, but it would affect the ability of someone to trademark the name, and since Lustre has always been the project of a commercial company (originally Cluster File Systems, then Sun, then Oracle, and now OpenSFS and this company), that is something that would be considered...
Re: (Score:2)
LusterFS, there I solved your problems. I will accept payment in any matter of ways.
Re: (Score:2)
That's nice. Go talk to the people who actually work for one of those companies and complain to them. Until then, it's a product name and it's going to keep getting spelled the way the manufacturer spells it...
Re: (Score:2)
By now you should be near deaf from the whooshing sounds going on right above you.
Re: (Score:3)
Re: (Score:2)
i wonder how they spell illustrate in the US?
Re:What is Lustre File System (Score:4, Funny)
Any reason Luster cannot be spelled correctly?
Assuming the name is supposed to indicate something that shines, and not a sex addict, it is spelled correctly.
Re: (Score:2)
Re: (Score:2)
We are in process to use it in commodity hardware, research is underway to see if it is feasible to use Lustre for our needs, where high performance high capacity storage is a key figure.
And i don't see any reasons why not. Of course, getting the kind of load we have in production is extremely hard in testing environment, so only future will show.
And lustre does not need lots more expensive than consumer grade hardware with some nice switches, LACP... Of course it solely depends on your bandwidth requiremen
Re: (Score:2)
The machines named Jaguar and Franklin are Cray's running Lustre.
The apostrophe is never used to form a plural. Not ever. No, not even then.
In terms of scalability, from the Wikipedia page for the Jaguar system at Oak Ridge National Labs (a large Cray XT5), their Lustre filesystem is 10 petabytes with read/write performance of approximately 240GB/sec (not sure what benchmark was used to get that number).
OK, so I'm not surprised if someone gets good performance from a Cray, but can't Lustre be used with lots of commodity hardware instead? I thought that was knid of the point.
I don't think a lot of people are going to go into many details on this article, because anyone using luster is liking using it in some way to leverage the idea of the cloud inside their respective business'es. Yes, I think it would scale well with commodity hardware, however their is no getting around the need of a fast interconnect between all nodes. If you cannot afford at least 10GbE for your entire storage cluster, don't even bother with luster, you'll hit a bottleneck on network IO likely with just a
Re: (Score:2)
Re: (Score:2)
I don't think a lot of people are going to go into many details on this article, because anyone using [lustre] is liking using it in some way to leverage the idea of the cloud inside their respective [businesses]
There are people who know a lot about Lustre and aren't beholden to anyone. It is a GPLed open source project after all.
Re: (Score:2)
Re: (Score:2)
Depends how much you need BW and how many storage nodes are you running.
What we are wanting to test out is using same system to provide OSTs and be a client, thus if we run a 8 switch stack which has switch to switch capacity of 48Gbps, and internal switch switching capacity of 176Gbps, and 48 ports in each switch + 2xModule slots for 10Gbe, i think we are going to be fine. If running fewer storage nodes we can put in Dual/Quad link NICs and LACP them if 10Gbe is not a possibility (out of ports or something
Re: (Score:2)
The machines named Jaguar and Franklin are Cray's running Lustre.
Extraneou's apo'strophe's make me cringe. Come on, people! Thi's i's one of the 'simple'st -- and mo'st ab'solute -- rule's in the whole Engli'sh language!
Re: (Score:2)
The apostrophe is never used to form a plural. Not ever. No, not even then.
You need to mind your p's and q's on this one. There are specific do's and don'ts regarding use of the apostrophe for possessives.
You wouldn't want to go to an Oakland As game - that would be confusing. If you got straight Cs on your report card, folks would think you were a real computer geek.
Since you might be, being on Slashdot, it's possible your colleagues may get confused if you tell them to fix the QoSs on their routers.
(The
Re: (Score:2)
> The apostrophe is never used to form a plural. Not ever. No, not even then.
Actually, it can. Just not in english :-) In my native dutch, apostrophe-s is a plural, while attached s is a possessive. OP has still made a mistake, but he might be a non-native speaker.
And, yes, we get the same shit here because of english contamination :-)
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Obviously, we have internal benchmarks that tend to show that Lustre is good but I can't talk about specifics on those.
So, uh, why mention them?
The stuff that I found most interesting is on page 12. The machines named Jaguar and Franklin are Cray's running Lustre.
So you need a Cray to get good performance?
Re: (Score:1)
But does it run (on) Linux?
Re:What is Lustre File System (Score:5, Interesting)
At a functional level, Lustre (GPL) is to ZFS (CDDL) as CXFS (commercial) is to XFS (GPL) for SGI. They are the upper 'cluster' layer to take advantage of the underlying filesystems' capability. I believe this approach is divergent from that of GFS, due to the upper/lower approach, but I'm not that familiar with clustered filesystems.
However: Arguably, Lustre on ZFS is a mumuchch better option due to ZFSs inherent capability superiorty over XFS. I've liked XFS historically, but ZFS is so drastically superior than anything else out there (in terms of storage management and available capacity and throughput) - all 'out of the box' that it's a no-brainer to use zvols for things other than direct zfs posix access. (For instance, they make great VM iSCSI targets, or local raw disks for VMs, or..)
Side note: the linux zfsonlinux.org port is being successfully used as the base volume manager for Lustre right now, so it is apparently quite capable/stable at that level. (zfsonlinux does not yet have zfs posix support.) Lustre on ZFS It, apparently, scales much better than the traditional LVM/RAID/etc. backend methods.
Re: (Score:3)
At a functional level, Lustre (GPL) is to ZFS (CDDL) as CXFS (commercial) is to XFS (GPL) for SGI.
And who says the IT world has too many confusing Acronyms?
Re: (Score:2)
I don't suppose "They made (asked) me to do it!" is a legitimate excuse?
Inversely, if we had long names for everything, we'd soon get confused and have insufficient time to actually work.
Re: (Score:2)
Lustre on ZFS is a mumuchch better option due to ZFSs inherent capability superiorty over XFS.
I can tell you with a high degree of confidence that ZFS is a poor option for Lustre compared to its traditional backends, Ext3 and Ext4. One simple reason: ZFS has about half the transaction throughput.
Re: (Score:2)
But ZFS is cool and trendy among geeks! Ext3/4 can't beat that!!
Got me :-/
Re: (Score:2)
Really? I thought the CDDL put the kibosh on that idea a year ago...
Re: (Score:2)
It does? How do you figure that? You will be somewhat limited compared to other filesystems for 'raw speed', but that's why ZFS has build in read and write cache functionality (via SSD or ramdisk). Unless we're talking about massive amounts of sustained reads and writes, with no time for the disks to catch up, I suspect that (oh) 32Gb of SSD or so would do the trick for most hosts, or 128Gb for a 'high demand' host member to Lustre.
So yeah, if you consider cache, ZFS is going to blow the snot out of anythin
Re: (Score:2)
Actually, I'm 100% certain Lustre is NOT using ZFS today. It is actually using ldiskfs for the backing filesystem, which is a modified version of ext4. While work was ongoing to port Lustre over to ZFS, this was not completed.
Re: (Score:2)
Lustre on ZFS It, apparently, scales much better than the traditional LVM/RAID/etc. backend methods.
By the way, where did you get that idea?
Re: (Score:2)
Same where he thinks ZFS is fast & default for lustre, and same where he thinks Lustre and LVM/RAID is the same type of a thing :D
Re: (Score:2)
Yes, because that was obviously what I intended to convey. Thank you for pointing out that your level of reading comprehension is likely very similar to that of a politicians'.
Re: (Score:3)
I'll tell you were I got that idea: experience.
Managing filesystems in lvm2, on raid cards - all with their own specific commands - is a real pain in the ass when you've got tens of hosts or more per admin, with many different roles and functionality.
So then you've got to have snmp set up for each of those hosts (often with different controller cards) to monitor those RAID cards status (with the shitty RAID console tool which lacks anything resembling documentation). Then you've got to manage LVM, with its
Re: (Score:2)
ZFS is slow on linux and Lustre runs on EXT3/EXT4 by default. Infact, Lustre is quite a big contributor to making ext4 in to existence by optimizing Ext3.
http://en.wikipedia.org/wiki/Ext4 [wikipedia.org]
Comparing lustre to LVM/RAID is comparing Apples to Oranges. One is network cluster file system, another is local storage management.
Re: (Score:2)
I didn't compare Lustre to LVM/RAID - I compared ZFS to it.
What sits on either would be Lustre, obviously. ZFS is significantly superior to RAID + LVM in pretty much every way, barring super-expensive hardware RAID controllers where RAID has a slight trump in and of itself. (Though it should be noted that these RAID controllers would likely provide significant benefit to a ZFS system, too.)
What I have to wonder is: what kind of storage methods or devices does a 'network cluster file system' use? Here's a gu
Ended project (Score:5, Informative)
According to insidehpc [insidehpc.com], Oracle has stopped developing Lustre and developers "have reportedly been encouraged to apply for other positions within the company".
A group of Lustre users already created OpenSFS [opensfs.org] on October 2010 to continue developing Lustre.
Re: (Score:2)
If necessary, it will be forked. Between OpenSFS and WhamCloud there will always be a home for lustre. WhamCloud already has contraclts with Lawrence Livermore National Lab and Oak Ridge National Lab. Oak Ridge already has the largest Lustre filesystem to date. And there is also DDN which supplies the hardware for most of the larger Lustre sites which has a local copy of Lustre that they distribute as well. Luistre is more than fine, its just a little lost finding a home at this time.
I see, so according to the F/OSS folks.... (Score:1)
Oracle acquired a lot of open source tech from Sun that has since been forked — or is in the process of being forked.
Is really:
Oracle acquired a lot of open source tech from Sun that has since been fucked— or is in the process of being fucked [by Oracle].
Wake me up before you go-go (Score:2)
Very first thing to do is... (Score:4, Interesting)
Lose every tie to ZFS. Every. Single. One.
Right now.
Like every piece of software Oracle is involved in, ZFS is a big fat patent trap. Not only that, but ZFS is a lot slower than Ext3 and Ext4, and probably Btrfs[1] as well. There is absolutely no benefit to using ZFS as an object storage target, there is only the certainty of legal problems.
[1] Oracle is involved with Btrfs too, so exercise due caution.
Re: (Score:1)
Unfortunately, Lustre-on-ZFS [zfsonlinux.org] is substantially faster that lustre on ext3, mainly because ZFS combines the features of an lvm and a filesystem. That eliminates the need to have SAN appliance heads managing the storage and provides some additional data integrity features. It's cheaper too.
Re: (Score:2)
Is it faster because of the ZFS intent log and second level cache on SSD ?
Re: (Score:3)
Unfortunately, Lustre-on-ZFS [zfsonlinux.org] is substantially faster that lustre on ext3, mainly because ZFS combines the features of an lvm and a filesystem
That's bafflegab and incorrect. Or if you disagree, please explain why.
Re: (Score:3)
And by the way, is your opinion based on benchmarks, or on hype from Sun? I strongly suspect the latter.
Re: (Score:2)
It appears to be based on the linked site:
"In particular, ZFS’s advanced architecture addresses two of our key performance concerns: random I/O, and small I/O. In a large cluster environment a Lustre I/O server (OSS) can be expected to generate a random I/O workload. There will be 100’s of threads concurrently accessing different files in the back-end file system. For writes ZFS’s copy-on-write transaction model converts this random workload in to a streaming workload which is critical whe
Re: (Score:2)
It appears to be based on the linked site:
"In particular, ZFS’s advanced architecture addresses two of our key performance concerns: random I/O, and small I/O. In a large cluster environment a Lustre I/O server (OSS) can be expected to generate a random I/O workload. There will be 100’s of threads concurrently accessing different files in the back-end file system. For writes ZFS’s copy-on-write transaction model converts this random workload in to a streaming workload which is critical when using SATA disks. For small I/O, Lustre can leverage a ZIL placed on separate SSD devices to maximize performance."
The LLNL ZFS study has been pretty widely publicized in the HPC community. Lustre uses the filesystem API rather than mounting in. Until now Lustre used ext under-the-hood for data storage, so the performance improvement from ZFS is relative to ext. ext3/4 may very well outperform ZFS on a workstation or small server, but that's not the what Lustre is used for (even their test system is ~900TB).
Disclaimer: I used to work for LLNL.
Disclaimer: I used to work on Ext3. I would classify the above as "hype from Sun". There is a hidden cost to making all the writes linear on spinning media: the reads become nonlinear. This is usually the wrong tradeoff.
Note that a traditional journal is another way of linearizing writes in that a transaction write transaction can be considered durably recorded to media as soon as the journal write completes.
Benchmarks tell the true story, not hype, and on good information and belief the benchmarks say Z
Re: (Score:2)
Re: (Score:2)
If you are comparing ZFS performance on linux, then, yes, it is slower
No, I am comparing Ext3/4 on linux to ZFS on Solaris.
Re: (Score:2)
There have been plenty of benchmarks out there showing ZFS's performance besting EXT3 and EXT4 on identical hardware (with one running OpenSolaris and the others on linux)
Link please.
Re: (Score:2)
Sun had good reasons for reasons for going with the CDDL, and Oracle has equally good reasons for sticking with it.
Yes, keeping Linux out on purpose:
In the words of Danese Cooper, who is no longer with Sun, one of the reasons for basing the CDDL on the Mozilla license was that the Mozilla license is GPL-incompatible. Cooper stated, at the 6th annual Debian conference, that the engineers who had written the Solaris kernel requested that the license of OpenSolaris be GPL-incompatible. "Mozilla was selected partially because it is GPL incompatible. That was part of the design when they released OpenSolaris. [...] the engineers who wrote Solaris [...] had some biases about how it should be released, and you have to respect that"
http://meetings-archive.debian.net/pub/debian-meetings/2006/debconf6/theora-small/2006-05-14/tower/OpenSolaris_Java_and_Debian-Simon_Phipps__Alvaro_Lopez_Ortega.ogg [debian.net]
the fact that nobody bothers doing this
http://zfsonlinux.org/ [zfsonlinux.org]
On the other hand, if you're hellbent on Linux and not too invested in the kernel
That makes no sense. Linux _is_ the kernel. Do you mean GNU?
Re: (Score:2)
Under linux this is so true, but under *BSD the Deduplication portion works too, and that is an excellent feature if you are running a huge amount of storage.
Re:Very first thing to do is... (Score:5, Informative)
ZFS is a big fat patent trap
Oracle has released the ZFS code under the CDDL. While lots of Linux people hate the license, it has very strong patent retaliation clauses. Oracle explicitly grates you patent licenses for everything required to use ZFS via clause 2.1. All other contributors do via clause 2.2. Anyone exerting patents against ZFS immediately (well, within 60 days) loses this grant and has their (copyright) license terminated as well via clause 6.2.
Since Sun accepted third-party contributions to ZFS under the OpenSolaris program, if Oracle tried exerting patents against any ZFS distributor then they would immediately have to stop distributing Solaris and then remove all of these contributions before they could start again.
The ZFS patents are only an issue for a reimplementation of ZFS for Linux, and that's a problem caused by the GPL. Using the FreeBSD or NetBSD ports of ZFS (or even the FUSE port) gives you an explicit grant to the patents.
Re: (Score:2)
The ZFS patents are only an issue for a reimplementation of ZFS for Linux, and that's a problem caused by the GPL.
"Mozilla was selected partially because it is GPL incompatible. That was part of the design when they released OpenSolaris. [...] the engineers who wrote Solaris [...] had some biases about how it should be released, and you have to respect that" - Danese Cooper
http://caesar.acc.umu.se/pub/debian-meetings/2006/debconf6/theora-small/2006-05-14/tower/OpenSolaris_Java_and_Debian-Simon_Phipps__Alvaro_Lopez_Ortega.ogg [acc.umu.se]
Re: (Score:2)
Comparing ZFS to ant of the EXT FSes is pointless, and utterly misses the point of ZFS.
Do ext3/4 provide snapshotting?
Do they provide deduplication?
Do they perform hash checks to avoid duplicating files in the first place?
Do they provide ANY of the dozens of features that set ZFS apart from other filesystems?
Don't bother, the answer is no.
And if you're going to disable those features on ZFS, then you have no reason to be using it in the first place, so you're effectively making an apples to zebras compariso