The Design Of The Google File System 210
Freddles writes "This is an interesting paper (PDF) describing the design approach to Google's file system. The design had to take account of requirements for huge file sizes, a highly responsive infrastructure and an assumption that hardware components will always fail."
In case you don't like PDF (Score:5, Informative)
Re:In case you don't like PDF (Score:3, Funny)
try this
I wish I had enough RAM to use as a harddisk. Then I could...well no, I wouldn't do anything useful. It would be cool, in a geeky way.
Re:In case you don't like PDF (Score:5, Funny)
Re:In case you don't like PDF (Score:2)
In case you don't like links at all (Score:3, Funny)
Re:In case you don't like PDF (Score:2)
as a sidenote, here is the google pdf-to-html cache of it: http://www.google.fi/search?q=cache:m0TMQYgIlIoJ:
Re:In case you don't like PDF (Score:3, Funny)
Yahoo uses the evil Anti-Google FS. It's the 1's complement called GllgOe. It can store 01111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 bytes of data.
Thoughtful... (Score:5, Funny)
Re:Thoughtful... (Score:5, Funny)
Re:Thoughtful... (Score:5, Funny)
Re:Thoughtful... (Score:4, Funny)
I-T. Really now, how hard is that?
Re:Thoughtful... (Score:2)
that is sad though.
Re:Thoughtful... (Score:2)
We can use it whenever people know just too much about trivial things.
+5 would show up as (Score:5, No Life)
Re:Thoughtful... (Score:2)
Of course, AllTheWeb is giving Google a run for its money...in the race [searchenginewatch.com] to make it to 4 billion pages indexed, so Google may fall back down for a while...
However, I don't think many ppl will switch because of a few thousand pages...
Re:Thoughtful... (Score:4, Insightful)
Google:
- top result: php.net
- 2nd place was php.net/downloads
AllTheWeb:
- top result: Hands-On PHP Training - 4 days $1695 (also ranked #10 on Turbo10, but not ranked in the top 20 at Google) -- oops, that is a sponsored link, but in AllTheWeb's default view, it looks like a normal link. php.net is actually ranked #1, but it appears 4th in the list of available links.
Turbo10:
- will not provide ANY results without Javascript turned on (BOO!)
- top result: GBF Masonry Cleaning Services..Stone Cleaning
- php.net ranked 5
Draw your own conclusions, but meta-search engines existed prior to Google yet even at its launch it excelled over them in terms of provision of relevant links. It appears that it still does. At least for a first pass
I suspect that one of the reasons that Google can bring higher quality links to the forefront is that being #1, they have a wider and more generous revenue base and therefore don't have to be as generous to "paying patrons" *cough cough*.
Another problem is that meta engines have to mix "high-quality" results (say from Google) with lower quality results (say from some dippy paid for advertising search engine).
Re:Thoughtful... (Score:2)
Not just that. Google revolutionized the web-search stage with their Pagerank software and other improvements. It's not something new, librarians have used such algorithms for a long time. However, it consistently gives "better" results than most of the competition.
I sus
Re:Thoughtful... (Score:2, Funny)
Take note: "Google is not affiliated with the authors of this page nor responsible for its content."
You mean FAT don't cut it no more? (Score:1, Redundant)
Weeeeeeeeeeeeeeeeeeeeee!
Re:You mean FAT don't cut it no more? (Score:3, Funny)
I think you mean "WEEEEEEE.EEE." Or possibly "WEEEEEE~1.EEE."
Re:You mean FAT don't cut it no more? (Score:5, Funny)
Re:You mean FAT don't cut it no more? (Score:2)
Story summary (Score:4, Funny)
Re:Story summary (Score:2)
PDF mirror (Score:5, Informative)
Interesting... (Score:3, Insightful)
Re:Interesting... (Score:2)
Nothing, or you'll be sued for copyright infringement.
Re:Interesting... (Score:1)
Re:Interesting... (Score:2)
There is nothing wrong with following and learning from our ancestors.
Google have given a great deal of thought into their filesystem, and most likely made some huge mistakes along the way. In the end they have a stable workable system that still gives me the shivers occasionally.
I would see these as guidelines for a further next generation filesystem rather than ripping the code from underneath them and calling it our own.
Just to make it clear.. (Score:5, Informative)
Okay, so I read this paper as a part of the SOSP reading group here [cmu.edu] at school [cmu.edu]. Just want to make it clear that this is not the file system used by the front end that we all see. It is used by internal dev groups as well as the web spiders that they employ. Their unique usage has definitely led to a number of interesting choices (such as the atomic appends) for the file system design. Read the paper for more details
Re:Just to make it clear.. (Score:4, Insightful)
"It is widely deployed within Google for the generation and processing of data used by our service as well as research and development that requires large data sets."
Re:Just to make it clear.. (Score:4, Informative)
Is there still a Google dance? (Score:3, Interesting)
Hmmm. (Score:4, Funny)
Never mind.
Everything's stolen nowdays. (Score:2, Funny)
Why the google file system is nothing but a waffle iron with a phone attached.
Re:Everything's stolen nowdays. (Score:1)
Only a file system? (Score:5, Interesting)
Luckily the world was saved from this possibility.
-John (now, one of those "why, back in my day..." story telling guys... sigh.)
Re:Only a file system? (Score:2)
Not Really. [lycoris.com]
Re:Only a file system? (Score:3, Interesting)
Steve Jobs must be shitting in his pants.
Is it open source? (Score:4, Funny)
Re:Is it open source? (Score:3, Funny)
Ah, yes. You want a new-fangled "ShelFS" system.
Word processor? (Score:2, Interesting)
Re:Word processor? (Score:1)
Nowadays, who knows? Probably Word (shudder).
-John (managing to not be nostalgic for LaTeX hackery).
Everyone still uses Latex in university. (Score:2, Funny)
Re:Word processor? (Score:2)
Re:Word processor? (Score:2)
I also was curious to see what software they had used to write the paper. It looked like a LaTeX document to me. Sure enough a quick peek at the document info reveals:
Title: paper.dvi
Application: dvips(k) 5.86 Copyright 1999 Radical Eye Software
Re:Word processor? (Score:2)
There was some cool stuff buried in there.
Re:Word processor? (Score:2)
Re:Word processor? (Score:2, Informative)
Re:Word processor? (Score:2)
It's very nice.
LaTex is not a word processor (Score:3, Informative)
That being
Re:LaTex is not a word processor (Score:2)
For changes tracking, why not just use cvs?
html version (Score:4, Informative)
I didn't read the whole article (kinda lengthy) but it seems pretty informative. I found their assumptions interesting, as they reveal some of the essence of what makes Google such a great search tool. Here are a few from the article:
- The system is built from many inexpensive commodity components that often fail. It must constantly monitor itself and detect, tolerate, and recover promptly from component failures on a routine basis.
- High sustained bandwidth is more imprtant that low latency. Most of our target applications place a premium onprocessing data in bulk at a high rate, while few have stringent response time requirements for an individual read or write.
- The workloads primarily consist of two kinds of reads: large streaming reads and small random reads. Successive operations from the same client often read through a contiguous region of a file.
Various hardware life expectancies? (Score:3, Interesting)
I think perhaps this is something we could all take a little more seriously. Part of me realises this is a comment on the sheer data being manipulated, but then something else that sprung to mind is the gradual reduction of warranties on HDDs, for example. I wonder what sort of stats an operation of this size could gather on various hardware components, and their varying propensities to wither and die.
Re:Various hardware life expectancies? (Score:2, Interesting)
Re:Various hardware life expectancies? (Score:2)
Re:Various hardware life expectancies? (Score:2)
No, Enterprise users won't settle for interruptions. It's the IT guy's work to figure out how to make a noninterruptible environment as cheap as possible.
Such a solution may well involve ultra cheap drives (one-third the cost of reliable ones) in a redundant RAID setup with hotspares, for example.
Interactive demo (Score:1, Funny)
Fabulous Insights (Score:5, Informative)
64meg chunk size is pretty huge, but I'm guessing that's blocked out based on continual threads of data, not typical files.
At first glance, this file system seems fairly wasteful. But hey, Google likely require speed and reliability over cost. Right?
This reminds me of the discussions about not-so-far-off database filesystems coming to an OS near you.
Re:Fabulous Insights (Score:2)
64 MB is the maximum chunk size. The assumptions section at the beginning talks about typical read/write operations working on about 1 MB.
When will it be in the kernel? (Score:4, Funny)
[ ] Google File System.
in the kernel config.
Must be 12pm - the updatedb script it running.
Re:When will it be in the kernel? (Score:2)
Someday I'll set that to a time when I won't be sat at my computer developing.
Maybe 11am.
Re:When will it be in the kernel? (Score:2)
Re:When will it be in the kernel? (Score:2)
Actually, since it's designed for lots of hardware which is expected to die regularly, I wonder if any of the technology could be applied to P2P networks?
And starting with Linux 2.7... (Score:5, Funny)
Re:And starting with Linux 2.7... (Score:3, Interesting)
How long before ILM or Weta has a GFS disk array?
Apples vs. oranges (Score:2)
Re:Apples vs. oranges (Score:2)
Re:And starting with Linux 2.7... (Score:2)
they published it ... (Score:5, Interesting)
I wonder what they believe will protect their business from poaching of these ideas?
Re:they published it ... (Score:2)
It's called "creating prior art" without patenting the stuff. That's good. It's not evil. It's the google folks.
Re:they published it ... (Score:2, Insightful)
Basically it says that if you spend all your time playing catch up you never be first.
If the other Search engines use the GoogleFS then you know they aren't the leader. Sort of like if kernal.org was running windows 2003 or if www.msn.com was running on linux.
Now if they go and create a FS so they can be the same as google then they are just catching up. Once they catch up to Google, Google will be somewhere else.
The other thing is they're are lots of Clustered file systems around so it
They have no reason for worry (Score:2)
Show your hate for SCO [anti-tshirts.com]. Get a cool t-shirt and donate to the Open Source Now Fund.
Re:they published it ... (Score:5, Insightful)
Perhaps the fact that it's taken many very smart people a good amount of time to implement and tune the original design, even after having come up with the basic layout?
Go take a look at the ReiserFS Future Vision page [namesys.com] -- you'll see some more interesting discussion of filesystem design, and overall direction. There are a few solid developers working full-time on the concepts discussed in the Reiser docs, and they still have enough work to keep them busy for years to come.
Google releasing information regarding the structure of their systems is a bit like John Carmack discussing the structure of his graphics engines: there's a hell of a distance between a conceptual description and a fine-tuned, tested, working implementation.
Given Google's history, I'd also imagine that they're on the lookout for up-and-coming young researchers. As such, if some grad student takes their work and extends it, they can certainly benefit.
RAIC?? (Score:3, Interesting)
What else can it be programmed to do? Could this become the basis for a personal computer where you just add computers seamlessly when you need more power?
RAID (Score:2)
Re:RAIC?? (Score:2)
Google cache (Score:5, Funny)
Re:Google cache (Score:3, Funny)
Google is not affiliated with the authors of this page nor responsible for its content.
GFS and GWS? (Score:2, Funny)
Prevayler anyone? (Score:2, Informative)
GooFS? (Score:2, Funny)
PC #1782563 (Score:2, Interesting)
Chunkservers... (Score:2)
user-mode? PVFS? (Score:2)
I wonder how it compares to PVFS [clemson.edu]. It seems like GoogleFS deals more aggressively with component failure. Any ideas?
Ironically Google has been down all day... (Score:2)
Q.
Failure (Score:2)
People, people (Score:3, Funny)
What a waste.... (Score:2, Insightful)
Re:great. now, deal with the spam issue (Score:5, Funny)
Ummm... not very many. Then again, I try not to search on "teen panties" very often. :)
That reminds me of the winter I spent in Chicago. I needed some galoshes to protect my shoes and keep my feet dry. Back in New England, we called them "rubbers" (I am not making this up). Needless to say, a google search on "buy rubbers" did not yield the intended results.
Re:great. now, deal with the spam issue (Score:2)
Hmmm, searching for help on LaTeX can sometimes be... distracting.
Re:great. now, deal with the spam issue (Score:2)
Re:great. now, deal with the spam issue (Score:2)
how many times have you searched for something on google, only to find that the search engine spammers have taken over almost every top 10 result?
Err, never. Even searches for porn images are still pretty useful (as useful as porn images are, I guess). Dozens of non-porn searches a day and always useful.
let them know (Score:2)
Re:great. now, deal with the spam issue (Score:2)
The other major problem is that many webpages aren't made to be easy to locate. At times they don't even include th
Re:google groups mostly down all day (Score:2)
Sure! Also, some of the counts of messages per thread are optimistic. I guess they've been told 1000 times already..or maybe I should mail them about it too?
Re:Thank God (Score:2)
Tomorrow's slashdot headline: Google proves definitively that 1 + 1 = 2
well... (Score:2)
A "real" GFS has multiple masters, as far as I'm concerned. This is a very specific app tied to a specific need for Google's web collection system.
So I think you're okay, even so.
Also, the article was published before Sept. 17 (earliest commentary I saw), so this is moot.
But anyway, kids, listen to him, don't procrastinate! And if you do, make sure you have adequate forged documentation on your 17 grandparents gruesome
Re:This sounds like a GPL violation to me! (Score:2)
Re:Google FS? (Score:2)
Re:Story Summary (Score:2)