2.4.20 ext3 Data Corrupting Bug Fixed 34
An anonymous reader writes "The ext3 data corrupting bug found in the latest stable Linux kernel and reported by Slashdot here and here has been fixed. In this interesting KernelTrap story Andrew Morton describes the problem and offers a working patch. Evidently the bug has its roots in a much bigger design issue, something that won't likely be fixed in the current 2.4 kernel series. In any case, with Morton's patch applied your data will not be corrupted."
QA test cases. (Score:3, Insightful)
Where can I find the QA documentation, test cases and scripts for ext3? I would like to verify that this bug, and variations thereof, will be caught before release in the future. Thanks.
They don't seem to be at the ext3 home (linked to in the story).
Open Source is useless without Open Procedures, Open Documentation and Open Quality Control.
Should be front page. (Score:4, Insightful)
I hate to say it, but maybe /. doesn't like stories that make linix look bad.
Re:Should be front page. (Score:3, Insightful)
slashdot:
news for whiners, stuff for people who need things explained to them in very small words
Re:Should be front page. (Score:2, Insightful)
Sometime in the future, 2.4 will go down in history as one serious cluster-fuck of a kernel.
Re:Should be front page. (Score:2, Funny)
2.4.11 or 2.4.15 (aka "greased turkey") anyone?
Re:Should be front page. (Score:2)
The fact that you are modded up +4 proofs that that is untrue.
Re:Yet another proof (Score:1)
Yah. (Score:2)
Still, Linux has so many filesystems it's not funny. What are the odds of them getting in the way of the kernel in the future?
Re:Yet another proof (Score:2, Informative)
You're an idiot if you don't have backups anyhow. The most reliable filesystem in the world isn't going to save you from a hard-drive failure, user error, malicious code, theft, flood, fire, lightning strike, earthquake.. These things eat data a lot more frequently than filesystem bugs!
Expect data loss. Keep backups.
Re:Yet another proof (Score:2)
Are you saying that users should refrain from upgrading to newer releases even when those have been explicitly tagged as 'stable'? Where do you draw the line?
I do think there is some truth in the argument that you shouldn't upgrade the kernel even from a stable series. Wait for your vendor to release an updated kernel package, if they judge it necessary. And maybe don't upgrade even then.
But it is unfair in this case to criticize users for installing what they thought was a stable, tested, reliable kernel version. Ah well, mistakes happen.
Re:Yet another proof (Score:1)
vs 3.0 (Score:3, Interesting)
Re:vs 3.0 (Score:2)
I can only assume that the moderators that moderated this up are similarly misinformed, which is why I chose to reply rather than moderate this "Overrated" like it should be.
Re:vs 3.0 (Score:2)
Re:vs 3.0 (Score:2)
Re:vs 3.0 (Score:2)
Re:vs 3.0 (Score:2)
And here we're talking about calling the next major release "3.0" while things as important as /the file system/ need to be majorly reworked.
2.4.x is the "stable" kernel. That means its not supposed to incorporate radical changes to its infrastructure. Apparently, the maintainer thought they could add some "safe" changes off of the 2.5.x kernel research to add functionality. The team was wrong. The ideal correction would include a radical change, so its going to be a kludge fix instead.
The file system is getting major rework, IN the development kernel (2.5.x). 2.4 is not 3.0. 2.5 is not 3.0. 3.0 will be out when its ready. Stop judging 3.0 (actually 2.6) based on what's going on in 2.4.
Besides, only an incompetent would use ext3 in a production machine.
Re:vs 3.0 (Score:2)
Actually, the server is an emergency backup / mirror server and it has been pretty unreliable for ages. I am not allowed to replace the nic and a kernel that does not require me to pull the power cable every time things go wrong is a big plus. Maybe there was another solution, but anything more than a day per month on that project is seen as lost time for me.
As to you other point, I hope that Linus's feature freeze does not preclude fixes for problems like this making the next stable set of kernels. Whatever they are called.Re:vs 3.0 (Score:2)
The problem with using ext3 in a production system is that is "new". That means its subject to "bugs". Some bugs don't get picked up until many months after its in use. On a filesystem, that means you can get data corruption and lose files/data for months before you realize there is a problem. (And the corruption would be handed down to your backups.) Also, with ext3 being new, it won't have many diagnostic tools or other utilities.
I have heard BAD things about reiserfs. Its a fact that they don't journal the metadata, just the filesystem structures. In certain crashes, you can lose some data while rapidly bringing up the system. But there are other people who swear by it, and perhaps its better than nothing.
Myself, I use XFS. There are people who will grouse endlessly about it, but I've never encountered a problem with it. In any case, the whole point of a journaling filesystem quick restart of the filesystems (no fsck) AND integrity of the data. Competent sysadmins don't use flaky filesystems or new kernels on PRODUCTION machines.
Actually, the server is an emergency backup / mirror server and it has been pretty unreliable for ages.
Aiieeee... How can it be an emergency backup/mirror server if its unreliable? Mind you, its childsplay to use the machine for prototyping and backup merely by adding a harddrive to it, and doing your prototyping work on the second drive. How the heck can they refuse the replace the NIC if its a clunker? Its a lousy $20 bucks. You probably can cannabalize an old machine's NIC for free.
Maybe there was another solution, but anything more than a day per month on that project is seen as lost time for me.
Screwing around for a day because the company is too cheap to spend $20 for a good NIC is ridiculous as well. Its about 1 hour of your salary. I've worked for cheap companies, but that's plain stupid. As does having you mess around with kernels released days ago.
As to you other point, I hope that Linus's feature freeze does not preclude fixes for problems like this making the next stable set of kernels.
The whole point of the feature freeze is to stop incorporating NEW features. Bugfixes are the only thing allowed in a frozen development kernel until release. Its a mistake to think of a stable kernel (2.4) as being bugfree for each release. There were shops that still ran 2.2 kernels, because they didn't like the "instability" of the 2.4 kernels.
Which as-shipped distros are affected by this? (Score:1, Interesting)
Re:Which as-shipped distros are affected by this? (Score:4, Informative)
Did you mean that you run your ext3 filesystems in full-journal mode, and would like to know if you have to update? Yes. Regardless of distro.
In either case, please remember that journalled mode is NOT the default. The default is ordered. Unless you're explicitly setting your filesystem to full journalling, you aren't affected by this problem.
HTH.
One more reason to use XFS (Score:2)
Install redhat on ext3,
configure redhat, esp the networking
get online, get the latest 2.4 kernel
get XFS patch and xfsprogs and install
recompile a new kernel with XFS in it and boot.
mkfs.xfs
cd
cp -a {bin,usr,etc,... except tmp,mnt,proc}
fix
reboot.
This still gives some obscure errors on bootup, but maybe because of redundant scripts. works very fast and stable for me. If you get around to fixing those errors, please roll out a HOWTO since noone can take filesystem instability on production servers, yet everyone wants to use 2.4.
Re:One more reason to use XFS (Score:1)
Re:One more reason to use XFS (Score:2)
Yes you have to.
I'm aiming for RHCE so I have to use RedHat, and this is the only way to get a decent filesystem. Considering the news of ext3 unstability, still more reason to walk the path of XFS