NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / os / linux / 8154 < prev next >

Wrap

Internet Message Format | 1992-08-13 | 3.1 KB

Path: sparky!uunet!olivea!sgigate!odin!mips!sdd.hp.com!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!yale!mintaka.lcs.mit.edu!ai-lab!life.ai.mit.edu!burley From: burley@geech.gnu.ai.mit.edu (Craig Burley) Newsgroups: comp.os.linux Subject: Re: Buffer corruption problems. Message-ID: <BURLEY.92Aug13153840@geech.gnu.ai.mit.edu> Date: 13 Aug 92 19:38:40 GMT References: <16078@ucdavis.ucdavis.edu> <1992Aug13.163854.21617@midway.uchicago.edu> Sender: news@ai.mit.edu Followup-To: comp.os.linux Organization: Free Software Foundation 545 Tech Square Cambridge, MA 02139 Lines: 50 In-reply-to: ace3@quads.uchicago.edu's message of 13 Aug 92 16:38:54 GMT In article <1992Aug13.163854.21617@midway.uchicago.edu> ace3@quads.uchicago.edu (Tony 'LLama' Acero) writes: I have no idea what's going on and would appreciate any input! :-) (The smiley is to indicate I'm not complaining and half-expecting that I've done something bone-headed) I'm not sure about your problem or the person's to whose post you followed up, but... ...I believe there is a bug in Linux that has the following behavior: - causes Linux to "misread" one 1024KB chunk of data from a disk-based file so that what your app ends up with is some _other_ 1024KB chunk (apparently from the same file) - occurs only during very heavy disk access, such as megabytes accessed continually - is intermittent, but happens enough to reproduce fairly easily - might be SCSI-related (I have a SCSI system) but, based on responses I've gotten from others saying they've seen the same behavior, probably isn't - is still in 0.97 and perhaps happens somewhat more often there (though of course it's hard to measure this) I keep putting off exploring this bug myself for various reasons, such as: I'm not a Linux-kernel hacker yet; I keep hoping someone else will fix it first; it's hard to debug when your own dev system _has_ the intermittent failure; I'm too busy with GNU Fortran; I'm waiting for the newer SCSI code with 4K blocks to see how that affects the bug (since that might be an important clue); I'm waiting until I have extfs &c up an running reliably so I can make use of my currently wasted disk space before tackling this bug; I'm just plain lazy; I'd rather play tennis with my wife; etc. I'm convinced that when I finally decide to tackle the bug (rather than just write a shell script and program to create a test-case to demonstrate it, as I've done so far), I'll blow 72 hours on it and _then_ find it someone else found and fixed it! (Unfortunately, nobody responded to my posted test case saying they'd reproduced the problem, much less had the know-how and desire to look into it themselves. If anyone wants, I could repost it to the mailing list as I did last time. Email me, don't post here, to keep traffic low, if you want me to post the test case.) Of course it could be a hardware bug in my system, but seeing as others _seem_ to have the same problem on wildly different hardware, I doubt it. -- James Craig Burley, Software Craftsperson burley@gnu.ai.mit.edu Member of the League for Programming Freedom (LPF)