home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!mcsun!news.funet.fi!hydra!klaava!torvalds
- From: torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds)
- Newsgroups: comp.os.linux
- Subject: Re: GP faults and other trivia. . .
- Message-ID: <1992Sep9.094232.17759@klaava.Helsinki.FI>
- Date: 9 Sep 92 09:42:32 GMT
- References: <1992Sep8.104048.1@ualr.edu>
- Organization: University of Helsinki
- Lines: 87
-
- In article <1992Sep8.104048.1@ualr.edu> nmspillers@ualr.edu writes:
- >
- >Anyhoo, the crux of my question is this--I'm working with Linux 0.96c
- >(no patches yet, I want to solve this problem first) and trying to compile
- >the kernel. I usually get a general protection fault somewhere in the kernel
- >compile, this leads to a 4-11 meg (no lie!) core and the compile degenerates
- >into signal 11 and 6 compiler errors. [ deleted ]
-
- There has been some talk about these kinds of things lately, so I might
- as well answer this..
-
- If you see intermittent system failure (core dumping in gcc etc) that
- are not easily repeatable, and don't necessarily happen at the same
- location every time, there are a couple of possible reasons:
-
- (a) kernel bugs
- (b) weird/buggy hardware
- (c) installation problems
-
- (a) is certainly possible, although the fact that the same thing works
- on a lot of other machines/setups does make me wonder about some
- reports. (c) on the other hand usually results in easily repeatable
- problems: they occur at the same point each time.
-
- The above (mostly deleted) description does sound like a memory problem:
- I should probably enable the NMI just to get a warning about it, but I
- think current versions of linux disable it at bootup (I think I disable
- it as soon as possible as the system cannot handle it during setup, and
- after the system is up and running and a NMI could be handled, I never
- re-enable them. I haven't looked into in a long time, so I could be
- wrong.)
-
- Alternate reasons are disk read errors, although the drivers do check
- error conditions, and you should see kernel messages if they occur. And
- if you are wondering "why only gcc?", the reason is probably that gcc is
- the one program that usually eats up most of your memory, and actively
- shuffles things around. So if you have a bad memory chip or the linux
- disk driver has some problems, they usually show up in gcc - that's when
- all the buffers are in use, and most of your memory is being excercised
- a lot.
-
- Note that memory problems are more likely to show up under linux-0.97
- and newer: not because they are more fragile, but simply because they
- use memory much more dynamically, and are more likely to take full
- advantage of the memory you have got.
-
- So the first things to check when seeing problems like the above is if
- it's hardware-related: one good way to do this is usually to slow down
- the machine to 8MHz or whetever, and see if the errors go away. If they
- do, it's probably not a software bug (although races etc can be
- timing-dependent: not very frequent). Other things you can do is try
- out some system testing software: but note that linux usually is a
- better system tester than most of these especially if they run onder
- 16-bit DOS and don't check 32-bit accesses to high memory etc.
-
- ----- change of subject
-
- Reading the above, your reaction might be "he's obviously trying to
- blame the hardware to get out of a tight spot". True. But it's also a
- case of standard bug-reporting of a operating system: with most other
- programs you can usually safely blame the program. While any bug report
- is preferable to none, there are things you can do to help me find it
- all. So I might as well use this post to mention some of them, true of
- most bug-reports:
-
- - mention all the necessary information. Too much data can be
- confusing (and boring), and too little can lead to other problems, so
- this isn't easy. Use your own judgement, but THINK about it a bit
- before.
-
- - try to make it repeatable, and find the minimal example. This is
- also almost always difficult or impossible, as the obvious bugs are
- certainly fixed, but it helps /a lot/ if you can simplify the bug-report
- a bit.
-
- - if you cannot make it repeatable or simple, assume it's a hardware
- problem, and start from that. Try different setups. If possible,
- different machines, but if not, try to change your setup as much as
- possible, and see if anything changes.
-
- - If the problem changes or disappears due to hardware changes, it
- might still be a software bug, so you might still want to send it in as
- such. But add your test-results to the report. And if the problem
- seems to be hardware-independent, mentioning the fact that you tested it
- it in your report is likely to get your report a higher priority.
-
- Linus
-