home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!dtix!darwin.sura.net!ra!tantalus!eric
- From: eric@tantalus.dell.com (Eric Youngdale)
- Newsgroups: comp.os.linux
- Subject: Re: .97pl5 won't boot (it's a SCSI problem)
- Message-ID: <3612@ra.nrl.navy.mil>
- Date: 15 Sep 92 15:53:50 GMT
- References: <1992Sep14.134849.2716@odin.diku.dk> <1992Sep15.134447.12185@odin.diku.dk>
- Sender: usenet@ra.nrl.navy.mil
- Organization: Naval Research Laboratory
- Lines: 83
-
- In article <1992Sep15.134447.12185@odin.diku.dk> dingbat@diku.dk (Niels Skov Olsen) writes:
- >I wrote:
- >
- >>My .97pl5 won't boot. My configuration:
- >
- >> i386dx/33mhz
- >> 8mb ram
- >> adaptec 1542b with 3 disks on the scsi bus
- >[ ... ]
- >
- >Now I have found out that the problem is in sd_init. The system
- >is trying to do READ_CAPACITY on sd2, and is waiting for the_result
- >to become non-negative:
- >
- >excerpt from kernel/blkdrv/scsi/sd.c:
- >
- > do {
- > the_result = -1;
- >#ifdef DEBUG
- > printk("sd%d : READ CAPACITY\n ", i);
- >#endif
- > scsi_do_cmd (rscsi_disks[i].device->host_no ,
- > rscsi_disks[i].device->id,
- > (void *) cmd, (void *) buffer,
- > 512, sd_init_done, SD_TIMEOUT, sense_buffer,
- > MAX_RETRIES);
- >
- > while(the_result < 0);
- > ^^^^^^^^^^^^^^^^^^^^^^ it never comes out of this loop...
- >
- > } while (try_again && the_result);
- >
- >I don't understand the details of this, because this code has not
- >changed since .97pl4 where it works!
- >
- >What's going on?
-
- Basically what is happening is that the code is waiting for an
- interrupt to change the value of try_again. One of the parameters to
- scsi_do_cmd is the address of the routine sd_init_done, and guess what
- sd_init_done does???.
-
- The interrupts are handled by several routines. Each different
- host adapter has a low-level routine that does any host-specific things,
- and this in turn calls the mid-level routine, which is scsi_done in scsi.c.
- This routine has the job of interpreting the status codes that come back from
- the device, and deciding what to do. Possibilities are:
-
- * I/O completed normally.
- * Some type of fatal error condition.
- * Some type of error for which we can simply retry the I/O request.
- * Some type of error for which we do not know enough. In this
- case, the routine issues the REQUEST_SENSE command, to ask
- for further information, and when the REQUEST_SENSE command
- completes, it calls scsi_done a second time, and then decides what
- to do based upon the data returned from the REQUEST_SENSE command.
-
- For each request, there should always come a point where it is
- considered "complete", and at this time the top-level completion function is
- called. In this case the routine sd_init_done is that function.
-
- Basically there were a number of bugs in the scsi code in pre-0.97pl5,
- which resulted in the following:
-
- * The adaptec was being improperly asked for the sense information.
- * The wrong buffer was examined to interpret the sense information.
- * The top-level completion function was being called before the
- request was actually complete, and thus it was inadvertently called
- twice. This could lead to the queueing of another command to
- the scsi device before the first one is really complete.
-
- I thought that I had fixed all of these in 0.97pl5, but there is
- apparently a case where the mid-level driver is not calling the top-level
- function. Offhand I do not know why this is the case, but I would suggest that
- anyone who sees this try compiling scsi.c with DEBUG defined, and looking
- at the resulting output to see why this I/O is not calling the top-level
- function. If I get a good report from someone, I should be able to come up
- with a patch on fairly short order.
-
- -Eric
- --
- Eric Youngdale
- eric@tantalus.nrl.navy.mil
-