2 Introduction Common Problems

Contents of this section

This section lists some of the common problems that people have. If there is not anything here that answers your questions, you should also consult the sections for your host adapter and the devices in that are giving you problems.

2.1 Important Note General Flakiness

If you experience random errors, the most likely causes are cabling and termination problems.

Some products, such as those built arround the newer NCR chips, feature digital filtering and active signal negation, and aren't very sensitive to cabling problems.

Others, such as the Adaptec 154xC, 154xCF, and 274x, are extremely sensitive and may fail with cables that work with other systems.

I reiterate: some host adapters are extremely sensitive to cabling and termination problems and therefore, cabling and termination should be the first things checked when there are problems.

To minimize your problems, you should use cables which

  1. Claim SCSI-II compliance
  2. Have a characteristic impedance of 132 ohms
  3. All come from the same source to avoid impedance mismatches
  4. Come from a reputable vendor such as Amphenol

Termination power should be provided by all devices on the SCSI bus, through a diode to prevent current backflow, so that sufficient power is available at the ends of the cable where it is needed. To prevent damage if the bus is shorted, TERMPWR should be driven through a fuse or other current limiting device.

If multiple devices, external cables, or FAST SCSI 2 are used, active or forced perfect termination should be used on both ends of the SCSI bus.

See the comp.periphs.scsi FAQ ftp://tsx-11.mit.edu/pub/linux/ALPHA/scsi for more information about active termination.

2.2 The kernel command line

Other parts of the documentation refer to a ``kernel command line''.

The kernel command line is a set of options you may specify from either the LILO: prompt after an image name, or in the append field in your LILO configuration file (LILO .14 and newer use /etc/lilo.conf, older versions use /etc/lilo/config).

Boot your system with LILO, and hit one of the alt, control, or shift keys when it first comes up to get a prompt. LILO should respond with

boot:

At this prompt, you can select a kernel image to boot, or list them with ``?''. Ie

boot: ?

ramdisk floppy harddisk

To boot that kernel with the command line options you have selected, simply enter the name followed by a white space delimited list of options, terminating with a return.

Options take the form of

variable=valuelist

Where valuelist may be a single value or comma delimited list of values with no whitespace. With the exception of root device, individual values are numbers, and may be specified in either decimal or hexadecimal.

Ie, to boot linux with an Adaptec 1520 clone not recognized at bootup, you might type

boot: floppy aha152x=0x340,11,7,1

If you don't care to type all of this at boot time, it is also possible to use the LILO configuration file ``append'' option with LILO .13 and newer.

Ie,

append="aha152x=0x340,11,7,1"

2.3 A SCSI device shows up at all possible IDs

If this is the case, you have strapped the device at the same address as the controller (typically 7, although some boards use other addresses, with 6 being used by some Future Domain boards).

Please change the jumper settings.

2.4 A SCSI device shows up at all possible LUNs

The device has buggy firmware.

As an interim solution, you should try using the kernel command line option

max_scsi_luns=1

If that works, there is a list of buggy devices in the kernel sources in drivers/scsi/scsi.c in the variable blacklist. Add your device to this list and mail the patch to Linus.

2.5 You get sense errors when you know the devices are error free

Sometimes this is caused by bad cables or impropper termination.

See general flaky : General Flakiness

2.6 A kernel configured with networking does not work.

The auto-probe routines for many of the network drivers are not passive, and will interfere with operation with some of the SCSI drivers.

2.7 Device detected, but unable to access.

A SCSI device is detected by the kernel, but you are unable to access it - ie mkfs /dev/sdc, tar xvf /dev/rst2, etc fails.

You don't have a special file in /dev for the device.

Unix devices are identified as either block or character (block devices go through the buffer cache, character devices do not) devices, a major number (ie which driver is used - block major 8 corresponds to SCSI disks) and a minor number (ie which unit is being accessed through a given driver - ie character major 4, minor 0 is the first virtual console, minor 1 the next, etc). However, accessing devices through this separate namespace would break the unix/Linux metaphor of ``everything is a file,'' so character and block device special files are created under /dev. This lets you access the raw third SCSI disk device as /dev/sdc, the first serial port as /dev/ttyS0, etc.

The preferred method for creating a file is using the MAKEDEV script:

cd /dev

and run MAKEDEV (as root) for the devices you want to create - ie

./MAKEDEV sdc

wildcards ``should'' work - ie

./MAKEDEV sd\*

``should'' create entries for all SCSI disk devices (doing this should create /dev/sda through /dev/sdp, with fifteen partition entries for each)

./MAKEDEV sdc\*

``should'' create entries for /dev/sdc and all fifteen permissible partitions on /dev/sdc, etc.

I say ``should'' because this is the standard unix behavior - the MAKEDEV script in your installation may not conform to this behavior, or may have restricted the number of devices it will create.

If MAKEDEV won't do the right magic for you, you'll have to create the device entries by hand with the mknod command.

The block/character type, major, and minor numbers are specified for the various SCSI devices in Subsection 3: Device Files in the appropriate section.

Take those numbers, and use (as root)

mknod /dev/device b|c major minor

ie -

mknod /dev/sdc b 8 32
mknod /dev/rst0 c 9 0

2.8 Device detected, but unable to access. SCSI System Lockups

This could be one of a number of things. Also see the section for your specific host adapter for possible further solutions.

There are cases where the lockups seem to occur when multiple devices are in use at the same time. In this case, you can try contacting the manufacturer of the devices and see if firmware upgrades are available which would correct the problem. If possible, try a different scsi cable, or try on another system. This can also be caused by bad blocks on disks, or by bad handling of DMA by the motherboard (for host adapters that do DMA). There are probably many other possible conditions that could lead to this type of event.

Sometimes these problems occur when there are multiple devices in use on the bus at the same time. In this case, if your host adapter driver supports more than one outstanding command on the bus at one time, try reducing this to 1 and see if this helps. If you have tape drives or slow cdrom drives on the bus, this might not be a practical solution.

2.9 Device detected, but unable to access. Configuring and building the kernel

Unused SCSI drivers eat up valuable memory, aggravating memory shortage problems on small systems because kernel memory is unpagable.

So, you will want to build a kernel tuned for your system, with only the drivers you need installed.

cd to /usr/src/linux

If you are using a root device other than the current one, or something other than 80x25 VGA, and you are writing a boot floppy, you should edit the makefile, and make sure the

ROOT_DEV =

and

SVGA_MODE =

lines are the way you want them.

If you've installed any patches, you may wish to guarantee that all files are rebuilt. If this is the case, you should type

make mrproper

Regardless of whether you ran make mrproper, type

make config

and answer the configuration questions. Then run

make depend

and finally

make

Once the build completes, you may wish to update the lilo configuration, or write a boot floppy. A boot floppy may be made by running

make zdisk

2.10 LUNS other than 0 don't work

This is often a problem with SCSI-> MFM, RLL, ESDI, SMD, and similar bridge boards.

At some point, we came to the conclusion that many SCSI-I devices were extremely broken, and added the following code


/* Some scsi-1 peripherals do not handle lun != 0.
   I am assuming that scsi-2 peripherals do better */
if((scsi_result[2] & 0x07) == 1 && 
   (scsi_result[3] & 0x0f) == 0) break;

to scan_scsis() in drivers/scsi/scsi.c. If you delete this code, your old devices should be detected correctly if you have not used the max_scsi_luns kernel command line option, or the NO_MULTI_LUN compile time define.

Next Chapter, Previous Chapter

Table of contents of this chapter, General table of contents

Top of the document, Beginning of this Chapter