Previous Next Contents

2. Common Problems

This section lists some of the common problems that people have. If there is not anything here that answers your questions, you should also consult the sections for your host adapter and the devices in that are giving you problems.

2.1 General Flakiness

If you experience random errors, the most likely causes are cabling and termination problems.

Some products, such as those built around the newer NCR chips, feature digital filtering and active signal negation, and aren't very sensitive to cabling problems.

Others, such as the Adaptec 154xC, 154xCF, and 274x, are _extremely_ sensitive and may fail with cables that work with other systems.

I reiterate : some host adapters are _extremely_ sensitive to cabling and termination problems and therefore, cabling and termination should be the first things checked when there are problems.

To minimize your problems, you should use cables which

  1. Claim SCSI-II compliance
  2. Have a characteristic impedance of 132 ohms
  3. All come from the same source to avoid impedance mismatches
  4. Come from a reputable vendor such as Amphenol

Termination power should be provided by _all_ devices on the SCSI bus, through a diode to prevent current backflow, so that sufficient power is available at the ends of the cable where it is needed. To prevent damage if the bus is shorted, TERMPWR should be driven through a fuse or other current limiting device.

If multiple devices, external cables, or FAST SCSI 2 are used, active or forced perfect termination should be used on both ends of the SCSI bus.

See the Comp.Periphs.Scsi FAQ (available on tsx-11 in pub/linux/ALPHA/scsi) for more information about active termination.

2.2 The kernel command line

Other parts of the documentation refer to a "kernel command line".

The kernel command line is a set of options you may specify from either the LILO : prompt after an image name, or in the append field in your LILO configuration file (LILO .14 and newer use /etc/lilo.conf, older versions use /etc/lilo/config).

Boot your system with LILO, and hit one of the alt, control, or shift keys when it first comes up to get a prompt. LILO should respond with

:

At this prompt, you can select a kernel image to boot, or list them with ?. Ie

:?

ramdisk floppy harddisk

To boot that kernel with the command line options you have selected, simply enter the name followed by a white space delimited list of options, terminating with a return.

Options take the form of

variable=valuelist

Where valuelist may be a single value or comma delimited list of values with no whitespace. With the exception of root device, individual values are numbers, and may be specified in either decimal or hexadecimal.

Ie, to boot linux with an Adaptec 1520 clone not recognized at bootup, you might type

:floppy aha152x=0x340,11,7,1

If you don't care to type all of this at boot time, it is also possible to use the LILO configuration file "append" option with LILO .13 and newer.

Ie,

append="aha152x=0x340,11,7,1"

2.3 A SCSI device shows up at all possible IDs

If this is the case, you have strapped the device at the same address as the controller (typically 7, although some boards use other addresses, with 6 being used by some Future Domain boards).

Please change the jumper settings.

2.4 A SCSI device shows up at all possible LUNs

The device has buggy firmware.

As an interim solution, you should try using the kernel command line option

max_scsi_luns=1

If that works, there is a list of buggy devices in the kernel sources in drivers/scsi/scsi.c in the variable blacklist. Add your device to this list and mail the patch to Linus Torvalds <Linus.Torvalds@cs.Helsinki.FI>.

2.5 You get sense errors when you know the devices are error free

Sometimes this is caused by bad cables or improper termination.

See section General Flakiness

2.6 A kernel configured with networking does not work

The auto-probe routines for many of the network drivers are not passive, and will interfere with operation with some of the SCSI drivers.

2.7 Device detected, but unable to access

A SCSI device is detected by the kernel, but you are unable to access it - ie mkfs /dev/sdc, tar xvf /dev/rst2, etc fails.

You don't have a special file in /dev for the device.

Unix devices are identified as either block or character (block devices go through the buffer cache, character devices do not) devices, a major number (ie which driver is used - block major 8 corresponds to SCSI disks) and a minor number (ie which unit is being accessed through a given driver - ie character major 4, minor 0 is the first virtual console, minor 1 the next, etc). However, accessing devices through this separate namespace would break the unix/Linux metaphor of "everything is a file," so character and block device special files are created under /dev. This lets you access the raw third SCSI disk device as /dev/sdc, the first serial port as /dev/ttyS0, etc.

The preferred method for creating a file is using the MAKEDEV script - cd /dev

and run MAKEDEV (as root) for the devices you want to create - ie

 ./MAKEDEV sdc

wildcards "should" work - ie

 ./MAKEDEV sd\*

"should" create entries for all SCSI disk devices (doing this should create /dev/sda through /dev/sdp, with fifteen partition entries for each)

 ./MAKEDEV sdc\*

"should" create entries for /dev/sdc and all fifteen permissible partitions on /dev/sdc, etc.

I say "should" because this is the standard unix behavior - the MAKEDEV script in your installation may not conform to this behavior, or may have restricted the number of devices it will create.

If MAKEDEV won't do the right magic for you, you'll have to create the device entries by hand with the mknod command.

The block/character type, major, and minor numbers are specified for the various SCSI devices in section Device Files in the appropriate section.

Take those numbers, and use (as root)

mknod /dev/device b|c major minor
ie -

mknod /dev/sdc b 8 32
mknod /dev/rst0 c 9 0

2.8 SCSI System Lockups

This could be one of a number of things. Also see the section for your specific host adapter for possible further solutions.

There are cases where the lockups seem to occur when multiple devices are in use at the same time. In this case, you can try contacting the manufacturer of the devices and see if firmware upgrades are available which would correct the problem. If possible, try a different scsi cable, or try on another system. This can also be caused by bad blocks on disks, or by bad handling of DMA by the motherboard (for host adapters that do DMA). There are probably many other possible conditions that could lead to this type of event.

Sometimes these problems occur when there are multiple devices in use on the bus at the same time. In this case, if your host adapter driver supports more than one outstanding command on the bus at one time, try reducing this to 1 and see if this helps. If you have tape drives or slow cdrom drives on the bus, this might not be a practical solution.

2.9 Configuring and building the kernel

Unused SCSI drivers eat up valuable memory, aggravating memory shortage problems on small systems because kernel memory is unpagable.

So, you will want to build a kernel tuned for your system, with only the drivers you need installed.

cd to /usr/src/linux

If you are using a root device other than the current one, or something other than 80x25 VGA, and you are writing a boot floppy, you should edit the makefile, and make sure the

ROOT_DEV =

and

SVGA_MODE =

lines are the way you want them.

If you've installed any patches, you may wish to guarantee that all files are rebuilt. If this is the case, you should type

make mrproper

Irregardless of weather you ran make mrproper, type

make config

and answer the configuration questions. Then run

make depend

and finally

make

Once the build completes, you may wish to update the lilo configuration, or write a boot floppy. A boot floppy may be made by running

make zdisk

2.10 LUNS other than 0 don't work

Many SCSI devices are horrendously broken, lock the SCSI bus up solid, and do other bad things when you attempt to talk to them at a logical unit someplace other than zero.

So, by default recent versions of the Linux kernel will not probe luns other than 0. To work around this, you need to the max_scsi_luns command line option, or recompile the kernel with the CONFIG_SCSI_MULTI_LUN option.

Usually, you'll put

max_scsi_luns=8

on your LILO command line.

If your multi-LUN devices still aren't detected correctly after trying one of these fixes (as the case will be with many old SCSI->MFM, RLL, ESDI, SMD, and similar bridge boards), you'll be thwarted by this piece of code

/* Some scsi-1 peripherals do not handle lun != 0.                
   I am assuming that scsi-2 peripherals do better */
if((scsi_result[2] & 0x07) == 1 &&                    
   (scsi_result[3] & 0x0f) == 0) break;                    

in scan_scsis() in drivers/scsi/scsi.c. Delete this code, and you should be fine.


Previous Next Contents