home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.seagate.com
/
2014.07.ftp.seagate.com.tar
/
ftp.seagate.com
/
pub
/
palindrome
/
technote
/
tn9401.asc
< prev
next >
Wrap
Text File
|
1995-02-07
|
27KB
|
669 lines
Version 1.3 February 6, 1995
PURPOSE OF THIS TECH NOTE
To diagnose causes of NLM hangs and take appropriate action to resolve
the problem.
CONTENTS
1. OVERVIEW
2. FILES TO DOWNLOAD
3. WHAT TO DO FIRST
4. FREEING A HUNG NLM PROCESS
5. SERVERS WITH MORE THAN 16 MB OF RAM
6. SERVER TUNING
6.1 LOW AVAILABLE SERVER CACHE
6.2 IMPROPER SERVER TUNING
6.3 OLD NETWARE MODULES
6.4 OLD TSA's (SMS-based product only)
6.5 OLD LAN DRIVERS
7. HARDWARE ISSUES
7.1 ADAPTEC 274X SCSI CARDS
8. TYING IT ALL TOGETHER
9. WHAT TO DO IF PROBLEMS PERSIST
1. Overview
Troubleshooting backup or restore hangs can be a time-consuming process.
When a backup or restore process hangs, it is usually caused by some
type of environmental problem and/or hardware conflict. The exact cause
or combination of causes is often difficult to pinpoint.
Please take time to read this entire Tech Note. Your particular hang
problem may be caused by a combination of the reasons described below.
NOTE: The syntax and documentation references in this Tech Note are
based on Network Archivist NLM version 3.1. Your syntax and
the area of the documentation you may need to reference may
vary depending on the Network Archivist version you are using.
2. FILES TO DOWNLOAD
This Tech Note refers to several files which are located on Palindrome's
BBS and NETWIRE. You may want to download some of the following files
before proceeding with this Tech Note.
FILE NAME DESCRIPTION AVAILABLE ON DATE PASSWORD
-----------------------------------------------------------------------------
APRVMED.ASC List of Palindrome- Palindrome BBS N/A none
approved media
-----------------------------------------------------------------------------
LIBUP4.EXE CLIB.NLM Palindrome BBS none
version 3.12H for NetWare 3.11 and 3.12
version 4.01e for NetWare 4.01 and 4.02
-----------------------------------------------------------------------------
CDL31.ASC PNA/PBD 3.1 Palindrome BBS N/A none
Supported Device
List
-----------------------------------------------------------------------------
CDL30.ASC PNA 3.0 Supported Palindrome BBS N/A none
Device List
----------------------------------------------------------------------------
CDL20G.ASC TNA 2.0G / PBD 2.1 Palindrome BBS N/A none
Supported Device
List
-----------------------------------------------------------------------------
CDL20F.ASC TNA 2.0F Supported Palindrome BBS N/A none
Device List
-----------------------------------------------------------------------------
LANDR3.EXE Updates to NETWIRE N/A N/A
Novell-certified
*.LAN drivers
-----------------------------------------------------------------------------
PALDIAG.ZIP Backup device Palindrome BBS N/A none
diagnostic program
-----------------------------------------------------------------------------
STRTL3.EXE Updates to NETWIRE N/A N/A
NetWare NLMs
----------------------------------------------------------------------------
DRVR.ASC Tested PNA/PBD Palindrome BBS N/A
SCSI driver
configurations
-----------------------------------------------------------------------------
PALSDUMP.ZIP Server Information Palindrome BBS N/A N/A
gathering tool
-----------------------------------------------------------------------------
SPXS4X.EXE Updated spxs module Palindrome BBS none
for NetWare 4.0x
servers
-----------------------------------------------------------------------------
3. WHAT TO DO FIRST
If a backup or restore process hangs, perform the steps described below.
1) Always check the PNA_LOG file for any related error messages. For help
with PNA error messages, with our 3.0 release or later, you can select the
Help - On Error menu, and get more information on a given error code.
Often other pertinent error messages are also contained in the server's
volume error log, and system error log. For instance, if our process hang
was caused by a drive deactivation, this event will be recorded in the
system error log. Also, PNA's backup activity of the server's Bindery
will also be recorded in the system error log.
2) If the backup or restore process hangs, VERIFY the File History
Databases (AV*.PAC) for each protected volume using the
Restore/Verify History Database(s) menu option. If you are using
PNA version 3.0, use the Tools/Verify History Database(s) menu option;
if you are using TNA version 2.x, use the TNARECOV menu option.
3) Write down (or print screen through RCONSOLE) a description of the
backup/restore activity screen. What is the media state? Where on the
volume is the backup or restore process? What is the state of the
LEDs on the front of the backup device?
4) Include the TIMESTAMP (/TS) switch during the next backup or restore
operation. The TIMESTAMP switch will create a detailed log file
(TMPSTAMP.LOG) which is written to your PNA installation directory.
The TMPSTAMP.LOG file indicates exactly where the backup or restore
process was at the time of the hang. This is often important, especially
if the hang re-occurs. In that case, a pattern can be established of
exactly what process causes the hang each time, and that can significantly
narrow down the possible cause.
Additionally, you can edit the "Command to Execute" field (located
on the Network Archivist Configure/System Parameters screen) to include
the TIMESTAMP switch in your unattended backup command.
BACKUP
------
To invoke a TIMESTAMP during a backup operation, issue the following
command:
PNABACK /A /Q /TS
NOTE: Use the LOAD statement if you are issuing PNABACK
from the server console.
RESTORE
-------
To invoke a TIMESTAMP during a restore operation, simply add the
/TS switch to the end of the command line. For example:
PNAREST /RO FS1/APPS: \*.* /TS
5) If, during a backup, you suspect a particular volume is causing the
problem, deactivate the volume on the Protected Resource List.
(If you are using a 3.0 or previous version of PNA, remove the
volume from the Protected Volume List.) If the backup completes,
the hang may be caused by an environmental problem on the volume.
NOTE: DO NOT remove the volume from the Protected Resource List if
you are using PNA version 3.1. With version 3.1, removing
the volume also causes the volume's File History Database
to be deleted. However, making the resource INACTIVE simply
causes PNA to ignore the volume during the backup, and leaves
its File History Database intact.
6) To determine whether a backup hang is database-related, perform a
Full Export. Since an export operation does not use the PNA databases
the way a managed backup operation does, a successful completion
may indicate a database problem.
4. FREEING A HUNG NLM PROCESS
IMPORTANT!!
NEVER try to unload an NLM that is hung or turn off the backup device
as a means of freeing up a hung process. Attempting to unload a hung
process or turning off the backup device may result in the entire server
locking or abending.
Always wait until a convenient time, then down the server properly using
the DOWN command.
5. SERVERS WITH MORE THAN 16 MB OF RAM
A server with more than 16 MB of RAM and a lower end SCSI card such as
an Adaptec 1540 or 1640, or a Bustech 540, can cause NLMs to hang.
Within NetWare, any ISA SCSI card placed in an EISA-bus PC cannot
address memory above 16 MB.
As a workaround to this problem, Palindrome developed an additional
SCSI interface driver called PALSDRV.NLM, which was introduced in PNA 3.x.
PALSDRV.NLM is described below.
1) ARE YOU USING TNA 2.x or PBD 2.x?
If so, upgrade your SCSI card to one that can address memory above
16 MB. If you have an EISA machine, upgrade to a Palindrome-supported
EISA SCSI controller. Optionally, you can upgrade your TNA or PBD
installation to PNA 3.1 or PBD 3.1, both which include Palindrome's
SCSI interface driver PALSDRV.NLM. Contact your Palindrome reseller
for upgrade information.
2) ARE YOU USING PNA 3.x or PBD 3.1?
If so, add a "SET RESERVED BUFFERS BELOW 16 MEG = 200" statement
to the installation server's STARTUP.NCF file.
Also be sure to load the PALSDRV.NLM driver with the "ABOVE16MEG" switch.
You can add this statement to the installation server's AUTOEXEC.NCF
file.
"SET RESERVED BUFFERS" Statement
--------------------------------
Example of statement to be added to the PNA installation server's
STARTUP.NCF file:
SET RESERVED BUFFERS BELOW 16 MEG = 200
NOTE: In order for this SET command to take effect,
you must reboot the server.
PALSDRV.NLM Driver
------------------
Example of statements to be added to the PNA installation server's
AUTOEXEC.NCF file:
LOAD AHA1540 PORT=330 ABOVE16=Y
LOAD PALSDRV ABOVE16MEG
NOTE: If PALSDRV is currently loaded in server memory
(run MODULES to determine if it is loaded), unload
it and reload it with the ABOVE16MEG switch. However,
do NOT unload PALSDRV if a backup or restore process
is running.
For more information, refer to the Network Archivist 3.1 Installation
and Support Guide or the Network Archivist 3.0 Reference Guide.
6. SERVER TUNING
6.1 Available server cache is too low.
Check the AVAILABLE CACHE BUFFERS on the PNA installation server. To do
this, load MONITOR at the server console, choose RESOURCE UTILIZATION,
and check the percentage of AVAILABLE CACHE BUFFERS.
NOTE:
On a Netware 4.x server, load MONITOR, choose RESOURCE UTILIZATION, and
check the CACHE BUFFER percentage.
If the percentage is at 50% or below, consider adding more memory to your
server. This is because when a backup or restore process loads, it will take
up an additional 1.5 MB to 2 MB. This coupled with normal network activity
may take available cache percentage into the critical zone of 30% or lower.
6.2 Improper server tuning for Netware 3.1x and 4.0x servers.
Refer to page A-4 of the PNA 3.1 NLM Installation and Support
Guide for more information on the procedures described below.
NOTE: When using the settings suggested below, ensure your
available cache buffers do not fall below 30% at any
time, including DURING a backup. To check this, switch
screens to Monitor from a running backup process. The
backup module will be using some additional memory during
the backup, and a server with 30% cache buffers when a
backup is NOT running, can easily fall below the 30% range
when a backup runs. This is especially true with servers
with 16 or less megabytes of RAM.
The settings suggested below can be placed in the
AUTOEXEC.NCF file unless otherwise noted.
PACKET RECEIVE BUFFERS
----------------------
Palindrome recommends setting the "Minimum Packet Receive Buffers" to
at least 100. A good rule of thumb is 1 packet receive buffer per
licensed connection and 25 packet receive buffers for each NIC in the
server.
For example, if your server is a 250-user license and your server has
2 NICs, then your "Minimum Packet Receive Buffers" setting should be 300.
To set the minimum packet receive buffers, use a command similar to the
following at the server console:
SET MINIMUM PACKET RECEIVE BUFFERS = 100
SET MAXIMUM PACKET RECEIVE BUFFERS = 500
NOTE: The "set minimum packet receive buffers" statement can be
placed in the STARTUP.NCF file (but not in the AUTOEXEC.NCF file).
SPXCONFG FOR 3.1x Servers ! (See next step for 4.0x servers)
--------
SPXCONFG is used to set certain parameters of SPX. Palindrome recommends
using the following timeout values on your installation server if SPX is
used:
Abort=5000, Verify=100, Wait=3000, Retry=255, Quiet=1.
To configure these settings, use a command similar to the following at the
server console:
LOAD SPXCONFG A=5000 V=100 W=3000 R=255 Q=1
( WARNING ! Loading this statement on a 4.x server will cause problems )
SPXCONFG FOR 4.0x Servers !
--------
Load "servman" on the 4.0x server.
Select IPX/SPX Configuration.
Select SPX Parameters.
Set the parameters as follows:
SPX watchdog abort timeout 5000
SPX watchdog verify timeout 100
SPX ack wait timeout 3000
SPX default retry count 255
Maximum concurrent SPX sessions 1000
DIRECTORY CACHE BUFFERS
If any protected servers have directories containing a large number of
files, be sure to allocate enough directory cache buffers.
Perform the following to calculate the number of cache buffers a server
should have:
1) Multiply the number of files (or potential number of files) in the
largest directory by the number of name spaces on the volume.
EXAMPLE: The largest directory has 5000 files and there
are two name spaces loaded (AFP and DOS). Multiply
5000 x 2 = 10,000.
2) Multiply the number from Step 1 by the size of the directory
entry, 128 bytes.
EXAMPLE: 128 x 10000 = 1,280,000
3) Divide the number from Step 2 by the size of the cache buffer
(default size is 4K)
EXAMPLE: 1,280,000/4096 = 320
4) Using the number from Step 3 as a minimum baseline, enter the
following command at the server console:
SET MINIMUM DIRECTORY CACHE BUFFERS=320
Remember, the Maximum Directory Cache Buffers must be set
to a value greater than the Minimum.
SHORT TERM MEMORY ALLOCATION on Netware 3.11 servers
On servers with a large amount of communication activity (many users),
the default 2MB of Allocated Short Term Memory is often not enough.
Depending on the number of users connected, the Allocated Short
Term Memory should generally be somewhere between 4MB and 6MB and
be increased as needed. Use a command similar to the following at
the server console:
SET MAXIMUM ALLOC SHORT TERM MEMORY = 4000000
6.3 OLD NETWARE MODULES
Be sure you are using current versions of the following modules on all
protected servers: CLIB, STREAMS, TLI, IPXS, SPXS. Update any modules
on protected servers that are not the current version.
Applying updates will depend on the NetWare version you are currently
running. Check on NETWIRE for information on the current NetWare updates.
Currently, STRTL3.EXE is available on NETWIRE and contains updates to
important Netware NLMs. STRTL3.EXE is also available on Palindrome's BBS.
Use MODULES to compare the date of the modules loaded on your PNA
installation server and all protected servers with the dates of the
modules listed below. If you find an older version on your server(s),
replace it with the current version.
NetWare 3.1x (includes Netware 3.12)
STREAMS.NLM 7-20-93
SPXS.NLM 5-17-94
TLI.NLM 9-14-93
IPXS.NLM 8-10-93
NetWare 4.0x
STREAMS.NLM 9-14-93
SPXS.NLM 8-30-94 *
TLI.NLM 9-14-93
IPXS.NLM 8-23-93
SPXDDFIX.NLM 3-10-94
* This updated SPXS module can be uploaded via Palindrome's BBS or
NetWire under filename SPXS4X.EXE.
** Soon, NetWare 4.01 and 4.02 will no longer be supported by Novell.
Palindrome strongly recommends updating to NetWare 4.1 as soon
as possible. Make sure to be running at least NetWare 4.02.
NetWare 4.1
The NLMs found in the shipping-version of NetWare 4.1 are current as
of the date of this tech note.
CURRENT CLIB VERSIONS
Be sure you are using a current version of CLIB. Compare the date of the
CLIB loaded on your server(s) with the dates listed below according
to the version of NetWare you are using. If you are not using the current
CLIB version, update your PNA installation server and remote servers
accordingly.
SERVER RUNNING USE CLIB.NLM VERSION DATE
NetWare 3.12 or: 3.12H * 10-27-94
- NetWare 3.11
NetWare 4.0x 4.01E * 5-25-94
NetWare 4.1 4.10 11-03-94
* CLIB.NLM version 3.12H and 4.01E are available on the Palindrome BBS as
LIBUP4.EXE (no password).
6.4 OLD TSAs (SMS-BASED PNA & PBD ONLY)
Use MODULES to compare the TSA listing below with the TSAs currently loaded
on your PNA installation server and remote servers.
If you are using an older TSA than those listed below, unload it and
replace it with the corresponding TSA located on Palindrome's TSA diskette.
Also be sure to use the correct TSA name. (For example, do not load TSA_400;
the newer version of the same TSA is TSA400.) If you are using TSA's NEWER
than those listed below, that's ok. Do NOT change TSA's if your TSA's are
newer than those listed below.
NetWare 3.11 server
SMDR31X.NLM 10-28-93
TSA311.NLM 11-5-93
WS_MAN.NLM 10-12-93
TSA_DOS.NLM 11-8-93
NetWare 3.12 server
SMDR31X.NLM 10-28-93
TSA312.NLM 11-5-93
WS_MAN.NLM 10-12-93
TSA_DOS.NLM 11-8-93
NetWare 4.0x server
SMDR.NLM 10-28-93
TSA400.NLM 11-5-93
TSANDS.NLM 10-26-93
WS_MAN.NLM 10-12-93
TSA_DOS.NLM 11-8-93
NetWare 4.1 server
SMDR.NLM 10-20-94
TSA410.NLM 10-21-94
TSANDS.NLM 10-21-94
WSMAN.NLM 10-21-94
TSADOS.NLM 10-21-94
Novell develops the core operating system TSAs, and generally releases new
versions every 6 months. The most current release availble is on the
Palindrome's BBS in a file called TSA0694.ZIP.
IMPORTANT NOTE:
There are issues raised by upgrading TSAs that go beyond the scope of
this tech note, so please see Palindrome Tech Note TN094-003.ASC if you
intend to upgrade any TSAs. If you do upgrade TSA's, the Protected
Resource List will need to be reconfigured. This process is explained
in TN94-003.ASC.
The TSAs available for NW 4.1 ship with NW 4.1. Any later releases will
be made available on the Palindrome BBS and NETWIRE.
6.5 OLD LAN DRIVERS
Be sure you are using the most current LAN driver version available for
your specific network card and NetWare version. If you need to verify
the current version of your LAN driver, contact the vendor from whom
you purchased the driver.
Novell provides updates to the latest Novell-certified server
drivers for NExxx.* boards, TOKEN, TRXNET, and PCN2L. These
updates are located on NETWIRE under filename LANDR3.EXE.
LANDR3.EXE also contains updates to the 4.0x and 3.1x media
support module (MSM and MSM31X) and the Ethernet topology
support module (ETHERTSM).
Netware 4.1 ships with updated LAN drivers.
If applicable, compare the dates of the NLMs listed below to those
currently installed on your protected servers. Update to the version(s)
listed below if your driver is older.
NE2000.LAN 10-8-93
NE3200.LAN 10-8-93
MSM.NLM 10-4-93
MSM31X.NLM 10-4-93
ETHERTSM.NLM 9-28-93
7. HARDWARE ISSUES
1) Be sure your SCSI controller, backup device, and the backup device's
firmware are supported. The Palindrome BBS contains the following
supported device listings:
CDL31.ASC for PNA & PBD 3.1
CDL30.ASC for PNA 3.0
CDL20G.ASC for TNA 2.0G & PBD 2.1
CDL20F.ASC for TNA 2.0F
To view information about the SCSI host adapter and the model and
firmware version of your backup device, use PALDIAG.NLM to scan the
SCSI bus. (PALDIAG.NLM can also be used to run diagnostics on the
backup device.) PALDIAG.NLM is included with Palindrome's 3.x products
and is also available on Palindrome's BBS under filename
PALDIAG.ZIP (no password needed).
To scan the SCSI bus and determine your backup device's model and
firmware version, copy PALDIAG.NLM to your SYS:SYSTEM directory,
and from the server console type:
LOAD PALDIAG /S
2) Verify that the SCSI BUS is properly terminated.
3) Download the file TSTDRVR.ASC from Palindrome's BBS and compare your
current SCSI driver with the list of Palindrome-tested SCSI drivers.
4) Be sure the drive is clean and the media being used is a recommended
DATA GRADE brand. Download APRVMED.ASC from Palindrome's BBS for
a listing of approved media.
5) To test the backup device, use PALDIAG with the "/D" switch to run
diagnostics on the device.
To perform the diagnostics, at the server console type:
LOAD PALDIAG /D
PALDIAG /D performs a short read/write test that will provide
percentages of retries during the read and write phase of the testing.
When you run PALDIAG /D, be sure the drive is clean and perform the
test on at least 2 different tapes. Print screen the results in case
you need help from Palindrome Technical Support.
Refer to the PNA Reference Guide (Chapter 3, "Media Maintenance")
for acceptable percentage thresholds for your particular backup device.
6) If you suspect the media is bad, verify the media using the
Media/Verify menu option. For pre-3.x versions of PNA, use the
Utility/Verify Tape menu option.
7) Check for possible interrupt, I/O, and DMA conflicts with
other adapters in the server.
8) Consider swapping the hardware components. Begin with the backup
device, followed by the cable, and then the host adapter.
7.1 ADAPTEC 274X SCSI CONTROLLERS
ADAPTEC 274X ADAPTERS:
Many hang situations are seen in cases where an Adaptec 274x card is in use,
often as the backup device controller, and sometimes as the NetWare Volume's
device controller. It is critical that you make sure you have the proper
driver for this popular SCSI card. Typically, the drivers dated 5-31-94 or
later have resolved the vast majority of problems. This applies for any SCSI
controller running the AIC7770 chipset, including embedded SCSI controllers
integrated into the motherboards of the Intel Deskside server, and the HP
NetServer. With earlier revisions of the AIC7770 chip, having too many
devices on the same SCSI bus can precipitate hangs and device deactivations
during high-throughput situations, such as a backup. You can tell the chip's
revision by looking at the 3rd line of print on the chip marked Adaptec. The
first line of print says "Adaptec", the second says "AIC7770" (sometimes with
a "p" after it), and the third row is a row of letters and numbers. If that
row begins with a "C", you have an early rev. chip. If the row begins with
"H" or later, then updating the driver should resolve the problem.
8. TYING IT ALL TOGETHER
Once you have gathered all of the above information, it is important to
establish a pattern, and a point of failure. There are a number of important
questions that need to be asked that can lead to the solution:
When did the problem first begin to occur?
Was there something that changed immediately prior to the first occurrence?
When does the hang occur?
What operation is the software performing when it happens?
Does it happen on the same operation each time?
The same volume?
The same place on a particular volume?
Does it occur when we begin writing to the backup media?
What hardware components are being used, and which are being used the heaviest
at the time of the hang?
If you change one of the variables, (ie. hardware component or configuration,
or software component or setting) does the behavior change?
9. WHAT TO DO IF PROBLEMS PERSIST
If the suggestions described in this Tech Note do not resolve the
hang problem you are experiencing, contact Palindrome Technical Support.
Please have the following items ready to fax to Palindrome Tech Support:
1. PALSDUMP output
(refer to readme contained within PALSDUMP.ZIP for
instructions. PALSDUMP.ZIP can be downloaded from
Palindrome's BBS)
2. Server Configuration sheet (completed)
(SERVCFG.ZIP on Palindrome's BBS)
Note: There is no need to fill out redundant information in
the server config sheet that has already been provided in the
PALSDUMP output. Be sure to fill out hardware information.
3. Printout of the TMPSTAMP LOG
4. Printout of PALDIAG /S output
5. Printout of PALDIAG /D output