home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Network Support Encyclopedia 96-1
/
novell-nsepro-1996-1-cd2.iso
/
download
/
netware
/
tabnd1.exe
/
TABEND.TXT
< prev
next >
Wrap
Text File
|
1995-04-13
|
23KB
|
548 lines
TROUBLESHOOTING OPERATING SYSTEM ABENDS
DISCLAIMER: THE ORIGIN OF THIS INFORMATION MAY BE INTERNAL OR
EXTERNAL TO NOVELL. NOVELL MAKES EVERY EFFORT WITHIN ITS MEANS
TO VERIFY THIS INFORMATION. HOWEVER, THE INFORMATION PROVIDED IN
THIS DOCUMENT IS FOR YOUR INFORMATION ONLY. NOVELL MAKES NO
EXPLICIT OR IMPLIED CLAIMS TO THE VALIDITY OF THIS INFORMATION.
This ABEND troubleshooting package, TABEND.EXE, includes the
following files:
Tabend.txt This document in text format.
Tabend.wp5 This document in WordPerfect 5.1 format.
Tabend.wp6 This document in WordPerfect 6.1 format
Ck_list.txt Appendix A of this document in a separate file.
RCSI.txt "Resolving Critical Server Issues." An article
from the Feb. 1995 Application Notes in text
format.
RCSI.wp5 "Resolving Critical Server Issues" article in
WordPerfect 5.1 format.
RCSI.wp6 "Resolving Critical Server Issues" article in
WordPerfect 6.1 format.
Config. The self-extracting file which contains config.nlm.
This NLM is used to collect server information.
The article "Resolving Critical Server Issues" covers ABEND and
GPPE troubleshooting in great depth. The Application Notes can
be downloaded from CompuServe at "go NetApps." There may be a
charge for this service. You can also purchase the AppNotes by
calling 1-800-377-4136.
This document is an attempt to help you better troubleshoot an
ABEND on your own before you place a call to Novell. Each of the
steps listed here are necessary steps. Most ABEND problems will
be resolved by doing what this document outlines. If, in your
case, this does not solve your problem, instructions are given
for what to have ready when you place a call for support.
What Is A Server ABEND
An ABEND occurs when program execution is halted abnormally.
There are many ABEND messages, but the three most common are GPPE
(General Protection Processor Exception), Page Fault Processor
Exception error, and NMI (Non-Maskable Interrupt). These three
errors are all processor exceptions, meaning that they are
generated by the processor. NetWare merely reports the message.
"The NetWare 3 and 4 operating systems continually monitor the
status of various server activities to ensure proper operation.
If NetWare detects a condition that threatens the integrity of
its internal data (such as an invalid parameter being passed in a
function call, or certain hardware errors), it abruptly halts the
active process and displays an "ABEND" message on the screen.
("ABEND" is a computer science term signifying an ABnormal END of
program.)
The primary reason for ABENDs in NetWare is to ensure the
stability and integrity of the internal operating system data.
For example, if the operating system detected invalid pointers to
cache buffers and yet continued to run, data would soon become
unusable or corrupted. Thus an ABEND is NetWare's way of
protecting itself and users against the unpredictable effects of
data corruption." (Resolving Critical Server Issues. Feb. 1995
Application Notes. Page 37.)
How To Troubleshoot An ABEND - Step 1
An ABEND can be caused by hardware or software. It is easier and
cheaper to troubleshoot the software first. The steps in this
section alone may solve your server ABEND, and may also prove to
be valuable preventative maintenance that will avert other
problems. Appendix A of this document is a summary sheet that
you should fill out as you troubleshoot your server. If you end
up opening a Technical Support Incident at Novell, the Support
Engineer will want this sheet from you.
NOTE: An NMI Parity error (ABEND: Non-Maskable Interrupt)
is a special case of ABEND error. NMI errors are
hardware problems. See Appendix B - Dealing With An
NMI Error.
1. Update all LAN and disk drivers. Each manufacturer
of LAN and disk cards must develop their own drivers.
The only way to assure that you have the latest
version of these drivers is to download them from the
respective vendor. Even new hardware does not
usually ship with the most current drivers. THIS STEP
IS CRITICAL - Be certain that drivers are the newest
available from the respective vendor!!! Another part
of this step is to have updated LAN support modules.
These modules include msm31x.nlm or msm.nlm, and
ethertsm.nlm and/or tokentsm.nlm (or any other tsm
module that your system may require). Get the latest
version of Landr?.exe (where the ? represents the
revision number or letter of the file). See Appendix
C - How To Access The NetWare OS Patches And Updated
Files.
2. Apply all patches. There are known issues with the OS
that the patches have been written to fix. Load ALL
the patches that apply to your version of the
operating system. We also find that the patches
invariably solve other problems that we may not have
known about. The file name you need to get is <OS
version>PT<file revision number or letter>.EXE. For
example, patches for a NetWare v3.12 server would be
in the file 312pt6.exe, where "6" is the current
revision of the patch file. See Appendix C - How To
Access The NetWare OS Patches And Updated Files.
3. Re-copy server.exe. File corruption can happen to
any file, even the server.exe. A corrupt server.exe
can be difficult to track down. For this reason, it
is easier to perform this step than to find out,
after a lot of troubleshooting, that a corrupt
server.exe was the problem. If the corruption were
only in server memory the solution would be to down
and exit the server and then power off the machine
and turn it back on.
Just in case the corruption has been written to disk, copy
a fresh copy of server.exe from the original disks or from
a write protected working copy. The same idea applies to
any other file or files in the system or public directory
that may have become corrupted.
Remember, the server.exe in NetWare v3.x contains the
server license number. Don't copy the wrong server.exe.
4. Update clib, streams, & SPX Files. Clib.nlm is a
library of functions that many Novell and third party
modules use to access the operating system
functionality. Because of this clib.nlm changes
often. Streams.nlm works in conjunction with Clib.nlm
but does not change as often. You should check to
see that both of these modules are the current
version.
Spxs.nlm is used for much of the server to workstation
communications. This NLM should also be updated to the
current version.
See Appendix C - How To Access The NetWare OS Patches And
Updated Files.
5. Do a Virus Scan of the DOS and NetWare Partition.
This should be habit during any troubleshooting.
6. Other Things To Look At. Here is a list of of items
that have been known to cause server ABENDS.
- Power fluctuations at the power source.
- A failing power supply.
- A bad cooling fan. (Heat Kills Hardware!)
- A dry, hot or dusty environment can encourage
hardware degradation and failure due to static
electric discharge.
- Check the server's error log for other clues.
- Look for other problems that may end up being
related. For example lost connections, drive
deactivation, climbing packet receive buffers, high
dirty cache buffers, a high number of LAN errors,
high utilzation, etc..
Another question to ask that may point you in the right
direction is, "What changes have been made to the server
environment lately?" Don't automatically say, none. Have you
increased the number of users? Is there new software? Has
software been upgraded? Is someone using software in a way
different then it had been used, such as database indexing,
etc.? Is there new or different hardware? Have there been
changes to the LAN, the routers, or the cabling? Have
workstations or the file server been physically moved? Are
there new printers on the LAN? Have there been any power
outages? Have SET parameters been changed? Etc.....
How To Troubleshoot An ABEND - Step 2
If the problem is not solved by now you have two troubleshooting
paths to pursue. One - it is a hardware problem, and Two - it is
a misbehaving NLM.
Hardware Hardware is actually the most likely cause at this
point. When troubleshooting hardware break the
network down into its component parts, or
subsystems. The subsystems to consider are LAN
Channel, Disk Channel, and System Board. Then use
the ABEND message to point you in the direction of
one of these subsystems. Most disk channel errors
are easy to pick out. ABENDs that mention server
process... are often, but not always, LAN related.
Errors that refer to ...memory... , ...alloc...,
or ...allocator...., etc. can be memory, system
board, or NLM related. Once you establish a
direction try replacing the hardware that you
think could be causing the ABEND. As a matter of
routine always check for poorly seated cards,
dirty connections, faulty cables, and things like
termination and SCSI ID (vendors sometimes differ
on how they handle termination and SCSI ID - Be
Aware). In some cases the problem can be
compatibility between hardware components.
NLM's The next most likely issue is a misbehaving NLM. First,
try to find a way to duplicate the ABEND at will. Look
for anything that seems to happen concurrently with the
ABEND. Ask yourself questions like these: Does the ABEND
happen at the same time of day, or the same day of the
week? Is there a certain application that is always
running, or is there some function in an application that
is always running such as database indexing? Is there a
certain workstation or segment that is also having a lot
of problem (incorrectly formed packets can cause a server
ABEND)?
These questions may help you to "divide and conquer" the
problem. Next, remove ALL non-essential NLM's. This
should include virus scanners, diagnostic and monitoring
NLM's, and NLM's that are not Novell certified. If the
server seems to stablize, load these NLM's back to the
server one at a time. Let the server sit after each NLM is
loaded to assure that it is ok to continue troubleshooting.
If you have the luxury of being able to duplicate the ABEND
at will, troubleshooting is much easier. Bring up the
server using "server -ns." This will bring up the server
without loading the startup.ncf file. Now load drivers and
NLM's one at a time and try to duplicate the ABEND. The
intention is to find an NLM that is responsible for the
ABEND. If you find an NLM that causes the ABEND contact
the developer of that NLM.
How To Troubleshoot An ABEND - Step 3
If the problem hasn't cleared up by this point it's probably time
to call in reinforcements. Your first step should be to call a
Novell Authorized Service Center (NASC). These Gold and Platium
dealers are Novell Netware trained and willing to help you. To
find the service center closest to you call 1-800-NET-WARE (638-9273), choose option 1, then choose option 2. Someone is there
to assist you from 7:30am to Midnight CST.
If you still need to contact Novell Technical Support Do The
Following Before You call us.
1. Run the NLM "config.nlm" at your server. This NLM was
included with the TABEND.exe (troubleshooting ABENDs) file
that you downloaded. When it completes it will place a file
named "config.txt" in your sys:system directory. This file
contains important server information that we can use to help
troubleshoot your ABEND. You will probably be ask for this
file by the Novell Technical Support (NTS) Engineer. He will
tell you at that time how to get it to us.
2. Next, fill out the form in Appendix A. This form is included
as the file ck_list.txt. When complete, append the form to
the config.txt file that was created in the previous step.
3. At this point open an incident with Novell Technical Support.
Tell the support engineer that you have the config.txt file
ready.
4. Consider the possiblity that you may need to get a core
memory dump from your server. A core memory dump takes a
"snapshot" of the server's RAM as it looks at the time of the
ABEND. We call this the "memory image." This image can be
collected and sent to Novell on floppy, tape, or via FTP. We
are able to use the information found in your memory image to
help isolate what is causing your server ABEND. For complete
instructions on how to collect a memory image see the
appendix of the document "Resolving Critical Server Issues."
This document has been included.
DO NOT automatically take a core dump. Wait until a
Technical Support Engineer instructs you to do so. Also, Do
not send us core dumps from servers that do not have the
patches and current LAN and disk drivers loaded. Too often we
end up spending time on a problem that has already been
resolved by current patches or updated software. Make sure
you have the current patches and current LAN and disk
drivers!
Appendix A - Check list/Summary
Incident Number: Name: Phone:
O/S version ________DS version _______Amount of RAM ________
Make/Model of Machine (indicate if a clone)/Bus Type:
______________________________________________________________
LAN card, driver name, driver date & version:
________________, ________________, __________________
LAN card, driver name, driver date & version:
________________, ________________, __________________
HBA (controller), driver name, driver date & version:
________________, ________________, __________________
List the devices on this HBA:
HBA (controller), driver name, driver date & version
________________, ________________, __________________
List the devices on this HBA:
Are your drives mirrored? Y N Or,duplexed? Y N
How much total volume space? __________________
1. Have you updated the LAN and disk drivers? Y N
2. Have you applied all the appropriate patches? Y N
3. Have you copied a fresh copy of Server.exe? Y N
4. Is your clib.nlm current? Y N
5. Have you virus scanned the DOS and Netware Partition? Y N
6. What other information do you have that may help
troubleshoot this problem?
7. What changes have been made to the server recently?
(Increased number of users, new software, upgraded
software, new or different hardware, LAN or router changes,
workstations or file server physically moved, power
outages, set parameter changes, etc...)
8. What hardware has been swapped out already?
9. Do you have config.txt ready to upload to us? Y N
Appendix B - Dealing With An NMI Error
As mentioned in the main body of this document, an NMI error is a
hardware problem. There are three types of interrupts that a
processor can handle: a maskable hardware interrupt (INTR), a
non-maskable hardware interrupt (NMI), and a software interrupt
(INT). The processor has a dedicated line on the system board
bus that handles only non-maskable hardware interrupts.
According to Intel's - i486 Mircroprocessor Hardware Reference
Manual this NMI line can be asserted as a result of one of three
catastrophic events,: 1) an imminent power loss, 2) a bus-transfer parity error or, 3) a memory-data parity error. When
this NMI line is asserted the processor generates an NMI error.
This error is received by the NetWare operating system and then
reported to the console screen. There are two flavors of NMI
errors, "ABEND: NMI parity error generated by IO check," and
"ABEND: NMI parity error generated by System Board." If the NMI
is generated by the system board there is a fairly good chance
the problem is with the system board or its' memory, although it
can still be elsewhere. If the NMI is generated by an IO check,
the problem could be anywhere. Here is a list of hardware
related items that we have found to cause NMI's. These idea's
should help you as you troubleshoot an NMI error.
1. Faulty RAM.
2. Faulty system board
3. Any I/O card. Especially cards with on-board memory.
4. Low or fluctuating power at the power source. Remember,
UPS's can go bad too.
5. Power supply going bad.
6. Memory extension boards.
7. System board memory that is mismatched in either speed or
brand.
8. Conflicting interrupts.
9. Try cleaning and reseating cards/cables/and memory
modules.
10. Incompatiblity between hardware pieces.
11. Look at the environment and how the equipment is handled.
NMI's can often be traced back to static electric
discharge. A sometimes overlooked point is that static
does not alway cause immediate failure, the damage can be
degenerative. The hard failure may not occur until
sometime in the future.
12. This is rare, but, we have also seen hard drives cause
NMI's.
Appendix C - How To Access The NetWare OS Patches And Updated
Files
What file to download?
The patches for each version of the OS are grouped into a
compressed, self-extracting executable file. These files are
named as follows: <OS version>PT<file revision number or
letter>.EXE. For example patches for a NetWare v3.12 server would
be in the file 312pt6.exe, where "6" is the current revision of
the patch.
This is a list of files mentioned in document:
NLM Download CompuServe FTP
Name This file Location Location
clib.nlm Libup?.exe NovFiles library NovFiles
streams.nlm STRTL?.exe 3.x or 4.x files Novlib\04 and 14
spxs.nlm STRTL?.exe 3.x or 4.x files Novlib\04 and 14
3.11 patches 311PT?.exe 3.x files Novlib\04
3.12 patches 312PT?.exe 3.x files Novlib\04
4.01 patches 401PT?.exe 4.x files Novlib\14
4.02 patches 402PT?.exe 4.x files Novlib\14
4.10 patches 410PT?.exe 4.x files Novlib\14
Where "?" represents the current revision of the file.
How to get the updated files?
NSE Pro The Netware Support Encyclopedia CD Rom has all the
latest OS patches and updates. The NSE can be
purchased by calling 800-346-7177.
CompuServe Get onto CompuServe and "Go Netwire,"
choose "File Updates," choose "Novlib,"
choose "Library," then choose from the
list of libraries.
FTP If you have an Internet connection and FTP to
FTP.Novell.Com.
Web Server http://www.novel.com/
How to apply the patches?
Place the compressed/executable file in its own directory and run
it. Get the read me file named <filename>.txt. This read me file
will give any detailed instructions neccessary to properly load
the patches. If you are running NetWare v3.11, load the patches
listed under the abstract section on the first page of the readme
(311PT?.TXT). If you are running NetWare v3.12 or any version of
NetWare v4.xx load ALL the patches.
There are three types of patches.
DYNAMIC -- Dynamic patches are implemented as <patch
name>.nlm files that can be loaded/unloaded while the server
is running. Unloading a dynamic patch will restore the
Operating System to its original "un-patched" state.
SEMI-STATIC -- Semi-static patches can also be loaded while
the server is running, but they cannot be unloaded. It is not
possible to undo the effects of a semi-static patch without
first downing the server and bringing it back up without
loading the semi-static patch.
STATIC -- Does not apply in the context of this document.
Dynamic and semi-static patches modify the Operating System in
memory, not on the disk. This means that dynamic and semi-static
patches must be loaded each time the Operating System is brought
up in order for any 'fixes' to take effect. Add a line to the
AutoExec.ncf or Startup.ncf, which ever is applicable, to
automatically load each patch the next time the server is downed
and brought back up.
In order to see which patches are currently loaded on the system,
type "PATCHES" at the file server command line. The patches will
then be grouped and displayed according to their type (i.e. -
STATIC, SEMI-STATIC, or DYNAMIC). If you already have patches
loaded check dates to make sure you have the most current
versions.
Appendix D - Help Us To Help You
If this document has helped you to solve your ABEND problem such
that you did not have to open an incident with Novell Technical
Support, we would like to hear about it. Simply fax us this page
with your comments on it. Fax the form to Novell Technical
Support at (801)429-5200 to the attention of "TABENDS FEEDBACK."
Thanks for your feedback.
NOTE: This form is for comments only. We will not be able to
response to any comments/questions given here.
Your Name:
Company Name:
Address:
Phone Number:
Were you able to solve your ABEND problem without opening an
incident with Novell Technical Support? If so, tell us the nature
of your problem and how this information helped you.
How can we make this document or the included files more useful
to you?
Are there other issues that might lend themselves to this type of
support ?
What else would you like to see Novell Technical Support doing to
make your job of supporting your network environment easier?