home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 8 Other
/
08-Other.zip
/
mdosbug4.zip
/
MDOSBUG.TXT
< prev
next >
Wrap
Text File
|
1997-04-11
|
9KB
|
253 lines
MDOS 100% BUG PROBLEM DESCRIPTION Rev. 2.0 / March 9th, 1997
CONTACT PERSON:
Tobias Ernst e-mail: tobi@bland.fido.de
Werderstr. 70 fidonet: 2:2476/418
D-76137 Karlsruhe os2net: 81:449/7835
Germany phone: +49 721 9374497
CONTENTS OF THIS DOCUMENT
This document contains
- A typical phenomenological problem description
- A technical problem description
- A kernel patch to solve the problem (!)
- Annotations
ONE-LINE DESCRIPTION OF THE PROBLEM:
APAR JR-10024:
MDOS APPS CAUSE 100% LOAD DUE TO BROKEN TIME SLICE API IN WARP4
A TYPICAL PHENOMENOLOGICAL PROBLEM DESCRIPTION
A lot of different DOS software, esp. DOS DFUE software, which
ran smoothly under Warp 3, causes 100% system workload since
upgrade from Warp 3 to Warp 4. Most of these programs claim to
be OS/2 aware, i.E. to release time slices to OS/2. The
difference between Warp 3 and Warp 4 can be visualized using a
system process monitor. It reveals the following:
WARP 3-> WARP 4->
Priority State Priority State
Program active 0x0201 Ready 0x0201 Ready
Program idle 0x0200 Blocked 0x0200 Ready
Warp 4 simply "forgets" to block the program when it is idle.
The following diagraph shows an interesting effect that
illustrates that there must be a serious bug in the time
slicing API. It was taken with the DOS fidonet mailer software
"McMail", a typical OS/2-aware DOS program:
Resulting system load
maesured using PULSE.EXE
Program is told to .. Warp 3 Warp 4
Release OS/2 time slices 3% 100%
Not release any time slices 50% 50%
Actually, giving a time slice back to OS/2 V4 makes the
situation WORSE than not giving a time slice at all!
There seem to be further problems with the DOS emulation idle
time detection. For example, on my system the TSR program
DOSKEY, which is included in OS/2 MDOS, causes 100% system load
*SOMETIMES*. This is not reproducable at will, but happens
casually and is very worrying.
A TECHNICAL PROBLEM DESCRIPTION
There are four methods of releasing time slices under OS/2:
Method Works with Warp 3 Works with Warp 4
a) "Generic DOS pause" yes yes (?)
INT 28h
b) "DPMI time slice" yes no
MOV AX,1680H
INT 2FH
c) "BIOS delay" yes yes
CX:DX=my_secs
MOV AH,86H
INT 15h
d) "CPU halt"
DX:AX=msecs yes no
STI
HLT
DB 035h, 0CAh
From these four methods, number b) and d) are the most
frequently used methods, so that about 80% of all OS/2-aware
DOS-software is nearly inoperable under Warp 4.
We have done some debugging of the OS/2 Warp 4 kernel and have
found the reason why INT 2F does not work any more:
OS/2 Versions prior to OS/2 Warp 4 were handling INT 2F /
AX=1680h via the 16 bit doskrnl. The 16 bit doskrnl contains a
routine to trap the INT 2F / AX=1680h and release the DOS tasks
processor time slice. In Warp 4, the 16 bit doskrnl still
conatins this code and it is still operable - but the INT 2F /
AX=1680h never reaches the doskrnl at all.
In Warp 4, INT 2F / AX=1680h is trapped by the MVDM before it
ever reaches doskrnl, and MVDM seams to be inable to handle it
correctly. Now there are two possible solutions to the
problem, namedly either make MVDM process the time slice call
correctly - or disable MVDM from handling the time slice call
at all, so that it can reach the routines in doskrnl which are
still operable.
As a first workaround, we have chosen the latter method:
A KERNEL PATCH TO SOLVE (WORKROUND)THE PROBLEM
The following kernel patch is valid for OS/2 Warp XR4000 and
XRG4000 service levels. We have not debugged the Fixpack #1
kernel yet, so I cannot estimate if it is valid for this kernel
as well (I only know that the Fixpack #1 kernel has still the
problem ...).
Service level XR4000 or XRG4000, File OS2KRNL (located in the
root dir of the installation drive): For Revision 9.023 (Warp
4 w/o fixes) at offset 67C2Eh for Revision 9.025 (Warp 4 w/
fixpack #1) at offset 67D73, change the following six bytes:
66 25 80 00 74 45 (old)
as follows:
66 3D 80 00 7E 45 (new)
This will stop MVDM from processing INT 2F/AX=1680 and voila,
time slices work again. - What this patch does is also
illustrated by the following disassembly of the OS2KNRL:
F8C6B4 push ebp
F8C6B5 mov ebp, esp
F8C6B7 push ebx
F8C6B8 push esi
F8C6B9 mov ebx, [ebp+8]
F8C6BC cmp byte ptr [ebx+1Dh], 16h
F8C6C0 jnz loc_FFF8C6EA
F8C6C2 movzx esi, byte ptr [ebx+1Ch]
F8C6C6 mov eax, esi
F8C6C8 mov ecx, eax
F8C6CA and ax, 80h
-->CHANGED: cmp ax, 80h
F8C6CE jz loc_FFF8C715
-->CHANGED: jle loc_FFF8C715
F8C6D0 cmp ecx, 8Ah ;
F8C6D6 ja loc_FFF8C715
F8C6D8 and ecx, 0FFFFFF7Fh
F8C6DE mov esi, ecx
F8C6E0 push ebx
F8C6E1 call dword_FFF14C50[ecx*4]
F8C6E8 jmp short loc_FFF8C717
F8C6EA
F8C6EA
F8C6EA loc_FFF8C6EA:
F8C6EA cmp word ptr [ebx+1Ch], 4010h
F8C6F0 jnz loc_FFF8C705
F8C6F2 mov word ptr [ebx+1Ch], 0
F8C6F8 mov word ptr [ebx+10h], 1428h
F8C6FE mov eax, 1
F8C703 jmp short loc_FFF8C717
F8C705
F8C705 loc_FFF8C705:
F8C705 cmp word ptr [ebx+1Ch], 4011h
F8C70B jnz loc_FFF8C715
F8C70D push ebx
F8C70E call loc_FFF94691
F8C713 jmp short loc_FFF8C717
F8C715
F8C715 loc_FFF8C715:
F8C715 sub eax, eax
F8C717
F8C717 loc_FFF8C717:
F8C717 pop esi
F8C718 pop ebx
F8C719 leave
F8C71A retn 4
HOW TO REPRODUCE THE PROBLEM
The problem can be and has been reproduced on all existing
versions of Warp 4 (Beta, Gamma, German GA, US American GA)
virtually independent of installed hard- or software
components. Using the following method, I was able to
reproduce it on *any* OS/2 Warp 4 system I have worked with
since.
Enter the following lines on the MDOS prompt and do not omit
the empty lines:
==begin==
debug
a100
mov ax,1680
int 2f
jmp 100
rcx
7
nloop.com
w
q
==end==
This creates a little program LOOP.COM. This program is just
an endless loop which does nothing more than to continuosly
release all of its processor time back to OS/2 via the INT 2F
call. Consequently, this program should not cause any visible
system load.
Start PULSE.EXE. Then run LOOP.COM in a windowed or full
screen DOS session with standard settings (IDLE_SENSITIVITY=70,
IDLE_SECONDS=0) and watch the Pulse. You will see that
LOOP.COM does not produce any system load under Warp 3, while
it produces, depending on the system, between 40% and 100%
under Warp 4. (Note that you will not see anything when
running the program except for the pulse change. You have to
terminate it by closing the DOS window).
ANNOTATIONS
Note: It is evident that, while LOOP.COM drives system load
display to 100%, you will not perceive relevant impact on
overall system performance. LOOP.COM is just a DEMO program
for the problem which does nothing more than producing the bug.
But imagine a full-grown program with a main loop like that:
- Poll I/O-Port
- Poll Keyboard
- Check Harddisk for file semaphores
- If OS/2 then give up timeslice
else do nothing for 1/10 sec
- Redo from start
With the Warp 4 problem, this program will poll the hardware 10
times as often as with Warp 3. This causes highly perceptible
effects on overall system load.
Note 2: Of course you can pull "LOOP"'s system load down using
very low values of IDLE_SENSITIVITY. But you can't do this
with a full grown BBS software. Meaning, of course you can,
but then the BBS will not respond to user input within a
reasonable time any more.
[EOF]