home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.os.linux
- Path: sparky!uunet!snorkelwacker.mit.edu!bloom-picayune.mit.edu!daemon
- From: tytso@ATHENA.MIT.EDU (Theodore Ts'o)
- Subject: Re: 0.97p2
- Message-ID: <1992Aug29.175510.25324@athena.mit.edu>
- Sender: daemon@athena.mit.edu (Mr Background)
- Reply-To: tytso@ATHENA.MIT.EDU (Theodore Ts'o)
- Organization: The Internet
- Date: Sat, 29 Aug 1992 17:55:10 GMT
- Lines: 92
-
- From: torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds)
- Date: 28 Aug 92 23:05:26 GMT
-
- Hmm. I'd like to hear more about the problem - especially if you can
- pinpoint it more closely (ie having used 0.97, 0.97.pl1 and now pl2) to
- a specific patch. As most people said 0.97.pl1 was fast, I'm assuming
- it's specific to patch2, but I'd like to have some confirmation before I
- start looking into the problem.
-
- Whether or not a particular version is fast is probably dependent on how
- much memory you have. I ran a controlled series of tests by running
- 0.96c, 0.97, 0.97pl1, and 0.97pl2 on my 40 MhZ 386 machine (with 16meg
- memory), I noticed no appreciable difference in times:
-
- Ver. Time to compile the stock 0.97
- kernel after doing a "make clean"
-
- 0.96c 9:35 (*)
- 9:33
- 9:34
-
- 0.97 10:21 (*)
- 9:41
-
- 0.97pl1 10:10 (*)
- 9:45
- 9:32
-
- 0.97pl2 10:36 (*)
- 9:25
- 9:30
- 10:11 (*)
- 9:41
-
- All of these times were measured by doing (date;make;date) >& MAKELOG
- and then measuring the difference between the first and second time.
- The only processes that were running on the machine other than the
- compile was the X server and a single xterm. The (*) times indicate the
- first compile after a reboot; the (*) times are higher because the
- buffer cache hasn't been primed yet.
-
- So at least if you have a lot of memory, there is no appreciable
- difference between 0.96c, 0.97, 0.97p1, and 0.97p2. If I had to make a
- guess, I would guess that the problem happens on machines with less
- memory --- say, 4 or 8 megabytes, and I further guess that it might be
- related to the buffer changes. It could very well be that the poeple
- who said that 0.97pl1 was fast were running with a lot of memory.
-
- If it's patch2, the problem is probably the changed mm code: having
- different page tables for each process might be costlier than I thought.
- The old (pre-0.97.pl2) mm was very simple and efficient - TLB flushes
- happened reasonably seldom. With the new mm, the TLB gets flushed at
- every task-switch (not due to any explicit flushing code, but just
- because that's how the 386 does things when tasks have different
- cr3's).
-
- I don't think the TLB cache flush would be much of a problem. Consider:
- There are 32 entries in the TLB, and if you reference a page which is
- not in the TLB, you pay a penalty of between 0 and 5 cycles. So the
- maximum penalty you incur by flushing the TLB is 5x32 or 160 cycles. If
- you further assume the worst case that you are switching contexts every
- tick of the 100hz clock, then you will flushing the TLB 100 times a
- second, or taking a penalty of 16,000 cycles/second. On a 16MHz machine,
- there are 16 x 10**6 cycles/second. So the worst case extra time
- incurred by flushing the TLB is (16 x 10**3) / (16 x 10**6) == 10**-3,
- or an overhead of 0.1%. On a 40MHz machine, this overhead declines to
- 0.04%.
-
- Now, these times do assume that the page table/directories haven't
- gotten paged out to disk. Since each process must now have at least one
- page directory and two page tables (one for low memory and one for the
- stack segment in high memory), if you assume a 2 meg system has 8-9
- processes running, 24 4k pages, or 10% of its user memory is being used
- to hold the page tables/directories. This has two effects; the first is
- to increase the memory usage, which may increase thrashing. The second
- is that if these pages get swapped out, the kernel will have to bring
- them in again the moment that process starts executing again, since the
- TLB cache will be empty.
-
- I can optimize things a bit - it's reasonably easy to fake away some of
- the TLB flushes by simply forcing the idle task to always use the same
- cr3 as the last task did (as the idle task runs only in kernel memory,
- and kernel memory is the same for all processes). So, I'd be interested
- to hear if this simple patch speeds linux up at all:
-
- Given my back of the envelope calculations above, I would be doubtful if
- this patch speeds up Linux by any appreciable amount. And any speed
- improvement will probably be taken up by the extra time to do the
- extra check in the scheduler. But this is only a theoretical guess;
- someone should probably gather experimental evidence to make sure.
-
- - Ted
-