ftp.cse.unsw.edu.au

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cse.unsw.edu.au / 2014.06.ftp.cse.unsw.edu.au.tar / ftp.cse.unsw.edu.au / pub / doc / papers / misc / cs.toronto.edu:programming / c-news.tbl.ms < prev next >

Wrap

Text File | 1992-10-18 | 35.0 KB | 1,162 lines

.\" tbl newspaper | troff -ms | dcat .\" tbl newspaper | nroff -ms | col .\" Formatting specs: paragraph space, no header, page number in centered .\" footer, 6-inch text centered on 8.5 page, level-1 headers tight spacing. .SE PB 0.6 .SE CH .SE CF % .SE FPH .SE LL 6i .SE PO 1.11i .HS 1 NUM L B 1 0 NOPAGE 5 4 .\" Macro for footnote-marker planting. .de ZF .ie !'\\$2'*' \\$1\\s-2\\v'-0.1c'\\$2\\v'0.1c'\\s0\\$3 .el \\$1\\s-2\\$2\\s0\\$3 .. .TL .ft HB .ps 14 News Need Not Be Slow .ps .ft .AU .ft I Geoff Collyer .ft .AI .\" Computing Services .ft I .ZF "Department of Statistics" \fR*\fI .\" Stupid macros won't allow a footnote here... moved it down. University of Toronto utzoo!utcsri!utfraser!geoff .ft .AU .ft I Henry Spencer .ft .AI .ft I Zoology Computer Systems University of Toronto utzoo!henry .ft .AB C news is a re-write, from scratch, of the `transport layer' of the Usenet software. C rnews runs at over 19 times the speed of B rnews; C expire runs in minutes rather than the hours taken by B expire. These performance improvements were (mostly) quite simple and straightforward, and they exemplify general principles of performance tuning. .AE .NH 1 .ft HB History and Motivation .PP In the beginning (of Usenet) (1979) was A news, .FS * Work done mostly while at U of T Computing Services. .FE written at Duke University by Steve Bellovin, Stephen Daniel, Tom Truscott and others. A single program, .I news , received, relayed, perused and cleaned out news articles. All articles were stored in a single .ZF \s-2UNIX\s0 \(dg .FS \(dg UNIX is a registered trademark of AT&T. .FE directory, which made A news suitable for local news and low volumes of network news. News articles were exchanged using a simple message format in which the first five lines of a message contained the information nowadays found in the article headers: unique article-id, newsgroup(s), author, subject, date posted. .PP As Usenet began to grow (1981), people in and around the University of California at Berkeley, including Matt Glickman and Mark Horton, modified A news extensively. The articles of each newsgroup were now stored in a separate directory. The message format was changed from the rigid and inextensible A news header format to one conforming to ARPA RFC 822 (the current ARPA mail-message format standard). .I News was broken into separate programs: \fIreadnews\fR, \fIinews\fR (aka \fIrnews\fR), and \fIexpire\fR. The authors dubbed the result ``B news''. Since the release of B news, it has replaced A news .ZF almost \(dd everywhere on Usenet. .FS \(dd AT&T Bell Laboratories Research still runs A news for local newsgroups. .FE .PP It soon became clear that sending individual articles from machine to machine as separate .I uucp transactions was unacceptably slow, in part because it produced large .I uucp spool directories, which are searched quite .ZF slowly * by the kernel. .FS * Recent \fIuucp\fRs (notably Honey DanBer) provide spool sub-directories, and recent 4BSD (4.3BSD and later) kernels provide linear (as opposed to quadratic) directory searching, both of which help this problem. .FE Sites began to .I batch articles into batches of (typically) 50,000\-100,000 bytes for transmission to other machines. .PP At about this time, B news was changed to file news articles in a tree, as (for example) /usr/spool/news/net/women/only/123, rather than as /usr/spool/news/net.women.only/123. The motive for this was primarily elimination of problems with long newsgroup names, but shortening directories (and thus speeding searches) was also a consideration. .PP As Usenet traffic continued to grow explosively, sites began to use data compression on news batches. The main objective was to reduce expensive long-distance phone time, but again performance improved a bit: the extra processor time used for compression and decompression was more than gained back by the reduction in processor time used by .I uucp itself. .PP Unfortunately, B news has been modified by many people since 1981, and has mutated repeatedly to match the changing nature of Usenet. It has become complex, slow, and difficult to maintain. .PP During 1985, we observed that the nightly arrival of new news and expiry of old news were consuming resources far out of proportion to the volume of data .ZF involved \(dg . .FS \(dg Never mind the cost/benefit ratio. .FE .I Expire often ran for 90 minutes, and .I rnews processing averaged 10 seconds or more per article. Both programs tended to cripple systems by performing much disk i/o and eating much system-mode CPU time. .B Utcs was running B 2.10.1 news then and .B utzoo was running B 2.10 news. Although newer B news releases were available, they offered little beyond B 2.10, and it was often necessary to regression-test new B news releases to verify that reported, published bug fixes had in fact been applied. .PP Spencer acted first and rewrote .I expire from the ground up. Though it initially lacked any form of selective expiry, this .I expire , when run each night, finished in about 15 minutes. (This was on 750-class machines holding all Usenet news and expiring after 14 days.) .PP Collyer observed in November 1985 that B .I rnews , upon receiving a batch of news, immediately \fIexec\fRed a trivial unbatcher which copied each article into a temporary file and then forked and \fIexec\fRed B rnews again. Such a technique is clearly overkill for articles averaging about 3,000 bytes each. Preliminary experiments failed to produce a modified B rnews that could unravel a batch without forking. Consultation with Rick Adams, the current B-news maintainer, revealed that this same technique remained in the upcoming B news release (variously B 2.10.3 or B 2.11). Within one .ZF week \(dd , .FS \(dd 40 hours, Collyer didn't have to work hard. .FE a from-scratch C .I rnews prototype was working well enough to run experimentally on a `leaf' machine receiving a subset of news. .PP This prototype version lacked a good many necessary amenities, and over the next eight months it was enhanced to bring it up to full functionality. It was also tuned heavily to improve its performance, since it was faster than B .I rnews but still not fast enough to make us happy. .PP Once the .I rnews newsgroup name matching routines were working, Spencer revised .I expire to add selective expiry, specified in a control file. Recently, we have also revised our old batcher heavily, largely to add capability but with an eye on performance. .NH 1 .ft HB Rnews Performance .PP The basic objective of C news was simpler code and higher performance. This may sound trite, but note that performance was an explicit objective. That was important. \fIPrograms will seldom run faster unless you care about making them run faster.\fR .PP `Faster' implies comparison to a slower version. Knowing the value of improvements, and assessing this in relation to their cost, requires knowing the performance of the unimproved version. Collyer kept detailed records of his work on .I rnews , so he could see how much progress he was making. See the Appendix for the final result. \fITo know how to get somewhere, you must know where you are starting from.\fR .PP The first functional C .I rnews ran at about 3 times the speed of B .I rnews . We had assumed that merely eliminating the fork/exec on each article would give a factor of 10 improvement, so this was disappointing. \fIAvoiding obvious performance disasters helps... but it's not always enough.\fR .PP Profiling, first with .I prof (1) and later with 4.2BSD's .I gprof (1), and rewriting of the bottlenecks thus discovered, eventually brought the speed up to over 19 times the speed of B .I rnews . This required a number of write-profile-study-rewrite cycles. There is undoubtedly still a lot of code which could be faster than it is, but since profiling shows that it doesn't have a significant impact on overall performance, who cares? \fITo locate performance problems, look through the eyes of thy profiler.\fR .PP Collyer first experimented with using .I read and .I write system calls instead of .I fread and .I fwrite , and got a substantial saving. Though the usage of system calls in this experiment was unportable, the saving eventually lead him to rewrite .I fread and .I fwrite from scratch to reduce the per-byte overheads. This helped noticeably, since pre-System-V .I fread and .I fwrite are really quite inefficient. \fIIf thy library function offends thee, pluck it out and fix it.\fR .PP At the time, C .I rnews was doing fairly fine-grain locking, essentially locking each file independently on each use. News doesn't need the resulting potential concurrency, especially when .I rnews runs relatively quickly, and the locking was clearly a substantial fraction of the execution time. C .I rnews was changed to use B-news compatible locking, with a single lock for the news system as a whole. \fISimplicity and speed often go together.\fR .PP When sending articles to a site using batching, .I rnews just appends the filename of each article to a \fIbatch file\fR for that site. The batch file is later processed by the batcher. In principle, batching is an option, and different sites may get different sets of newsgroups. In practice, few articles are ever sent unbatched, and most articles go to all sites fed by a given system. This means that .I rnews is repeatedly appending lines to the same set of batch files. \%Noticing this, Collyer changed C .I rnews to keep these files open, rather than re-opening them for every .ZF article * . .FS * The price for this tactic is that the code has to be prepared for the possibility that the number of sites being fed exceeds the supply of file descriptors. Fortunately, that is rare. .FE \fIOnce you've got it, hang onto it.\fR .PP These two simple changes\(emcoarser locking and retaining open files\(emcut system time by about 20% and real time by still more. .PP On return from Christmas holidays, after considerable agonizing over performance issues, Collyer turned some small, heavily-used character-handling functions into macros. This reduced user-mode time quite a bit. \fIA function call is an expensive way to perform a small, quick task.\fR .PP .I Rnews was always looking up files by full pathnames. Changing it to .I chdir to the right place and use relative names thereafter reduced system time substantially. \fIAbsolute pathnames are convenient but not cheap.\fR .PP Studying the profiling data revealed that .I rnews was spending a lot of time re-re-re-reading the .I sys and .I active files. These files are needed for processing every article, and they are not large. Collyer modified .I rnews to simply read these files in once and keep them in core. This change alone cut system time and real time by roughly 30%. \fIAgain, once you've got it, don't throw it away!\fR .PP There is a more subtle point here, as well. When these files were re-read every time, they were generally processed a line at a time. The revised strategy was to \fIstat\fR the file to determine its size, \fImalloc\fR enough space for the whole file, and bring it in with a single \fIread\fR. This is a vastly more efficient way to read a file! \fITasks which can be done in one operation should be.\fR .PP At this point (mid-January 1986), C .I rnews was faster than B .I rnews by one order of magnitude, and there was much rejoicing. .PP In principle, the `Newsgroups:' header line, determining what directories the article will be filed in, can be arbitrarily far from the start of the article. In practice, it is almost always found within the first thousand bytes or so. By complicating rnews substantially, it became possible in most cases to .I creat the file in the right place (or the first of the right places) in .I /usr/spool/news before writing any of the article to disk, eliminating the need for temporary files or even temporary links. The improvement in system time was noticeable, and the improvement in user time was even more noticeable. \fIPrepare for the worst case, but optimize for the typical case.\fR .PP There are certain circumstances, notably control-message articles, in which it is necessary to re-read the article after filing it. .I Rnews originally re-opened the article to permit this. Changing the invocation of .I fopen to use the \fBw+\fR mode made it possible to just seek back to the beginning instead, which is .I much faster. This, plus some similar elimination of other redundant calls to .I open , reduced system time by over 30%. \fIGet as much mileage as possible out of each call to the kernel.\fR .PP Both scanning the in-core .I active and .I sys files and re-writing the .I active file are simpler if the in-core copies are kept exactly as on disk, but this implied frequent scans to locate the ends of lines. It turned out to be worthwhile to pre-scan the .I active file for line boundaries, and remember them. \fIWhen storing files in an unstructured way, a little remembered information about their structure goes a long way in speeding up access.\fR .PP We already had a .I STREQ macro, just a simple invocation of .I strcmp , as a convenience. As a result of some other experience by Spencer, Collyer tried replacing some calls of .I strncmp by a .I STREQN macro, which compared the first character of the two strings in-line before incurring the overhead of calling .I strncmp . This sped things up noticeably, and later got propagated through more and more of the code. String-equality tests usually fail on the very first character. \fITest the water before taking the plunge.\fR .PP While looking at string functions, Collyer noticed that .I strncmp s to determine whether a line was a particular header line had the comparison length computed by applying .I strlen to the prototype header. With a little bit of work, the prototypes were isolated as individual character arrays initialized at compile time. This permitted substituting the compile-time .I sizeof operation for the run-time .I strlen . \fILet the compiler do the work when possible.\fR .PP At this point, profiling was turned off temporarily for speed tests. Profiling does impose some overhead. The speed trials showed that C .I rnews was now running at over 15 times the speed of B .I rnews . .PP After months of adding frills, bunting and B 2.11 .ZF compatibility * , .FS * And supposed B 2.11 compatibility, as those who remember the short-lived cross-posting restrictions will recall. .FE Collyer again returned to performance tuning in August 1986. The 4.2BSD kernel on .B utcs now included the 4.3BSD \fInamei\fR caches, which improve filename-lookup performance considerably. Unfortunately, considerations of crash recovery dictated some loss in performance: it seemed desirable to put batch-file additions out by the line rather than by the block. \fIPerformance is not everything.\fR .PP .I Gprof revealed that newsgroup name matching was an unexpected bottleneck, so that module was extensively tweaked by adding .B register declarations, turning functions into macros, applying .I STREQN and such more widely, and generally tuning the details of string operations. The code that handled .I sys -file lines got similar treatment next. The combination cut 40% off user-mode time. \fIPersistent tuning of key modules can yield large benefits.\fR .PP Newsgroup matching remained moderately costly, and an investigation of where it was being used revealed two separate tests for a particular special form of name. It proved awkward to combine the two, so the testing routine was changed to remember having done that particular test already. \fIIf the same question is asked repeatedly, memorize the answer.\fR .PP By this time, the number of system calls needed to process a single article could be counted on one's fingers, and their individual contributions could be assessed. At one point it was desirable for a .I creat to fail if the file already existed, so this was being checked with a call to .I access first. John Gilmore pointed out that on systems with a 3-argument .I open (4.2BSD, System V), this test can be folded into the .I open . The elimination of the extra name\(->file (\fInamei\fR) mapping cut both system time and real time by another 15%. (Note that this system \fIdoes\fR have \fInamei\fR cacheing!) \fIFile name lookups are expensive; minimize them.\fR .PP The development system (\fButcs\fP, a 750) is now filing 2-3 articles per second on average; .B utfraser (a Sun 3/160 with an Eagle disk) is typically filing 6-7 articles per second. C .I rnews runs over 19 times as fast in real time as B .I rnews , over 25 times as fast in system-mode CPU time, roughly 3.6 times as fast in user-mode CPU time, and over 10 times as fast in combined CPU times. .PP With one exception (see \fIFuture Directions\fR), it now appears that very little can be done to speed up .I rnews without changing the specifications. It seems to be executing nearly the bare minimum of system calls, and the user-level code has been hand-optimised fairly heavily. .NH 1 .ft HB Expire Performance .PP The rewrite of .I expire that started this whole effort was only partly motivated by performance problems. Performance was definitely bad enough to require attention, but the B .I expire of the time also had some serious bugs. Worse, the code was a terrible mess and was almost impossible to understand, never mind fix. Early efforts were directed mainly at producing a version that would \fIwork\fR; rewriting .I expire from scratch simply looked like the easiest route. Decisions made along the way, largely for other reasons, nevertheless produced major speedups. .PP The first of these decisions was a reduction in the scope of the program. B .I expire had several options for doing quite unrelated tasks, such as rebuilding news's history file. The code for these functions was substantial and was somewhat interwoven with the rest. C .I expire adheres closely to a central tenet of the `Unix Philosophy': \fIa program should do one task, and do it well\fR. This may appear unrelated to performance, but better-focussed programs are generally simpler and smaller, reducing their resource consumption and making performance tuning easier (and hence more likely). In addition, a multipurpose program almost always pays some performance penalty for its generality. .PP The second significant decision had the biggest effect on performance, despite being made for totally unrelated reasons. For each news article, the B news history file contained the arrival date and an indication of what newsgroups it was in. This is \fIalmost\fR all the information that \fIexpire\fR needs to decide whether to expire an article or not. The .ZF missing * data is whether the article contains an explicit expiry date, .FS * Recent versions of B news have made some attempt to redress this lack, but haven't gone as far as C expire. The discussion here applies to the B expire that was current at the time C expire was written. .FE and if so, what it is. B .I expire had to discover this for itself, which required opening the article and parsing its headers. A site which retains news for two weeks will have upwards of 5,000 articles on file. A few dozen of them will have explicit expiry dates. \fIBut B expire opened and scanned all 5,000+ articles every time it ran!\fR This was a performance disaster. .PP We actually did not want to parse headers in \fIexpire\fR at all, because the B news header-parsing code was (and is) complex and was known to contain major bugs. The performance implications of this were obvious, although secondary at the time. Header parsing is itself a non-trivial task, and accessing 5,000+ files simply cannot be made cheap. \fIInformation needed centrally should be kept centrally.\fR .PP The C news history file has the same format as that of B news, with one addition: a field recording the explicit expiry date, if any, of each article. If no expiry date is present in the article, the field contains `\-' as a .ZF placemarker \(dg . .FS \(dg It would be possible to simply compute a definitive expiry date for an article when it arrives, and record that. This would eliminate the decision-making overhead in \fIexpire\fR, but would greatly slow the response to changes in expiry policy. Since one reason to change policy is time-critical problems like a shortage of disk space, this loss of flexibility was judged unacceptable. It is better to leave the expiry decision to \fIexpire\fR and concentrate on making \fIexpire\fR do it quickly. .FE In this way, the header parsing is done \fIonce\fR per article, on arrival. In fact, the extra effort involved is essentially nil, since \fIrnews\fR does full header parsing at arrival time anyway. \fIRnews\fR had to be changed to write out the expiry date, and code which knew the format of the history file had to be changed to know about the extra field. Perhaps a dozen lines of code outside \fIexpire\fR were involved. .PP A crude first version of C .I expire , incorporating these decisions in the most minimal way, ran an order of magnitude faster than B .I expire . Precise timing comparisons were not practical at the time, since the original motive for C .I expire was that B .I expire had stopped working completely, crippled by bugs in its header parsing. Later versions of B .I expire did cure this problem, but we were no longer interested in putting up slow, buggy software just to make an accurate comparison. .PP Further work on C .I expire mostly concentrated on cleaning up the hasty first version, and on incorporating desired features such as selective expiry by newsgroup. Selective expiry caused a small loss in performance by requiring \fIexpire\fR to check the newsgroup(s) of each article against an expiry-control list. Here, \fIexpire\fR benefitted from the work done to speed up the newsgroup-matching primitives of \fIrnews\fR, since \fIexpire\fR uses the same routines. \fIIf you re-invent the square wheel, you will not benefit when somebody else rounds off the .ZF corners \fR\(dd\fI .\fR .FS \(dd A corollary of this is: \fIknow thy libraries, and use them\fR. .FE .PP One improvement that was made late in development was in the format of the dates stored in the history file. B .I rnews stored the arrival date in human-readable form, and \fIexpire\fR converted this into numeric form for comparisons of dates. Date conversion is a complex operation, and the widely-distributed \fIgetdate\fR function used by news is not fast. Inspection of the code established that \fIexpire\fR was the only program that ever looked at the dates in the history file. There is some potential use of the information for debugging, but this is infrequent, and a small program that converts decimal numeric dates to human-readable ones addresses the issue. Both C .I rnews and C .I expire now store the dates in decimal numeric form. \fIStore repeatedly-used information in a form that avoids expensive conversions.\fR .PP Actually, C .I expire bows to compatibility by accepting either form on input, but outputs only the decimal form as it regenerates the history file. Thus, in the worst case, \fIexpire\fR does the conversion only once for each history line, rather than once per line per run. \fI``If they hand you a lemon, make lemonade''.\fR .PP If \fIexpire\fR is archiving expired articles, it may need to create directories to hold them. This is an inherently expensive operation, but it is infrequently needed. However, checking to see whether it \fIis\fR in fact needed is also somewhat expensive... and the answer is almost always `no'. The same is true of checking to see whether the original article really still exists: it almost always does. (This cannot be subsumed under generic `archiving failed' error handling because a missing original is just an article that was cancelled, and does not call for a trouble report.) Accordingly, C .I expire just charges ahead and attempts to do the copying. Only if this fails does \fIexpire\fR analyze the situation in detail. \fICarrying a net in front of you in case you trip is usually wasted effort.\fR .PP Archiving expired articles often requires copying across filesystem boundaries, since it's not uncommon to give current news and archived news rather different treatment for space allocation and backups. Copying from one filesystem to another can involve major disk head movement if the two filesystems are on the same spindle. Since head movement is expensive, maximizing performance requires getting as much use as possible out of each .ZF movement * . .FS * As witness the progressive increase in filesystem block size that produced major performance improvements in successive versions of 4BSD. .FE \fIExpire\fR is not a large program, and even on a small machine it can spare the space for a large copying buffer. So it does its archiving copy operations using an 8KB buffer. \fIBuying in bulk is often cheaper.\fR Since 8KB accommodates most news articles in one gulp, there is little point in enlarging it further. \fIThe law of diminishing returns does apply to buying in bulk.\fR .PP Since .I expire is operating on the history file at (potentially) the same time that .I rnews is adding more articles to it, some form of locking is necessary. Given that .I expire has to look over the whole database of news, and typically has to expire a modest fraction of the articles, it is a relatively long-running process compared to .I rnews . Contention for the history-file lock can be minimized by noting that .I rnews never does anything other than append to the file. So .I expire can leave the file unlocked while scanning it; the contents will not change. When (and only when) .I expire reaches end-of-file, it locks the news system, checks for and handles any further entries arriving on the end of the history file meanwhile, and finishes up. \fILocking data that won't change is wasteful.\fR .PP After careful application of these various improvements, C .I expire is fast enough that further speedup is not worth much effort. However, an analysis of where it spends its time does suggest one area that might merit attention in the future. \fIExpire\fR rebuilds the history file to reflect the removal of expired articles. The history file is large. \fIExpire\fR must also rebuild the \fIdbm\fR indexing data base, since it contains offsets into the history file. This data base is comparable in size to the history file itself, and is generated in a less orderly manner that requires more disk accesses. .PP Much of the time needed for these operations could be eliminated if \fIexpire\fR could mark a history line as `expired' without changing its size. This could be done by writing into the history file rather than by rebuilding the whole file, and the indexing database would not need alteration. This would also permit retaining information about an article after the article itself expires, which would simplify rejecting articles that arrive again (due to loops in the network, etc.) after the original has expired. The history file should still be cleaned out, and the indexing database rebuilt, occasionally. C .I expire contains some preliminary `hooks' for this approach, but to date full implementation does not seem justified: C .I expire is already fast enough. \fIKnow when you are finished.\fR .NH 1 .ft HB Batcher Performance .PP The C batcher is descended from a very old version written to add some minor functionality that was not present in the B batcher of the time. It is small and straightforward, and contains only a couple of noteworthy performance hacks. .PP The batcher works from a list of filed articles, to be composed into batches. The list is by absolute pathname. All of these files reside in the same area of the system's directory tree, and referring to them with absolute pathnames every time implies repeatedly traversing the same initial pathname prefix. To avoid this, the batcher initially \fIchdir\fRs to a likely-looking place such as \fI/usr/spool/news\fR. Thereafter, before using an absolute pathname to open an article, it checks whether the beginning of the pathname is identical to the directory where it already resides. If so, it strips this prefix off the name before proceeding. \fIIf you walk the same road repeatedly, consider moving to the other end.\fR .PP The batcher's input is usually in fairly random order, with little tendency for successive files to be in the same directory. If this were not the case, it would be worthwhile for the batcher to actually move around in the directory tree to be closer to the next file. .PP The batcher used to copy data using .I putc(getc()) loops. This has been replaced by .I fread/fwrite which is significantly faster, especially if using the souped-up .I fread/fwrite mentioned earlier. \fIIf you need to move a mountain, use a bulldozer, not a teaspoon.\fR .NH 1 .ft HB Future Directions .PP The one improvement we are still considering for .I rnews is a radical revision of the newsgroup-matching strategy. Newsgroup matching still consumes about 18% of user-mode processor time. The key observation is that the information that determines which newsgroups go to which sites seldom changes. It would probably be worth precompiling a bit array indexed by newsgroup and site, and recompiling it only when the .I active file or the .I sys file changes in a relevant way. This would cut the newsgroup-matching time to essentially zero. .PP .I Rnews would be faster (and simpler) if `Newsgroups:' and `Control:' were required to be the first two headers (if present) of each article. At present .I rnews tries to find them before starting to write the article out, so that it can put the article in the right place from the start, but it has to allow for the possibility that vast volumes of other headers may precede them. .PP Hashing .I active -file lookups in .I rnews would be fun, but profiling suggests that it's not worthwhile unless the number of newsgroups is in the thousands. .PP When \s-2PDP-11\s+2's are truly dead on Usenet, the use of large per-process memories .I may allow further speedups to .I rnews by reading the entire batch into memory at once and writing each article to disk in a few .I writes (it can't easily be reduced to a single .I write because headers must be modified before filing). .PP One optimization we have \fInot\fR considered is re-coding key parts in assembler. C news already runs on five different types of machine. Use of assembler would be a maintenance nightmare, and probably would not yield benefits comparable to those of the more high-level changes. .NH 1 .ft HB Acknowledgements .PP Ian Darwin ran the very earliest alpha versions of .I rnews and gave helpful feedback. Mike Ghesquiere, Dennis Ferguson and others have run later versions and prodded Collyer to fix or implement assorted things. John Gilmore and Laura Creighton read and criticized an early alpha version of .I rnews . .NH 1 .ft HB Appendix: rnews Times .PP Measurements have been taken on a VAX 750 running 4.2BSD under generally light load, using a batch of 297,054 bytes of net.unix-wizards containing 171 articles and ~104 cross-postings. All times are in seconds per article. .LP .TS c1 c1 c1 c1 c1 c1 n1w(5n) n1w(5n) n1w(5n) l1w(3.5i) . time real user sys comments _ 85 Dec 6 00:54 4.68 0.3 1.29 T{ \fBB news\fP rejecting all. (b.1.rej) T} 85 Dec 6 00:54 3.184 0.69 0.67 T{ first timing trial; \fIprofiling on\fP (c.1) T} 85 Dec 6 00:54 0.66 0.175 0.199 T{ rejecting all (c.2.rej) T} 85 Dec 6 03:25 0.58 0.175 0.175 T{ still rejecting all (c.3.rej) T} 85 Dec 6 23:46 9.058 0.631 2.251 T{ \fBB news\fP using private directories, rejected 53 of the 171 articles as "too old" (b.2) T} 85 Dec 7 00:24 T{ 2.0 (est) T} - - T{ on a 10 MHz 68000 with slow memory and slow disk (crude timings) (c.darwin.1) T} 85 Dec 7 00:40 7.576 0.684 2.403 T{ \fBB news\fP without the "too old" reject code and having cleared out history (b.3) T} 85 Dec 7 04:43 1.99 0.49 0.53 T{ accepting the articles, using read and write for bulk copies (c.4) T} 85 Dec 7 06:10 T{ 2.261 (!) T} 0.497 0.449 T{ optimized by less locking & keeping batch files open (c.5) T} 85 Dec 7 07:32 1.383 0.491 0.414 T{ same as the last one, but with a lower load average (around 1.5) (c.6) T} 85 Dec 16 03:43 1.380 0.447 0.374 T{ for calibration after misc. cleanup (c.7, c.8) T} 86 Jan 13 00:23 1.232 0.349 0.301 T{ turned hostchar() into a macro (c.9) T} 86 Jan 13 04:26 1.36 0.333 T{ 0.242 (!) T} T{ using in-core active file, under heavy load (c.10) T} 86 Jan 13 08:24 1.94 0.349 0.253 T{ using in-core sys file too, under heavy load. Re-run this trial! (c.11) T} 86 Jan 13 08:42 T{ 0.892 (!) T} 0.332 0.245 T{ re-run at better nice. Not striking, except for real time. Was run in a large directory; ignore. (c.12) T} 86 Jan 13 08:59 T{ 0.861 (!) T} 0.333 T{ 0.212 (!) T} T{ re-run at good nice & in a small directory. Have beaten B news by \fIone order of magnitude\fP on real & sys times! Beat it by more than twice on user time. (c.13) T} 86 Jan 21 19:15 1.208 0.349 0.245 T{ creat 1st link under final name, only link to make cross-postings; with HDRMEMSIZ too small (c.14) T} 86 Jan 21 19:57 0.728 0.318 0.193 T{ previous mod, with HDRMEMSIZ of 4096 (c.15) T} 86 Jan 22 01:20 0.719 0.315 0.166 T{ fewer opens (just rewind the spool file), but Xref(s): not working (c.16) T} 86 Jan 22 01:53 T{ 0.637 (!) T} 0.314 T{ 0.154 (!) T} T{ fewer opens fixed to spell Xref: right; Xref: not working (c.17) T} 86 Jan 22 04:00 0.874 0.325 0.174 T{ fewer opens with Xref: fixed (times may be high due to calendar) (c.18) T} 86 Jan 22 05:45 0.694 0.309 0.159 T{ under lighter load, times are better (c.19) T} 86 Jan 24 04:29 0.715 0.317 T{ 0.129 (!) T} T{ turn creat & open into just creat, under slightly heavy load (c.20) T} 86 Jan 24 06:06 T{ 0.628 (!) T} T{ 0.288 (!) T} T{ 0.129 (!) T} T{ reduce number of calls on index (by noting line starts at the start) and strncmp (via macro) in active.mem.c, but still profiling and writing stdout and stderr to the tty (c.21) T} 86 Jan 24 07:22 0.653 T{ 0.209 (!) T} T{ 0.123 (!) T} T{ fewer strlen calls (by using \fBsizeof\fP s - 1), writing stdout to /dev/null and with \fIprofiling off\fP, but under moderate load; try again (c.23) T} 86 Jan 24 07:35 T{ 0.574 (!) T} T{ 0.216 (!) T} T{ 0.123 (!) T} T{ as last time, but stdout to tty(!) and under light load. \fIrunning 15.67 times as fast as B rnews\fP (c.24) T} 86 Aug 8 04:23 0.839 0.51 0.124 T{ performance hit: fflush after each history line for crash-resilience; run for \fIgprof\fP output and calibration with later runs. running under 4.2.1BSD (has 4.3 namei cache) now. real and user times are way up; due to gprof profiling? (c.25) T} 86 Aug 8 04:24 0.962 T{ 0.438 (!) T} 0.131 T{ run with faster ngmatch, with \fBregister\fP decl.s and wordmatch and STREQN macros; saved 15% user. User time is better than c.25, but still up from c.24. (c.26) T} 86 Aug 10 07:35 0.805 T{ 0.345 (!) T} 0.135 T{ further speedups: ngmatch has more \fBregister\fP decl.s and in-line index; more use of STREQ(N) macro for str(n)cmp in hdrmatch, ngmatch.c and transmit.c; faster ishdr without index. real & user times are better than both c.26 and c.25 (c.27) T} 86 Aug 11 04:19 1.012 T{ 0.303 (!) T} 0.146 T{ rewrote sys.c, used INDEX and STREQ(N) macros throughout rnews. real and sys times are up, but user continues to decline. (c.28) T} 86 Aug 12 03:51 1.315 0.315 0.154 T{ minor tweaks: all.all.ctl caching, etc. (c.29) T} 86 Aug 30 17:56 0.564 T{ 0.189 (!) T} T{ 0.112 (!) T} T{ light load, thought we had 3-arg open in fileart, but didn't. Odd. \fIStopped using gprof\fP. (c.30) T} 86 Aug 30 17:57 T{ 0.475 (!) T} 0.191 T{ 0.095 (!) T} T{ Really and truly use the 3-arg open. \fI19 times B rnews speed.\fP (c.31) T} .TE