home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.unix.programmer:4308 comp.unix.internals:1692
- Path: sparky!uunet!dtix!darwin.sura.net!mips!sdd.hp.com!caen!hellgate.utah.edu!fcom.cc.utah.edu!gateway.univel.com!ns.Novell.COM!gateway.novell.com!ithaca.Eng.Sandy.Novell.COM!terry
- From: terry@ithaca.Eng.Sandy.Novell.COM (Terry Lambert)
- Newsgroups: comp.unix.programmer,comp.unix.internals
- Subject: Re: SVR4 semaphore overhead???
- Message-ID: <1992Aug14.222159.12495@gateway.novell.com>
- Date: 14 Aug 92 22:21:59 GMT
- References: <1992Aug8.131020.17966@ttsi.lonestar.org> <1992Aug13.020203.7490@ttsi.lonestar.org>
- Sender: news@gateway.novell.com (NetNews)
- Organization: Novell NPD -- Sandy, UT
- Lines: 149
- Nntp-Posting-Host: ithaca.eng.sandy.novell.com
-
- In article <1992Aug13.020203.7490@ttsi.lonestar.org> mse@ttsi.lonestar.org (Mark Evans) writes:
- >In article <1992Aug8.131020.17966@ttsi.lonestar.org> I wrote:
- >>I have seen postings from time to time regarding the overhead of
- >>semaphores in SVR4, but not many followups. I'm now in the boat
- >>of considering the usage of semaphores for mutual exclusion and
- >>resource coordination between a substantial number (~30) of
- >>processes. There is a potential for a large number of semaphores
- >>(60-90) to get allocated.
- >>
- >>Someone recently posted an observation that as the number of
- >>semaphores in a system grows, so does the overhead. Can anyone
- >>confirm this?
- >
- >On a Tandem Integrity R3000 machine running SVR4, I found no
- >degradation in performance on a given semaphore as the number of
- >semaphores in a set increased from 1 to 25 (current sysgen'd max).
- >I am chagrined, however, at the 70 microseconds or so taken to
- >do a simple semop() lock, as compared to the .82 microseconds it
- >takes to call _test_and_set(), a mips-supplied routine in
- >libc.a which performs an indivisible test-and-set (actually, it's
- >read-modify-write). As suggested in previous articles and by
- >respondents to my original post, I am cooking up a
- >lightweight semaphore package using _test_and_test() for the
- >lock, and system semaphores (or message queue or pause/signal)
- >for sleep/wakeup when contention arises.
- >
- >A couple of questions have come up that require expert knowledge:
- >
- >(1) Is there a way for a process to reschedule itself, i.e., go to
- >the end of the current ready-to-run list? I don't want to resort
- >to a sleep_ms() call if I can avoid it.
-
- Under SVR4, there is no way to do this (I will assume that you
- are talking about SVR4 due to the Subject: line) because the scheduler
- code has been intentionally seperated. You would have to be able to
- determine the order in which something is removed from the ready-to-run
- queue and placed on the run queue, and you can't do that contextually.
-
- The Tandem _test_and_set() uses a hardware mechanism (precisely
- like that one available on VAXen), and as such is not relative to your
- lbolt value, but is instead relative to your system clock. Semaphores
- and other consumers of CPU quantums are in terms of lbolt.
-
- >(2) When multiple processes contend for a system semaphore, are
- >they rewarded in FIFO order as the semaphore becomes available?
- >I have gotten the impression from previous testing that this is not so.
- >Can anyone explain the data structures & algorithm used behind the
- >scenes?
-
- No, they are not. A wakeup is issued on the address, which moves all
- the processes sleeping on it to the ready_to_run queue. The scheduler
- then picks one to run at the next timer interrupt. This may not be
- the process who was waiting the longest (although the longer a process
- is waiting, the higher it's priority in the queue), and so, yes, you
- can get starvation. So what you believe is correct: Semaphore requests
- are not queued.
-
- Now when the processes awaken, each attempts to grab the semaphore; the
- first one returns to user land; the rest go back to sleep. This is
- called "the thundering herd" problem, and has been at least partially
- resolved in Sequent's OS, but a fat lot of good that'll do you.
-
- >Writing this simple lock/unlock package at the user level without
- >resorting to priority controls is surprisingly mind-boggling if you
- >consider all the races. I'm on my fourth "I've got it" approach.
- >When I really do "have it," I'll happily share the results to the
- >many who have expressed interest in the subject. If you think
- >you have a solution, I know many out here would love to see it.
-
- One soloution is the creation of a system call. You can do this on SVR4
- even though their equivalent of the init_systent.c is pre-linked into
- os.o and is thus not user-replacable. The way you do this is to go
- through the table looking for the nosys() function pointers, and replace
- that entry. The reason you use a driver is to allow you a known query
- point (via ioctl()) to find out which entry to use "syscall()" with,
- and to provide system startup initialization of the table entry and any
- data structures.
-
- I have done this in two different instances to provide a FIFOed semaphore
- mechanism with low overhead.
-
- Your single system call is then divided, via it's first parameter, into
- multiple libarary entry points for "lws_init()" to init the semaphores
- by returning you the sytem call to use; lws_init() then uses this entry
- point to make a syscall() on your behalf with the number semaphores you
- will need. Another ioctl is made to indicate to the driver what needs
- to be cleaned up on close by this process, and then the fd is left open
- to allow you to resource-track close without specific deinitialization
- which might be defeated by a core dump or abnormal exit.
-
- A call "lws_create()" returns the index of an entry in your table of
- semaphores allocated during the init or -1.
-
- A call "lws_release()" returns the index to the unallocated pool.
-
- A call "lws_acquire()" acquires the given index. This will sleep on the
- address of the semaphore requested (ie: &my_sems[ idx]) in the kernel if
- the semaphore is currently allocated. If the semaphore is allocated, the
- pid of the allocator is stored in the semaphore struct as well so that
- resource track cleanup of the process can wake up sleepers so that the
- semaphore in question is not deadlocked.
-
- A call to "lws_free()" frees the given index, and does a wakeup on the
- associated elements address so that the next requestor can proceed.
-
-
- order:
-
- lws_init()
- [Any number of:]
- lws_create()
-
- [Any number of:]
- lws_acquire()
- lws_free()
-
- lws_release()
-
- [Resource track cleanup by close of fd on process exit]
-
-
- FIFO order can be enforced by either ignoring "the thundering herd" problem
- and keeping an outstanding count of requests per semaphore, wherein each
- attempted acquirer copies this value on initial attempt and decrements it
- each time it wakes up. If the number is !0, it goes back to sleep;
- otherwise, it owns it. This relies on kernel serialization of system
- calls, so would require a mutex on AIX/Solaris 2.0. If the number is
- 0, the mutex is acquired, and the call returns to the requestor. This
- has the effect of guaranteeing FIFO order.
-
- The problem of waking up only the one who is next in line for the semaphore
- can be resolved by allocation of more kernel memory for "queued request
- elements", and sleeping on the address of your queue element rather than
- the semaphore list entry (which becomes a queue head pointer). Release
- removes the element of the releasor from the queue, and wakes up the next
- entry. This has the problem of requiring kernel memory allocation/dealloc
- on request/release (unless such memory is statically allocated), and
- complicates both the release an resource track cleanup by requiring an
- additional pointer and link list for all queue entries for the necessary
- reverse lookup.
-
-
- Terry Lambert
- terry_lambert@gateway.novell.com
- terry@icarus.weber.edu
-
- ---
- Disclaimer: Any opinions in this posting are my own and not those of
- my present or previous employers.
-