home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.parallel
- Path: sparky!uunet!gumby!destroyer!gatech!hubcap!fpst
- From: cap@ifi.unizh.ch (Clemens Cap)
- Subject: Re: Linda / The Parform
- Message-ID: <1992Dec23.220706.8281@ifi.unizh.ch>
- Followup-To: poster
- Sender: cap@ifi.unizh.ch
- Organization: University of Zurich, Department of Computer Science
- References: <1992Dec21.132725.23905@hubcap.clemson.edu>
- Date: Wed, 23 Dec 92 22:07:06 GMT
- Approved: parallel@hubcap.clemson.edu
- Lines: 153
-
-
- Some days ago, Volker Strumpen and I posted a message to inform all interested
- scientists who had followed that thread on the final results of our measurements
- with parallel programming platforms. There was a reply by Steven Ericsson who
- obviously did not know, that we had spent several months with getting all
- details of those measurements right and that some 80 emails were exchanged with
- a number of computer science departments to obtain fair and correct data
- under reproduceable conditions.
-
- Of course we did a larger number of measurements and made extensive
- statistics, took care of compiler versions and options, carefully chose our
- benchmarking problem, discussed those issues with people who are longer in
- the field doing better research than we do, of coures we gave a number of talks
- on the subject and spent days discussing the issue, of course we know, why we
- obtained superlinear speed up, of course...
-
- Well, I guess we should have mentioned this more explicitely in our posting.
- Nevertheless we mentioned our old report, where some of the open questions are
- adressed, and we also mentioned a forthcoming publication, which will hopefully
- answer the questions that still remained unanswered in the old version. Following
- standard etiquette we should be able to provide a precise reference for the new
- version soon, but we will not compromise journal copyright by downloading ftp versions.
- We are aware of some aspects which are not adressed in the best way in
- the old report, these problems should be solved in the publication.
-
- Usenet groups are an interesting medium for discussion. Thank you to all collegues that
- responded to our original posting some months ago and that shared their experiences
- and their knowledge with us via email. This significantly helped us with our research.
- Occasionally postings by Mr. This_is_all_wrong_since_I_know_it_better join the thread
- providing for some late night amusement in front of the workstation. Therefore also
- these postings help one's research.
-
- In article <1992Dec21.132725.23905@hubcap.clemson.edu> Steven Ericsson Zenith <zenith@kai.com> writes:
-
- >I find these numbers difficult to believe.
-
- Initially we found those numbers hard to believe, too. It's just what we kept
- measuring.
-
- >Firstly, bells start ringing for the single processor case. Why?
- >Because they are all the same and to my knowledge these systems don't
- >all use the same compiler. I expect to see some variation.
- >No indication is given of what the 1 processor time means - is this the
- >sequential execution time under the respective system compiler? It
- >should be. The above numbers can only begin to make sense if the base
- >compiler *is* the same in all cases - otherwise we do not know what we
- >are comparing.
-
- As written in our report, all platforms use the C language and C compiler
- with identical compiler options, apart of course of system specific enhancements.
- For the single processor case it is easy to design a conditional statement, which
- skips the communication part of the program.
- Therefore it is reasonable to expect identical execution times.
-
- >I would like to see the superlinear speed up explained. It's difficult
- >to assess without a detailed description of the hardware, operating
- >system infrastructure, etc..
-
- This is correct and therefore there is a long list of used equipment including
- the versions of all platforms in the forthcoming publication.
- There are several plausible reasons for superlinear speedup (cache, swapping and
- paging). I will not adress this in detail here, it can be found in the report and
- the publication. Furthermore it has been measured and confirmed by a number of additional
- test not explicitely mentioned in the paper and the publication.
-
- >Since this problem is obviously more than embarrassingly parallel I'd
- >like to know how much, if any, of the interaction mechanism was used
- >during computation. If the answer is, as I suspect, that after the data
- >and work distribution, insignificant interaction took place then the
- >above tells us something about the parallel decomposition of the problem
- >but sweet Fanny Adams about any of the systems tested.
-
- It might be interesting to know, that the insignificant interaction, as described
- in our old and new report, was as low as 6MBit per second through a standard 10Mbitps
- Ethernet. Usually 3MBit per second already is considered as a highly loaded
- Ethernet. Please inform Fanny Adams about that.
-
- Since we wanted to study dynamic load balancing, we needed a problem with tight subtask
- synchronisation. We furthermore wanted to evaluate the network bandwidth. A precise
- evaluation of overheads is only possible, when you know the speedup the application
- should have without any communication and administration. For this and other reasons
- we chose an explicit heat equation solver. A choice upon which the director of a
- major national computing center hosting also Cray equipment commented "Just the right
- problem for those goals." Of course we know, that this approach to the heat equation
- has some numerical and physical problems. But it provides loads of the type we wanted to study.
-
-
- >What was measured here? Does the clock start before or after
- >distribution of the data set and processes?
-
- We measured wall clock time on a dedicated system measuring the computational phase
- of the problem. Our old report mentions the general setup of the experiment, the
- publication even reports the precise type of every single employed machine. Data
- distribution consisted of distributing initial conditions of a heat equation. This
- is no problem, since often initial conditions are given as coded functions or are
- found on files. In our measurement this kind of data distribution you mention is of
- negligable importance.
-
- >I know people love to see numbers like this, it makes them feel cosy and
- >all warm inside but most such statistics - in this particular area and
- >that I've seen - belong in the marketing department.
-
- This is the usual problem when you buy a computer and ask the salesman for the
- MIPS (= meaningless instrument for pushing sales). This is not the usual situation
- when doing research. And this is comp.parallel not comp.marketing.
-
- >Here's a test that might tell you something about each of the systems:
- >using the same base compiler, hardware and operating system plus a
- >library implementing each model tell me (scaling over the same number of
- >processors shown above) the execution times taken to perform a 1024x1024
- >matrix transpose where each element of the matrix is a double float
- >assigned the value of its initial index. ;-) ( <- wicked grin). A
- >second test is to do the above and print the before and after result ;-)
- >(( <- a double wicked grin).
-
- Yes, of course, we faked the measurements, we used different compilers and
- different compiler switches just for the fun of it all.
- (:^% (Not so very funny grin)
-
- >And even if you did this, unless you can show that the implementations
- >are truely comparable, (i.e., they use the same implementation
- >techniques and infrastructure) it will not tell you anything about the
- >model's performance. That leaves you with a serious logistical problem -
- >one I had to confront and failed to fund in the Ease work - you have to
- >personally undertake the implementation of each model in a uniform (dare
- >I say "scientific") way - any other comparison is meaningless.
-
- I did not fail to fund this work. I get enough payment by the industrial sponsor
- for being able to rise enough manpower for precisely this work. We get enough
- support by our department and by the technical staff to be able to use the
- infrastructure fully dedicated - no one is working with the 40 machines
- when we do our measurements, no screenlock is running, everybody is logged out and
- there are no dumps going on. We even used an oscilloscope and a special Ethernet
- monitoring equipment during parts of our work, in order to understand various
- effects.
- It is correct, that this IS a logistical problem. And this is the reason why
- Volker invested several months for solving this problem. I learned from earlier
- postings, in which I was in error myself, that one has to read an article and to
- do ones homework before hitting the Followup button of the newsreader.
-
- I hope those comments clarify some of the open questions and I apologize for
- the lack of detail in our previous posting. We just did not want to post our
- both entire papers.
-
- Happy Xmas.
-
-
- --
- * Dr. Clemens H. CAP cap@ifi.unizh.ch (email)
- * Ass. Professor for Formal Methods in CS +(1) 257-4326 (office)
- * Dept. of Computer Science +(1) 322 02 19 (home)
- * University of Zurich +(1) 363 00 35 (fax)
-
-