NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / parallel / 2805 < prev next >

Wrap

Text File | 1992-12-28 | 9.0 KB | 167 lines

Newsgroups: comp.parallel Path: sparky!uunet!gumby!destroyer!gatech!hubcap!fpst From: cap@ifi.unizh.ch (Clemens Cap) Subject: Re: Linda / The Parform Message-ID: <1992Dec23.220706.8281@ifi.unizh.ch> Followup-To: poster Sender: cap@ifi.unizh.ch Organization: University of Zurich, Department of Computer Science References: <1992Dec21.132725.23905@hubcap.clemson.edu> Date: Wed, 23 Dec 92 22:07:06 GMT Approved: parallel@hubcap.clemson.edu Lines: 153 Some days ago, Volker Strumpen and I posted a message to inform all interested scientists who had followed that thread on the final results of our measurements with parallel programming platforms. There was a reply by Steven Ericsson who obviously did not know, that we had spent several months with getting all details of those measurements right and that some 80 emails were exchanged with a number of computer science departments to obtain fair and correct data under reproduceable conditions. Of course we did a larger number of measurements and made extensive statistics, took care of compiler versions and options, carefully chose our benchmarking problem, discussed those issues with people who are longer in the field doing better research than we do, of coures we gave a number of talks on the subject and spent days discussing the issue, of course we know, why we obtained superlinear speed up, of course... Well, I guess we should have mentioned this more explicitely in our posting. Nevertheless we mentioned our old report, where some of the open questions are adressed, and we also mentioned a forthcoming publication, which will hopefully answer the questions that still remained unanswered in the old version. Following standard etiquette we should be able to provide a precise reference for the new version soon, but we will not compromise journal copyright by downloading ftp versions. We are aware of some aspects which are not adressed in the best way in the old report, these problems should be solved in the publication. Usenet groups are an interesting medium for discussion. Thank you to all collegues that responded to our original posting some months ago and that shared their experiences and their knowledge with us via email. This significantly helped us with our research. Occasionally postings by Mr. This_is_all_wrong_since_I_know_it_better join the thread providing for some late night amusement in front of the workstation. Therefore also these postings help one's research. In article <1992Dec21.132725.23905@hubcap.clemson.edu> Steven Ericsson Zenith <zenith@kai.com> writes: >I find these numbers difficult to believe. Initially we found those numbers hard to believe, too. It's just what we kept measuring. >Firstly, bells start ringing for the single processor case. Why? >Because they are all the same and to my knowledge these systems don't >all use the same compiler. I expect to see some variation. >No indication is given of what the 1 processor time means - is this the >sequential execution time under the respective system compiler? It >should be. The above numbers can only begin to make sense if the base >compiler *is* the same in all cases - otherwise we do not know what we >are comparing. As written in our report, all platforms use the C language and C compiler with identical compiler options, apart of course of system specific enhancements. For the single processor case it is easy to design a conditional statement, which skips the communication part of the program. Therefore it is reasonable to expect identical execution times. >I would like to see the superlinear speed up explained. It's difficult >to assess without a detailed description of the hardware, operating >system infrastructure, etc.. This is correct and therefore there is a long list of used equipment including the versions of all platforms in the forthcoming publication. There are several plausible reasons for superlinear speedup (cache, swapping and paging). I will not adress this in detail here, it can be found in the report and the publication. Furthermore it has been measured and confirmed by a number of additional test not explicitely mentioned in the paper and the publication. >Since this problem is obviously more than embarrassingly parallel I'd >like to know how much, if any, of the interaction mechanism was used >during computation. If the answer is, as I suspect, that after the data >and work distribution, insignificant interaction took place then the >above tells us something about the parallel decomposition of the problem >but sweet Fanny Adams about any of the systems tested. It might be interesting to know, that the insignificant interaction, as described in our old and new report, was as low as 6MBit per second through a standard 10Mbitps Ethernet. Usually 3MBit per second already is considered as a highly loaded Ethernet. Please inform Fanny Adams about that. Since we wanted to study dynamic load balancing, we needed a problem with tight subtask synchronisation. We furthermore wanted to evaluate the network bandwidth. A precise evaluation of overheads is only possible, when you know the speedup the application should have without any communication and administration. For this and other reasons we chose an explicit heat equation solver. A choice upon which the director of a major national computing center hosting also Cray equipment commented "Just the right problem for those goals." Of course we know, that this approach to the heat equation has some numerical and physical problems. But it provides loads of the type we wanted to study. >What was measured here? Does the clock start before or after >distribution of the data set and processes? We measured wall clock time on a dedicated system measuring the computational phase of the problem. Our old report mentions the general setup of the experiment, the publication even reports the precise type of every single employed machine. Data distribution consisted of distributing initial conditions of a heat equation. This is no problem, since often initial conditions are given as coded functions or are found on files. In our measurement this kind of data distribution you mention is of negligable importance. >I know people love to see numbers like this, it makes them feel cosy and >all warm inside but most such statistics - in this particular area and >that I've seen - belong in the marketing department. This is the usual problem when you buy a computer and ask the salesman for the MIPS (= meaningless instrument for pushing sales). This is not the usual situation when doing research. And this is comp.parallel not comp.marketing. >Here's a test that might tell you something about each of the systems: >using the same base compiler, hardware and operating system plus a >library implementing each model tell me (scaling over the same number of >processors shown above) the execution times taken to perform a 1024x1024 >matrix transpose where each element of the matrix is a double float >assigned the value of its initial index. ;-) ( <- wicked grin). A >second test is to do the above and print the before and after result ;-) >(( <- a double wicked grin). Yes, of course, we faked the measurements, we used different compilers and different compiler switches just for the fun of it all. (:^% (Not so very funny grin) >And even if you did this, unless you can show that the implementations >are truely comparable, (i.e., they use the same implementation >techniques and infrastructure) it will not tell you anything about the >model's performance. That leaves you with a serious logistical problem - >one I had to confront and failed to fund in the Ease work - you have to >personally undertake the implementation of each model in a uniform (dare >I say "scientific") way - any other comparison is meaningless. I did not fail to fund this work. I get enough payment by the industrial sponsor for being able to rise enough manpower for precisely this work. We get enough support by our department and by the technical staff to be able to use the infrastructure fully dedicated - no one is working with the 40 machines when we do our measurements, no screenlock is running, everybody is logged out and there are no dumps going on. We even used an oscilloscope and a special Ethernet monitoring equipment during parts of our work, in order to understand various effects. It is correct, that this IS a logistical problem. And this is the reason why Volker invested several months for solving this problem. I learned from earlier postings, in which I was in error myself, that one has to read an article and to do ones homework before hitting the Followup button of the newsreader. I hope those comments clarify some of the open questions and I apologize for the lack of detail in our previous posting. We just did not want to post our both entire papers. Happy Xmas. -- * Dr. Clemens H. CAP cap@ifi.unizh.ch (email) * Ass. Professor for Formal Methods in CS +(1) 257-4326 (office) * Dept. of Computer Science +(1) 322 02 19 (home) * University of Zurich +(1) 363 00 35 (fax)