home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!haven.umd.edu!mimsy!afterlife!swaliff
- From: swaliff@afterlife.ncsc.mil (Steve Aliff)
- Newsgroups: comp.os.linux
- Subject: Maxtor 7213/HD Timeouts/Disk read doesn't match write??
- Summary: Problems with disk I/O under Linux. Bad hardware? Bugged Software?
- Keywords: Linux I/O, Maxtor 7213, Kouwei KW-556A controller, HD timeout.
- Message-ID: <1992Sep15.175932.13580@afterlife.ncsc.mil>
- Date: 15 Sep 92 17:59:32 GMT
- Organization: The Great Beyond
- Lines: 363
-
- Recently, I have been experimenting with Linux on my home PC. I have
- tried both the MCC and SLS releases with a variety of different disk
- partitioning schemes. I noticed a number of HD timeouts/controller
- reset messages with both the MCC and SLS releases. Most recently, I
- re-partitioned my disk and loaded the SLS release. With this setup, I
- didn't notice any HD timeouts, but I was using only a small part of my
- disk. Namely, I had /dev/hdb1 (32M) as /, swap on /dev/hdb2 (16M), and
- the rest of the disk as one large unused partition (were I had been
- experimenting with extfs under the MCC release).
-
- Since I wasn't having any problems with this minimal setup, I decided
- to see if my problems were caused by something related to the portions
- of my disk not presently in use. So I created a 64M Minix filesystem
- on /dev/hdb3 and mounted it on /mnt. I then ran a trivial little disk
- exerciser program I just wrote (attached below), and got some puzzling
- results. Sometimes the program would read 0xd0 instead of 0.
- Sometimes it would read 0xd5 instead of 0x55. The differences were
- never at exactly the same place. The random reads always produce HD
- timeouts. I have attached a screen dump of one of these errors. Only
- once did I think to do an "od" of the file. The one time I did so, the
- file had the correct values throughout. I have been invoking my tests
- with "ufstst 8000" (write/read 8,000 8192 byte chunks).
-
- My system is a Technology Power Sonic Motherboard 386@33MHz, 8MHz ISA
- bus, 8MB DRAM, Diamond Speedstar Plus SVGA card, Zoom VFP32bis modem
- card, Kouwei KW-556A IDE/Multi-I/O card. The disks are a Quantum
- lps105 master and Maxtor 7213 slave. There are also the obligatory
- floppies (Teac 3.5 and 5.25 and a CMS DJ-10 off the floppy
- controller).
-
- I'm left with several questions. Do I have a bad or failing disk?
- Could there be unmapped bad spots on the disk? Is there a bug in Linux
- causing this? Could it be that my disk controller is simply
- incompatible? Is anyone else running with either the Maxtor 7213 disk
- or the Kouwei controller, and if so, any problems? If you have a setup
- which gives you HD timeouts, would you take a few minutes to try my
- program and see what happens? If you have a 7213, would you try my
- program regardless? With a few simple mods, this program should
- compile and run under DOS, the open would need to explicitly call for
- binary mode, but that's about it.
-
- Thanks for your patience. I'm open for suggestions.
-
- --Cut here--
- /*
-
- ufstst [#blocks]
- Compile with "cc -O2 -o ufstst ufstst.c"
-
- */
-
- #define DEBUG 0
- #define MAX 512L
- #define BLK 8192L
- #define ME "ufstst"
-
- #include <memory.h>
- #include <stdlib.h>
- #include <fcntl.h>
- #include <sys/types.h>
- #include <sys/stat.h>
- #include <errno.h>
- #include <stdio.h>
- #include <time.h>
-
- extern int errno;
-
- main(argc, argv)
- int argc;
- char *argv[];
- {
- int ufs;
- char *filename = "junk";
- unsigned char pattern[4][BLK];
- unsigned char buff[BLK+1];
- int i, j, k;
- long place, rem;
- long max;
- struct stat stat_buff;
- long seek_status;
- int status;
- long blksize;
- time_t start_time, end_time, elapsed_time;
- double xfer_rate, avg_read_rate, avg_write_rate, avg_random_read_rate;
- int amtwrit;
- int bad_cnt;
-
- #if DEBUG
- long smin, smax;
- #endif
-
- if (argc > 2) {
- fprintf(stderr, "Usage: %s <number of blocks>\n", ME);
- exit(1);
- }
-
- if (argc == 2)
- max = atoi(argv[1]);
- else
- max = MAX;
-
- errno = 0;
- if ((status = stat(filename, &stat_buff)) < 0)
- if (errno != ENOENT) {
- perror(ME);
- exit(99);
- }
-
- if (status == 0) {
- fprintf(stderr,"%s: '%s' exists, continue? (y/n) ", ME, filename);
- if (!yes_no()) exit(99);
- unlink(filename);
- }
-
- if ((ufs = open(filename, O_RDWR | O_CREAT | O_TRUNC, 0666)) <= 0)
- {
- fprintf(stderr, "%s: Couldn't open '%s'.\n", ME, filename);
- perror("open:");
- exit(99);
- }
-
- errno = 0;
- if ((status = stat(filename, &stat_buff)) < 0)
- {
- perror(ME);
- exit(99);
- }
-
- blksize = BLK;
- printf("%s: Writing %ld blocks of %ld bytes. (%0.2f MB, %ld bytes)\n",
- ME, max, blksize, (double) ((max * blksize)/(1024.0 * 1024.0)),
- max * blksize);
-
- for(i=0; i<blksize; i++)
- pattern[0][i] = 0x00;
-
- for(i=0; i<blksize; i++)
- pattern[1][i] = 0xff;
-
- for(i=0; i<blksize; i++)
- pattern[2][i] = 0xaa;
-
- for(i=0; i<blksize; i++)
- pattern[3][i] = 0x55;
-
- avg_write_rate = 0.0;
- avg_read_rate = 0.0;
- avg_random_read_rate = 0.0;
-
- for(i=0; i<4; i++)
- {
- printf("Sequential writes, pattern %d (0x%02x):\n", i, pattern[i][0]);
- start_time = time(NULL);
- for(k=0; k<max; k++)
- {
- amtwrit = write(ufs, pattern[i], blksize);
- if ((amtwrit < 0) || (amtwrit != blksize))
- {
- fprintf(stderr, "%s: error in write. amtwrit = %d\n",
- ME, amtwrit);
- perror("write:");
- exit(99);
- }
- printf("%5d\015", k);
- }
- end_time = time(NULL);
- elapsed_time = end_time - start_time;
- printf("Elapsed time = %ld seconds.\n", elapsed_time);
- xfer_rate = (float) ((max * blksize) /
- (elapsed_time <= 0 ? (-1) : elapsed_time));
- avg_write_rate += xfer_rate;
- printf("Approximately %0.2f bytes per second (%0.2f KB/s).\n",
- xfer_rate,
- xfer_rate/1024.0);
-
- lseek(ufs, 0L, SEEK_SET);
-
- printf("Sequential reads, pattern %d:\n", i);
- start_time = time(NULL);
- for(k=0; k<max; k++)
- {
- if (read(ufs, buff, blksize) < 0)
- {
- fprintf(stderr, "%s: error in read.\n", ME);
- perror("read:");
- exit(99);
- }
- if (memcmp(buff, pattern[i], blksize) != 0)
- {
- fprintf(stderr, "%s: pattern read differs from what was written!\n", ME);
- bad_cnt = 0;
- for(j=0; j<blksize; j++)
- {
- if (buff[j] != pattern[i][j])
- {
- printf("Record %d, offset %d: was %02x should have been %02x\n",
- k, j, buff[j], pattern[i][j]);
- if (bad_cnt++ >= 20)
- {
- fprintf(stderr,"more? (y/n) ");
- if (!yes_no()) break;
- bad_cnt = 0;
- }
- }
- }
- fprintf(stderr,"Continue tests? (y/n)");
- if (!yes_no()) exit(99);
- }
- printf("%5d\015", k);
- }
- end_time = time(NULL);
- elapsed_time = end_time - start_time;
- printf("Elapsed time = %ld seconds.\n", elapsed_time);
- xfer_rate = (float) ((max * blksize) /
- (elapsed_time <= 0 ? (-1) : elapsed_time));
- avg_read_rate += xfer_rate;
- printf("Approximately %0.2f bytes per second (%0.2f KB/s).\n",
- xfer_rate,
- xfer_rate/1024.0);
-
- #if DEBUG
- smin = max * blksize + 1;
- smax = 0;
- #endif
- srand(1);
- printf("Random reads, pattern %d:\n", i);
- start_time = time(NULL);
- for(k=0; k<max; k++)
- {
- place = (long) rand();
- place = (long) (place % (max * blksize));
- if ((rem = place % blksize) != 0) place += (blksize - rem);
- if (place == (max * blksize)) place -= (long) blksize;
- #if DEBUG
- smin = (smin < place) ? smin : place;
- smax = (smax > place) ? smax : place;
- #endif
- if ((seek_status = lseek(ufs, place, SEEK_SET)) < 0L)
- {
- fprintf(stderr, "%s: lseek failed. seek_status = %ld\n", ME, seek_status);
- perror("lseek");
- printf("place = %ld\n", place);
- exit(99);
- }
- if (read(ufs, buff, blksize) < 0)
- {
- fprintf(stderr, "%s: error in read.\n", ME);
- perror("read:");
- exit(99);
- }
- if (memcmp(buff, pattern[i], blksize) != 0)
- {
- fprintf(stderr, "%s: pattern read differs from what was written!\n", ME);
- bad_cnt = 0;
- for(j=0; j<blksize; j++)
- {
- if (buff[j] != pattern[i][j])
- {
- printf("at %d: was: %2x should have been %2x\n",
- j, buff[j], pattern[i][j]);
- if (bad_cnt++ >= 20)
- {
- fprintf(stderr,"more? (y/n) ");
- if (!yes_no()) break;
- bad_cnt = 0;
- }
- }
- }
- fprintf(stderr,"Continue tests? (y/n)");
- if (!yes_no()) exit(99);
- }
- printf("%5d %10ld\015", k, place);
- }
- end_time = time(NULL);
- elapsed_time = end_time - start_time;
- printf("Elapsed time = %ld seconds.\n", elapsed_time);
- xfer_rate = (float) ((max * blksize) /
- (elapsed_time <= 0 ? (-1) : elapsed_time));
- avg_random_read_rate += xfer_rate;
- printf("Approximately %0.2f bytes per second (%0.2f KB/s).\n\n",
- xfer_rate,
- xfer_rate/1024.0);
- #if DEBUG
- printf("DEBUG: smin=%ld, smax=%ld\n", smin, smax);
- #endif
-
- lseek(ufs, 0L, SEEK_SET);
- }
-
- avg_write_rate = avg_write_rate/(4.0 * 1024.0);
- avg_read_rate = avg_read_rate/(4.0 * 1024.0);
- avg_random_read_rate = avg_random_read_rate/(4.0 * 1024.0);
- printf("Avg. write rate = %0.2f, read = %0.2f, rand = %0.2f\n",
- avg_write_rate, avg_read_rate, avg_random_read_rate);
-
- close(ufs);
- if (unlink(filename) < 0)
- fprintf(stderr,"%s: %s not removed.\n", ME, filename);
- printf("%s: completed.\n", ME);
- return 0;
- }
-
-
- int yes_no()
- {
- int ans, dummy;
-
- while((ans = dummy = getchar()) != EOF) {
- switch(ans) {
- case 'y':
- case 'Y':
- while(dummy != '\n' && dummy != EOF)
- dummy = getchar();
- return(1);
- case 'n':
- case 'N':
- while(dummy != '\n' && dummy != EOF)
- dummy = getchar();
- return(0);
- default:
- fprintf(stderr, "%s: '%c' is not valid here, enter 'y' or 'n'\n",
- ME, ans);
- while(dummy != '\n' && dummy != EOF)
- dummy = getchar();
- break;
- } /* end switch */
- } /* end while */
-
- return(-99);
-
- } /* end yes_no() */
- --Cut here--
-
- The screendump:
-
- 6468Unexpected HD interrupt
- ufstst: pattern read differs from what was written!
- Record 6469, offset 1262: was d5 should have been 55
- Record 6469, offset 1264: was d5 should have been 55
- Record 6469, offset 1266: was d5 should have been 55
- Record 6469, offset 1268: was d5 should have been 55
- Record 6469, offset 1270: was d5 should have been 55
- Record 6469, offset 1272: was d5 should have been 55
- Record 6469, offset 1274: was d5 should have been 55
- Record 6469, offset 1276: was d5 should have been 55
- Record 6469, offset 1278: was d5 should have been 55
- Record 6469, offset 1280: was d5 should have been 55
- Record 6469, offset 1282: was d5 should have been 55
- Record 6469, offset 1284: was d5 should have been 55
- Record 6469, offset 1286: was d5 should have been 55
- Record 6469, offset 1288: was d5 should have been 55
- Record 6469, offset 1290: was d5 should have been 55
- Record 6469, offset 1292: was d5 should have been 55
- Record 6469, offset 1294: was d5 should have been 55
- Record 6469, offset 1296: was d5 should have been 55
- Record 6469, offset 1298: was d5 should have been 55
- Record 6469, offset 1300: was d5 should have been 55
- Record 6469, offset 1302: was d5 should have been 55
- more? (y/n) HD-controller reset
-
- --End of Message--
- --
- Steve Aliff (swaliff@afterlife.ncsc.mil [144.51.1.1])
-