Diskmomizer logs all messages once running with a syslog priority. You can direct the various messages into the standard error or standard out and or to syslog based on the priority of the messages.
The following options control all aspects of diskmomizer's logging: STDOUT, STDERR, STDOUT_PRIORITY , STDERR_PRIORITY, SYSLOG_FACILITY , SYSLOG_LOG_UPTO_PRIORITY, OBSCURE_REPORT_PRIORITY.
Once diskmomizer has opened all the devices to be tested it reports that it is starting I/O to each device with a message a the priority of NOTICE:
PID 1047: NOTICE /dev/vx/rdsk/cjgdg/vol19 DEV_STARTING time now 16:31:22 01/Aug/2000 PID 1047: NOTICE /dev/vx/rdsk/cjgdg/vol18 DEV_STARTING time now 16:31:22 01/Aug/2000
Once all the writes are started you see messages like this:
PID 1049: NOTICE /dev/vx/rdsk/cjgdg/vol19 DEV_RUNNING stop at 03:14:07 19/Jan/2038 PID 1049: NOTICE /dev/vx/rdsk/cjgdg/vol18 DEV_RUNNING stop at 03:14:07 19/Jan/2038
The stop time is the time when if using the options EXPERT_MIN_ACTIVE_TIME, EXPERT_MAX_ACTIVE_TIME, EXPERT_MIN_IDLE_TIME and EXPERT_MAX_IDLE_TIME that diskomizer will stop activity to that device. In this case none of the active time options are set so the time that the device is stopped is when time_t wraps.
If you have set the option SYSLOG_FACILITY to DAEMON then by default as these messages are logged at level NOTICE these messages are also copied into /var/adm/messages.
Now diskomizer keeps you in touch with the progress of the writes and the per process statistics by reporting information messages like this:
PID 1050: INFO /dev/vx/rdsk/cjgdg/vol19 (vxio0:a) write times (3.479,37.451,54.529) 15% PID 1047: INFO /dev/vx/rdsk/cjgdg/vol18 (vxio0:a) write times (4.295,37.476,53.964) 15%
The write times are given in seconds. The first time is the time of the fastest write, the second is the mean average time and the third is the time of the slowest write.
Diskmomizer reports any situations it does not like at log level WARN or above. There are basically three different errors it can detect:
System calls that return errors. Aioread, aiowrite and aiowait for example can all return errors about the io in question.
Reads and Writes that take longer than expected.
Reads that return data that does not match the data that was written to that block.
Of the three kinds of error, it is the third one that is the worst. The system has told the application that the data is stored correctly but it has failed to do so. Here is an example that was created by using dd to copy some blocks from one device to another, while both were being tested:
$ get_error 3 stderr PID 1630: ERROR 3 begin: Mon Oct 21 17:10:22 2002 PID 1630: ERR On disk header says device byte offset 14000 (0t81920), which calculates diskomizer block 0x6 (0t6), I requested diskomizer block 0x10 (0t16) PID 1630: WARNING Block read from sd38:c matches block written to sd31:c PID 1630: ERR sd38:c aioread retry corrupt PID 1630: ERR last write to this block requested Mon Oct 21 17:01:04 2002 PID 1630: ERR last write to this block returned Mon Oct 21 17:01:12 2002 PID 1630: ERR last write to this block used path sd38:c PID 1630: ERR sd38:c PID 1630: ERR Block at byte offset 0t163840 (0x28000) block size 2048 (0x800) PID 1630: ERR use "dd if=/devices/sbus@b,0/QLGC,isp@0,10000/sd@9,0:c,raw bs=2048 iseek=80 count=1" to read the block PID 1630: ERR Requested Mon Oct 21 17:10:21 2002, Time now Mon Oct 21 17:10:22 2002 PID 1630: ERR Diffs file dumped to /var/tmp/a/13/diskomizer/1630/diffs PID 1630: WARNING On read error retry 2, 3 remaining sd38:c blk 0x28000 PID 1630: WARNING ERROR 3 end: Mon Oct 21 17:10:22 2002
and because this is a data corruption as you can see from the error messages; diskomizer dumps diffs file containing the data that was expected and the data that was actually read:
Type A buffer header start and end value = 0xaaaaaaaaaaaaaaaa: Element Offset Size bufhdr.start 0x00 (0t000) sizeof (bufhdr.start ) 0x08 (0t08) bufhdr.ab.a.serial_and_provider 0x08 (0t008) sizeof (bufhdr.ab.a.serial_an) 0x20 (0t32) bufhdr.ab.a.devid.ino 0x28 (0t040) sizeof (bufhdr.ab.a.devid.ino) 0x08 (0t08) bufhdr.ab.a.devid.dev 0x30 (0t048) sizeof (bufhdr.ab.a.devid.dev) 0x04 (0t04) bufhdr.ab.a.hdrchksum 0x38 (0t056) sizeof (bufhdr.ab.a.hdrchksum) 0x04 (0t04) bufhdr.ab.a.type 0x3c (0t060) sizeof (bufhdr.ab.a.type ) 0x04 (0t04) bufhdr.ab.a.chksum 0x40 (0t064) sizeof (bufhdr.ab.a.chksum ) 0x08 (0t08) bufhdr.ab.a.did 0x48 (0t072) sizeof (bufhdr.ab.a.did ) 0x04 (0t04) bufhdr.ab.a.len 0x4c (0t076) sizeof (bufhdr.ab.a.len ) 0x04 (0t04) bufhdr.ab.a.off 0x50 (0t080) sizeof (bufhdr.ab.a.off ) 0x08 (0t08) bufhdr.ab.a.time 0x58 (0t088) sizeof (bufhdr.ab.a.time ) 0x08 (0t08) bufhdr.end 0x60 (0t096) sizeof (bufhdr.end ) 0x08 (0t08) Type B buffer header start and end value = 0x5555555555555555: Element Offset Size bufhdr.start 0x00 (0t000) sizeof (bufhdr.start ) 0x08 (0t08) bufhdr.ab.b.off 0x08 (0t008) sizeof (bufhdr.ab.b.off ) 0x08 (0t08) bufhdr.ab.b.time 0x10 (0t016) sizeof (bufhdr.ab.b.time ) 0x08 (0t08) bufhdr.ab.b.len 0x18 (0t024) sizeof (bufhdr.ab.b.len ) 0x04 (0t04) bufhdr.ab.b.did 0x1c (0t028) sizeof (bufhdr.ab.b.did ) 0x04 (0t04) bufhdr.ab.b.devid.ino 0x20 (0t032) sizeof (bufhdr.ab.b.devid.ino) 0x08 (0t08) bufhdr.ab.b.devid.dev 0x28 (0t040) sizeof (bufhdr.ab.b.devid.dev) 0x04 (0t04) bufhdr.ab.b.hdrchksum 0x30 (0t048) sizeof (bufhdr.ab.b.hdrchksum) 0x04 (0t04) bufhdr.ab.b.type 0x34 (0t052) sizeof (bufhdr.ab.b.type ) 0x04 (0t04) bufhdr.ab.b.chksum 0x38 (0t056) sizeof (bufhdr.ab.b.chksum ) 0x08 (0t08) bufhdr.ab.b.serial_and_provider 0x40 (0t064) sizeof (bufhdr.ab.b.serial_an) 0x20 (0t32) bufhdr.end 0x60 (0t096) sizeof (bufhdr.end ) 0x08 (0t08) Error Instance 0 Diffs dumped Mon Oct 21 17:10:21 2002 Diffs from aioread for block 0x28000 use "dd if=/devices/sbus@b,0/QLGC,isp@0,10000/sd@9,0:c,raw bs=2048 iseek=20 count=1" to read the block Decoding header Good Bad hdr.start 0x5555555555555555 0x5555555555555555 hdr.ab.b.off 0x28000 0x14000 hdr.ab.b.time "Mon Oct 21 17:01:04 2002" "Mon Oct 21 17:01:04 2002" hdr.ab.b.len 0x800 0x2000 hdr.ab.b.did 0x65d 0x65d hdr.ab.b.devid.ino 0x5e701 0x5e6e1 hdr.ab.b.devid.dev 0x800132 0x8000fa hdr.ab.b.hdrchksum 0x179e 0x184d hdr.ab.b.type.BUF_EXECUTABLE 0 0 hdr.ab.b.type.BUF_BAD_HDR 0 0 hdr.ab.b.type.BUF_BAD_CHKSUM 0 0 hdr.ab.b.type.BUF_READ_ONLY 0 0 hdr.ab.b.type.BUF_READY 0x1 0x1 hdr.ab.b.type.sequence 0x1 0x1 hdr.ab.b.chksum 0xcac3bcb5a99d9614 0xe9eff5fc06121629 hdr.ab.b.serial_and_provider "2160775739Sun_Microsystems" "2160775739Sun_Microsystems" hdr.end 0x5555555555555555 0x5555555555555555 Offset Written Read Diffs Bit count 0x00000000 0x5555555555555555 0x5555555555555555 0x0000000000000000 00 0x00000008 0x0000000000028000 0x0000000000014000 0x000000000003c000 04 0x00000010 0x000000003db424c0 0x000000003db424c0 0x0000000000000000 00 0x00000018 0x000008000000065d 0x000020000000065d 0x0000280000000000 02 0x00000020 0x000000000005e701 0x000000000005e6e1 0x00000000000001e0 04 0x00000028 0x0080013200000000 0x008000fa00000000 0x000001c800000000 04 0x00000030 0x0000179e08400000 0x0000184d08400000 0x00000fd300000000 09 0x00000038 0xcac3bcb5a99d9614 0xe9eff5fc06121629 0x232c4949af8f803d 29 0x00000040 0x3231363037373537 0x3231363037373537 0x0000000000000000 00 0x00000048 0x333953756e5f4d69 0x333953756e5f4d69 0x0000000000000000 00 0x00000050 0x63726f7379737465 0x63726f7379737465 0x0000000000000000 00 0x00000058 0x6d73000000000000 0x6d73000000000000 0x0000000000000000 00 0x00000060 0x5555555555555555 0x5555555555555555 0x0000000000000000 00 0x00000068 0x565758595a5b5c5d 0xf8f9fafbfc030405 0xaeaea2a2a6585858 29
When the written block was a type B block and the read block is a type A block, both headers are decoded. If both the headers are of the same type, and both their check sums are correct, as here, then they will only be decoded as the header of the type they are. This is clearly a misplaced block, as expected as I moved it with dd.
Now lets look at the error message in more detail:
PID 1630: ERROR 3 begin: Mon Oct 21 17:10:22 2002 PID 1630: ERR On disk header says device byte offset 14000 (0t81920), which calculates diskomizer block 0x6 (0t6), I requested diskomizer block 0x10 (0t16) PID 1630: WARNING Block read from sd38:c matches block written to sd31:c PID 1630: ERR sd38:c aioread retry corrupt PID 1630: ERR last write to this block requested Mon Oct 21 17:01:04 2002 PID 1630: ERR last write to this block returned Mon Oct 21 17:01:12 2002 PID 1630: ERR last write to this block used path sd38:c PID 1630: ERR sd38:c PID 1630: ERR Block at byte offset 0t163840 (0x28000) block size 2048 (0x800) PID 1630: ERR use "dd if=/devices/sbus@b,0/QLGC,isp@0,10000/sd@9,0:c,raw bs=2048 iseek=80 count=1" to read the block PID 1630: ERR Requested Mon Oct 21 17:10:21 2002, Time now Mon Oct 21 17:10:22 2002 PID 1630: ERR Diffs file dumped to /var/tmp/a/13/diskomizer/1630/diffs PID 1630: WARNING On read error retry 2, 3 remaining sd38:c blk 0x28000 PID 1630: WARNING ERROR 3 end: Mon Oct 21 17:10:22 2002
The first line is just tells you the error instance, which in this case is 3
The next line tells you the type of error that was seen. In this case it was that header of the block that was read says that this was block 0x6 when diskmomizer was reading block 0x16. These block numbers are diskomizer blocks. The header actually contains byte offsets so the calculated block here is (81920/8192) – 4 which gives 6 as the largest block size is 8192 bytes and the starting offset is 4.
Having read a block that claims to be for another block and or another device, diskomizer now goes and check to see if this block matches the one that diskomizer last wrote to the block and or device from which the block claims to be from, to see if the information in the header of the block that has been read is correct. Since in the case of a mislocated block the error could have been during the writing of the block that has been put in the wroing place, diskomizer reports the time that the aiowrite was submitted and when it returned, so that you can tie this up with any messages in the messages file.
Finally it reports the information about this io, the time the last write to the block was submitted and when it returned and the information about the block size of this request and the time that the read was requested and returned.