Discussion:
[ale] Many thanks to Phil Turel
Malcolm Bibby via Ale
2018-07-30 22:32:35 UTC
Permalink
About 10 days I suffered a major crash on my system that runs Centos-7.
  I appealed to members of this group for advise.   It turned out
better than I could have ever hoped.   I was helped, very significantly,
to get back in operation with the help of Phil Turel, a member here.  
He came to my place one day, took an image, and then, on another day, I
went to his place after he had analyzed and fixed the image.

Simultaneously with re-installing the contents of the image, he also set
me up with two rotary disks in a RAID configuration and two SSDs also in
a RAID configuration.

On the first visit we enjoyed a beer and a great pizza while on the
second visit we enjoyed a DOS EQUIX each and great Mexican food!   Also,
there were two great conversations on various political matters!!

Thank you Phil.

Malcolm M Bibby

_______________________________________________
Ale mailing list
***@ale.org
https://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mail
Phil Turmel via Ale
2018-07-30 22:51:39 UTC
Permalink
You're welcome, Malcolm.

Very interesting and unusual bit of corruption, on all but the first
superblock, and precisely on the single 512-byte sectors of those other
superblocks. Never seen anything like it.
Post by Malcolm Bibby via Ale
About 10 days I suffered a major crash on my system that runs Centos-7.
  I appealed to members of this group for advise.   It turned out better
than I could have ever hoped.   I was helped, very significantly, to get
back in operation with the help of Phil Turel, a member here.   He came
to my place one day, took an image, and then, on another day, I went to
his place after he had analyzed and fixed the image.
Simultaneously with re-installing the contents of the image, he also set
me up with two rotary disks in a RAID configuration and two SSDs also in
a RAID configuration.
On the first visit we enjoyed a beer and a great pizza while on the
second visit we enjoyed a DOS EQUIX each and great Mexican food!   Also,
there were two great conversations on various political matters!!
Thank you Phil.
Malcolm M Bibby
_______________________________________________
Ale mailing list
***@ale.org
https://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/
Derek Atkins via Ale
2018-08-06 17:46:07 UTC
Permalink
Post by Phil Turmel via Ale
You're welcome, Malcolm.
Very interesting and unusual bit of corruption, on all but the first
superblock, and precisely on the single 512-byte sectors of those other
superblocks. Never seen anything like it.
So how did you debug it? And how did you fix it?

If it's that regular a pattern it could be anything from a rotary issue
in the HDD to a failed memory stick.

-derek
--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
***@MIT.EDU PGP key available
_______________________________________________
Ale mailing list
***@ale.org
https://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo
Phil Turmel via Ale
2018-08-06 18:26:06 UTC
Permalink
Hi Derek,
Post by Derek Atkins via Ale
Post by Phil Turmel via Ale
You're welcome, Malcolm.
Very interesting and unusual bit of corruption, on all but the first
superblock, and precisely on the single 512-byte sectors of those other
superblocks. Never seen anything like it.
So how did you debug it? And how did you fix it?
I used xfs_db, based on a clue from an old mailing list entry with a
similar error message.

Within xfs_db, "sb 0" would move the cursor to the first superblock,
which I could then "print", report the block # with "fsb", and report
the sector number with "daddr". Repeat with "sb 1", "sb 2", and "sb 3".

With the sector numbers, I could get hex for the superblock and
surrounding sectors with:

dd if=/dev/whatever bs=512 skip=sector count=16 |hexdump -C

That showed me the scrambled data in just one sector in the latter
superblocks, with proper data structures following.

I then used dd to extract the good superblock:

dd if=/dev/whatever bs=512 count=1 of=tempsb.dat

And write it to the other locations:

dd if=tempsb.dat bs=512 count=1 seek=sector of=/dev/whatever

xfs_repair then worked, but with a handful of corrections, due to the
inability to mount to replay the log.
Post by Derek Atkins via Ale
If it's that regular a pattern it could be anything from a rotary issue
in the HDD to a failed memory stick.
The original failing device was an M.2 mini-PCIe SSD. And it was
failing, and gave up the ghost completely later.

I have no idea what failure mode made it possible to write just the one
scrambled 512-byte sector to the beginning of each allocation group,
except the first. Smells like an offset calculation bug to me.

Phil
_______________________________________________
Ale mailing list
***@ale.org
https://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo

Loading...