Repair a thin pool
I use ProxMox in my home lab which has been really helpful to spin up VMs as needed for tests and experiments.
I recently ran into an issue with the LVM thin pool used by ProxMox. The metadata space was completely full. The metadata space reported by lvs -a
was 99.99%.
After a quick search, I noticed I was not the first one running into this. It seems some felt the default pool size in LVM2 was not large enough:
- https://forum.proxmox.com/threads/is-default-install-lvm2-thin-pool-metadata-size-appropriate.31627/
I came up with steps to fix the issue starting with a resize the metadata space:
root@pve1:/# lvresize --poolmetadatasize +1G pve/data
Although lvs -a
showed the additional space, I was still experiencing issues and I assumed the metadata was corrupted so I tried:
root@pve1:/# lvconvert --repair pve/data
This did not resolve the issue. Since the root of the tree was lost prior, lvconvert --repair
was not able to recover anything and I was left with no metadeta and none of the thin volumes were available. lvs -a
was still showing the thin volumes but they remain unavailable:
root@pve1:/# lvchange -ay pve/vm-100-disk-5
device-mapper: reload ioctl on failed: No data available
I tried to running vgmknodes -vvv pve
but noticed those volumes got marked NODE_DEL:
Processing LV vm-100-disk-5 in VG pve.
dm mknodes pve-vm--100--disk--5 NF [16384] (*1)
pve-vm--100--disk--5: Stacking NODE_DEL
Syncing device names
pve-vm--100--disk--5: Processing NODE_DEL
I reached out to Zdenek Kabelac and Ming-Hung Tsai who are both extremely knowledgeable with LVM thin-pools and they both provided much needed and very useful assistance. Following advice from Ming-Hung, I grabbed the source code of thin-provisioning-tools from GitHub. To properly compile in ProxMox I had to add a number of tools:
apt-get install git
apt-get install autoconf
apt-get install g++
apt-get install libexpat
apt-get install libexpat1-dev
apt-get install libexpat1
apt-get install libaio-dev libaio1
apt-get install libboost1.55-all-dev
apt-get install make
Using this new set of tools, I started poking around with thin_check
, thin_scan
and thin_ll_dump
:
root@pve1:/# ./pdata_tools thin_check /dev/mapper/pve-data_meta2
examining superblock
examining devices tree
examining mapping tree
missing all mappings for devices: [0, -]
bad checksum in btree node (block 688)
root@pve1:/# ./pdata_tools thin_scan /dev/mapper/pve-data_meta2 -o /tmp/thin_scan_meta2.xml
root@pve1:/# ./pdata_tools thin_ll_dump /dev/mapper/pve-data_meta2 -o /tmp/thin_ll_dump_meta2.xml
pve_data_meta2
was the oldest backup of the metadata created by the lvconvert --repair
and was the most likely to contain my metadata. But the thin_check
showed the all mappings were missing because the root was missing.
To fix this with thin_ll_restore
, I needed to find the correct nodes. In the thin_ll_dump
meta dump created above, I was able to find the data-mapping-root:
root@pve1:/# grep "key_begin=\"5\" key_end=\"8\"" /tmp/thin_ll_dump_meta2.xml
<node blocknr="6235" flags="2" key_begin="5" key_end="8" nr_entries="4" value_size="8"/>
<node blocknr="20478" flags="2" key_begin="5" key_end="8" nr_entries="4" value_size="24"/>
In the thin_scan
xml file created above, I was able to find the device-details-root:
root@pve1:# grep value_size=\"24\" /tmp/thin_scan_meta2.xml
<single_block type="btree_leaf" location="20477" blocknr="20477" ref_count="0" is_valid="1" value_size="24"/>
<single_block type="btree_leaf" location="20478" blocknr="20478" ref_count="1" is_valid="1" value_size="24"/>
I used the 6235
and 20477
pair to start which produced good metadata and much fewer orphans than before:
root@pve1:/# ./pdata_tools thin_ll_dump /dev/mapper/pve-data_meta2 --device-details-root=20477 --data-mapping-root=6235 -o /tmp/thin_ll_dump2.xml
root@pve1:/# ./pdata_tools thin_ll_dump /tmp/tmeta.bin --device-details-root=20478 --data-mapping-root=6235
<superblock blocknr="0" data_mapping_root="6235" device_details_root="20478">
<device dev_id="5">
<node blocknr="7563" flags="1" key_begin="0" key_end="708527" nr_entries="6" value_size="8"/>
</device>
<device dev_id="6">
<node blocknr="171" flags="1" key_begin="0" key_end="799665" nr_entries="51" value_size="8"/>
</device>
<device dev_id="7">
<node blocknr="20413" flags="1" key_begin="0" key_end="1064487" nr_entries="68" value_size="8"/>
</device>
<device dev_id="8">
<node blocknr="19658" flags="1" key_begin="0" key_end="920291" nr_entries="17" value_size="8"/>
</device>
</superblock>
<orphans>
<node blocknr="564" flags="2" key_begin="0" key_end="0" nr_entries="0" value_size="8"/>
<node blocknr="677" flags="1" key_begin="0" key_end="1848" nr_entries="23" value_size="8"/>
<node blocknr="2607" flags="1" key_begin="0" key_end="708527" nr_entries="6" value_size="8"/>
<node blocknr="20477" flags="2" key_begin="5" key_end="8" nr_entries="4" value_size="24"/>
<node blocknr="3020" flags="1" key_begin="370869" key_end="600885" nr_entries="161" value_size="8"/>
<node blocknr="20472" flags="2" key_begin="379123" key_end="379268" nr_entries="126" value_size="8"/>
<node blocknr="20476" flags="2" key_begin="379269" key_end="401330" nr_entries="127" value_size="8"/>
</orphans>
Armed with this modified XML file and after making sure nothing was active and using the thin pool metadata, I was able to attempt a restore:
root@pve1:/# dmsetup remove pve-data-tpool
root@pve1:/# dmsetup remove pve-data_tdata
root@pve1:/# ./pdata_tools thin_ll_restore -i /tmp/thin_ll_dump_meta2_root_6235.xml -E /tmp/tmeta.bin -o /dev/mapper/pve-data_tmeta
Following the restore, my thin volumes ALL came back and I was able to activate every single volume.
I learned a lot about LVM thin pool in the process AND learned to be more careful with metadata space. ProxMox creates a very small space by default and when deploying a new server, metadatapoolsize should always be increased (or checked and monitored at the very least).
Originally published at unxrlm;.