Friday 23 October 2020

dd is a pretentious fraud, don't trust it except for disks, files, and tape

Especially busybox dd.

[Note, specifying ibs and obs separately seems to work as expected (except with busybox dd). Using bs where ibs and obs have the same value is not a good idea]

 I used to feel guilty writing to the tape drive with cat: cat /tmp/backup.tar.gz > /dev/rct0

What dd was good at, (I had somehow learned by osmosis) was reading and writing full block sizes where the device driver can't do it so well.

What nonsense.

dd doesn't to "impedance matching" of block sizes, it has barely any regard for them. 

This all of a sudden mattered when I was using a shell to set efi variables using the efivarfs file system:

printf "\x07\x00\x00\x00%s" "$var-value" > "my-var-${owner-UUID}"

That works fine, but this was a problem:

some-process | cat /dev/fd/2 2<<<$'\x07\x00\x00\x00' - > "my-var-${owner-UUID}"

why? Because the data written to the variable pseudo-file had to be written all-in-one-go, and as this example shows, it might not happen, the output is written in two parts:

( echo hello ; sleep 1 ; echo goodbye ) | cat /dev/fd/2 2<<<$'\x07\x00\x00\x00' - | cat | cat

I supposed that this would be ideal for dd, I set the block size to 4096 which is the maximum write to the efi-var psuedo files anyway, giving:

( echo hello ; sleep 1 ; echo goodbye ) | \
cat /dev/fd/2 2<<<$'\x07\x00\x00\x00' - | dd bs=4096 of="my-var-${owner-UUID}"

but it didn't do the trick, strace showed multiple writes from dd, as we can also see here

( echo hello ; sleep 1 ; echo goodbye ) | cat /dev/fd/2 2<<<$'\x07\x00\x00\x00' - | dd bs=4096 | cat | cat

it turns out that dd doesn't care if it does a partial read from a pipe, it takes the block size as a maximum hint, rather than a requirement.

So... dd is ostensibly good at impedance matching of block sizes, lets set an input block size of 1 and an output block size of 4096 and let it accumulate the input blocks for output:

( echo hello ; sleep 1 ; echo goodbye ) | cat /dev/fd/2 2<<<$'\x07\x00\x00\x00' - | dd bs=1 obs=4096 | cat | cat

exactly the same result! dd just doesn't care!

GNU has the life-saving non-posix iflag=fullblock which actually reads a full input block (unless eof).

Without which, what is the point of dd? Well yes, we know it has incidental features such as very limited character set conversion, along with skip and seek, and unlike head, tail, etc, it won't read more input than it intends to use (which makes it very convenient for reading some of stdin from a shell script).

But it's main purpose is unmet, and with solid risk in some circles. Want to generate a random password for openssl?

password=$(dd if=/dev/random bs=32 count=1 | base64)

do you see the danger yet? What if dd blocks for want of randomness, dd will return a partial block!

simulate thus, try to read 12 random bytes and see how many we might get:

( printf "a" ; sleep 1 ; printf "bc" ; sleep 1 ; printf "def" ; sleep 1 ; printf "ghij" ; sleep 1 ; printf "klmno" ) | dd bs=12 count=1 | wc -c

we get one character instead of 12 (or instead of 256 or however many you expected)

The fix is to swap bs and count, so we keep reading 1 byte until we have enough

( printf "a" ; sleep 1 ; printf "bc" ; sleep 1 ; printf "def" ; sleep 1 ; printf "ghij" ; sleep 1 ; printf "klmno" ) | dd bs=1 count=12 | wc -c

dd, you are a pretentious ass, without the GNU extension you cannot read lots of small blocks and write one big block, and even on a non-terminating stream with more data becoming available, you will read less than block size multiplied by count, and think yourself smart!

Except with disks, files, and tape, where the driver managers block size, and partial reads won't happen, you can't be trusted to get your most basic task right.

Many interesting remarks at:

  • https://unix.stackexchange.com/questions/121865/create-random-data-with-dd-and-get-partial-read-warning-is-the-data-after-the
  • https://unix.stackexchange.com/questions/17295/when-is-dd-suitable-for-copying-data-or-when-are-read-and-write-partial
  • https://superuser.com/questions/520601/why-does-dd-only-copy-128-bytes-from-dev-random-when-i-request-more

1 comment:

  1. Thank you for the article. We provide eBay Data Scraper you can easily extract data from eBay through our United Lead Scraper

    ReplyDelete