Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Archives:

Size Discrepany in the ‘du’ Command

June 21st, 2012 by Multimedia Mike

I had a problem today while using the common Unix command ‘du’. As a refresher, ‘du’ stands for disk usage and is a handy tool for understanding how much disk space is being occupied.

I think ‘du’ is probably doing the right thing. The problem might be that I’m getting strange (read: 1/2 the expected number) when running the tool against directories on vmhgfs, the VMware filesystem.

Science Project
On an Ubuntu Linux VMware session, my home directory is on the main file system, which is ext4. The directory /mnt/hgfs is reported by ‘mount’ to be of type vmhgfs and is shared with the host machine.

Create a directory in the home directory and generate a 10 MiB file:

mkdir /home/melanson/dir
dd if=/dev/urandom of=/home/melanson/dir/random-file bs=1048576 count=10

Create a directory on the shared drive and copy the same file:

mkdir /mnt/hgfs/vmshare/dir
cp /home/melanson/dir/random-file /mnt/hgfs/vmshare/dir

Run ‘du’ on each directory using the -k and -h options:

du -k /home/melanson/dir /mnt/hgfs/vmshare/dir
10244   /home/melanson/dir
5120    /mnt/hgfs/vmshare/dir

du -h /home/melanson/dir /mnt/hgfs/vmshare/dir
11M    /home/melanson/directory
5.0M   /mnt/hgfs/vmshare/directory

I noticed this discrepancy when I was trying to pack a set of files (akin to ‘tar’-ing) living in a directory in the shared location. I was going mad trying to understand why the original directory was only 2 MB as reported by ‘du’ but the final packed file was 4 MB.

To be fair, the man page for ‘du’ succinctly states that the tool’s purpose is merely to estimate file space usage”.

Posted in General | 5 Comments »

5 Responses

  1. Kostya Says:

    I had the same fun with du’ing files on mounted non-physical filesystem (fuseiso to be precise) and it also reported half or quarter of real file size. I guess it might have something to do with sector size (512 bytes vs. 4096 bytes default for ext[23]).

  2. Z.T. Says:

    What does stat say on files in the vmware shared dir?

  3. Multimedia Mike Says:

    @Z.T.: Good question:

    $ stat dir/random-file
    File: `dir/random-file’
    Size: 10485760 Blocks: 10240 IO Block: 1024 regular file

    $ du -h dir/
    5.0M dir/

  4. Coderjoe Says:

    And Kostya gets it in one. du estimates the size of files using the allocated block count times what it thinks the block size is. I don’t know anything about the vmware host-guest shared filesystem, but I suspect it uses a different block size than du expects, reports a different size than it actually uses, or uses several different sizes.

    The size that du would report would also be incorrect if you were dealing with a sparse file, as the file would be using fewer disk blocks than it supposedly does.

    What happens when you ask du for –apparent-size? or perhaps –bytes?

  5. SR Says:

    Sounds to me like it’s clearly the vmhgfs driver that’s at fault.