Bzip2 vs. LZMA

Pursuant to some archiving projects I want to conduct, I wanted to evaluate Bzip2 vs. LZMA for compression. I know that the latter is more efficient, size-wise, than the former while generally requiring more time on the compression side. But I wanted to know if the encoding time difference was very severe vs. the space saved. I also wanted to know how the relative decode speed compares.

Methodology: For a number of large files that are each around 1.35 GB, measure the compression speed and ratio and then measure…

You know what? This is the most basic type of profiling experiment to set up and I really don’t feel like describing the process, the hardware used, the variables carefully controlled, or graphing the data. Here’s what I came up with in my tests:

  • Bzip2 is 2-2.3x faster to compress than LZMA.
  • The Bzip2 files were 15-20% larger than the LZMA files.
  • The LZMA files decompressed in nearly half the time of the Bzip2 files.

Conclusion: I’ll be going with LZMA for my long-term archival projects.

11 thoughts on “Bzip2 vs. LZMA

  1. nine

    Some quick testing of my own with a tar of the x264 repository. I did multiple runs, of course, but was lazy beyond that too. bzip2 decompression was faster here, and the differences in efficiency are negligible. I suppose your files are ISOs, given recent posts?

    -rw-r–r– 1 nine nine 5140480 2010-01-20 15:36 x264.tar (100% size)
    -rw-r–r– 1 nine nine 3015644 2010-01-20 15:36 x264.tar.bz2 (59%)
    -rw-r–r– 1 nine nine 2874130 2010-01-20 15:36 x264.tar.lzma (56%)

    bzip2 x264.tar 1.16s user 0.01s system 99% cpu 1.182 total (1x time)
    lzma x264.tar 3.14s user 0.10s system 98% cpu 3.278 total (2.77x)

    bunzip2 x264.tar.bz2 0.37s user 0.02s system 98% cpu 0.391 total (1x time)
    unlzma x264.tar.lzma 0.30s user 0.02s system 99% cpu 0.324 total (0.81x)

  2. Short Circuit

    I once did some benchmarks running gzip, bzip2, lzma and rzip against my SQL backups from Rosetta Code. As I recall, rzip blew the pants off the other three, but for the life of me I can’t find anywhere I wrote about it at the time.

  3. Z.T.

    “bzip2 decompression was faster here”

    “bunzip2” … “0.37s”
    “unlzma” … “0.30s” … “0.81x”

    Copy paste error or analysis error?

  4. Ramiro

    Interesting… I set out a year or so ago to find the best compression for bundling builds. It was back when not many distros had lzma, and its interface to tar had just changed from one version to another, which was kind of annoying.
    I found out 7zip gave the best compression. It was always better (both size-wise and speed-wise) than using lzma standalone or through tar.

  5. Multimedia Mike Post author

    @Falk: Says who?

    @pJok: No, haven’t tried it. I notice that their website is suspiciously devoid of comparisons to compression types that people actually use (save for zlib).

  6. Owen

    @Mike: Says the developer ;). The LZMA tools are in maintenance only mode now, and the XZ ones can read the files anyway. XZ implements the LZMA2 algorithm, which (I suspect) has better compression

  7. Denver Gingerich

    I’ve found that lzma -5 provides a reliable space/time tradeoff. On Ubuntu 9.10, I notice that sometimes lzma -6 through -9 give slightly larger compressed files than -5, which means the extra time used to compress at those levels is wasted. Furthermore, the compressed sizes from -5 are almost always within 1% of the compressed sizes from -9. Note that lzma’s default is -7.

    Another advantage of -5 is that it uses much less memory than -9 and a bit less than -7 (both for compressing and decompressing), which may be a concern for you if you’ll be extracting the archives on memory-constrained devices. See the “Memory requirements” section of http://tukaani.org/lzma/benchmarks.html for more details.

    If you want really fast compress times, then use lzma -1 or -2. I’ve found these give 20-30% larger files than lzma -9 (still 30% smaller than gzip at -9, though), but take 1/9th to 1/14th as much time to compress. lzma -9 is similar to -7 (the default) for compress time and compressed file size. Decompression time is similar for -1 through -9.

    I’ve done a fair bit of work with compression testing recently so if you have any questions, let me know.

  8. Pedant 10

    @ Owen: No, compression ratios between lzma and lzma2 are virtually indestinguishable. The differences are more in data structure – better corruption detection and the like.

Comments are closed.