Wiki Agenda Contact Version française

Huffman with two queues

Huffman coding with two queues instead of a priority queue.


Authors: Jean-Christophe Filliâtre

Topics: Algorithms

Tools: Why3

see also the index (by topic, by tool, by reference, by year)


Huffman coding with two queues (instead of a priority queue)

Huffman coding is an algorithm to build a prefix code given a frequency for each symbol. See https://en.wikipedia.org/wiki/Huffman_coding

Huffman coding is typically implemented with a priority queue. We pick up the two trees with the smallest frequencies frequencies, we build a new tree with the sum of the frequencies, we put it back in the priority queue, until there is a single tree.

However, when the initial list of frequencies is sorted, we can implement the algorithm in a simpler (and more efficient way), using two (regular) queues instead of a priority queue. See http://www.staff.science.uu.nl/~leeuw112/huffman.pdf

In the following, we implement and prove the core of this algorithm, using a sorted list of integers as input and its sum as output (we do not build Huffman trees here). The key here is to prove that the two queues remain sorted.

Author: Jean-Christophe FilliĆ¢tre (CNRS) Thanks to JudicaĆ«l Courant for pointing this paper out.

use int.Int
use seq.Seq
use seq.SortedInt
use seq.Sum

function last (s: seq int) : int = s[length s - 1]

lemma add_last:
  forall s: seq int, x.
  length s > 0 -> sorted s -> last s <= x -> sorted (snoc s x)
lemma sorted_tail:
  forall s. sorted s -> length s >= 1 -> sorted s[1 .. ]
lemma sorted_tail_tail:
  forall s. sorted s -> length s >= 2 -> sorted s[2 .. ]

let huffman_coding (s: seq int) : int
  requires { length s > 0 }
  requires { sorted s }
  ensures  { result = sum s }
= let ref x = s in
  let ref y = empty in
  while length x + length y >= 2 do
    invariant { length x + length y > 0 }
    invariant { sum x + sum y = sum s }
    invariant { sorted x } invariant { sorted y }
    invariant { length x >= 2 -> length y >= 1 -> last y <= x[0] + x[1] }
    invariant { length x >= 1 -> length y >= 2 -> last y <= x[0] + y[0] }
    invariant {                  length y >= 2 -> last y <= y[0] + y[1] }
    variant   { length x + length y }
    if length y = 0 then begin
      y <- snoc y (x[0] + x[1]);
      x <- x[2 .. ]
    end else if length x = 0 then begin
      y <- snoc y[2 .. ] (y[0] + y[1]);
    end else begin (* both non-empty *)
      if x[0] <= y[0] then
        if length x >= 2 && x[1] <= y[0] then begin
          y <- snoc y (x[0] + x[1]);
          x <- x[2 .. ]
        end else begin
          y <- snoc y[1 .. ] (x[0] + y[0]);
          x <- x[1 .. ]
        end
      else
        if length y >= 2 && y[1] <= x[0] then begin
          y <- snoc y[2 .. ] (y[0] + y[1]);
        end else begin
          y <- snoc y[1 .. ] (x[0] + y[0]);
          x <- x[1 .. ]
        end
    end
  done;
  if length x > 0 then x[0] else y[0]

download ZIP archive

Why3 Proof Results for Project "huffman_with_two_queues"

Theory "huffman_with_two_queues.Top": fully verified

ObligationsAlt-Ergo 2.3.3CVC4 1.8Z3 4.11.2
add_last0.02------
sorted_tail0.070.47---
sorted_tail_tail0.060.58---
VC for huffman_coding---------
split_vc
loop invariant init0.010.06---
loop invariant init0.020.08---
loop invariant init0.010.06---
loop invariant init0.020.08---
loop invariant init0.010.06---
loop invariant init0.010.06---
loop invariant init0.010.06---
precondition0.010.08---
loop variant decrease0.030.13---
loop invariant preservation0.020.08---
loop invariant preservation0.010.10---
loop invariant preservation0.020.10---
loop invariant preservation0.020.09---
loop invariant preservation3.08------
loop invariant preservation0.010.06---
loop invariant preservation0.010.06---
precondition0.010.07---
loop variant decrease0.030.12---
loop invariant preservation0.020.10---
loop invariant preservation0.020.10---
loop invariant preservation0.010.06---
loop invariant preservation---------
assert (sorted y1[2..])
asserted formula---------
assert (length y1 >= 2)
asserted formula0.010.07---
asserted formula---------
split_all_full
asserted formula0.020.10---
loop invariant preservation0.24------
loop invariant preservation0.010.06---
loop invariant preservation0.010.06---
loop invariant preservation2.461.48---
precondition0.010.06---
loop variant decrease0.040.13---
loop invariant preservation0.020.10---
loop invariant preservation0.020.10---
loop invariant preservation0.020.10---
loop invariant preservation0.020.10---
loop invariant preservation3.13------
loop invariant preservation------0.31
loop invariant preservation0.420.32---
precondition0.010.09---
precondition0.010.09---
loop variant decrease0.090.14---
loop invariant preservation0.020.10---
loop invariant preservation---------
split_all_full
loop invariant preservation---------
inline_goal
loop invariant preservation0.030.11---
loop invariant preservation0.020.10---
loop invariant preservation---------
split_all_full
loop invariant preservation1.332.17---
loop invariant preservation0.94------
loop invariant preservation0.682.55---
loop invariant preservation2.652.88---
precondition0.010.06---
loop variant decrease0.040.12---
loop invariant preservation0.020.10---
loop invariant preservation---------
split_all_full
loop invariant preservation0.020.09---
loop invariant preservation0.010.06---
loop invariant preservation---------
split_all_full
loop invariant preservation0.30------
loop invariant preservation0.31------
loop invariant preservation1.601.47---
loop invariant preservation2.541.45---
precondition0.010.09---
precondition0.020.09---
loop variant decrease0.100.13---
loop invariant preservation0.020.09---
loop invariant preservation---------
split_all_full
loop invariant preservation0.020.09---
loop invariant preservation0.020.08---
loop invariant preservation---------
split_all_full
loop invariant preservation0.531.63---
loop invariant preservation2.33------
loop invariant preservation0.622.22---
loop invariant preservation1.19------
postcondition0.190.14---