VerifyThis @ FM 2012, problem 1

Longest Repeated Substring

Authors: Claude Marché / François Bobot

Topics: Array Data Structure / Ghost code / Sorting Algorithms

Tools: Why3

References: VerifyThis @ FM2012

see also the index (by topic, by tool, by reference, by year)

VerifyThis@fm2012 competition, Challenge 1: Longest Repeated Substring

The following is the original description of the verification task, reproduced verbatim. A reference implementation in Java (based on code by Robert Sedgewick and Kevin Wayne) was also given. The Why3 implementation below follows the Java implementation as faithfully as possible.

Longest Common Prefix (LCP) - 45 minutes

VERIFICATION TASK
-----------------

Longest Common Prefix (LCP) is an algorithm used for text querying. In
the following, we model text as an integer array. You may use other
representations (e.g., Java Strings), if your system supports them.

LCP can be implemented with the pseudocode given below. Formalize and
verify the following informal specification for LCP.

Input:  an integer array a, and two indices x and y into this array
Output: length of the longest common prefix of the subarrays of a
        starting at x and y respectively.

Pseudocode:

    int lcp(int[] a, int x, int y) {
        int l = 0;
        while (x+l<a.length && y+l<a.length && a[x+l]==a[y+l]) {
            l++;
        }
        return l;
    }


ADVANCED
========

BACKGROUND
----------

Together with a suffix array, LCP can be used to solve interesting text
problems, such as finding the longest repeated substring (LRS) in a text.

A suffix array (for a given text) is an array of all suffixes of the
text. For the text [7,8,8,6], the suffix array is
[[7,8,8,6],
   [8,8,6],
     [8,6],
       [6]]

Typically, the suffixes are not stored explicitly as above but
represented as pointers into the original text. The suffixes in a suffix
array  are sorted in lexicographical order. This way, occurrences of
repeated substrings in the original text are neighbors in the suffix
array.

For the above, example (assuming pointers are 0-based integers), the
sorted suffix array is:

[3,0,2,1]

VERIFICATION TASK
-----------------

The attached Java code contains an implementation of a suffix array
(SuffixArray.java), consisting essentially of a lexicographical
comparison on arrays, a sorting routine, and LCP.

The client code (LRS.java) uses these to solve the LRS problem. Verify
that it does so correctly.

First module: longest common prefix

it corresponds to the basic part of the challenge

module LCP

use export int.Int
use array.Array

predicate is_common_prefix (a:array int) (x y l:int) =
  0 <= l /\ x+l <= a.length /\ y+l <= a.length /\
  (forall i:int. 0 <= i < l -> a[x+i] = a[y+i])

is_common_prefix a x y l is true when the prefixes of length l at respective positions x and y in array a are identical. In other words, the array parts a[x..x+l-1] and a[y..y+l-1] are equal

lemma not_common_prefix_if_last_char_are_different:
  forall a:array int, x y:int, l:int.
    0 <= l /\ x+l < a.length /\ y+l < a.length /\ a[x+l] <> a[y+l] ->
      not is_common_prefix a x y (l+1)

This lemma helps for the proof of lcp (but is not mandatory) and is needed later (for le_trans)

predicate is_longest_common_prefix (a:array int) (x y l:int) =
  is_common_prefix a x y l /\
  forall m:int. l < m -> not (is_common_prefix a x y m)

is_longest_common_prefix a x y l is true when l is the maximal length such that prefixes at positions x and y in array a are identical.

This lemma helps to proving lcp (but again is not mandatory), and is needed for proving lcp_sym and le_trans lemmas, and the post-condition of compare function in the "absurd" case

lemma longest_common_prefix_succ:
  forall a:array int, x y l:int.
    is_common_prefix a x y l /\ not (is_common_prefix a x y (l+1)) ->
    is_longest_common_prefix a x y l

use export ref.Refint

let lcp (a:array int) (x y:int) : int
  requires { 0 <= x <= a.length }
  requires { 0 <= y <= a.length }
  ensures { is_longest_common_prefix a x y result }
  = let n = a.length in
    let l = ref 0 in
    while x + !l < n && y + !l < n && a[x + !l] = a[y + !l] do
      invariant { is_common_prefix a x y !l }
      variant { n - !l }
      incr l
    done;
    !l

the first function, that computes the longest common prefix

end

module LCP_test

test harness for lcp

use array.Array
use LCP

(*
let test () =
  let arr = Array.make 4 0 in
  arr[0]<-1; arr[1]<-2; arr[2]<-2; arr[3]<-5;
  let x = lcp arr 1 2 in
  assert { is_common_prefix arr 1 2 1};
  assert { not is_common_prefix arr 1 2 2};
  check { x = 1 };
  (* int[] brr = {1,2,3,5}; *)
  let brr = Array.make 4 0 in
  brr[0]<-1; brr[1]<-2; brr[2]<-3; brr[3]<-5;
  let x = lcp brr 1 2 in
  assert { is_common_prefix brr 1 2 0};
  assert { not is_common_prefix brr 1 2 1};
  check { x = 0 };
  (* int[] crr = {1,2,3,5}; *)
  let crr = Array.make 4 0 in
  crr[0]<-1; crr[1]<-2; crr[2]<-3; crr[3]<-5;
  let x = lcp crr 2 3 in
  assert { is_common_prefix crr 2 3 0};
  assert { not is_common_prefix crr 2 3 1};
  check { x = 0 };
  (* int[] drr = {1,2,3,3}; *)
  let drr = Array.make 4 0 in
  drr[0]<-1; drr[1]<-2; drr[2]<-3; drr[3]<-3;
  let x = lcp drr 2 3 in
  assert { is_common_prefix drr 2 3 1};
  assert { not is_common_prefix drr 2 3 2};
  check {x = 1}
*)

  exception BenchFailure

  let bench () raises { BenchFailure -> true } =
    let arr = Array.make 4 0 in
    arr[0]<-1; arr[1]<-2; arr[2]<-2; arr[3]<-5;
    let x = lcp arr 1 2 in
    if x <> 1 then raise BenchFailure;
    x

end

module SuffixSort

Second module: sorting suffixes

  use int.Int
  use ref.Refint
  use array.Array
  use LCP

  predicate lt (a : array int) (x y:int) =
     let n = a.length in
     0 <= x <= n /\ 0 <= y <= n /\
     exists l:int. LCP.is_common_prefix a x y l /\
     (y+l < n /\ (x+l = n \/ a[x+l] < a[y+l]))

order by prefix

let compare (a:array int) (x y:int) : int
  requires { 0 <= x <= a.length }
  requires { 0 <= y <= a.length }
  ensures { result = 0 -> x = y }
  ensures { result < 0 -> lt a x y }
  ensures { result > 0 -> lt a y x }
  =
  if x = y then 0 else
  let n = a.length in
  let l = LCP.lcp a x y in
  if x + l = n then -1 else
  if y + l = n then 1 else
  if a[x + l] < a[y + l] then -1 else
  if a[x + l] > a[y + l] then 1 else
  absurd

comparison function, that decides the order of prefixes

  use map.MapInjection as Inj

range a is true when for each i, a[i] is between 0 and a.length-1

  predicate range (a:array int) = Inj.range a.elts a.length

  use array.ArrayPermut

for the permut predicate (permutation of elements)

  predicate le (a : array int) (x y:int) = x = y \/ lt a x y

  (* needed for le_trans (for cases x = y /\ le y z and le x y /\ y = z) *)
  lemma lcp_same_index :
    forall a:array int, x:int.
      0 <= x <= a.length -> LCP.is_longest_common_prefix a x x (a.length - x)

  lemma le_trans : forall a:array int, x y z:int.
    le a x y /\ le a y z -> le a x z

  use map.Map as Map

spec of sorted in increasing prefix order

  predicate sorted_sub (a:array int) (data:Map.map int int) (l u:int) =
    forall i1 i2 : int. l <= i1 <= i2 < u ->
      le a (Map.get data i1) (Map.get data i2)

  predicate sorted (a : array int) (data:array int) =
    sorted_sub a data.elts 0 data.length

let sort (a:array int) (data : array int)
  requires { data.length = a.length }
  requires { range data }
  ensures { sorted a data }
  ensures { permut_all (old data) data }
  =
   for i = 0 to data.length - 1 do
     invariant { permut_all (old data) data   }
     invariant { sorted_sub a data.elts 0 i  }
     invariant { range data }
     let j = ref i in
     while !j > 0 && compare a data[!j-1] data[!j] > 0 do
       invariant { 0 <= !j <= i }
       invariant { range data }
       invariant { permut_all (old data) data   }
       invariant { sorted_sub a data.elts 0 !j }
       invariant { sorted_sub a data.elts !j (i+1) }
       invariant { forall k1 k2:int. 0 <= k1 < !j /\ !j+1 <= k2 <= i ->
                     le a data[k1] data[k2] }
       variant { !j }
       label L in
       let b = !j - 1 in
       let t = data[!j] in
       data[!j] <- data[b];
       data[b] <- t;
       assert { exchange (data at L) data (!j-1) !j };
       decr j
     done;
     (* needed to prove the invariant `sorted_sub a data.elts 0 i` *)
     assert { !j > 0 -> le a data[!j-1] data[!j] }
   done

end

module SuffixSort_test

  use int.Int
  use array.Array
  use LCP
  use SuffixSort

  exception BenchFailure

  let bench () raises { BenchFailure -> true } =
    let arr = Array.make 4 0 in
    (* [7,8,8,6] *)
    arr[0]<-7; arr[1]<-8; arr[2]<-8; arr[3]<-6;
    let suf = Array.make 4 0 in
    for i = 0 to 3 do
      invariant { forall j:int. 0 <= j < i -> suf[j] = j }
      suf[i] <- i
    done;
    SuffixSort.sort arr suf;
    (* should be [3,0,2,1] *)
    if suf[0] <> 3 then raise BenchFailure;
    if suf[1] <> 0 then raise BenchFailure;
    if suf[2] <> 2 then raise BenchFailure;
    if suf[3] <> 1 then raise BenchFailure;
    suf

end

Third module: Suffix Array Data Structure

module SuffixArray

use int.Int
use array.Array
use LCP
use SuffixSort
use map.Map as Map
use map.MapInjection as Inj

predicate permutation (m:Map.map int int) (u : int) =
  Inj.range m u /\ Inj.injective m u

type suffixArray = {
    values : array int;
    suffixes : array int;
}
invariant { values.length = suffixes.length /\
  permutation suffixes.elts suffixes.length /\
  SuffixSort.sorted values suffixes }
by { values = make 0 0; suffixes = make 0 0 }

let select (s:suffixArray) (i:int) : int
  requires { 0 <= i < s.values.length }
  ensures { result = s.suffixes[i] }
  = s.suffixes[i]

use array.ArrayPermut

lemma permut_permutation : forall a1 a2:array int.
  permut_all a1 a2 /\ permutation a1.elts a1.length ->
    permutation a2.elts a2.length

needed to establish invariant in function create

let create (a:array int) : suffixArray
  ensures { result.values = a }
= let n = a.length in
  let suf = Array.make n 0 in
  for i = 0 to n-1 do
    invariant { forall j:int. 0 <= j < i -> suf[j] = j }
    suf[i] <- i done;
  SuffixSort.sort a suf;
  { values = a; suffixes = suf }

constructor of suffixArray structure

let lcp (s:suffixArray) (i:int) : int
  requires { 0 < i < s.values.length }
  ensures { LCP.is_longest_common_prefix s.values
              s.suffixes[i-1] s.suffixes[i] result }
= LCP.lcp s.values s.suffixes[i] s.suffixes[i-1]

end

module SuffixArray_test

use int.Int
use array.Array
use SuffixArray

(*
let test () =
  let arr = Array.make 4 0 in
  arr[0]<-1; arr[1]<-2; arr[2]<-2; arr[3]<-5;
  let sa = create arr in
  assert { sa.suffixes[0] = 0 };
  assert { sa.suffixes[1] = 1 };
  assert { sa.suffixes[2] = 2 };
  assert { sa.suffixes[3] = 3 };
  let x = lcp sa 1 in
  check {x = 0};
  (* int[] brr = {1,2,3,5}; *)
  let brr = Array.make 4 0 in
  brr[0]<-1; brr[1]<-2; brr[2]<-3; brr[3]<-5;
  let sa = create brr in
  let x = lcp sa 1 in
  check {x = 0 (* TODO *)};
  (* int[] crr = {1,2,3,5}; *)
  let crr = Array.make 4 0 in
  crr[0]<-1; crr[1]<-2; crr[2]<-3; crr[3]<-5;
  let sa = create crr in
  let x = lcp sa 2 in
  check {x = 0 (* TODO *)};
  (* int[] drr = {1,2,3,3}; *)
  let drr = Array.make 4 0 in
  drr[0]<-1; drr[1]<-2; drr[2]<-3; drr[3]<-3;
  let sa = create drr in
  let x = lcp sa 2 in
  check {x = 0 (* TODO *)}
*)

  exception BenchFailure

  let bench () raises { BenchFailure -> true } =
    let arr = Array.make 4 0 in
    (* [7,8,8,6] *)
    arr[0]<-7; arr[1]<-8; arr[2]<-8; arr[3]<-6;
    let sa = create arr in
    if sa.suffixes[0] <> 3 then raise BenchFailure;
    if sa.suffixes[1] <> 0 then raise BenchFailure;
    if sa.suffixes[2] <> 2 then raise BenchFailure;
    if sa.suffixes[3] <> 1 then raise BenchFailure;
    sa

end

Fourth module: Longest Repeated Substring

module LRS

  use int.Int
  use ref.Ref
  use array.Array
  use map.MapInjection as Inj
  use LCP
  use SuffixSort
  use SuffixArray

additional properties relating le and longest_common_prefix

(* needed for `lrs` function, last before last assert *)
lemma lcp_sym :
  forall a:array int, x y l:int.
     0 <= x <= a.length /\ 0 <= y <= a.length ->
       LCP.is_longest_common_prefix a x y l ->
       LCP.is_longest_common_prefix a y x l

lemma le_le_common_prefix:
    forall a:array int, x y z l:int.
     SuffixSort.le a x y /\ SuffixSort.le a y z ->
     LCP.is_common_prefix a x z l -> LCP.is_common_prefix a y z l

allows CVC to prove the next lemma

  lemma le_le_longest_common_prefix:
    forall a:array int, x y z l m :int.
     SuffixSort.le a x y /\ SuffixSort.le a y z ->
     LCP.is_longest_common_prefix a x z l /\
     LCP.is_longest_common_prefix a y z m
       -> l <= m

proved by Alt-Ergo and CVC. But only by Alt-Ergo if previous lemma is removed

  val solStart : ref int
  val solLength : ref int
  val ghost solStart2 : ref int

  let lrs (a:array int) : unit
    requires { a.length > 0 }
    ensures { 0 <= !solLength <= a.length }
    ensures { 0 <= !solStart <= a.length }
    ensures { 0 <= !solStart2 <= a.length /\ !solStart <> !solStart2 /\
              LCP.is_longest_common_prefix a !solStart !solStart2 !solLength }
    ensures { forall x y l:int.
                0 <= x < y < a.length /\
                LCP.is_longest_common_prefix a x y l ->
                !solLength >= l }
  =
    let sa = SuffixArray.create a in
    solStart := 0;
    solLength := 0;
    solStart2 := a.length;
    for i=1 to a.length - 1 do
      invariant { 0 <= !solLength <= a.length }
      invariant { 0 <= !solStart <= a.length }
      invariant {
           0 <= !solStart2 <= a.length /\ !solStart <> !solStart2 /\
           LCP.is_longest_common_prefix a !solStart !solStart2 !solLength }
      invariant { forall j k l:int.
                  0 <= j < k < i /\
                  LCP.is_longest_common_prefix a
                    sa.SuffixArray.suffixes[j] sa.SuffixArray.suffixes[k] l ->
                  !solLength >= l }
      let l = SuffixArray.lcp sa i in
      if l > !solLength then begin
         solStart := SuffixArray.select sa i;
         solStart2 := SuffixArray.select sa (i-1);
         solLength := l
      end
    done;

the following assert needs lcp_sym lemma

    assert { forall j k l:int.
                0 <= j < a.length /\ 0 <= k < a.length /\ j <> k /\
                 LCP.is_longest_common_prefix a
                 sa.SuffixArray.suffixes[j] sa.SuffixArray.suffixes[k] l ->
              !solLength >= l};
    (* we state explicitly that sa.suffixes is surjective *)
    assert { forall x y:int.
                0 <= x < y < a.length ->
                exists j k : int.
                  0 <= j < a.length /\ 0 <= k < a.length /\ j <> k /\
                  x = sa.SuffixArray.suffixes[j] /\
                  y = sa.SuffixArray.suffixes[k] };
    ()

end

module LRS_test

  use int.Int
  use array.Array
  use ref.Ref
  use LRS

  exception BenchFailure

  let bench () raises { BenchFailure -> true } =
    let arr = Array.make 4 0 in
    (* [7,8,8,6] *)
    arr[0]<-7; arr[1]<-8; arr[2]<-8; arr[3]<-6;
    lrs arr;
    if !solStart <> 1 then raise BenchFailure;
    if !solLength <> 1 then raise BenchFailure;
    !solStart, !solLength

end

Why3 Proof Results for Project "verifythis_fm2012_LRS"

Theory "verifythis_fm2012_LRS.LCP": fully verified

Obligations		Alt-Ergo 2.0.0	CVC4 1.5	Z3 4.11.2
not_common_prefix_if_last_char_are_different		---	---	0.01
longest_common_prefix_succ		0.00	---	---
VC for lcp		---	---	---
split_goal_right
	loop invariant init	---	0.01	---
	index in array bounds	---	0.01	---
	index in array bounds	---	0.01	---
	loop variant decrease	---	0.02	---
	loop invariant preservation	---	---	0.88
	postcondition	---	---	0.01

Theory "verifythis_fm2012_LRS.LCP_test": fully verified

Obligations	CVC4 1.5
VC for bench	0.02

Theory "verifythis_fm2012_LRS.SuffixSort": fully verified

Obligations		Alt-Ergo 2.0.0	CVC4 1.5
VC for compare		---	---
split_goal_right
	precondition	---	0.01
	precondition	---	0.01
	index in array bounds	---	0.01
	index in array bounds	---	0.01
	index in array bounds	---	0.02
	index in array bounds	---	0.01
	unreachable point	0.01	---
	postcondition	---	0.00
	postcondition	---	0.02
	postcondition	0.01	---
lcp_same_index		---	0.04
le_trans		0.08	---
VC for sort		---	---
split_goal_right
	loop invariant init	---	0.04
	loop invariant init	---	0.04
	loop invariant init	---	0.04
	loop invariant init	---	0.02
	loop invariant init	---	0.02
	loop invariant init	---	0.03
	loop invariant init	---	0.03
	loop invariant init	---	0.07
	loop invariant init	---	0.03
	index in array bounds	---	0.05
	index in array bounds	---	0.07
	precondition	---	0.07
	precondition	---	0.08
	index in array bounds	---	0.04
	index in array bounds	---	0.05
	index in array bounds	---	0.04
	index in array bounds	---	0.05
	assertion	0.01	---
	loop variant decrease	---	0.02
	loop invariant preservation	---	0.06
	loop invariant preservation	0.06	---
	loop invariant preservation	---	0.07
	loop invariant preservation	---	0.20
	loop invariant preservation	0.68	---
	loop invariant preservation	0.56	---
	assertion	---	0.05
	loop invariant preservation	---	0.03
	loop invariant preservation	0.13	---
	loop invariant preservation	---	0.03
	postcondition	---	0.04
	postcondition	---	0.04
	out of loop bounds	---	0.04

Theory "verifythis_fm2012_LRS.SuffixSort_test": fully verified

Obligations	CVC4 1.5
VC for bench	0.34

Theory "verifythis_fm2012_LRS.SuffixArray": fully verified

Obligations		Alt-Ergo 2.0.0	CVC4 1.5	Z3 4.11.2
VC for suffixArray		---	0.04	---
VC for select		---	0.04	---
permut_permutation		---	---	0.03
VC for create		---	---	---
split_goal_right
	array creation size	---	0.03	---
	loop invariant init	---	0.02	---
	index in array bounds	---	0.03	---
	loop invariant preservation	0.02	---	---
	precondition	---	0.05	---
	precondition	---	0.06	---
	precondition	---	---	0.04
	postcondition	---	0.02	---
	out of loop bounds	---	0.04	---
VC for lcp		---	---	---
split_goal_right
	index in array bounds	---	0.05	---
	index in array bounds	---	0.04	---
	precondition	---	0.08	---
	precondition	---	0.07	---
	postcondition	0.02	---	---

Theory "verifythis_fm2012_LRS.SuffixArray_test": fully verified

Obligations	CVC4 1.5
VC for bench	0.05

Theory "verifythis_fm2012_LRS.LRS": fully verified

Obligations							Alt-Ergo 2.0.0	CVC4 1.5	CVC4 1.6
lcp_sym							0.01	---	---
le_le_common_prefix							0.05	---	---
le_le_longest_common_prefix							---	0.04	---
VC for lrs							---	---	---
split_goal_right
	loop invariant init						---	0.03	---
	loop invariant init						---	0.02	---
	loop invariant init						---	0.04	---
	loop invariant init						---	0.03	---
	precondition						---	0.04	---
	precondition						---	0.04	---
	precondition						---	0.04	---
	loop invariant preservation						---	0.08	---
	loop invariant preservation						---	0.08	---
	loop invariant preservation						---	0.08	---
	loop invariant preservation						---	0.34	---
	loop invariant preservation						---	0.03	---
	loop invariant preservation						---	0.03	---
	loop invariant preservation						---	0.04	---
	loop invariant preservation						---	0.36	---
	assertion						---	0.08	---
	assertion						---	---	---
	introduce_premises
		assertion					---	---	---
		assert (surjective (suffixes sa).elts a.length)
			asserted formula				---	0.06	---
			assertion				---	---	---
			unfold surjective in h
				assertion			---	---	---
				instantiate h x
					assertion		---	---	---
					instantiate h y
						assertion	---	0.06	---
	postcondition						---	0.04	---
	postcondition						---	0.04	---
	postcondition						---	0.04	---
	postcondition						---	---	---
	introduce_premises
		postcondition					---	---	---
		instantiate Assert x,y
			postcondition				---	---	0.06
	out of loop bounds						---	0.03	---

Theory "verifythis_fm2012_LRS.LRS_test": fully verified

Obligations	CVC4 1.5
VC for bench	0.04

Toccata

Formally Verified Programs, Certified Tools and Numerical Computations