Class DiffXKumarRangan

java.lang.Object
com.topologi.diffx.algorithm.DiffXAlgorithmBase
com.topologi.diffx.algorithm.DiffXKumarRangan
All Implemented Interfaces:
DiffXAlgorithm

public final class DiffXKumarRangan extends DiffXAlgorithmBase
Performs the diff comparison using an optimized version of the linear space algorithm of S.Kiran Kumar and C.Pandu Rangan.

Implementation note: this algorithm effectively detects the correct changes in the sequences, but suffers from two main problems:

  • When the events are formatted directly from reading the matrix, the XML is not necessarily well-formed, this occurs mostly when some elements are swapped, because the closing tags will not necessarily reported in an order that allows the XML to be well-formed.
    Using the SmartXMLFormatter helps in solving the problem as it maintains a stack of the elements that are being written and actually ignores the name of the closing element, so all the elements are closed properly.

For S. Kiran Kumar and C. Pandu Rangan. A linear space algorithm for the LCS problem, Acta Informatica. Volume 24 , Issue 3 (June 1987); Copyright Springer-Verlag 1987

This class reuses portions of code originally written by Mikko Koivisto and Tuomo Saarni.

http://dblp.uni-trier.de/rec/bibtex/journals/acta/KumarR87

http://users.utu.fi/~tuiisa/Java/

Version:
9 March 2005
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final boolean
    Set to true to show debug info.
    The formatter to use for the write diff function.
    private int
    A counter for the index of the second sequence when generating the diff.
    private int
    The length of the LCS.
    private int[]
     
    private int[]
     
    private int[]
     
    private int
    Global integer variable needed in the computation of the LCS.
    private int[]
     
    private int[]
     
    private int
    Global integer variable needed in the computation of the LCS.

    Fields inherited from class com.topologi.diffx.algorithm.DiffXAlgorithmBase

    length1, length2, sequence1, sequence2
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates a new DiffXAlgorithmBase.
  • Method Summary

    Modifier and Type
    Method
    Description
    private int
    Calculates the LCS length.
    private int[]
    calMid(int start1, int end1, int start2, int end2, int m, int n, int sign, int waste)
    Uses integer arrays to keep track where the longest common subsequence that can be found.
    private static void
    copyUpTo(int[] a, int[] b, int len)
    Copies the first array into the second one up to the specified index (included).
    private void
    fillOne(int start1, int end1, int start2, int end2, int m, int n, int sign)
    This is used to find the index from where the longest common subsequence so far can be found.
    private void
    generateLCS(int start1, int end1, int start2, int end2, int m, int n, int lcs)
    Computes the LCS and returns it in character array.
    private void
    getLCSMinimumWaste(int start1, int end1, int start2, int end2, int m, int n, int lcs)
    Computes the longest common subsequence for the specified boundaries when the waste is (strictly) less than 2 events.
    private void
    getLCSMoreWaste(int start1, int end1, int start2, int end2, int m, int n, int lcs)
    Computes the longest common subsequence for the specified boundaries when the waste is more than 1 character.
    private void
    Initialises the state variables.
    int
    Calculates the length of LCS and returns it.
    private void
    printState(int f)
    Prints the state of this object, that is the values of all of the state variables to System.err.
    void
    Writes the diff sequence using the specified formatter.
    private void
    writeDeleted(int jSeq2)
    Write the deleted events to the formatter.

    Methods inherited from class com.topologi.diffx.algorithm.DiffXAlgorithmBase

    getFirstSequence, getSecondSequence

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DEBUG

      private static final boolean DEBUG
      Set to true to show debug info.
      See Also:
    • R1

      private int[] R1
    • R2

      private int[] R2
    • LL

      private int[] LL
    • LL1

      private int[] LL1
    • LL2

      private int[] LL2
    • R

      private int R
      Global integer variable needed in the computation of the LCS.
    • S

      private int S
      Global integer variable needed in the computation of the LCS.
    • iSeq2

      private int iSeq2
      A counter for the index of the second sequence when generating the diff.
    • length

      private int length
      The length of the LCS.
    • df

      private DiffXFormatter df
      The formatter to use for the write diff function.
  • Constructor Details

    • DiffXKumarRangan

      public DiffXKumarRangan(EventSequence seq0, EventSequence seq1)
      Creates a new DiffXAlgorithmBase.
      Parameters:
      seq0 - The first sequence to compare.
      seq1 - The second sequence to compare.
  • Method Details

    • length

      public int length()
      Calculates the length of LCS and returns it.

      If the length is calculated already it'll not be calculated repeatedly.

      This algorithm starts from the length of the first sequence as the maximum possible LCS and reduces the length for every difference with the second sequence.

      The time complexity is O(n(m-p)) and the space complexity is O(n+m).

      Returns:
      The length of LCS
    • process

      public void process(DiffXFormatter formatter) throws IOException
      Writes the diff sequence using the specified formatter.
      Parameters:
      formatter - The formatter that will handle the output.
      Throws:
      IOException - If thrown by the formatter.
    • init

      private void init()
      Initialises the state variables.
    • calculateLength

      private int calculateLength()
      Calculates the LCS length.
      Returns:
      The LCS length.
    • fillOne

      private void fillOne(int start1, int end1, int start2, int end2, int m, int n, int sign)
      This is used to find the index from where the longest common subsequence so far can be found.

      The time complexity is O(n+m) and the space complexity is O(n).

      Parameters:
      start1 - The start index of the first sequence.
      end1 - The last index of the first sequence.
      start2 - The start index of the second sequence.
      end2 - The last index of the second sequence.
      m - The length of the first sequence.
      n - The length of the second sequence.
      sign - This is used to mark wether to start from the beginning of the string or from the end of the string.
    • calMid

      private int[] calMid(int start1, int end1, int start2, int end2, int m, int n, int sign, int waste)
      Uses integer arrays to keep track where the longest common subsequence that can be found.

      The time complexity is O(n(waste+1)) and the space complexity is O(n+m).

      Parameters:
      start1 - The start index of the first sequence.
      end1 - The last index of the first sequence.
      start2 - The start index of the second sequence.
      end2 - The last index of the second sequence.
      m - The length of the first sequence.
      n - The length of the second sequence.
      sign - This is used to mark wether to start from the beginning of the string or from the end of the string.
      waste - The length of characters not included in the LCS between indexes start1 and end1. Similarly between indexes start2 and end2.
      Returns:
      Integer array consisting of the ???.
    • generateLCS

      private void generateLCS(int start1, int end1, int start2, int end2, int m, int n, int lcs) throws IOException
      Computes the LCS and returns it in character array. The time complexity is O(n(m-p)) and the space complexity is O(n+m).
      Parameters:
      start1 - The start index of the first sequence.
      end1 - The last index of the first sequence.
      start2 - The start index of the first sequence.
      end2 - The last index of the first sequence.
      m - The length of the first sequence.
      n - The length of the second sequence.
      lcs - The length of LCS between indexes start1 and end1. Similarly between indexes b_start and b-loppu. (???)
      Throws:
      IOException - If thrown by the formatter
    • getLCSMinimumWaste

      private void getLCSMinimumWaste(int start1, int end1, int start2, int end2, int m, int n, int lcs) throws IOException
      Computes the longest common subsequence for the specified boundaries when the waste is (strictly) less than 2 events.

      This method is iterative; NOT recursive.

      Parameters:
      start1 - The start 0-based index of the first sequence.
      end1 - The last 0-based index of the first sequence.
      start2 - The start 0-based index of the second sequence.
      end2 - The last 0-based index of the second sequence.
      m - The length of the first sequence.
      n - The length of the second sequence.
      lcs - The length of LCS between indexes start1 and end1.
      Throws:
      IOException - If thrown by the formatter.
    • getLCSMoreWaste

      private void getLCSMoreWaste(int start1, int end1, int start2, int end2, int m, int n, int lcs) throws IOException
      Computes the longest common subsequence for the specified boundaries when the waste is more than 1 character.

      This method is recursive and will process ech subsequence with the LCS algorithm.

      Parameters:
      start1 - The start 0-based index of the first sequence.
      end1 - The last 0-based index of the first sequence.
      start2 - The start 0-based index of the second sequence.
      end2 - The last 0-based index of the second sequence.
      m - The length of the first sequence.
      n - The length of the second sequence.
      lcs - The length of LCS between indexes start1 and end1.
      Throws:
      IOException - If thrown by the formatter.
    • writeDeleted

      private void writeDeleted(int jSeq2) throws IOException
      Write the deleted events to the formatter.
      Parameters:
      jSeq2 - The index of the LL array for the next event of the second sequence.
      Throws:
      IOException - If thrown by the formatter.
    • printState

      private void printState(int f)
      Prints the state of this object, that is the values of all of the state variables to System.err. This is for debugging purposes only.
      Parameters:
      f - The flags
    • copyUpTo

      private static void copyUpTo(int[] a, int[] b, int len)
      Copies the first array into the second one up to the specified index (included).
      Parameters:
      a - The first array.
      b - The second array.
      len - The 0-based index of the last copied value.