Class GuanoAlgorithm

java.lang.Object
com.topologi.diffx.algorithm.GuanoAlgorithm
All Implemented Interfaces:
DiffXAlgorithm

public final class GuanoAlgorithm extends Object implements DiffXAlgorithm
A matrix-based algorithm using weighted events which produces correct results, but may require minor adjustments during formatting.

Implementation note: this algorithm effectively detects the correct changes in the sequences, but will not necessarily return events that can be serialised as well-formed XML as they stand.

Known problem in this implementation: elements that contain themselves tend to generate events that are harder to serialise as XML.

This class is said 'fit' because it will adapt the matrix to the sequences that it is being given in order to improve performance.

Note: The name of this class comes from a contracted version of the features of this algorithm, as explained below:

  • Weighted, each token is has a given weight;
  • Symmetrical, when possible, the algorithm will try to choose a path that is symmetrical in regards to the arrangement of the tokens;
  • Matrix, this class uses a matrix for its internal representation;
Version:
11 May 2010
  • Field Details

    • DEBUG

      private static final boolean DEBUG
      Set to true to show debug info.
      See Also:
    • sequence1

      private final EventSequence sequence1
      The first sequence of events to test.
    • sequence2

      private final EventSequence sequence2
      The second sequence of events to test.
    • length1

      private final int length1
      Length of the first sequence to compare.
    • length2

      private final int length2
      Length of the second sequence to compare.
    • matrix

      private transient Matrix matrix
      Matrix storing the paths.
    • estate

      private transient ElementState estate
      The state of the elements.
    • length

      private transient int length
      The length of the LCS.
  • Constructor Details

    • GuanoAlgorithm

      public GuanoAlgorithm(EventSequence seq0, EventSequence seq1)
      Creates a new DiffXAlgorithmBase.
      Parameters:
      seq0 - The first sequence to compare.
      seq1 - The second sequence to compare.
  • Method Details

    • length

      public int length()
      Returns the length of the longest common sequence.
      Specified by:
      length in interface DiffXAlgorithm
      Returns:
      the length of the longest common sequence.
    • process

      public void process(DiffXFormatter formatter) throws IOException
      Writes the diff sequence using the specified formatter.
      Specified by:
      process in interface DiffXAlgorithm
      Parameters:
      formatter - The formatter that will handle the output.
      Throws:
      IOException - If thrown by the formatter.
    • getFirstSequence

      public final EventSequence getFirstSequence()
      Description copied from interface: DiffXAlgorithm
      Returns the first sequence used for the diff-x comparison.
      Specified by:
      getFirstSequence in interface DiffXAlgorithm
      Returns:
      the first sequence used for the diff-x comparison.
      See Also:
    • getSecondSequence

      public final EventSequence getSecondSequence()
      Description copied from interface: DiffXAlgorithm
      Returns the second sequence used for the diff-x comparison.
      Specified by:
      getSecondSequence in interface DiffXAlgorithm
      Returns:
      the second sequence used for the diff-x comparison.
      See Also:
    • processEmpty

      private void processEmpty(DiffXFormatter formatter) throws IOException
      Writes the diff sequence using the specified formatter when one of the sequences is empty.

      The result becomes either only insertions (when the second sequence is empty) or deletions (when the first sequence is empty).

      Parameters:
      formatter - The formatter that will handle the output.
      Throws:
      IOException - If thrown by the formatter.
    • setupMatrix

      private static Matrix setupMatrix(EventSequence s1, EventSequence s2)
      Determines the most appropriate matrix to use.

      Calculates the maximum length of the shortest weighted path if both sequences are totally different, which corresponds to the sum of all the events.

      Parameters:
      s1 - The first sequence.
      s2 - The second sequence.
      Returns:
      The most appropriate matrix.
    • printLost

      private void printLost(int i, int j)
      Print information when the algorithm gets lost in the matrix, ie when it does not know which direction to follow.
      Parameters:
      i - The X position.
      j - The Y position.