Class DiffXKumarRangan
- All Implemented Interfaces:
DiffXAlgorithm
Implementation note: this algorithm effectively detects the correct changes in the sequences, but suffers from two main problems:
- When the events are formatted directly from reading the matrix, the XML is not
necessarily well-formed, this occurs mostly when some elements are swapped, because
the closing tags will not necessarily reported in an order that allows the XML to
be well-formed.
Using theSmartXMLFormatterhelps in solving the problem as it maintains a stack of the elements that are being written and actually ignores the name of the closing element, so all the elements are closed properly.
For S. Kiran Kumar and C. Pandu Rangan. A linear space algorithm for the LCS problem, Acta Informatica. Volume 24 , Issue 3 (June 1987); Copyright Springer-Verlag 1987
This class reuses portions of code originally written by Mikko Koivisto and Tuomo Saarni.
- Version:
- 9 March 2005
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final booleanSet totrueto show debug info.private DiffXFormatterThe formatter to use for the write diff function.private intA counter for the index of the second sequence when generating the diff.private intThe length of the LCS.private int[]private int[]private int[]private intGlobal integer variable needed in the computation of the LCS.private int[]private int[]private intGlobal integer variable needed in the computation of the LCS.Fields inherited from class com.topologi.diffx.algorithm.DiffXAlgorithmBase
length1, length2, sequence1, sequence2 -
Constructor Summary
ConstructorsConstructorDescriptionDiffXKumarRangan(EventSequence seq0, EventSequence seq1) Creates a new DiffXAlgorithmBase. -
Method Summary
Modifier and TypeMethodDescriptionprivate intCalculates the LCS length.private int[]calMid(int start1, int end1, int start2, int end2, int m, int n, int sign, int waste) Uses integer arrays to keep track where the longest common subsequence that can be found.private static voidcopyUpTo(int[] a, int[] b, int len) Copies the first array into the second one up to the specified index (included).private voidfillOne(int start1, int end1, int start2, int end2, int m, int n, int sign) This is used to find the index from where the longest common subsequence so far can be found.private voidgenerateLCS(int start1, int end1, int start2, int end2, int m, int n, int lcs) Computes the LCS and returns it in character array.private voidgetLCSMinimumWaste(int start1, int end1, int start2, int end2, int m, int n, int lcs) Computes the longest common subsequence for the specified boundaries when the waste is (strictly) less than 2 events.private voidgetLCSMoreWaste(int start1, int end1, int start2, int end2, int m, int n, int lcs) Computes the longest common subsequence for the specified boundaries when the waste is more than 1 character.private voidinit()Initialises the state variables.intlength()Calculates the length of LCS and returns it.private voidprintState(int f) Prints the state of this object, that is the values of all of the state variables toSystem.err.voidprocess(DiffXFormatter formatter) Writes the diff sequence using the specified formatter.private voidwriteDeleted(int jSeq2) Write the deleted events to the formatter.Methods inherited from class com.topologi.diffx.algorithm.DiffXAlgorithmBase
getFirstSequence, getSecondSequence
-
Field Details
-
DEBUG
private static final boolean DEBUGSet totrueto show debug info.- See Also:
-
R1
private int[] R1 -
R2
private int[] R2 -
LL
private int[] LL -
LL1
private int[] LL1 -
LL2
private int[] LL2 -
R
private int RGlobal integer variable needed in the computation of the LCS. -
S
private int SGlobal integer variable needed in the computation of the LCS. -
iSeq2
private int iSeq2A counter for the index of the second sequence when generating the diff. -
length
private int lengthThe length of the LCS. -
df
The formatter to use for the write diff function.
-
-
Constructor Details
-
DiffXKumarRangan
Creates a new DiffXAlgorithmBase.- Parameters:
seq0- The first sequence to compare.seq1- The second sequence to compare.
-
-
Method Details
-
length
public int length()Calculates the length of LCS and returns it.If the length is calculated already it'll not be calculated repeatedly.
This algorithm starts from the length of the first sequence as the maximum possible LCS and reduces the length for every difference with the second sequence.
The time complexity is O(n(m-p)) and the space complexity is O(n+m).
- Returns:
- The length of LCS
-
process
Writes the diff sequence using the specified formatter.- Parameters:
formatter- The formatter that will handle the output.- Throws:
IOException- If thrown by the formatter.
-
init
private void init()Initialises the state variables. -
calculateLength
private int calculateLength()Calculates the LCS length.- Returns:
- The LCS length.
-
fillOne
private void fillOne(int start1, int end1, int start2, int end2, int m, int n, int sign) This is used to find the index from where the longest common subsequence so far can be found.The time complexity is O(n+m) and the space complexity is O(n).
- Parameters:
start1- The start index of the first sequence.end1- The last index of the first sequence.start2- The start index of the second sequence.end2- The last index of the second sequence.m- The length of the first sequence.n- The length of the second sequence.sign- This is used to mark wether to start from the beginning of the string or from the end of the string.
-
calMid
private int[] calMid(int start1, int end1, int start2, int end2, int m, int n, int sign, int waste) Uses integer arrays to keep track where the longest common subsequence that can be found.The time complexity is O(n(waste+1)) and the space complexity is O(n+m).
- Parameters:
start1- The start index of the first sequence.end1- The last index of the first sequence.start2- The start index of the second sequence.end2- The last index of the second sequence.m- The length of the first sequence.n- The length of the second sequence.sign- This is used to mark wether to start from the beginning of the string or from the end of the string.waste- The length of characters not included in the LCS between indexes start1 and end1. Similarly between indexes start2 and end2.- Returns:
- Integer array consisting of the ???.
-
generateLCS
private void generateLCS(int start1, int end1, int start2, int end2, int m, int n, int lcs) throws IOException Computes the LCS and returns it in character array. The time complexity is O(n(m-p)) and the space complexity is O(n+m).- Parameters:
start1- The start index of the first sequence.end1- The last index of the first sequence.start2- The start index of the first sequence.end2- The last index of the first sequence.m- The length of the first sequence.n- The length of the second sequence.lcs- The length of LCS between indexes start1 and end1. Similarly between indexes b_start and b-loppu. (???)- Throws:
IOException- If thrown by the formatter
-
getLCSMinimumWaste
private void getLCSMinimumWaste(int start1, int end1, int start2, int end2, int m, int n, int lcs) throws IOException Computes the longest common subsequence for the specified boundaries when the waste is (strictly) less than 2 events.This method is iterative; NOT recursive.
- Parameters:
start1- The start 0-based index of the first sequence.end1- The last 0-based index of the first sequence.start2- The start 0-based index of the second sequence.end2- The last 0-based index of the second sequence.m- The length of the first sequence.n- The length of the second sequence.lcs- The length of LCS between indexes start1 and end1.- Throws:
IOException- If thrown by the formatter.
-
getLCSMoreWaste
private void getLCSMoreWaste(int start1, int end1, int start2, int end2, int m, int n, int lcs) throws IOException Computes the longest common subsequence for the specified boundaries when the waste is more than 1 character.This method is recursive and will process ech subsequence with the LCS algorithm.
- Parameters:
start1- The start 0-based index of the first sequence.end1- The last 0-based index of the first sequence.start2- The start 0-based index of the second sequence.end2- The last 0-based index of the second sequence.m- The length of the first sequence.n- The length of the second sequence.lcs- The length of LCS between indexes start1 and end1.- Throws:
IOException- If thrown by the formatter.
-
writeDeleted
Write the deleted events to the formatter.- Parameters:
jSeq2- The index of the LL array for the next event of the second sequence.- Throws:
IOException- If thrown by the formatter.
-
printState
private void printState(int f) Prints the state of this object, that is the values of all of the state variables toSystem.err. This is for debugging purposes only.- Parameters:
f- The flags
-
copyUpTo
private static void copyUpTo(int[] a, int[] b, int len) Copies the first array into the second one up to the specified index (included).- Parameters:
a- The first array.b- The second array.len- The 0-based index of the last copied value.
-