Welcome to MatAlign! (a.k.a. CompareTwo)

*
Version V4A is now available.
NOTE:
NOTES ON THIS UPDATE:
MAIN DIFFERENCE FROM V2: batch process of many matrix comparisons.
This version allows comparison among many matrices, and fix some bugs.
Credit should also be given to Alok Saldanha [alok@caltech.edu] who did
lots of initial experimenting with this idea.
Strategy: user now provides one or two files that contains file names
of matrices to compare. The first line of the file always contains
the directory information of the matrices (therefore all matrices should
be under the same directory). If one list is provided, all pair-wise
comparison will be performed; if two lists are provided, all matrices
in list file 1 will be compared to all matrices in list file 2.
I also did some software engineering to speed up the comparison.
In this version, all matrices should have the format of a count vector.
I will allow more variations to the format in later modifications.
To illustrate the speed gain, I compared all TRANSFAC matrices against
themselves (636 matrices, 201930 comparisons). It took MegaMatAlign about
30 seconds to complete the comparison, while if I run a perl script that calls
MatAlign that many times, it took almost an hour. So, there is at least
a 50~100x gain in speed.
I should also point out that another trick for speeding up the process
is to calculate a ALLR lookup table upfront. However, since pseudocount
treatment is different between the lookup table and real calculation,
the ALLR scores are slightly different. I don't think this will cause
significant different, but have to watch out. I may turn this off. Then
it would run for twice amount the time, but the logrithm calculation
would be more consistent.
*

GCN4_01.matrix

GCN4_C.matrix

HSF_01.matrix

HSF_02.matrix

HSF_03.matrix

HSF_04.matrix

HSF_05.matrix

Welcome to MatAlign! (a.k.a. CompareTwo)

*
Version V2B is now available.
NOTE:
MAIN DIFFERENCE FROM V2A: distance calculation.
In previous versions, Dist is calculated with the following formula:
Dist = ALLR(A:A) + ALLR(B:B) - 2*ALLR(A:B)
Discussion with Ryan Christensen suggests that this formula should
only apply to the aligned portion between A and B. for example, if
A and B is quite similar, but A is longer than B, i.e., A contains B,
then we probably want the program give a small distance. The previous
versions will give a rather big distance simply because self ALLR of A
is significantly larger.
If Dist is calcualted based only on the aligned portion, it also poses
a severe limitation. When two matrices are very dissimilar, the aligned
part can be very small and results in a very small distance.
In this version I provide two distance calculations, and these two
must be considered together, (also together with ALLR) to determine
the actual similarity of the two matrices. I'll leave it to the users
to determine which parameter is better.
*

Welcome to MatAlign! (a.k.a. CompareTwo)

*
Give it a try, tell you if you like it, don't like, how do
you want me to improve it, etc. I know there's still some
bugs, so check back for newer compiles.
Good luck and have a nice day!
*