Matrix Aligner


Comparison, Alignment of PSSMs
Update 07-26-2006

Welcome to MatAlign! (a.k.a. CompareTwo)

Version V4A is now available.
NOTE:
NOTES ON THIS UPDATE:
MAIN DIFFERENCE FROM V2: batch process of many matrix comparisons.
This version allows comparison among many matrices, and fix some bugs.

Credit should also be given to Alok Saldanha [alok@caltech.edu] who did
lots of initial experimenting with this idea.

Strategy: user now provides one or two files that contains file names
of matrices to compare. The first line of the file always contains
the directory information of the matrices (therefore all matrices should
be under the same directory). If one list is provided, all pair-wise
comparison will be performed; if two lists are provided, all matrices
in list file 1 will be compared to all matrices in list file 2.

I also did some software engineering to speed up the comparison.

In this version, all matrices should have the format of a count vector.
I will allow more variations to the format in later modifications.

To illustrate the speed gain, I compared all TRANSFAC matrices against
themselves (636 matrices, 201930 comparisons). It took MegaMatAlign about
30 seconds to complete the comparison, while if I run a perl script that calls
MatAlign that many times, it took almost an hour. So, there is at least
a 50~100x gain in speed.

I should also point out that another trick for speeding up the process
is to calculate a ALLR lookup table upfront. However, since pseudocount
treatment is different between the lookup table and real calculation,
the ALLR scores are slightly different. I don't think this will cause
significant different, but have to watch out. I may turn this off. Then
it would run for twice amount the time, but the logrithm calculation
would be more consistent.

  • Executable for linux: matalign-v4a
  • Source Code: matalign-v4a.tar.gz
  • README/help page: README
  • A sample matrix list file: sample_list
  • Matrix files contained in sample_list:
    GCN4_01.matrix
    GCN4_C.matrix
    HSF_01.matrix
    HSF_02.matrix
    HSF_03.matrix
    HSF_04.matrix
    HSF_05.matrix



  • Update 07-20-2006

    Welcome to MatAlign! (a.k.a. CompareTwo)

    Version V2B is now available.
    NOTE:
    MAIN DIFFERENCE FROM V2A: distance calculation.
    In previous versions, Dist is calculated with the following formula:
    Dist = ALLR(A:A) + ALLR(B:B) - 2*ALLR(A:B)
    Discussion with Ryan Christensen suggests that this formula should
    only apply to the aligned portion between A and B. for example, if
    A and B is quite similar, but A is longer than B, i.e., A contains B,
    then we probably want the program give a small distance. The previous
    versions will give a rather big distance simply because self ALLR of A
    is significantly larger.
    If Dist is calcualted based only on the aligned portion, it also poses
    a severe limitation. When two matrices are very dissimilar, the aligned
    part can be very small and results in a very small distance.

    In this version I provide two distance calculations, and these two
    must be considered together, (also together with ALLR) to determine
    the actual similarity of the two matrices. I'll leave it to the users
    to determine which parameter is better.

  • Executable for linux: matalign-v2b
  • Source Code: matalign-v2b.tar.gz
  • README/help page: README
  • A sample matrix file: matrix1
  • A sample matrix file: matrix2


  • Update 08-15-2005

    Welcome to MatAlign! (a.k.a. CompareTwo)

    Give it a try, tell you if you like it, don't like, how do
    you want me to improve it, etc. I know there's still some
    bugs, so check back for newer compiles.
    Good luck and have a nice day!

  • Executable for linux: matalign-v2a
  • Source Code: matalign.tar.gz
  • README/help page: README
  • A sample matrix file: matrix1
  • A sample matrix file: matrix2