The Multi-Stage Matcher, Version 0.7

April 1, 2005

Satanjeev "Bano" Banerjee (satanjeev AT cmu.edu)
Alon Lavie (alavie AT cs.cmu.edu)

Carnegie Mellon University
Pittsburgh, PA, USA



1. Introduction
===============

This is software that takes two strings of space separated words as
input and aligns matching words between the two strings. Alignment is
done over several stages, where each stage uses different criteria to
find candidate matching tokens from the two strings to
align. Supported criteria are "exact", "porter_stem", "wn_stem" and
"wn_synonymy" (details below).



2. Code Organization
====================

The software is organized in modules. The overall algorithm is divided
into two main parts, the matching algorithm that returns candidate
token matches between tokens in the two strings, and the aligning
algorithm. There are several implemented matching algorithms, each in
a Perl module of its own: 

exact.pm:       Returns tokens from the two strings that are exact
		        matches of each other. 
porter_stem.pm: Returns tokens from the two strings that are matches
		        of each other after being stemmed using the Porter
		        stemmer. 
wn_stem.pm:     Same as porter_stem, but stemming is done using
		        WordNet.
wn_synonymy.pm: Returns for each token in the second string, the first
		        token (if any) going left to right in the first string
		        such that the two tokens share at least one synset in
		        WordNet.

Given candidate matches between tokens in the two strings, the
algorithm to actually construct an alignment between the two strings
is implemented in the perl module mStageMatcher.pm.

Program standAloneMatcher.pl includes the mStageMatcher.pm and uses it
to match and align two sentences contained in an input text file. This
program shows how to use mStageMatcher.pm from inside a program.



3. How to Run standAloneMatcher.pl
==================================

One or more of the matching modules may be used in any order to run
the program. To run the program with only exact match, run it like so:

perl standAloneMatcher.pl input.txt exact

The input file (input.txt in the above example) should have the two
strings of words, each on a line of its own - the second string will
be aligned to the first one.


The output format is as follows: 

Line 1: (# of stages)
Line 2: (# of matched words in stage 1) (# of flips needed in aligning words in stage 1)
Line 3: (# of matched words in stage 2) (# of flips needed in aligning words in stage 2)
.
.
.
Line n: (# of chunks) (average chunk length)


To run the program with only porter stemming, run it like so:

perl standAloneMatcher.pl input.txt porter_stem


To run it with first the exact and then the wn_stem, run it like so:

perl standAloneMatcher.pl input.txt exact wn_stem


Use the --details flag to get the actual final alignment:

perl standAloneMatcher.pl --details input.txt

[Note: The WordNet loading module takes a once-per-program loading
time of about 3 seconds on a 2.4 GHz 1GB RAM machine]. 



4. Licensing:
=============

METEOR is distributed under the following license: 

License Start: 
                   Carnegie Mellon University                      
                     Copyright (c) 2004                            
                      All Rights Reserved.                         
                                                                   
Permission is hereby granted, free of charge, to use and distribute
this software and its documentation without restriction, including 
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of this work, and to    
permit persons to whom this work is furnished to do so, subject to 
the following conditions:                                          
 1. The code must retain the above copyright notice, this list of  
    conditions and the following disclaimer.                       
 2. Any modifications must be clearly marked as such.              
 3. Original authors' names are not deleted.                       
 4. The authors' names are not used to endorse or promote products 
    derived from this software without specific prior written      
    permission.                                                    
                                                                   
CARNEGIE MELLON UNIVERSITY AND THE CONTRIBUTORS TO THIS WORK       
DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING    
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT 
SHALL CARNEGIE MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE    
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES  
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN 
AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,        
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF     
THIS SOFTWARE.                                                     

Author: Satanjeev "Bano" Banerjee satanjeev@cmu.edu
Author: Alon Lavie alavie@cs.cmu.edu

License End.



