NAME commFasta.pl SYNOPSIS perl commFasta.pl [-cdivh] file1 file2 VERSION 1.2 of October 1, 2002 Version history 1.2 : [October 1, 2002] The program also returns the sequences that have no match in file2. 1.1 : [October 24, 2001] some bug fixes DESCRIPTION commFasta.pl takes two fasta formatted files and compares all sequences in file1 to file2. If it finds two identical sequences, it prints out the sequence in fasta format, concatenating the two ids with a %==% sign, and also concatenating the description with %==%. The sequence is then printed once. If it does not find an identical sequence in file2 it prints the sequence identifier, followed by %!=%, followed by the sequence, in fasta format. The number of identical sequences and the number of sequences that didn't match any sequence in file2 are printed at the end of the program. With the -i option (identifiers only), the program prints the two identifiers of matching sequences. If the identifiers themselves are identical, it appends an = sign, otherwise it prints a # sign. OPTIONS -i prints identifiers only, no sequence or fasta format output is generated. -d prints fasta sequences of file1 that cannot be found in file2. -di prints out file1 identifiers for only those sequences that don't appear in file2. -h -h prints a help message -v -v prints a version number. -c -c analyzes a file for duplicate sequence entries. Only 1 argument (filename) has to be supplied. perl commFasta.pl crappyfile.fasta AUTHOR Lukas Mueller