Unisolve Pty Ltd - Bioinformatics

Perl and Bioinformatics

As well as being called the "duct tape of the Internet", Perl has been called the "ideal" language for bioinformatics.

1. Perl is remarkably good for slicing, dicing, twisting, wringing, smoothing, summarizing and otherwise mangling text. Although the biological sciences do involve a good deal of numeric analysis now, most of the primary data is still text: clone names, annotations, comments, bibliographic references. Even DNA sequences are textlike. Interconverting incompatible data formats is a matter of text mangling combined with some creative guesswork. Perl's powerful regular expression matching and string manipulation operators simplify this job in a way that isn't equalled by any other modern language.

2. Perl is forgiving. Biological data is often incomplete, fields can be missing, or a field that is expected to be present once occurs several times (because, for example, an experiment was run in duplicate), or the data was entered by hand and doesn't quite fit the expected format. Perl doesn't particularly mind if a value is empty or contains odd characters. Regular expressions can be written to pick up and correct a variety of common errors in data entry.

The staff at Unisolve are experienced Perl developers and are heavily involved in the Perl community through Melbourne Perl Mongers as well as being the principal authors of the Perl reference site perlmeme.org.

Back to Home