MacTech Network:   MacTech Forums  |  MacForge.net  |  Computer Memory  |  Register Domains  |  Cables  |  iPod Deals  | Mac Deals  | Mac Book Shelf


  MacTech Magazine

The journal of Macintosh technology

 
 

Magazine In Print
  About MacTech  
  Home Page  
  Subscribe  
  Archives DVD  
  Submit News  
  MacTech Forums  
  Get a copy of MacTech RISK FREE  
MacTech Only Search:
Community Search:
MacTech Central
  by Category  
  by Company  
  by Product  
MacTech News
  MacTech News  
  Previous News  
  MacTech RSS  
Article Archives
  Show Indices  
  by Volume  
  by Author  
  Source Code FTP  
Inside MacTech
  Writer's Kit  
  Editorial Staff  
  Editorial Calendar  
  Back Issues  
  Advertising  
Contact Us
  Customer Service  
  MacTech Store  
  Legal/Disclaimers  
  Webmaster Feedback  
 Get Netflix

October 1996 Programmer's Challenge

DNA Match

Mail solutions to: progchallenge@mactech.com

Due Date: 1 October 1996

This month's Challenge is based on a suggestion submitted by Vicente Giles of the Universidad de Málaga. Vincente faces a real-world problem to look for all the genomic sequences that match certain criteria, given a DNA database sequence and a problem sequence. A DNA sequence is a string of the four different nucleotides involved in the genetic code, denoted 'A', 'C', 'G', and 'U', which stand for adenine, cytosine, guanine, and uracil. The problem is to find all possible matches of the problem sequence in the database sequence, allowing a specified number of differences.

The prototype for the code you should write is:

long FindMatch(
  char *alphabet,       /* legal characters for database and fragment strings */
  char *database,       /* reference string to compare fragments against */
  void *storage,        /* storage preallocated for your use */
  char *fragment,       /* string to match against database */
  long diffsAllowed,    /* differences allowed between fragment and database */
  long matchPosition[]  /* return match positions in this array*/
);
 void InitMatch(
  char *alphabet,        /* legal characters for database and fragment strings */
  char *database,        /* reference string to compare fragments against */
  void *storage          /* storage preallocated for your use */
);

Because we would like our DNA-matching algorithm to be useful even if scientists discover an extraterrestrial genetic code based on other nucleotides, the algorithm accepts the genetic alphabet as a parameter. In the problem posed by Vincente, this would be the string "ACGU", but in our Challenge it might include any of the characters 'a'..'z' or 'A'..'Z'. (Extraterrestrial DNA is case sensitive.) The null-terminated reference string contained in the database parameter can be up to 1000000000 (10**9) characters long. The fragment that you are to match is also null-terminated, but will be significantly shorter on average (up to 10000 characters) than the database string. You should compare the input fragment against database, finding all occurrences of fragment that differ in no more than diffsAllowed positions from a substring of database. Your code should populate one entry in the preallocated matchPosition array for each match found, storing the offset of the character in database that corresponds to the first character of fragment. The FindMatch function should return the number of matches found.

As an example, given the following input ...

  alphabet:    ACGU
  database:    ACGTACGTACGTAAAAAATACGTACGTATA
  fragment:    ACGTACGTAC
  diffsAllowed:  5

... your code should find 7 matches and store the following values in matchPosition:

  -4 0 4 8 15 19 23
 

Notice that partial matches can occur at the beginning or the end of database, and as a result, the offsets returned in matchPosition can be negative or greater than strlen(database) - strlen(fragment).

To allow you to do some preprocessing, your InitMatch routine will be called once before a sequence of calls to FindMatch. InitMatch will be called with the same alphabet and database parameters provided to subsequent FindMatch calls. Both routines will also be given the same storage parameter that points to at least 1MB of memory allocated and initialized to zero by the calling routine. FindMatch will be called between 100 and 1000 times, on average, for each call to InitMatch. The winning solution will be the one with the fastest execution time, including the execution time for both InitMatch and FindMatch.

Other fine print: The alphabet characters will be provided in increasing ASCII order. The offsets you store in matchPosition need not be in any particular order. The value for diffsAllowed will typically be smaller than 50% of strlen(fragment). And finally, you should not allocate any dynamic storage in your solution beyond that provided in the storage parameter.

This will be a native PowerPC Challenge, using the latest Symantec environment. Solutions may be coded in C or C++.





Generate a short URL for this page:


Click on the cover to
see this month's issue!

TRIAL SUBSCRIPTION
Get a RISK-FREE subscription to the only technical Mac magazine!

Today's Deal


Apple Special

Order
Snow Leopard,
Mac Box Set, Family Pack,
and Snow Leopard Server
at a discount.
 


MacTech Magazine. www.mactech.com
Toll Free 877-MACTECH, Outside US/Canada: 805-494-9797

Register Low Cost (ok dirt cheap!) Domain Names in the MacTech Domain Store. As low as $1.99!
Save on long distance * Upgrade your Computer. See local info about Westlake Village
appleexpo.com | bathjot.com | bathroomjot.com | bettersupplies.com | comclothing.com | computerlunatics.com | dotcomclothing.com | explainit.com | exposinternational.com | homeismycastle.com | hoodcards.com | intlexpo.com | keyinfocard.com | kosheru.com | learnmorsels.com | localdealcards.com | lvschools.com | macjobsearch.com | mactechjobs.com | mactechmonitor.com | mactechsupplies.com | macwishbook.com | movie-depot.com | netprofessional.com | nibblelearning.com | notesintheshower.com | officejot.com | onlinebigbox.com | palmosdepot.com | peopleslineitemveto.com | showerjot.com | snapestore.com | snapishop.com | snapistore.com | snaptradingpost.com | stimulusmap.com | stimulusroadmap.com | triunfoguides.com | video-depot.com
Staff Site Links



All contents are Copyright 1984-2008 by Xplain Corporation. All rights reserved.

MacTech is a registered trademark of Xplain Corporation. Xplain, Video Depot, Movie Depot, Palm OS Depot, Explain It, MacDev, MacDev-1, THINK Reference, NetProfessional, NetProLive, JavaTech, WebTech, BeTech, LinuxTech, Apple Expo, MacTech Central and the MacTutorMan are trademarks or service marks of Xplain Corporation. Sprocket is a registered trademark of eSprocket Corporation. Other trademarks and copyrights appearing in this printing or software remain the property of their respective holders.