Description
As an IBM researcher, you have been tasked with writing a program that will find commonalities amongst given snippets of DNA that can be correlated with individual survey information to identify new genetic markers.
A DNA base sequence is noted by listing the nitrogen bases in the order in which they are found in the molecule. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A 6-base DNA sequence could be represented as TAGACC.
Given a set of DNA base sequences, determine the longest series of bases that occurs in all of the sequences.
Input
- A single positive integer m (2 <= m <= 10) indicating the number of base sequences in this dataset.
- m lines each containing a single base sequence consisting of 60 bases.
Output
Sample Input
3 2 GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3 GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA GATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA GATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA 3 CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
Sample Output
no significant commonalities AGATAC CATCATCAT
这个代码说明了STL真心强大,不过效率有点低~~~(比较的次数有点多)
#include<iostream>
#include<string.h>
#define MAXN 60
using namespace std;
int main()
{
int t;
cout<<"请输入测试次数"<<endl;
cin>>t; //测试数据的次数
while(t--)
{
int n; //每组测试数据有多少行数据
cout<<"请输入此数据的行数"<<endl;
cin>>n;
string str[n]; //用以存放数据
for(int i=0;i<n;i++)
{
cin>>str[i];
}
string res = " "; //用于存储找到的符合要求的子串
for(int i=3;i<=MAXN;i++) //i表示子串的长度
{
for(int j=0;j<MAXN;j++) //j表示字符串中的下标位置
{
string tmp=str[0].substr(j,i);
bool flag=true;
for(int k=1;k<n;k++)
{
if(str[k].find(tmp)==string::npos)
{
flag=false;
break;
}
}
if(flag&&res.size()<tmp.size()) res=tmp; //获得较长的子串
else if(flag&&(res.size()==tmp.size())&&(tmp<res)) res=tmp; //如果长度相等,则获得获得较小的字符串
}
}
if(res==" ") cout<<"no significant commonalities"<<endl;
else cout<<res<<endl;
}
return 0;
}
这道题跟算法中的求最长公共子序列有点像,不过区别是算法中的题的子串可以是不连续的,子要存在该字符串就可以了,采用的是递归回溯的方法,此处是连续的