Hi,
You read the first two posts about perl? Missed? Don't worry, go and read here ( perl #1 #2 )
From today, we can learn few programs that would help biology students! Ya, as a biotechnology student I use perl programs for dealing with DNA , RNA and Proteins.
So, I know just something about perl (very limited) for processing sequences! Okay, let's learn some basic programs.
"Hashing" here we are using "Genetic code" where we specified the one letter aminoacid code for the codon
"length" this command is used in the perl for finding the number of letters in the pattern
"substr" is used for getting a substring from the given string. You can specify the string from which the substring must be taken and the number of letters of the string that must be taken as substring can also be selected.
You read the first two posts about perl? Missed? Don't worry, go and read here ( perl #1 #2 )
From today, we can learn few programs that would help biology students! Ya, as a biotechnology student I use perl programs for dealing with DNA , RNA and Proteins.
So, I know just something about perl (very limited) for processing sequences! Okay, let's learn some basic programs.
Program to convert DNA sequence to protein sequence:
print "enter a dna sequence file\n";
$dnafile=<>;
chomp($dnafile);
unless(open(DNAFILENAME,$dnafile))
{
print "file not found\n";
}
$dnafile=<DNAFILENAME>;
$dna=join("",$dnafile);
$dna=~s/\s//g;
$dna=uc($dna);
print "DNA seq. is\n",$dna;
$dna=~tr/T/U/;
print "mRNA seq. is\n", $dna,"\n";
my(%genetic_code)=
(
'UUU'=>'F',
'UUC'=>'F',
'UUG'=>'L',
'UUA'=>'L',
'UCU'=>'L',
'UCC'=>'S',
'UCG'=>'S',
'UCA'=>'S',
'UGU'=>'C',
'UGC'=>'C',
'UGG'=>'T',
'UGA'=>'Stop',
'UAU'=>'Y',
'UAC'=>'Y',
'UAG'=>'Stop',
'UAA'=>'Stop',
'CUU'=>'L',
'CUC'=>'L',
'CUG'=>'L',
'CUA'=>'L',
'CCU'=>'P',
'CCC'=>'P',
'CCG'=>'P',
'CCA'=>'P',
'CGU'=>'A',
'CGC'=>'A',
'CGG'=>'A',
'CGA'=>'A',
'CAU'=>'H',
'CAC'=>'H',
'CAG'=>'Q',
'CAA'=>'Q',
'GUU'=>'V',
'GUC'=>'V',
'GUG'=>'V',
'GUA'=>'V',
'GCU'=>'A',
'GCC'=>'A',
'GCG'=>'A',
'GCA'=>'A',
'GGU'=>'G',
'GGC'=>'G',
'GGG'=>'G',
'GGA'=>'G',
'GAU'=>'A',
'GAC'=>'A',
'GAG'=>'E',
'GAA'=>'E',
'AUU'=>'I',
'AUC'=>'I',
'AUG'=>'I',
'AUA'=>'I',
'ACU'=>'T',
'ACC'=>'T',
'ACG'=>'T',
'ACA'=>'T',
'AGU'=>'S',
'AGC'=>'S',
'AGG'=>'R',
'AGA'=>'R',
'AAU'=>'N',
'AAC'=>'N',
'AAG'=>'K',
'AAA'=>'K',
);
for($i=0;$i<(length($dna)-2);$i+=3)
{
$codon=substr($dna,$i,3);
$protein.=$genetic_code{$codon};
}
print "the protein sequence is\n",$protein,"\n";
open (PROTEIN, ">protein.txt");
print PROTEIN $protein;
close PROTEIN;
exit;
O/P of the above perl code |
"length" this command is used in the perl for finding the number of letters in the pattern
"substr" is used for getting a substring from the given string. You can specify the string from which the substring must be taken and the number of letters of the string that must be taken as substring can also be selected.
$codon=substr($dna,$i,3);
Here, we are selecting the substring from the main string stored in "$dna" and as we are going to take as codon, so, we are taking 3 letters here as substring.
In the following line,
$protein.=$genetic_code{$codon};
we are converting the three letter codon into one letter amino acid code and saving it in $protein.
We are using ".= " which appends the value when each time the for loop is executed, so that, we get the complete aminoacid sequence, rather than a single amino acid - check with out giving the "dot", you could understand it better.
In the following lines, we are saving the output sequence in a text file, we are creating the file protein.txt. In the next line, we are printing the output in the created file, in the next line, we are closing the file.
open (PROTEIN, ">protein.txt");
print PROTEIN $protein;
close PROTEIN;
I explained in my own way, hope i explained in a better way.
Got the concept? Any doubts??? Then, comment.
Comments
Post a Comment