How to calculate the occurence of substring in dictionary values?

How to calculate the occurence of substring in dictionary values? - python

I need to calculate the occurrences of a motif (including overlaps) in sequences (motif is passed in the first line of standard input and the sequences in subsequent lines). The sequence name starts with >, and after whitespace is just a comment about the sequence that needs to be neglected. The input of program is like:
AT
>seq1 Comment......
AGGTATA
TGGCGCC
>seq2 Comment.....
GGCCGGCGC
The output should be:
seq1: 2
seq2: 0
I decided to save the first line as a motif, strip the comment from sequence name, join lines of sequence in one line and save sequence names (keys) and sequences (values) in a dictionary. I also wrote a function for motif_count and want to call it on dictionary values and then save it in a separate dictionary for final output. Can I do it or is there a better way?
#!/usr/bin/env python3
import sys
sequence = sys.stdin.readlines()
motif = sequence[0]
d = {}
temp_genename = None
temp_sequence = None
def motif_count(m, s):
count = 0
next_pos = -1
while True:
next_pos = s.find(m, next_pos + 1)
if next_pos < 0:
break
count += 1
return count
if sequence[1][0] != '>':
print("ERROR")
exit(1)
for line in sequence[1:]:
if line[0] == '>':
temp_genename = line.split(' ')[0].strip()
temp_sequence = ""
else:
temp_sequence += line.strip()
d[temp_genename] = temp_sequence
for value in d:
motif_count(motif, value)

You can simplify your code by using dictionary and string expressions to get the relevant key words that you need for your processing. Assuming your sequence values are consistent and similar to what you provided, you can split over the redundant This sequence is from and then later filter the uppercase letters and finally compute the occurrence of your motif. This can be done as in the following:
def motif_count(motif, key):
d[key] = d[key].count(motif)
sequence = """AT
>seq1 This sequence is from bacterial genome
AGGTATA
TGGCGCC
>seq2 This sequence is rich is CG
GGCCGGCGC""".split('\n')
d = {}
# print error if format is wrong
if sequence[1][0] != '>':
print("ERROR")
else:
seq = "".join(sequence).split('>')[1:]
func = lambda line: line.split(' This sequence is ')
d = dict((func(line)[0], ''.join([c for c in func(line)[1] if c.isupper()]))
for line in seq)
motif = sequence[0]
# replace seq with its count
for key in d:
motif_count(motif, key)
# print output
print(d)
Output:
{'seq1': 2, 'seq2': 0}

Related

Python - count the number of letter U's and return the line that contains them

My professor wants us to create a program that will allow multiple lines of input from a user and determine which line has the most amount of the letter U. We then need to print the count and the line.
Example of input:
This u is u an u example u line.
This u is another.
Example u line u.
Example of output:
Line with the most U's: 4
This u is u an u exammple u line.
I have this so far for the count but I am completely stuck on how to get it to go line by line.
#input line was provided by professor - not needed
def count_letter_u(line):
count = 0
for ch in line:
if ch == 'u':
count += 1
return count

I am not sure how you are getting multiple lines as input so I wrote my own way of getting multiple lines as input. Enter the lines and press ctrl+z and then Enter to stop taking input, these lines will get saved in contents list. The program loops through the list and appends the sentence as key and "u" count as the value in a dict. Then the max value with its respective key is printed.
print("Enter your content, Ctrl+D and Enter to save it.") # Ctrl+Z if on windows
contents = [] # list of sentences
dict = {}
while True:
try:
line = input()
except EOFError:
break
contents.append(line) # append each line to contents
def count_letter_u(line): # no of "u" in a line
count = 0
for ch in line:
if ch == 'u':
count += 1
return count
for sentence in contents:
dict[sentence] = count_letter_u(sentence) # add sentence, count to dict
max = max(zip(dict.values(), dict.keys()))[1] # max value
print(f"Line with the most U's: {dict.get(max)}, {max}")
Output
Enter your content, Ctrl-D and Enter to save it.
This u is u an u example u line.
This u is another.
Example u line u.
Line with the most U's: 4, This u is u an u example u line.

input_string = "This u is u an u example u line.\nThis u is another.\nExample u line u."
lines = input_string.splitlines() #splitting the string in lines
most_u = -1 # most 'U' count (since a line may have 0 'U' but not -1 so ) will be greater)
index_with_most_u = -1 # index (number) of line which has most 'U'
def count_letter_u(line):
count = 0
for ch in line:
if ch == 'u':
count += 1
return count
for index,line in enumerate(lines):
if most_u < count_letter_u(line): # If we find greater number of U till this point we update the most_u and line number
most_u = count_letter_u(line)
index_with_most_u = index
print("Line with the most U's:" + str(most_u))
print(lines[index_with_most_u])

Loop through the users input and store it into an empty array.
Then loop through the characters inside the each array element and check for U's using the count() function str.count(sub[, start[, end]].
Create a dictionary from the two arrays and find the max value of the number of Us and print the corresponding line.
def user_lines():
line = []
u_input = 0
while u_input != 'stop':
# Loops through the length of the line array
u_input = input('Enter your line: ')
if u_input == 'stop':
break
# Append the user input to the line array
line.append(u_input)
print(line)
word_count = []
for word in line:
# Appends the amount of Us found inside the word
word_count.append(word.count('u'))
# Zipping the u_array and word_count arrays together to create a dictionary
# This associates the words with the number of U's inside
mapping = dict(zip(line, word_count))
# Finds the biggest number of Us in
new_val = mapping.values()
maximum_val = max(new_val)
# Finds the key associated with the max value
key_max = max(zip(mapping.values(), mapping.keys()))[1]
print('The line with the most Us are:', key_max, 'with', maximum_val)
user_lines()

Transforming source word into target word

I need some help with my code. I need to convert one input word into another, changing one letter at a time. currently my program does this but very inefficiently and does not find the shortest route. Any help would be appreciated.
import re
def same(item, target):
return len([c for (c, t) in zip(item, target) if c == t])
def build(pattern, words, seen, list):
return [word for word in words
if re.search(pattern, word) and word not in seen.keys() and
word not in list]
def find(word, words, seen, target, path):
list = []
for i in range(len(word)):
list += build(word[:i] + "." + word[i + 1:], words, seen, list)
if len(list) == 0:
return False
list = sorted([(same(w, target), w) for w in list])
for (match, item) in list:
if match >= len(target) - 1:
if match == len(target) - 1:
path.append(item)
return True
seen[item] = True
for (match, item) in list:
path.append(item)
if find(item, words, seen, target, path):
return True
path.pop()
fname = 'dictionary.txt'
file = open(fname)
lines = file.readlines()
while True:
start = input("Enter start word:")
words = []
for line in lines:
word = line.rstrip()
if len(word) == len(start):
words.append(word)
target = input("Enter target word:")
break
count = 0
path = [start]
seen = {start : True}
if find(start, words, seen, target, path):
path.append(target)
print(len(path) - 1, path)
else:
print("No path found")
edit: Below is another failed attempt by me to fix this problem by trying a different approach. This time it does not seem to loop properly.
def find(start, words, target): # find function. Word = start word, words =
start=list(start)
target=list(target)
print("Start word is ", start)
print("Target word is ", target)
letter = 0
while start != target:
if letter == len(target):
letter = 0
continue
elif start[letter] == target[letter]:
letter = letter + 1
continue
else:
testword = list(start)
testword[letter] = target[letter]
testword = ''.join(testword)
if testword in words:
start[letter] = target[letter]
letter = letter + 1
print(start)
continue
else:
letter = letter + 1
continue
letter = letter + 1
continue
fname = "dictionary.txt"
file = open(fname) # Open the dictionary
lines = file.readlines() # Read each line from the dictionary and store it in lines
while True: # Until ended
start = input("Enter start word:") # Take a word from the user
words = [] # Inititialise Array 'words'
for line in lines: # For each line in the dictionary
word = line.rstrip() #strip all white space and characters from the end of a string
if len(word) == len(start):
words.append(word)
if start not in words:
print("Your start word is not valid")
continue
target = input("Enter target word:")
if len(start) != len(target):
print("Please choose two words of equal length")
continue
if target not in words:
print("Your target word is not valid")
continue
break
edit: Here is the basic algorithm to the code. (Both variants are compatiable with my purpose).
-input start word
-input target word
- if len(start) = len(target)
continue
-check dictionary to see if target and start words are present
- find what letters are different from the start to target word
- change one different letter in the start word until start word
=target
word #after each letter change, output must be valid word in dictionary
The goal is to achieve this in the least amount of steps which is not achieved, the first section of code does this, I think but in a huge amount of steps I know could be far more efficient

Here's a breadth-first search that doesn't use any 3rd party modules. I don't guarantee that it finds the shortest solutions, but it appears to work. ;) It stops when it finds a solution, but due to the random order of sets each run of the program may find a different solution for a given start & target pair.
import re
# The file containing the dictionary
fname = '/usr/share/dict/words'
start, target = 'hide', 'seek'
wlen = len(start)
wrange = range(wlen)
words = set()
with open(fname) as f:
for word in f:
w = word.rstrip()
# Grab words of the correct length that aren't proper nouns
# and that don't contain non-alpha chars like apostrophe or hyphen
if len(w) == wlen and w.islower() and w.isalpha():
words.add(w)
print('word set size:', len(words))
# Build a regex to find words that differ from `word` by one char
def make_pattern(word):
pat = '|'.join(['{}.{}'.format(word[:i], word[i+1:]) for i in wrange])
return re.compile(pat)
# Find words that extend each chain of words in `seq`
def find(seq):
result = []
seen = set()
for current in seq:
pat = make_pattern(current[-1])
matches = {w for w in words if pat.match(w)} - seen
if target in matches:
return current + (target,)
result.extend(current + (w,) for w in matches)
seen.update(matches)
words.difference_update(matches)
seq[:] = result
# Search for a solution
seq = [(start,)]
words.discard(start)
while True:
solution = find(seq)
if solution:
print(solution)
break
size = len(seq)
print(size)
if size == 0:
print('No solutions found')
break
typical output
word set size: 2360
9
55
199
479
691
('hide', 'hire', 'here', 'herd', 'heed', 'seed', 'seek')
I ought to mention that all those word chains chew up a bit of RAM, I'll try to think of a more compact approach. But it shouldn't really be a problem on modern machines, unless you're working with really large words.

Using a bit of preprocessing to group equal length words, you can use the networkx 3rd party library to build a graph, then use its shortest_path algorithm to retrieve it. Note that I've used the default dictionary available on most *nix systems and limited it to words of 5 characters or less.
from collections import defaultdict
import networkx as nx
# Group the words into equal length so we only compare within words of equal length
with open('/usr/share/dict/words') as fin:
words = defaultdict(set)
for word in (line.strip() for line in fin if line.islower() and len(line) <= 6):
words[len(word)].add(word)
graph = nx.Graph()
for k, g in words.items():
while g:
word = g.pop()
matches = {w for w in g if sum(c1 != c2 for c1, c2 in zip(word, w)) == 1}
graph.add_edges_from((word, match) for match in matches)
Then, get the shortest route, eg:
In [1]: ' -> '.join(nx.shortest_path(graph, 'hide', 'seek'))
Out[1]: 'hide -> hire -> here -> herd -> heed -> seed -> seek'
In [2]: ' -> '.join(nx.shortest_path(graph, 'cat', 'dog'))
Out[2]: 'cat -> cot -> cog -> dog'

dinucleotide count and frequency

Im trying to find the dinuc count and frequencies from a sequence in a text file, but my code is only outputting single nucleotide counts.
e = "ecoli.txt"
ecnt = {}
with open(e) as seq:
for line in seq:
for word in line.split():
for i in range(len(seqr)):
dinuc = (seqr[i] + seqr[i:i+2])
for dinuc in seqr:
if dinuc in ecnt:
ecnt[dinuc] += 1
else:
ecnt[dinuc] = 1
for x,y in ecnt.items():
print(x, y)
Sample input: "AAATTTCGTCGTTGCCC"
Sample output:
AA:2
TT:3
TC:2
CG:2
GT:2
GC:1
CC:2
Right now, Im only getting single nucleotides for my output:
C 83550600
A 60342100
T 88192300
G 92834000
For the nucleotides that repeat i.e. "AAA", the count has to return all possible combinations of consecutive 'AA', so the output should be 2 rather than 1. It doesnt matter what order the dinucleotides are listed, I just need all combinations, and for the code to return the correct count for the repeated nucleotides. I was asking my TA and she said that my only problem was getting my 'for' loop to add the dinucleotides to my dictionary, and I think my range may or may not be wrong. The file is a really big one, so the sequence is split up into lines.
Thank you so much in advance!!!

I took a look at your code and found several things that you might want to take a look at.
For testing my solution, since I did not have ecoli.txt, I generated one of my own with random nucleotides with the following function:
import random
def write_random_sequence():
out_file = open("ecoli.txt", "w")
num_nts = 500
nts_per_line = 80
nts = []
for i in range(num_nts):
nt = random.choice(["A", "T", "C", "G"])
nts.append(nt)
lines = [nts[i:i+nts_per_line] for i in range(0, len(nts), nts_per_line)]
for line in lines:
out_file.write("".join(line) + "\n")
out_file.close()
write_random_sequence()
Notice that this file has a single sequence of 500 nucleotides separated into lines of 80 nucleotides each. In order to count dinucleotides where you have the first nucleotide at the end of one line and the second nucleotide at the start of the next line, we need to merge all of these separate lines into a single string, without spaces. Let's do that first:
seq = ""
with open("ecoli.txt", "r") as seq_data:
for line in seq_data:
seq += line.strip()
Try printing out "seq" and notice that it should be one giant string containing all of the nucleotides. Next, we need to find the dinucleotides in the sequence string. We can do this using slicing, which I see you tried. So for each position in the string, we look at both the current nucleotide and the one after it.
for i in range(len(seq)-1):#note the -1
dinuc = seq[i:i+2]
We can then do the counting of the nucleotides and storage of them in a dictionary "ecnt" very much like you had. The final code looks like this:
ecnt = {}
seq = ""
with open("ecoli.txt", "r") as seq_data:
for line in seq_data:
seq += line.strip()
for i in range(len(seq)-1):
dinuc = seq[i:i+2]
if dinuc in ecnt:
ecnt[dinuc] += 1
else:
ecnt[dinuc] = 1
print ecnt

A perfect opportunity to use a defaultdict:
from collections import defaultdict
file_name = "ecoli.txt"
dinucleotide_counts = defaultdict(int)
sequence = ""
with open(file_name) as file:
for line in file:
sequence += line.strip()
for i in range(len(sequence) - 1):
dinucleotide_counts[sequence[i:i + 2]] += 1
for key, value in sorted(dinucleotide_counts.items()):
print(key, value)

10 ,most frequent words in a string Python

I need to display the 10 most frequent words in a text file, from the most frequent to the least as well as the number of times it has been used. I can't use the dictionary or counter function. So far I have this:
import urllib
cnt = 0
i=0
txtFile = urllib.urlopen("http://textfiles.com/etext/FICTION/alice30.txt")
uniques = []
for line in txtFile:
words = line.split()
for word in words:
if word not in uniques:
uniques.append(word)
for word in words:
while i<len(uniques):
i+=1
if word in uniques:
cnt += 1
print cnt
Now I think I should look for every word in the array 'uniques' and see how many times it is repeated in this file and then add that to another array that counts the instance of each word. But this is where I am stuck. I don't know how to proceed.
Any help would be appreciated. Thank you

The above problem can be easily done by using python collections
below is the Solution.
from collections import Counter
data_set = "Welcome to the world of Geeks " \
"This portal has been created to provide well written well" \
"thought and well explained solutions for selected questions " \
"If you like Geeks for Geeks and would like to contribute " \
"here is your chance You can write article and mail your article " \
" to contribute at geeksforgeeks org See your article appearing on " \
"the Geeks for Geeks main page and help thousands of other Geeks. " \
# split() returns list of all the words in the string
split_it = data_set.split()
# Pass the split_it list to instance of Counter class.
Counters_found = Counter(split_it)
#print(Counters)
# most_common() produces k frequently encountered
# input values and their respective counts.
most_occur = Counters_found.most_common(4)
print(most_occur)

You're on the right track. Note that this algorithm is quite slow because for each unique word, it iterates over all of the words. A much faster approach without hashing would involve building a trie.
# The following assumes that we already have alice30.txt on disk.
# Start by splitting the file into lowercase words.
words = open('alice30.txt').read().lower().split()
# Get the set of unique words.
uniques = []
for word in words:
if word not in uniques:
uniques.append(word)
# Make a list of (count, unique) tuples.
counts = []
for unique in uniques:
count = 0 # Initialize the count to zero.
for word in words: # Iterate over the words.
if word == unique: # Is this word equal to the current unique?
count += 1 # If so, increment the count
counts.append((count, unique))
counts.sort() # Sorting the list puts the lowest counts first.
counts.reverse() # Reverse it, putting the highest counts first.
# Print the ten words with the highest counts.
for i in range(min(10, len(counts))):
count, word = counts[i]
print('%s %d' % (word, count))

from string import punctuation #you will need it to strip the punctuation
import urllib
txtFile = urllib.urlopen("http://textfiles.com/etext/FICTION/alice30.txt")
counter = {}
for line in txtFile:
words = line.split()
for word in words:
k = word.strip(punctuation).lower() #the The or you You counted only once
# you still have words like I've, you're, Alice's
# you could change re to are, ve to have, etc...
if "'" in k:
ks = k.split("'")
else:
ks = [k,]
#now the tally
for k in ks:
counter[k] = counter.get(k, 0) + 1
#and sorting the counter by the value which holds the tally
for word in sorted(counter, key=lambda k: counter[k], reverse=True)[:10]:
print word, "\t", counter[word]

import urllib
import operator
txtFile = urllib.urlopen("http://textfiles.com/etext/FICTION/alice30.txt").readlines()
txtFile = " ".join(txtFile) # this with .readlines() replaces new lines with spaces
txtFile = "".join(char for char in txtFile if char.isalnum() or char.isspace()) # removes everything that's not alphanumeric or spaces.
word_counter = {}
for word in txtFile.split(" "): # split in every space.
if len(word) > 0 and word != '\r\n':
if word not in word_counter: # if 'word' not in word_counter, add it, and set value to 1
word_counter[word] = 1
else:
word_counter[word] += 1 # if 'word' already in word_counter, increment it by 1
for i,word in enumerate(sorted(word_counter,key=word_counter.get,reverse=True)[:10]):
# sorts the dict by the values, from top to botton, takes the 10 top items,
print "%s: %s - %s"%(i+1,word,word_counter[word])
output:
1: the - 1432
2: and - 734
3: to - 703
4: a - 579
5: of - 501
6: she - 466
7: it - 440
8: said - 434
9: I - 371
10: in - 338
This methods ensures that only alphanumeric and spaces are in the counter. Doesn't matter that much tho.

Personally I'd make my own implementation of collections.Counter. I assume you know how that object works, but if not I'll summarize:
text = "some words that are mostly different but are not all different not at all"
words = text.split()
resulting_count = collections.Counter(words)
# {'all': 2,
# 'are': 2,
# 'at': 1,
# 'but': 1,
# 'different': 2,
# 'mostly': 1,
# 'not': 2,
# 'some': 1,
# 'that': 1,
# 'words': 1}
We can certainly sort that based on frequency by using the key keyword argument of sorted, and return the first 10 items in that list. However that doesn't much help you because you don't have Counter implemented. I'll leave THAT part as an exercise for you, and show you how you might implement Counter as a function rather than an object.
def counter(iterable):
d = {}
for element in iterable:
if element in d:
d[element] += 1
else:
d[element] = 1
return d
Not difficult, actually. Go through each element of an iterable. If that element is NOT in d, add it to d with a value of 1. If it IS in d, increment that value. It's more easily expressed by:
def counter(iterable):
d = {}
for element in iterable:
d.setdefault(element, 0) += 1
Note that in your use case, you probably want to strip out the punctuation and possibly casefold the whole thing (so that someword gets counted the same as Someword rather than as two separate words). I'll leave that to you as well, but I will point out str.strip takes an argument as to what to strip out, and string.punctuation contains all the punctuation you're likely to need.

You can also do it through pandas dataframes and get result in convinient form as a table: "word-its freq." ordered.
def count_words(words_list):
words_df = pn.DataFrame(words_list)
words_df.columns = ["word"]
words_df_unique = pn.DataFrame(pn.unique(words_list))
words_df_unique.columns = ["unique"]
words_df_unique["count"] = 0
i = 0
for word in pn.Series.tolist(words_df_unique.unique):
words_df_unique.iloc[i, 1] = len(words_df.word[words_df.word == word])
i+=1
res = words_df_unique.sort_values('count', ascending = False)
return(res)

To do the same operation on a pandas data frame, you may use the following through Counter function from Collections:
from collections import Counter
cnt = Counter()
for text in df['text']:
for word in text.split():
cnt[word] += 1
# Find most common 10 words from the Pandas dataframe
cnt.most_common(10)

Python: Appending a to a list from a dictionary

This is going to be long but I don't know how else to effectively explain this.
So I have 2 files that I am reading in. The first one has a list of characters.The second file is a list of 3 characters and then it's matching identifier character(separated by a tab).
With the second file I made a dictionary with the 3 characters as the items and the one character as the corresponding key.
What I need to do is take 3 characters at a time from the first list and compare it with the dictionary. If there is a match I need to take the corresponding key and append it to a new list that I will print out. If the match is a '*' character I need to stop not continue comparing the list to the dictionary.
I'm having trouble with the comparing and then making the new list by using the append function.
Here is part of the first input file:
Seq0
ATGGAAGCGAGGATGtGa
Here is part the second:
AUU I
AUC I
AUA I
CUU L
GUU V
UGA *
Here is my code so far:
input = open("input.fasta", "r")
codons = open("codons.txt", "r")
counts = 1
amino_acids = {}
for lines in codons:
lines = lines.strip()
codon, acid = lines.split("\t")
amino_acids[codon] = acid
counts += 1
count = 1
for line in input:
if count%2 == 0:
line = line.upper()
line = line.strip()
line = line.replace(" ", "")
line = line.replace("T", "U")
import re
if not re.match("^[AUCG]*$", line):
print "Error!"
if re.match("^[AUCG]*$", line):
mrna = len(line)/3
first = 0
last = 3
while mrna != 0:
codon = line[first:last]
first += 3
last += 3
mrna -= 1
list = []
if codon == amino_acids[codon]:
list.append(acid)
if acid == "*":
mrna = 0
for acid in list:
print acid
So I want my output to look something like this:
M L I V *
But I'm not getting even close to this.
Please help!

The following is purely untested code. Check indentation, syntax and logic, but should be closer to what you want.
import re
codons = open("codons.txt", "r")
amino_acids = {}
for lines in codons:
lines = lines.strip()
codon, acid = lines.split("\t")
amino_acids[codon] = acid
input = open("input.fasta", "r")
count = 0
list = []
for line in input:
count += 1
if count%2 == 0: #i.e. only care about even lines
line = line.upper()
line = line.strip()
line = line.replace(" ", "")
line = line.replace("T", "U")
if not re.match("^[AUCG]*$", line):
print "Error!"
else:
mrna = len(line)/3
first = 0
while mrna != 0:
codon = line[first:first+3]
first += 3
mrna -= 1
if codon in amino_acids:
list.append(amino_acids[codon])
if acid == "*":
mrna = 0
for acid in list:
print acid

In Python there's usually a way to avoid writing explicit loops with counters and such. There's an incredibly powerful list comprehension syntax that lets you construct lists in one line. To wit, here's an alternate way to write your second for loop:
import re
def codons_to_acids(amino_acids, sequence):
sequence = sequence.upper().strip().replace(' ', '').replace('T', 'U')
codons = re.findall(r'...', sequence)
acids = [amino_acids.get(codon) for codon in codons if codon in amino_acids]
if '*' in acids:
acids = acids[:acids.index('*') + 1]
return acids
The first line performs all of the string sanitization. Chaining together the different methods makes the code more readable to me. You may or may not like that. The second line uses re.findall in a tricky way to split the string every three characters. The third line is a list comprehension which looks up each codon in the amino_acids dict and creates a list of the resulting values.
There's no easy way to break out of a for loop inside a list comprehension, so the final if statement slices off any entries occurring after a *.
You would call this function like so:
amino_acids = {
'AUU': 'I', 'AUC': 'I', 'AUA': 'I', 'CUU': 'L', 'GUU': 'V', 'UGA': '*'
}
print codons_to_acids(amino_acids, 'ATGGAAGCGAGGATGtGaATT')

If you can solve the problem without regex, it's best not to use it.
with open('input.fasta', 'r') as f1:
input = f1.read()
codons = list()
with open('codons.txt', 'r') as f2:
codons = f2.readlines()
input = [x.replace('T', 'U') for x in input.upper() if x in 'ATCG']
chunks = [''.join(input[x:x+3]) for x in xrange(0, len(input), 3)]
codons = [c.replace('\n', '').upper() for c in codons if c != '\n']
my_dict = {q.split()[0]: q.split()[1] for q in codons }
result = list()
for ch in chunks:
new_elem = my_dict.pop(ch, None)
if new_elem is None:
print 'Invalid key!'
else:
result.append(new_elem)
if new_elem == '*':
break
print result

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate the occurence of substring in dictionary values? - python

Related

Python - count the number of letter U's and return the line that contains them

Transforming source word into target word

dinucleotide count and frequency

10 ,most frequent words in a string Python

Python: Appending a to a list from a dictionary

Categories

Resources