I have to make a program that calls a function and searches for all the anagrams of the string from a file, returning a list of the words.
I made everything and it should work, but when I start it it gives me None. I even tried words that are in the file and it is the same.
def find_anagrams_in_wordlist(str, str_list):
str_list = get_dictionary_wordlist()
for int in range (0, len(str)):
anagram(str, str_list[int])
if anagram(str, str_list[int]):
return(str_list[int])
def find_anagrams(str):
str_list = get_dictionary_wordlist()
return find_anagrams_in_wordlist(str, str_list)
def test_find_anagrams():
print(find_anagrams("tenato"))
And this is my anagram() function:
def anagram(str1, str2):
str1_list = list(str1)
str1_list.sort()
str2_list = list(str2)
str2_list.sort()
return (str1_list == str2_list)
And this is my get_dictionary_wordlist() function:
def get_dictionary_wordlist():
text_file = open("dictionary.txt", "r")
return text_file.read().splitlines()
What should I change to make it work?
OK, first guess; this code:
def find_anagrams_in_wordlist(str, str_list):
str_list = get_dictionary_wordlist()
for int in range (0, len(str)):
anagram(str, str_list[int])
if anagram(str, str_list[int]):
return(str_list[int])
is working with range(0, len(str)) - the number of characters in 'tenato' - instead of range(0, len(str_list)) - the number of words in the dictionary.
This means you only test the first few dictionary words, and ignore the rest. Try it as:
def find_anagrams_in_wordlist(str, str_list):
str_list = get_dictionary_wordlist()
for word in str_list:
if anagram(str, word):
return word
(there's no need to count through lists in Python using range(), you can for item in mylist: directly).
NB. if this works, your design will still only return the first word which matches, not a list of words which match. You would need to build up a list of matches, and then return the list after the loop completes.
Lets examine what you have:
def find_anagrams_in_wordlist(str, str_list): # You shouldn't name something 'str' because it's a type.
str_list = get_dictionary_wordlist() # You are overwriting your incoming list with the same function call.
for int in range (0, len(str)): # You're executing your for loop once per letter in the candidate word.
anagram(str, str_list[int]) # This function executes but isn't doing anything
if anagram(str, str_list[int]):
return(str_list[int]) # If you find something you are exiting your for loop AND your function
def anagram(str1, str2):
str1_list = list(str1)
str1_list.sort() # The result of this is the same every time... and it's None
str2_list = list(str2)
str2_list.sort() # Always returns None
return (str1_list == str2_list) # This is the only real work being done here.
Finally, you are not actually accumulating your results:
def find_anagrams_in_wordlist(candidate, word_list): # Name your variables clearly
results = list() # For accumulating your results
sorted_candidate = sorted(candidate) # You only need to sort your candidate once
for word in word_list: # This is the cleanest way to loop through an iterable
sorted_word = sorted(word) # Sort the word once
if sorted_candidate == sorted_word: # Now check for equality: an additional function can obfuscate things
results.append(word) # If it's equal, add it to your results
return results # Send back your results
We can test it like so, in the REPL:
>>> lst = {"cat", "dog", "fish", "god"}
>>> find_anagrams_in_wordlist("odg", lst)
['god', 'dog']
>>> find_anagrams_in_wordlist("cat", lst)
['cat']
>>> find_anagrams_in_wordlist("shif", lst)
['fish']
>>> find_anagrams_in_wordlist("birds", lst)
[]
Note that in my solution I am using sorted not sort(). sort() won't accomplish the correct thing:
>>> x = "moonshot"
>>> y = list(x)
>>> y
['m', 'o', 'o', 'n', 's', 'h', 'o', 't']
>>> z = y.sort()
>>> z
>>> type(z)
<type 'NoneType'>
>>> sorted(x)
['h', 'm', 'n', 'o', 'o', 'o', 's', 't']
Related
I am working on a list filter. This is as far as I've gone. I would like to remove every string that doesn't contain H, L OR C. So far this is my attemp
input_list = input("Enter The Results(leave a space after each one):").split(' ')
for i in input_list:
if 'H'not in i or 'L' not in i or 'C' not in i:
Use this pythonic code
input_list = input("Enter The Results(leave a space after each one):").split(' ') # this is the input source
after_removed = [a for a in input_list if ('H' not in a and 'L' not in a and 'C' not in a)] # this is the after removed 'H', 'L', and 'C' from the input_list
Using list comprehension, you can make python easier and faster
If you don't believe, just try it for yourself :D
For clarity, you can use a function
def contains_invalid_character(my_string):
return 'H' in my_string or 'L' in my_string or 'C' in my_string
# To be more pythonic, you can use the following
# return next((True for letter in ("H", "L", "C") if letter in my_string), False)
results = []
for i in input_list:
if not contains_invalid_character(i):
results.append(i)
# Or to be more pythonic
# results = [i for i in input_list if not contains_invalid_character(i)]
Can someone explain me why after the for loop the list res is ['m']?
string = 'spam'
for x in string:
res =[]
res.extend(x)
print(res)
I expected the output to be res = ['s', 'p', 'a', 'm']
You are replacing the list object each step of your loop. The statement res = [] creates a new, empty list object, then adds a single letter to that list.
Without the loop, this is what you are doing:
>>> x = 's'
>>> res = []
>>> res.extend(x)
>>> res
['s']
>>> x = 'p'
>>> res = []
>>> res.extend(x)
['p']
>>> x = 'a'
>>> res = []
>>> res.extend(x)
>>> res
['a']
>>> res = []
>>> x = 'm'
>>> res.extend(x)
>>> res
['m']
Create the list outside of the loop, once:
string = 'spam'
res = []
for x in string:
res.extend(x)
print(res)
Now you don't keep replacing the list object with a new one each iteration of the for loop.
Again, removing the loop and doing the steps manually, now we have:
>>> res = []
>>> x = 's'
>>> res.extend(x)
>>> res
['s']
>>> x = 'p'
>>> res.extend(x)
>>> res
['s', 'p']
>>> x = 'a'
>>> res.extend(x)
>>> res
['s', 'p', 'a']
>>> x = 'm'
>>> res.extend(x)
>>> res
['s', 'p', 'a', 'm']
Not that you should be using res.extend() here; it only works because individual letters in string assigned to x are each also strings and even single-letter strings are still sequences. What you are really doing with res.extend(x) is the equivalent of for element in x: res.append(element), but x will always have just one element.
So this would work too:
string = 'spam'
res = []
for x in string:
res.append(x)
print(res)
or just extend res with the whole string value:
string = 'spam'
res = []
res.extend(string)
print(res)
or, if you just wanted a list of all the characters of a string, just use the list() function:
string = 'spam'
res = list(string)
print(res)
list() does exactly what you wanted to do with your loop: create an empty list, loop over the input, and add each element to the new list, which is then returned:
>>> string = 'spam'
>>> list(string)
['s', 'p', 'a', 'm']
You are resetting res every time inside the loop. You need to use this-
string = ‘spam’
res =[]
for x in string:
res.extend(x)
print(res)
You'll never get that output because for every iteration of the loop, you are setting res = [] and therefore only the last iteration will work by extending the blank list with 'm'.
The fixed code looks like this:
string = 'spam'
res = []
for x in string:
res.extend(x)
print(res)
Another note is that you probably should use .append in this case. .extend is for appending an entire iterable but since you are only adding one element at a time it isn't necessary. Check here for a good explanation.
Also a last note here is that you'll want to be careful with editing python code outside of plain text or code editors. You're using some leading and trailing apostrophes ‘’ instead of regular '' which will cause you trouble at some point.
You are always re-initializing the res list in the for-loop, that is why in the last iteration of the loop the list is initialized to [] an empty list and the last letter is added to it.
string = 'spam'
res =[]
for x in string:
res.extend(x)
print(res)
or to make it simple, use the list builtin which takes an iterable like a string and converts it into an list object:
>>> list('spam')
['s', 'p', 'a', 'm']
I think this is simplest way:
string = 'spam'
res = list(string)
My Question is that if we need to find the intersect between two strings?
How could we do that?
For example "address" and "dress" should return "dress".
I used a dict to implement my function, but I can only sort these characters and not output them with the original order? So how should I modify my code?
def IntersectStrings(first,second):
a={}
b={}
for c in first:
if c in a:
a[c] = a[c]+1
else:
a[c] = 1
for c in second:
if c in b:
b[c] = b[c]+1
else:
b[c] = 1
l = []
print a,b
for key in sorted(a):
if key in b:
cnt = min(a[key],b[key])
while(cnt>0):
l.append(key)
cnt = cnt-1
return ''.join(l)
print IntersectStrings('address','dress')
There are lots of intersecting strings. One way you could create a set of all substrings of each string and then intersect. If you want the biggest intersection just find the max from the resulting set, e.g.:
def substrings(s):
for i in range(len(s)):
for j in range(i, len(s)):
yield s[i:j+1]
def intersect(s1, s2):
return set(substrings(s1)) & set(substrings(s2))
Then you can see the intersections:
>>> intersect('address', 'dress')
{'re', 'ss', 'ess', 'es', 'ress', 'dress', 'dres', 'd', 'e', 's', 'res', 'r', 'dre', 'dr'}
>>> max(intersect('address', 'dress'), key=len)
'dress'
>>> max(intersect('sprinting', 'integer'), key=len)
'int'
Here is the code (i took it from this discussion Translation DNA to Protein, but here i'm using RNA instead of DNA file):
from itertools import takewhile
def translate_rna(sequence, d, stop_codons=('UAA', 'UGA', 'UAG')):
start = sequence.find('AUG')
# Take sequence from the first start codon
trimmed_sequence = sequence[start:]
# Split it into triplets
codons = [trimmed_sequence[i:i + 3] for i in range(0, len(trimmed_sequence), 3)]
# Take all codons until first stop codon
coding_sequence = takewhile(lambda x: x not in stop_codons and len(x) == 3, codons)
# Translate and join into string
protein_sequence = ''.join([codontable[codon] for codon in coding_sequence])
# This line assumes there is always stop codon in the sequence
return "{0}".format(protein_sequence)
Calling the translate_rna function:
sequence = ''
for line in open("to_rna", "r"):
sequence += line.strip()
translate_rna(sequence, d)
My to_rna file looks like:
CCGCCCCUCUGCCCCAGUCACUGAGCCGCCGCCGAGGAUUCAGCAGCCUCCCCCUUGAGCCCCCUCGCUU
CCCGACGUUCCGUUCCCCCCUGCCCGCCUUCUCCCGCCACCGCCGCCGCCGCCUUCCGCAGGCCGUUUCC
ACCGAGGAAAAGGAAUCGUAUCGUAUGUCCGCUAUCCAG.........
The function translate only the first proteine (from the first AUG to the first stop_codon)
I think the problem is in this line:
# Take all codons until first stop codon
coding_sequence = takewhile(lambda x: x not in stop_codons and len(x) == 3 , codons)
My question is : How can i tell python (after finding the first AUG and store it into coding_sequence as a list) to search again the next AUG in the RNA file and sotre it in the next position.
As a result, i wanna have a list like that:
['here_is_the_1st_coding_sequence', 'here_is_the_2nd_coding_sequence', ...]
PS : This is a homework, so i can't use Biopython.
EDIT:
A simple way to describe the problem:
From this code:
from itertools import takewhile
lst = ['N', 'A', 'B', 'Z', 'C', 'A', 'V', 'V' 'Z', 'X']
ch = ''.join(lst)
stop = 'Z'
start = ch.find('A')
seq = takewhile(lambda x: x not in stop, ch)
I want to get this:
['AB', 'AVV']
EDIT 2:
For instance, from this string:
UUUAUGCGCCGCUAACCCAUGGUUCCCUAGUGGUCCUGACGCAUGUGA
I should get as result:
['AUGCGCCGC', 'AUGGUUCCC', 'AUG']
looking at your basic code, because I couldn't quite follow your main stuff, it looks like you just want to split your string on all occurences of another string, and substring the string starting from the index of another string. If that is wrong, please tell me and I can update accordingly.
To achieve this, python has a builtin str.split(sub) which splits a string at every occurence of sub. Also, it has a str.index(sub) which returns the first index of sub. Example:
>>> ch = 'NABZCAVZX'
>>> ch[ch.index('A'):].split('Z')
['AB', 'CAV', 'X']
you can also specify sub strings that aren't just one char:
>>> ch = 'NACBABQZCVEZTZCGE'
>>> ch[ch.index('AB'):].split('ZC')
['ABQ', 'VEZT', 'GE']
Using multiple delimiters:
>>> import re
>>> stop_codons = ['UAA','UGA','UAG']
>>> re.compile('|'.join(stop_codons))\
>>> delim = re.compile('|'.join(stop_codons))
>>> ch = 'CCHAUAABEGTAUAAVEGTUGAVKEGUAABEGEUGABRLVBUAGCGGA'
>>> delim.split(ch)
['CCHA', 'BEGTA', 'VEGT', 'VKEG', 'BEGE', 'BRLVB', 'CGGA']
note that there is no order preferance to the split, ie if there is a UGA string ahead of a UAA, it will still split on the UGA. I am not sure if thats what you want but thats it.
Question: DO NOT USE SETS IN YOUR FUNCTION: Uses lists to return a list of the common letters in the first and last names (the intersection) Prompt user for first and last name and call the function with the first and last names as arguments and print the returned list.
I can't figure out why my program is just printing "No matches" even if there are letter matches. Anything helps! Thanks a bunch!
Code so far:
import string
def getCommonLetters(text1, text2):
""" Take two strings and return a list of letters common to
both strings."""
text1List = text1.split()
text2List = text2.split()
for i in range(0, len(text1List)):
text1List[i] = getCleanText(text1List[i])
for i in range(0, len(text2List)):
text2List[i] = getCleanText(text2List[i])
outList = []
for letter in text1List:
if letter in text2List and letter not in outList:
outList.append(letter)
return outList
def getCleanText(text):
"""Return letter in lower case stripped of whitespace and
punctuation characters"""
text = text.lower()
badCharacters = string.whitespace + string.punctuation
for character in badCharacters:
text = text.replace(character, "")
return text
userText1 = raw_input("Enter your first name: ")
userText2 = raw_input("Enter your last name: ")
result = getCommonLetters(userText1, userText2)
numMatches = len(result)
if numMatches == 0:
print "No matches."
else:
print "Number of matches:", numMatches
for letter in result:
print letter
Try this:
def CommonLetters(s1, s2):
l1=list(''.join(s1.split()))
l2=list(''.join(s2.split()))
return [x for x in l1 if x in l2]
print CommonLetters('Tom','Dom de Tommaso')
Output:
>>> ['T', 'o', 'm']
for letter in text1List:
Here's your problem. text1List is a list, not a string. You iterate on a list of strings (['Bobby', 'Tables'] for instance) and you check if 'Bobby' is in the list text2List.
You want to iterate on every character of your string text1 and check if it is present in the string text2.
There's a few non-pythonic idioms in your code, but you'll learn that in time.
Follow-up: What happens if I type my first name in lowercase and my last name in uppercase? Will your code find any match?
Prior to set() being the common idiom for duplicate removal in Python 2.5, you could use the conversion of a list to a dictionary to remove duplicates.
Here is an example:
def CommonLetters(s1, s2):
d={}
for l in s1:
if l in s2 and l.isalpha():
d[l]=d.get(l,0)+1
return d
print CommonLetters('matteo', 'dom de tommaso')
This prints the count of the common letters like so:
{'a': 1, 'e': 1, 'm': 1, 't': 2, 'o': 1}
If you want to have a list of those common letters, just use the keys() method of the dictionary:
print CommonLetters('matteo', 'dom de tommaso').keys()
Which prints just the keys:
['a', 'e', 'm', 't', 'o']
If you want upper and lower case letters to match, add the logic to this line:
if l in s2 and l.isalpha():