Python: Altering value of a mismatched key in a dictionary - python

This code needs to find the most frequent k-mers (substrings of k letters) with d mismatches in a string (genome). In the past I had to find the most frequent k-mer without mismatches and I'm trying minimally alter my code. To do so, I would have to be able to increment values in a dictionary that have a different key from a string I'm passing. Is that possible? Below is my code. Is there a way to do what I have written in the comment? HammingDistance() just computes the number of differences between 2 strings.
import operator
def MostFrequentKmer (Text, k, d):
kmerDict = {}
freqKmers = list()
for i in range (0, len(Text)-k+1):
kmer = Text[i:i+k]
if kmer in kmerDict:
kmerDict[kmer] += 1
#elif a key exists for which HammingDistance(key, kmer) <= d, then increment the value associated with that key
else:
kmerDict[kmer] = 1
maxVal = max(zip(kmerDict.values()))[0]
for k, v in kmerDict.items():
if v == maxVal:
freqKmers.append(k)
print(sorted(freqKmers))
def HammingDistance (str1, str2):
hamDis = 0
for i in range(0, len(str1)):
if str1[i] != str2[i]:
hamDis += 1
return hamDis
Example IO is:
Input- ("ACGTTGCATGTCGCATGATGCATGAGAGCT", 4, 1)
Output- ["ATGC", "ATGT", "GATG"]

Assuming you want to 1) increment the count for all closest keys and 2) add an entry if there are no closest keys, the below does what you want.
else:
close_keys = [k for k in kmerDict.keys() if HammingDistance(k, kmer) <= d]
if close_keys:
for k in close_keys:
kmerDict[k] += 1
else:
kmerDict[k] = 1
As an aside, please consider following python naming conventions, e.g., change HammingDistance to hamming_distance.

Related

Python - Optimize code for finding the largest palindrome number that can be formed from the given digits

I wrote code for task, but get error: time-limit-exceeded from testing system.
I need to get advice how I can write this code faster and more precise
Code:
# input
n = int(input())
seq = input()
pairs = []
seq = list(seq)
# find pairs
counted = []
for i, item in enumerate(seq):
for j, num in enumerate(seq):
if (i != j) and (item == num):
if (i not in counted) and (j not in counted):
pairs.append((item, num))
counted.append(i)
counted.append(j)
# remove pairs from seq
for pair in pairs:
seq.remove(pair[0])
seq.remove(pair[1])
# create a palindrome
start = []
end = []
pairs = sorted(pairs)
pairs = list(reversed(pairs))
for item in pairs:
start.append(item[0])
end.append(item[1])
end = list(reversed(end))
if len(seq) != 0:
seq = [int(item) for item in seq]
max_el = list(sorted(seq))[-1]
start.append(max_el)
final_s = start + end
# output
output = ''.join([str(item) for item in final_s])
print(output)
It's an interesting problem and not completely trivial. First, I think the input can only have odd count of a single digit, otherwise it cannot be formed into a palindrome. For example, 11333 is a valid input, but 113334 is not (both 3 and 4 have odd counts). It should also be noted that we cannot just dump the odd-count digits in the middle of the output. For example, we might be tempted to do 1335551 -> 3155513, but the correct answer (largest palindrome) is 5315135.
Given these constraints, here's my attempt at a solution. It uses collections.Counter to count the digit pairs, which are then sorted in descending order and mirrored to create the output. The possible odd-count digit is handled by treating it as a single digit (which goes into the middle of the output), plus a bunch of paired digits.
I tested it for input sizes of 10^5 digits and it didn't seem to take much time at all.
from collections import Counter
def biggest_pal(n):
c = Counter(str(n))
s = ''
evens = {k: v for k, v in c.items() if not v % 2}
odds = {k: v for k, v in c.items() if v % 2}
vodd = ''
if len(odds) > 1:
raise ValueError('Invalid input')
elif odds:
vodd, nodd = odds.popitem()
if nodd > 1:
evens[vodd] = nodd - 1
for k, v in sorted(evens.items(), key=lambda p: -int(p[0])):
s += k * int(v/2)
return s + vodd + s[::-1]
Some test inputs:
biggest_pal(112) # 121
biggest_pal(1122) # 2112
biggest_pal(1234123) # 3214123
biggest_pal(1331555) # 5315135
biggest_pal(112212) # ValueError: invalid input

I'm trying to match dictionary keys with its values based on some rules, without using additional libraries

The dict_key contains the correct spelling and its corresponding value contains the spelling of the candidate
The function should identify the degree of correctness as mentioned below:
CORRECT, if it is an exact match
ALMOST CORRECT, if no more than 2 letters are wrong
WRONG, if more than 2 letters are wrong or if length (correct spelling versus spelling given by contestant) mismatches.
and return a list containing the number of CORRECT answers, number of ALMOST CORRECT answers and number of WRONG
My program assumes that all the words are in uppercase and max word length is 10
Here is my code:
def find_correct(word_dict):
#start writing your code here
correct_count=0
almost_correct_count=0
incorrect_count=0
for k,v in word_dict.items():
if len(k)<=10:
if len(k)==len(v):
if k==v:
correct_count+=1
else:
for i in k:
i_count=0
#print(i)
for j in v:
#print(j)
if not i==j:
i_count+=1
break
if i_count<=2:
almost_correct_count+=i_count
else:
incorrect_count+=i_count
else:
incorrect_count+=1
else:
incorrect_count+=1
print(correct_count,almost_correct_count,incorrect_count)
Driver Code:
word_dict={"WhIZZY":"MIZZLY","PRETTY":"PRESEN"}
print(find_correct(word_dict))
My Output:
0,2,0
Expected Output:
0,0,2
So I came up with a much simpler solution. I hope I got your question right but it produces the desired output.
WORD_DICT = {"THEIR":"THEIR",
"BUSINESS":"BISINESS",
"WINDOWS":"WINDMILL",
"WERE":"WEAR",
"SAMPLE":"SAMPLE"}
def find_correct(word_dict):
correct, almost_correct, incorrect = 0, 0, 0
for key, value in WORD_DICT.items():
diff_list = set(list(key)).symmetric_difference(set(list(value)))
diff = len(diff_list)
if diff == 0:
correct += 1
elif diff <= 2:
almost_correct += 1
elif diff > 2:
incorrect += 1
print(correct, almost_correct, incorrect)
find_correct(WORD_DICT)
Instead of going through every character I compare the Strings as lists. I got the idea fron the following post.
This seems to work for your specified dictionaries, though there might be an edge case or two for which it doesn't work properly. If you have any cases this doesn't work for, then the problem is very likely to be with the if/elif/else block in the find_correct function and the way it's evaluating the length of the list.
I took my cue from the accepted answer, to convert the strings to lists, although instead of setting, I used the pop method to remove the required elements so that duplicates would be accounted for.
WORD_DICT = {"THEIR":"THEIR",
"BUSINESS":"BISINESS",
"WINDOWS":"WINDMILL",
"WERE":"WEAR",
"SAMPLE":"SAMPLE"}
second_dict = {'WHIZZY': 'MIZZLY', 'PRETTY': 'PRESEN'}
def find_correct(k, v):
k, v = list(k), list(v)
for k_letter in k:
if k_letter in v:
idx = v.index(k_letter)
v.pop(idx)
if len(v) == 0:
return "correct"
elif len(v) == 1:
return "almost correct"
else:
return "incorrect"
def top_level_func(word_dict):
d = {"correct":0, "almost correct":0, "incorrect":0}
for k, v in word_dict.items():
response = find_correct(k, v)
d[response] += 1
return d
results = top_level_func(second_dict)
for item in results.items():
print("{} = {} instances".format(*item))
def find_correct(word_dict):
correct,almost,incorrect=0,0,0
for key,value in word_dict.items():
count=0
if(key==value):
correct+=1
elif(len(key)==len(value)):
for i in range(0,len(key)):
if(key[i]!=value[i]):
count+=1
if(count<=2):
almost+=1
else:
incorrect+=1
else:
incorrect+=1
list=[correct,almost,incorrect]
return list
word_dict={'WHIZZY': 'MIZZLY', 'PRETTY': 'PRESEN'}
print(find_correct(word_dict))
def find_correct(word_dict):
correct=0
almost_correct=0
wrong=0
for key,val in word_dict.items():
key1=key;val1=val
if(len(key)!=len(val)):
wrong+=1
elif(key==val):
correct+=1
else:
var=0;count=0
for i in range(len(key1)):
for j in range(i+1):
var=j
if(key1[i]!=val1[j]):
count+=1
if(count<=2):
almost_correct+=1
else:
wrong+=1
li=[correct,almost_correct,wrong]
return li
word_dict={"THEIR": "THEIR","BUSINESS":"BISINESS","WINDOWS":"WINDMILL","WERE":"WEAR","SAMPLE":"SAMPLE"}
print(find_correct(word_dict))
def find_correct(word_dict):
correct_count=0
almost_correct_count=0
wrong_count=0
list1=[]
for k,v in word_dict.items():
if len(k)<=10:
if len(k)==len(v):
if k==v:
correct_count+=1
else:
x=[]
y=[]
for i in k:
x.append(i)
for i in v:
y.append(i)
count=0
for i in x:
if not(y[x.index(i)]==i):
count+=1
if count<=2:
almost_correct_count+=1
else:
wrong_count+=1
else:
wrong_count+=1
else:
wrong_count+=1
list1.append(correct_count)
list1.append(almost_correct_count)
list1.append(wrong_count)
return list1
word_dict={'MOST': 'MICE', 'GET': 'GOT', 'COME': 'COME', 'THREE': 'TRICE'}
print(find_correct(word_dict))

Check the most frequent letter(s) in a word. Python

My task is:
To write a function that gets a string as an argument and returns the letter(s) with the maximum appearance in it.
Example 1:
s = 'Astana'
Output:
a
Example 2:
s = 'Kaskelen'
Output:
ke
So far, I've got this code(click to run):
a = input()
def most_used(w):
a = list(w)
indexes = []
g_count_max = a.count(a[0])
for letter in a:
count = 0
i = int()
for index in range(len(a)):
if letter == a[index] or letter == a[index].upper():
count += 1
i = index
if g_count_max <= count: //here is the problem.
g_count_max = count
if i not in indexes:
indexes.append(i)
letters = str()
for i in indexes:
letters = letters + a[i].lower()
return letters
print(most_used(a))
The problem is that it automatically adds first letter to the array because the sum of appearance of the first element is actually equal to the starter point of appearance(which is basically the first element).
Example 1:
s = 'hheee'
Output:
he
Example 2:
s = 'malaysia'
Output:
ma
I think what you're trying to can be much simplified by using the standard library's Counter object
from collections import Counter
def most_used(word):
# this has the form [(letter, count), ...] ordered from most to least common
most_common = Counter(word.lower()).most_common()
result = []
for letter, count in most_common:
if count == most_common[0][1]:
result.append(letter) # if equal largest -- add to result
else:
break # otherwise don't bother looping over the whole thing
return result # or ''.join(result) to return a string
You can use a dictionary comprehension with a list comprehension and max():
s = 'Kaskelen'
s_lower = s.lower() #convert string to lowercase
counts = {i: s_lower.count(i) for i in s_lower}
max_counts = max(counts.values()) #maximum count
most_common = ''.join(k for k,v in counts.items() if v == max_counts)
Yields:
'ke'
try this code using list comprehensions:
word = input('word=').lower()
letters = set(list(word))
max_w = max([word.count(item) for item in letters])
out = ''.join([item for item in letters if word.count(item)==max_w])
print(out)
Also you can import Counter lib:
from collections import Counter
a = "dagsdvwdsbd"
print(Counter(a).most_common(3)[0][0])
Then it returns:
d

Comparing occurrences of characters in strings

code
def jottoScore(s1,s2):
n = len(s1)
score = 0
sorteds1 = ''.join(sorted(s1))
sorteds2 = ''.join(sorted(s2))
if sorteds1 == sorteds2:
return n
if(sorteds1[0] == sorteds2[0]):
score = 1
if(sorteds2[1] == sorteds2[1]):
score = 2
if(sorteds2[2] == sorteds2[2]):
score = 3
if(sorteds2[3] == sorteds2[3]):
score = 4
if(sorteds2[4] == sorteds2[4]):
score = 5
return score
print jottoScore('cat', 'mattress')
I am trying to write a jottoScore function that will take in two strings and return how many character occurrences are shared between two strings.
I.E jottoScore('maat','caat') should return 3, because there are two As being shared and one T being shared.
I feel like this is a simple enough independent practice problem, but I can't figure out how to iterate over the strings and compare each character(I already sorted the strings alphabetically).
If you are on Python2.7+ then this is the approach I would take:
from collections import Counter
def jotto_score(str1, str2):
count1 = Counter(str1)
count2 = Counter(str2)
return sum(min(v, count2.get(k, 0)) for k, v in count1.items())
print jotto_score("caat", "maat")
print jotto_score("bigzeewig", "ringzbuz")
OUTPUT
3
4
in case they are sorted and the order matters:
>>> a = "maat"
>>> b = "caat"
>>> sum(1 for c1,c2 in zip(a,b) if c1==c2)
3
def chars_occur(string_a, string_b):
list_a, list_b = list(string_a), list(string_b) #makes a list of all the chars
count = 0
for c in list_a:
if c in list_b:
count += 1
list_b.remove(c)
return count
EDIT: this solution doesn't take into account if the chars are at the same index in the string or that the strings are of the same length.
A streamlined version of #sberry answer.
from collections import Counter
def jotto_score(str1, str2):
return sum((Counter(str1) & Counter(str2)).values())

How to produce multiple modes in Python?

Basically I just need to figure out how to produce modes (numbers occurring most frequently) from a list in Python, whether or not that list has multiple modes?
Something like this:
def print_mode (thelist):
counts = {}
for item in thelist:
counts [item] = counts.get (item, 0) + 1
maxcount = 0
maxitem = None
for k, v in counts.items ():
if v > maxcount:
maxitem = k
maxcount = v
if maxcount == 1:
print "All values only appear once"
if counts.values().count (maxcount) > 1:
print "List has multiple modes"
else:
print "Mode of list:", maxitem
But instead of returning strings in the "All values only appear once," or "list has multiple modes," I would want it to return the actual integers that it's referencing?
Make a Counter, then pick off the most common elements:
from collections import Counter
from itertools import groupby
l = [1,2,3,3,3,4,4,4,5,5,6,6,6]
# group most_common output by frequency
freqs = groupby(Counter(l).most_common(), lambda x:x[1])
# pick off the first group (highest frequency)
print([val for val,count in next(freqs)[1]])
# prints [3, 4, 6]
def mode(arr):
if len(arr) == 0:
return []
frequencies = {}
for num in arr:
frequencies[num] = frequencies.get(num,0) + 1
mode = max([value for value in frequencies.values()])
modes = []
for key in frequencies.keys():
if frequencies[key] == mode:
modes.append(key)
return modes
This code can tackle with any list. Make sure, elements of the list are numbers.
new in python 3.8's statistics module there is a function for that:
import statistics as s
print("mode(s): ",s.multimode([1,1,2,2]))
output: mode(s): [1, 2]

Categories

Resources