Compare 2 Strings in Python - python

So, I need to figure out a program that when you input 2 different strings of the same length it will return NOT print the number of differences between the two strings. The order of the characters matters as well.
For example if you input ("abcdef", "aabccf")
it should return 4.
("abcdef", "accddf") should return 2.
All I have so far is:
def differencecount ( A, B):
counter = 0
str1 = list (A)
str2 = list (B)
for letter in str1:
if letter == str2:
counter = counter + 1
return counter
All this does is return 0 though so I feel like I'm missing something.

I would use
def difference(word_one, word_two):
return sum(l1 != l2 for l1, l2 in zip(word_one, word_two))
Which works like
>>> difference('abcdef', 'abcdef')
0
>>> difference('abcdef', 'abcabc')
3

You can zip the strings together and then count how many different pairs there are:
def chardifferencecounter(x, y):
return len([1 for c1, c2 in zip(x, y) if c1 != c2])
>>> chardifferencecounter('abcdef', 'aabccf')
4
>>> chardifferencecounter('abcdef', 'accddf')
2
Explanation:
Zipping the strings together produces this:
>>> s1 = 'abcdef'
>>> s2 = 'aabccf'
>>> zip(s1, s2)
[('a', 'a'), ('b', 'a'), ('c', 'b'), ('d', 'c'), ('e', 'c'), ('f', 'f')]
so it takes a character from the same position in each string and pairs them together. So you just need to count how many pairs are different. That can be done using a list comprehension to create a list with those pairs that are the same filtered out, and then get the length of that list.

Just for a different look here is a solution that doesn't use zip or enumerate:
def chardifferencecounter(x,y):
if len(x) != len(y):
raise Exception('Please enter strings of equal length')
return sum(x[i] != y[i] for i in range(len(x)))
Note that this solution also raises an exception when x and y are of different lengths, which is what you wanted in your comment.

Related

How can I sort this according to the length of the output?

So I wrote this code with the help of Stack Overflow users, and here it is...
def letter_total(filename: str):
chars = list(filename)
chars_unique = set(chars)
chars_unique.remove(' ')
result = []
for x in chars_unique:
result.append([x, chars.count(x)*('*')])
return result
def letter_count(filename: str):
l_count = letter_total(filename)
for c in sorted(l_count):
print(c[0], c[1])
print(letter_count(filename='How was your day'))
and this is the resulting output...
H *
a **
d *
o **
r *
s *
u *
w **
y **
None
but I want my output to be printed in order from most numbers of * to the least number of them. (if there are same number of '*' in two different letters, then I want it to return the two letters in alphabetical order)
somy output should look like this
a **
o **
w **
y **
d *
H *
r *
s *
How can I accomplish this without using key = lamda and only using sorted()??
You're asking to drive in a screw without using a screwdriver and only using your bare fingers, but okay.
If you store each tally as a list [negative_count, letter] instead of [letter, stars], the default ordering will first sort by negative_count (longer first) and use letter as a tie-breaker, exactly as you intended. Note that capitals sort before lowercase letters.
With minimal changes to your code:
def letter_total(filename: str):
chars = list(filename)
chars_unique = set(chars)
chars_unique.remove(' ')
result = []
for x in chars_unique:
result.append([-chars.count(x), x])
return result
def letter_count(filename: str):
l_count = letter_total(filename)
for c in sorted(l_count):
print(c[1], (-c[0]) * '*')
print(letter_count(filename='How was your day'))
Then a couple more pointers:
letter_count is already doing the printing; no need to also print its return value (which is None).
It's more efficient and idiomatic to use tuples (stars, letter) instead of lists here.
This code is O(n²) which means it's rather inefficient. For each unique letter, it's running through the entire string to count just that letter. It's more efficient to run through the string once, and keep a tally in a dict. Then as the last step, convert the dict into a list of tuples.
Putting all that together:
def letter_total(filename: str):
l_count = {}
for x in filename:
if x != ' ':
if x not in l_count:
l_count[x] = 0
l_count[x] -= 1
result = [(count, letter) for letter, count in l_count.items()]
return result
def letter_count(filename: str):
l_count = letter_total(filename)
for c in sorted(l_count):
print(c[1], (-c[0]) * '*')
print(letter_count(filename='How was your day'))
I understand you're just learning, but in production code, I would recommend collections.Counter which does exactly this job for you:
>>> from collections import Counter
>>> list(Counter('How was your day').items())
[(' ', 3), ('H', 1), ('a', 2), ('d', 1), ('o', 2), ('r', 1), ('s', 1), ('u', 1), ('w', 2), ('y', 2)]
clean the input string
then use Counter with its method most_common to get a list of letters counted by their occurence
then group the output list of tuples l by second element
apply sorted
from collections import Counter
from typing import List, Tuple
s: str = 'How was your day'.replace(" ", "")
ll: List[Tuple[str, int]] = Counter(s).most_common()
res = sum([sorted(v, key=lambda ch: ch[0].lower()) for k,v in groupby(ll), lambda x: x[1])], [])
res = [(x, y * "*") for x,y in res]
OUTPUT:
[('a', '**'),
('o', '**'),
('w', '**'),
('y', '**'),
('d', '*'),
('H', '*'),
('r', '*'),
('s', '*'),
('u', '*')]
This way:
sorted(sorted(l_count), key = lambda i:-i[1])

Counting the number of times a letter occurs at a certain position using python

I'm a python beginner and I've come across this problem and I'm not sure how I'd go about tackling it.
If I have the following sequence/strings:
GATCCG
GTACGC
How to I count the frequency each letter occurs at each position. ie) G occurs at position one twice in the two sequences, A occurs at position 1 zero times etc.
Any help would be appreciated, thank you!
You can use a combination of defaultdict and enumerate like so:
from collections import defaultdict
sequences = ['GATCCG', 'GTACGC']
d = defaultdict(lambda: defaultdict(int)) # d[char][position] = count
for seq in sequences:
for i, char in enumerate(seq): # enum('abc'): [(0,'a'),(1,'b'),(2,'c')]
d[char][i] += 1
d['C'][3] # 2
d['C'][4] # 1
d['C'][5] # 1
This builds a nested defaultdict that takes the character as first and the position as second key and provides the count of occurrences of said character in said position.
If you want lists of position-counts:
max_len = max(map(len, sequences))
d = defaultdict(lambda: [0]*max_len) # d[char] = [pos0, pos12, ...]
for seq in sequences:
for i, char in enumerate(seq):
d[char][i] += 1
d['G'] # [2, 0, 0, 0, 1, 1]
Not sure this is the best way but you can use zip to do a sort of transpose on the the strings, producing tuples of the letters in each position, e.g.:
x = 'GATCCG'
y = 'GTACGC'
zipped = zip(x,y)
print zipped
will produce as output:
[('G', 'G'), ('A', 'T'), ('T', 'A'), ('C', 'C'), ('C', 'G'), ('G', 'C')]
You can see from the tuples that the first positions of the two strings contain two Gs, the second positions contain an A and a T, etc. Then you could use Counter (or some other method) to get at what you want.

How to store multiple numbers and then add to each individual one

I have written a for loop which gives me all the values of a specific letters place in the alphabet.
For example the word hello will give me the numbers 8, 5, 12, 12 and 14. Now I want to add them to another word which is the same length for e.g abcde, which would be 1, 2, 3, 4 and 5. Now I want to add the two numbers together but keeping the individual numbers for example 8+1, 5+2, 12+3, 12+4 and 14+5.
This is the code I have so far
for letter in message:
if letter.isalpha() == True:
x = alphabet.find(letter)
for letter in newkeyword:
if letter.isalpha() == True:
y = alphabet.find(letter)
When I try adding x and y, I get a single number. Can someone help?
If you are planning to do further calculations with the numbers consider this solution which creates a list of tuples (also by using zip, as #Kashyap Maduri suggested):
messages = zip(message, newkeyword)
positions = [(alphabet.find(m), alphabet.find(n)) for m, n in messages]
sums = [(a, b, a + b, "{}+{}".format(a,b)) for a, b in positions]
Each tuple in the sums list consists of both operands, their sum and a string representation of the addition.
Then you could for example print them sorted by their sum:
for a, b, sum_ab, sum_as_str in sorted(sums, key = lambda x: x[2]):
print(sum_as_str)
Edit
when i run the program i want it to give me the answer of those sums for e.g 14+5=19 i just want the 19 part any ideas? – Shahzaib Shuz Bari
This makes it a lot easier:
messages = zip(message, newkeyword)
sums = [alphabet.find(m) + alphabet.find(n) for m, n in messages]
And you get a list of all the sums.
You are looking for zip function. It zips 2 or more iterables together. For e.g.
l1 = 'abc'
l2 = 'def'
zip(l1, l2)
# [('a', 'd'), ('b', 'e'), ('c', 'f')] in python 2.7
and
list(zip(l1, l2))
# [('a', 'd'), ('b', 'e'), ('c', 'f')] in python 3
So here is a solution for your problem:
l = list(zip(message, newkeyword))
[str(alphabet.find(x)) + '+' + str(alphabet.find(y)) for x, y in l]

how to replace the alphabetically smallest letter by 1, the next smallest by 2 but do not discard multiple occurrences of a letter?

I am using Python 3 and I want to write a function that takes a string of all capital letters, so suppose s = 'VENEER', and gives me the following output '614235'.
The function I have so far is:
def key2(s):
new=''
for ch in s:
acc=0
for temp in s:
if temp<=ch:
acc+=1
new+=str(acc)
return(new)
If s == 'VENEER' then new == '634335'. If s contains no duplicates, the code works perfectly.
I am stuck on how to edit the code to get the output stated in the beginning.
Note that the built-in method for replacing characters within a string, str.replace, takes a third argument; count. You can use this to your advantage, replacing only the first appearance of each letter (obviously once you replace the first 'E', the second one will become the first appearance, and so on):
def process(s):
for i, c in enumerate(sorted(s), 1):
## print s # uncomment to see process
s = s.replace(c, str(i), 1)
return s
I have used the built-in functions sorted and enumerate to get the appropriate numbers to replace the characters:
1 2 3 4 5 6 # 'enumerate' from 1 -> 'i'
E E E N R V # 'sorted' input 's' -> 'c'
Example usage:
>>> process("VENEER")
'614235'
One way would be to use numpy.argsort to find the order, then find the ranks, and join them:
>>> s = 'VENEER'
>>> order = np.argsort(list(s))
>>> rank = np.argsort(order) + 1
>>> ''.join(map(str, rank))
'614235'
You can use a regex:
import re
s="VENEER"
for n, c in enumerate(sorted(s), 1):
s=re.sub('%c' % c, '%i' % n, s, count=1)
print s
# 614235
You can also use several nested generators:
def indexes(seq):
for v, i in sorted((v, i) for (i, v) in enumerate(seq)):
yield i
print ''.join('%i' % (e+1) for e in indexes(indexes(s)))
# 614235
From your title, you may want to do like this?
>>> from collections import OrderedDict
>>> s='VENEER'
>>> d = {k: n for n, k in enumerate(OrderedDict.fromkeys(sorted(s)), 1)}
>>> "".join(map(lambda k: str(d[k]), s))
'412113'
As #jonrsharpe commented I didn't need to use OrderedDict.
def caps_to_nums(in_string):
indexed_replaced_string = [(idx, val) for val, (idx, ch) in enumerate(sorted(enumerate(in_string), key=lambda x: x[1]), 1)]
return ''.join(map(lambda x: str(x[1]), sorted(indexed_replaced_string)))
First we run enumerate to be able to save the natural sort order
enumerate("VENEER") -> [(0, 'V'), (1, 'E'), (2, 'N'), (3, 'E'), (4, 'E'), (5, 'R')]
# this gives us somewhere to RETURN to later.
Then we sort that according to its second element, which is alphabetical, and run enumerate again with a start value of 1 to get the replacement value. We throw away the alpha value, since it's not needed anymore.
[(idx, val) for val, (idx, ch) in enumerate(sorted([(0, 'V'), (1, 'E'), ...], key = lambda x: x[1]), start=1)]
# [(1, 1), (3, 2), (4, 3), (2, 4), (5, 5), (0, 6)]
Then map the second element (our value) sorting by the first element (the original index)
map(lambda x: str(x[1]), sorted(replacement_values)
and str.join it
''.join(that_mapping)
Ta-da!

Recursive Python function to produce a list of anagrams

After a lot of head scratching and googling I still can't figure this out. I'm very new to Python and I'm struggling with the syntax. Conceptually I think I have a pretty decent idea of what I want to do and how to do so recursively. Technically however, coding it into Python however is proving to be a nightmare.
Basically I want to add all of the permutations of a word to list (no duplicate characters allowed), which can then be called by another program or function.
The return command and how to handle white space is really confusing me. I want the recursive function to "return" something once it unwinds but I don't want it to stop the function until all of the characters have iterated and all the permutations have been recursively generated within those iterations. When I run the code below nothing seems to happen.
def permutations(A, B = ''):
assert len(A) >= 0
assert len(A) == len(set(A))
res = []
if len(A) == 0: res = res.extend(B)
else:
for i in range(len(A)):
permutations(A[0:i] + A[i+1:], B + A[i])
return res
permutations('word'))
If I run the code below it prints out OK to my display pane, but I can't figure out how to get it into an output format that can be used by other program like a list.
def permutations(A, B = ''):
assert len(A) >= 0
assert len(A) == len(set(A))
if len(A) == 0: print(B)
else:
for i in range(len(A)):
permutations(A[0:i] + A[i+1:], B + A[i])
permutations('word')
Please could someone advise me on this, while I have some hair left! Very gratefully received.
Thank you
Jon
Basically your mistake is in
res = res.extend(B)
.extend() doesn't return a new list, but modifies the instance.
Another problem is that you don't use the return value from your recursive calls.
Here is one way to fix your code:
def permutations(A, B = ''):
assert len(A) >= 0
assert len(A) == len(set(A))
if len(A) == 0:
return [B]
else:
res = []
for i in range(len(A)):
res.extend(permutations(A[0:i] + A[i+1:], B + A[i]))
return res
print permutations('word')
Like this?
from itertools import permutations
a = [x for x in permutations('word')]
print a
Output:
>>[('w', 'o', 'r', 'd'), ('w', 'o', 'd', 'r'), ('w', 'r', 'o', 'd'),
>>('w', 'r', 'd', 'o'), ('w', 'd', 'o', 'r'), ('w', 'd', 'r', 'o'),
>>('o', 'w', 'r', 'd'), ..............
EDIT:
I just realized you said no duplicate characters allowed. It does not really matter for 'word', but let's say you have 'wordwwwdd'. Then you could do:
[x for x in permutations(''.join(set('wordwwwdd')))]
But it will mess up the order because of using set, so it will look like:
>> [('r', 'o', 'w', 'd'), ('r', 'o', 'd', 'w'), ('r', 'w', 'o', 'd')....
I would do it like this:
def permute_nondupe_letters_to_words(iterable):
return (''.join(i) for i in itertools.permutations(set(iterable)))
And to use it:
word = 'word'
permutation_generator = permute_nondupe_letters_to_words(word)
bucket_1, bucket_2 = [], []
for i in permutation_generator:
bucket_1.append(i)
if i == 'owdr':
break
for i in permutation_generator:
bucket_2.append(i)
And
print(len(bucket_1), len(bucket_2))
prints:
(10, 14)
Here is another way to approach this problem:
it is Python 2.7 and 3.3 compatible (have not yet tested with other versions)
it will accept input containing duplicate items, and only return unique output
(ie permutations("woozy") will only return "oowzy" once)
it returns output in sorted order (and will allow you to specify sort key and ascending or descending order)
it returns string output on string input
it runs as a generator, ie does not store all combinations in memory. If that's what you want, you have to explicitly say so (example shown below)
Edit: it occurred to me that I had omitted a length parameter, so I added one. You can now ask for things like all unique 4-letter permutations from a six-letter string.
Without further ado:
from collections import Counter
import sys
if sys.hexversion < 0x3000000:
# Python 2.x
dict_items_list = lambda d: d.items()
is_string = lambda s: isinstance(s, basestring)
rng = xrange
else:
# Python 3.x
dict_items_list = lambda d: list(d.items())
is_string = lambda s: isinstance(s, str)
rng = range
def permutations(lst, length=None, key=None, reverse=False):
"""
Generate all unique permutations of lst in sorted order
lst list of items to permute
length number of items to pick for each permutation (defaults to all items)
key sort-key for items in lst
reverse sort in reverse order?
"""
# this function is basically a shell, setting up the values
# for _permutations, which actually does most of the work
if length is None:
length = len(lst)
elif length < 1 or length > len(lst):
return [] # no possible answers
# 'woozy' => [('w', 1), ('o', 2), ('z', 1), ('y', 1)] # unknown order
items = dict_items_list(Counter(lst))
# => [('o', 2), ('w', 1), ('y', 1), ('z', 1)] # now in sorted order
items.sort(key=key, reverse=reverse)
if is_string(lst):
# if input was string, return generator of string
return (''.join(s) for s in _permutations(items, length))
else:
# return generator of list
return _permutations(items, length)
def _permutations(items, length):
if length == 1:
for item,num in items:
yield [item]
else:
for ndx in rng(len(items)):
# pick an item to start with
item, num = items[ndx]
# make new list of remaining items
if num == 1:
remaining_items = items[:ndx] + items[ndx+1:]
else:
remaining_items = items[:ndx] + [(item, num-1)] + items[ndx+1:]
# recurse against remaining items
for perm in _permutations(remaining_items, length-1):
yield [item]+perm
# test run!
words = list(permutations("woozy"))
results in
['oowyz',
'oowzy',
'ooywz',
'ooyzw',
'oozwy',
'oozyw',
'owoyz',
# ...
'zwooy',
'zwoyo',
'zwyoo',
'zyoow',
'zyowo',
'zywoo'] # 60 items = 5!/2!, as expected

Categories

Resources