Recursive Python function to produce a list of anagrams - python

After a lot of head scratching and googling I still can't figure this out. I'm very new to Python and I'm struggling with the syntax. Conceptually I think I have a pretty decent idea of what I want to do and how to do so recursively. Technically however, coding it into Python however is proving to be a nightmare.
Basically I want to add all of the permutations of a word to list (no duplicate characters allowed), which can then be called by another program or function.
The return command and how to handle white space is really confusing me. I want the recursive function to "return" something once it unwinds but I don't want it to stop the function until all of the characters have iterated and all the permutations have been recursively generated within those iterations. When I run the code below nothing seems to happen.
def permutations(A, B = ''):
assert len(A) >= 0
assert len(A) == len(set(A))
res = []
if len(A) == 0: res = res.extend(B)
else:
for i in range(len(A)):
permutations(A[0:i] + A[i+1:], B + A[i])
return res
permutations('word'))
If I run the code below it prints out OK to my display pane, but I can't figure out how to get it into an output format that can be used by other program like a list.
def permutations(A, B = ''):
assert len(A) >= 0
assert len(A) == len(set(A))
if len(A) == 0: print(B)
else:
for i in range(len(A)):
permutations(A[0:i] + A[i+1:], B + A[i])
permutations('word')
Please could someone advise me on this, while I have some hair left! Very gratefully received.
Thank you
Jon

Basically your mistake is in
res = res.extend(B)
.extend() doesn't return a new list, but modifies the instance.
Another problem is that you don't use the return value from your recursive calls.
Here is one way to fix your code:
def permutations(A, B = ''):
assert len(A) >= 0
assert len(A) == len(set(A))
if len(A) == 0:
return [B]
else:
res = []
for i in range(len(A)):
res.extend(permutations(A[0:i] + A[i+1:], B + A[i]))
return res
print permutations('word')

Like this?
from itertools import permutations
a = [x for x in permutations('word')]
print a
Output:
>>[('w', 'o', 'r', 'd'), ('w', 'o', 'd', 'r'), ('w', 'r', 'o', 'd'),
>>('w', 'r', 'd', 'o'), ('w', 'd', 'o', 'r'), ('w', 'd', 'r', 'o'),
>>('o', 'w', 'r', 'd'), ..............
EDIT:
I just realized you said no duplicate characters allowed. It does not really matter for 'word', but let's say you have 'wordwwwdd'. Then you could do:
[x for x in permutations(''.join(set('wordwwwdd')))]
But it will mess up the order because of using set, so it will look like:
>> [('r', 'o', 'w', 'd'), ('r', 'o', 'd', 'w'), ('r', 'w', 'o', 'd')....

I would do it like this:
def permute_nondupe_letters_to_words(iterable):
return (''.join(i) for i in itertools.permutations(set(iterable)))
And to use it:
word = 'word'
permutation_generator = permute_nondupe_letters_to_words(word)
bucket_1, bucket_2 = [], []
for i in permutation_generator:
bucket_1.append(i)
if i == 'owdr':
break
for i in permutation_generator:
bucket_2.append(i)
And
print(len(bucket_1), len(bucket_2))
prints:
(10, 14)

Here is another way to approach this problem:
it is Python 2.7 and 3.3 compatible (have not yet tested with other versions)
it will accept input containing duplicate items, and only return unique output
(ie permutations("woozy") will only return "oowzy" once)
it returns output in sorted order (and will allow you to specify sort key and ascending or descending order)
it returns string output on string input
it runs as a generator, ie does not store all combinations in memory. If that's what you want, you have to explicitly say so (example shown below)
Edit: it occurred to me that I had omitted a length parameter, so I added one. You can now ask for things like all unique 4-letter permutations from a six-letter string.
Without further ado:
from collections import Counter
import sys
if sys.hexversion < 0x3000000:
# Python 2.x
dict_items_list = lambda d: d.items()
is_string = lambda s: isinstance(s, basestring)
rng = xrange
else:
# Python 3.x
dict_items_list = lambda d: list(d.items())
is_string = lambda s: isinstance(s, str)
rng = range
def permutations(lst, length=None, key=None, reverse=False):
"""
Generate all unique permutations of lst in sorted order
lst list of items to permute
length number of items to pick for each permutation (defaults to all items)
key sort-key for items in lst
reverse sort in reverse order?
"""
# this function is basically a shell, setting up the values
# for _permutations, which actually does most of the work
if length is None:
length = len(lst)
elif length < 1 or length > len(lst):
return [] # no possible answers
# 'woozy' => [('w', 1), ('o', 2), ('z', 1), ('y', 1)] # unknown order
items = dict_items_list(Counter(lst))
# => [('o', 2), ('w', 1), ('y', 1), ('z', 1)] # now in sorted order
items.sort(key=key, reverse=reverse)
if is_string(lst):
# if input was string, return generator of string
return (''.join(s) for s in _permutations(items, length))
else:
# return generator of list
return _permutations(items, length)
def _permutations(items, length):
if length == 1:
for item,num in items:
yield [item]
else:
for ndx in rng(len(items)):
# pick an item to start with
item, num = items[ndx]
# make new list of remaining items
if num == 1:
remaining_items = items[:ndx] + items[ndx+1:]
else:
remaining_items = items[:ndx] + [(item, num-1)] + items[ndx+1:]
# recurse against remaining items
for perm in _permutations(remaining_items, length-1):
yield [item]+perm
# test run!
words = list(permutations("woozy"))
results in
['oowyz',
'oowzy',
'ooywz',
'ooyzw',
'oozwy',
'oozyw',
'owoyz',
# ...
'zwooy',
'zwoyo',
'zwyoo',
'zyoow',
'zyowo',
'zywoo'] # 60 items = 5!/2!, as expected

Related

How can I sort this according to the length of the output?

So I wrote this code with the help of Stack Overflow users, and here it is...
def letter_total(filename: str):
chars = list(filename)
chars_unique = set(chars)
chars_unique.remove(' ')
result = []
for x in chars_unique:
result.append([x, chars.count(x)*('*')])
return result
def letter_count(filename: str):
l_count = letter_total(filename)
for c in sorted(l_count):
print(c[0], c[1])
print(letter_count(filename='How was your day'))
and this is the resulting output...
H *
a **
d *
o **
r *
s *
u *
w **
y **
None
but I want my output to be printed in order from most numbers of * to the least number of them. (if there are same number of '*' in two different letters, then I want it to return the two letters in alphabetical order)
somy output should look like this
a **
o **
w **
y **
d *
H *
r *
s *
How can I accomplish this without using key = lamda and only using sorted()??
You're asking to drive in a screw without using a screwdriver and only using your bare fingers, but okay.
If you store each tally as a list [negative_count, letter] instead of [letter, stars], the default ordering will first sort by negative_count (longer first) and use letter as a tie-breaker, exactly as you intended. Note that capitals sort before lowercase letters.
With minimal changes to your code:
def letter_total(filename: str):
chars = list(filename)
chars_unique = set(chars)
chars_unique.remove(' ')
result = []
for x in chars_unique:
result.append([-chars.count(x), x])
return result
def letter_count(filename: str):
l_count = letter_total(filename)
for c in sorted(l_count):
print(c[1], (-c[0]) * '*')
print(letter_count(filename='How was your day'))
Then a couple more pointers:
letter_count is already doing the printing; no need to also print its return value (which is None).
It's more efficient and idiomatic to use tuples (stars, letter) instead of lists here.
This code is O(n²) which means it's rather inefficient. For each unique letter, it's running through the entire string to count just that letter. It's more efficient to run through the string once, and keep a tally in a dict. Then as the last step, convert the dict into a list of tuples.
Putting all that together:
def letter_total(filename: str):
l_count = {}
for x in filename:
if x != ' ':
if x not in l_count:
l_count[x] = 0
l_count[x] -= 1
result = [(count, letter) for letter, count in l_count.items()]
return result
def letter_count(filename: str):
l_count = letter_total(filename)
for c in sorted(l_count):
print(c[1], (-c[0]) * '*')
print(letter_count(filename='How was your day'))
I understand you're just learning, but in production code, I would recommend collections.Counter which does exactly this job for you:
>>> from collections import Counter
>>> list(Counter('How was your day').items())
[(' ', 3), ('H', 1), ('a', 2), ('d', 1), ('o', 2), ('r', 1), ('s', 1), ('u', 1), ('w', 2), ('y', 2)]
clean the input string
then use Counter with its method most_common to get a list of letters counted by their occurence
then group the output list of tuples l by second element
apply sorted
from collections import Counter
from typing import List, Tuple
s: str = 'How was your day'.replace(" ", "")
ll: List[Tuple[str, int]] = Counter(s).most_common()
res = sum([sorted(v, key=lambda ch: ch[0].lower()) for k,v in groupby(ll), lambda x: x[1])], [])
res = [(x, y * "*") for x,y in res]
OUTPUT:
[('a', '**'),
('o', '**'),
('w', '**'),
('y', '**'),
('d', '*'),
('H', '*'),
('r', '*'),
('s', '*'),
('u', '*')]
This way:
sorted(sorted(l_count), key = lambda i:-i[1])

Using zip_longest on unequal lists but repeat the last entry instead of returning None

There is an existing thread about this
Zipping unequal lists in python in to a list which does not drop any element from longer list being zipped
But it's not quite I'm after.
Instead of returning None, I need it to copy the previous entry on the list.
Is this possible?
a = ["bottle","water","sky"]
b = ["red", "blue"]
for i in itertools.izip_longest(a,b):
print i
#result
# ('bottle', 'red')
# ('water', 'blue')
# ('sky', None)
# What I want on the third line is
# ('sky', 'blue')
itertools.izip_longest takes an optional fillvalue argument that provides the value that is used after the shorter list has been exhausted. fillvalue defaults to None, giving the behaviour you show in your question, but you can specify a different value to get the behaviour you want:
fill = a[-1] if (len(a) < len(b)) else b[-1]
for i in itertools.izip_longest(a, b, fillvalue=fill):
print i
(Obviously if the same list is always the shorter one then choosing the fill character is even easier.)
You can chain the shorter list with a repeat of its last value. Then using regular izip, the result will be the length of the longer list:
from itertools import izip, repeat, chain
def izip_longest_repeating(seq1, seq2):
if len(seq1) < len(seq2):
repeating = seq1[-1]
seq1 = chain(seq1, repeat(repeating))
else:
repeating = seq2[-1]
seq2 = chain(seq2, repeat(repeating))
return izip(seq1, seq2)
print(list(izip_longest_repeating(a, b)))
# [('bottle', 'red'), ('water', 'blue'), ('sky', 'blue')]
And here's a version that should work for any iterables:
from itertools import izip as zip # Python2 only
def zip_longest_repeating(*iterables):
iters = [iter(i) for i in iterables]
sentinel = object()
vals = tuple(next(it, sentinel) for it in iters)
if any(val is sentinel for val in vals):
return
yield vals
while True:
cache = vals
vals = tuple(next(it, sentinel) for it in iters)
if all(val is sentinel for val in vals):
return
vals = tuple(old if new is sentinel else new for old, new in zip(cache, vals))
yield vals
list(zip_longest_repeating(['a'], ['b', 'c'], ['d', 'r', 'f']))
# [('a', 'b', 'd'), ('a', 'c', 'r'), ('a', 'c', 'f')]

How would I drop the last x characters of a string as a python generator while using calls to iter and while?

I’m having trouble writing a generator function that takes an iterable and one more parameter which is an integer x. It outputs every value except for the last x values. It doesn’t know how to count how many values the iterable outputs.
I don’t know how to do this using a while loop as well as iter. I also need to use a comprehension that creates a list to store x values at most.
Lets say we call :
for i in func_function(“abcdefghijk”,5):
print(i,end =”)
It should print abcdef.
Here's what I've tried:
def func_function(iterable, x):
while True:
l = []
for x in iter(iterable):
if len(x) == x:
yield x
The trick is to turn this from lookahead into lookbehind.
I'd do this by iterating over the input and maintaining a window of the most recent n elements:
def except_last_n(iterable, n):
last_n = []
for val in iterable:
last_n.append(val)
if len(last_n) > n:
yield last_n.pop(0)
for val in except_last_n(range(10), 3):
print(val)
Rewriting this as a while loop and iter is left as exercise for the reader.
def except_last_n(iterable, n):
last_n = [val for val in iterable]
if len(last_n) > n:
yield last_n.pop(0)
from collections import deque
def drop_last_few(iterable, x=5):
it = iter(iterable)
data = deque(maxlen=x)
data.extend([next(it) for i in range(x)])
for val in it:
yield data[0]
data.append(val)
This uses a double-ended queue as storage to cache at most x elements. Demo:
>>> print(*drop_last_few("abcdefghijk", 5))
a b c d e f
Strings are sliceable:
def func_function(iterable, x):
yield from iterable[:-x]
print(func_function("asdfgkjbewqrfgkjb",8))
k = list(func_function("asdfgkjbewqrfgkjb",8))
print(k) # ['a', 's', 'd', 'f', 'g', 'k', 'j', 'b', 'e']
The while loop, iter and l=[] are not needed...

Compare 2 Strings in Python

So, I need to figure out a program that when you input 2 different strings of the same length it will return NOT print the number of differences between the two strings. The order of the characters matters as well.
For example if you input ("abcdef", "aabccf")
it should return 4.
("abcdef", "accddf") should return 2.
All I have so far is:
def differencecount ( A, B):
counter = 0
str1 = list (A)
str2 = list (B)
for letter in str1:
if letter == str2:
counter = counter + 1
return counter
All this does is return 0 though so I feel like I'm missing something.
I would use
def difference(word_one, word_two):
return sum(l1 != l2 for l1, l2 in zip(word_one, word_two))
Which works like
>>> difference('abcdef', 'abcdef')
0
>>> difference('abcdef', 'abcabc')
3
You can zip the strings together and then count how many different pairs there are:
def chardifferencecounter(x, y):
return len([1 for c1, c2 in zip(x, y) if c1 != c2])
>>> chardifferencecounter('abcdef', 'aabccf')
4
>>> chardifferencecounter('abcdef', 'accddf')
2
Explanation:
Zipping the strings together produces this:
>>> s1 = 'abcdef'
>>> s2 = 'aabccf'
>>> zip(s1, s2)
[('a', 'a'), ('b', 'a'), ('c', 'b'), ('d', 'c'), ('e', 'c'), ('f', 'f')]
so it takes a character from the same position in each string and pairs them together. So you just need to count how many pairs are different. That can be done using a list comprehension to create a list with those pairs that are the same filtered out, and then get the length of that list.
Just for a different look here is a solution that doesn't use zip or enumerate:
def chardifferencecounter(x,y):
if len(x) != len(y):
raise Exception('Please enter strings of equal length')
return sum(x[i] != y[i] for i in range(len(x)))
Note that this solution also raises an exception when x and y are of different lengths, which is what you wanted in your comment.

how to replace the alphabetically smallest letter by 1, the next smallest by 2 but do not discard multiple occurrences of a letter?

I am using Python 3 and I want to write a function that takes a string of all capital letters, so suppose s = 'VENEER', and gives me the following output '614235'.
The function I have so far is:
def key2(s):
new=''
for ch in s:
acc=0
for temp in s:
if temp<=ch:
acc+=1
new+=str(acc)
return(new)
If s == 'VENEER' then new == '634335'. If s contains no duplicates, the code works perfectly.
I am stuck on how to edit the code to get the output stated in the beginning.
Note that the built-in method for replacing characters within a string, str.replace, takes a third argument; count. You can use this to your advantage, replacing only the first appearance of each letter (obviously once you replace the first 'E', the second one will become the first appearance, and so on):
def process(s):
for i, c in enumerate(sorted(s), 1):
## print s # uncomment to see process
s = s.replace(c, str(i), 1)
return s
I have used the built-in functions sorted and enumerate to get the appropriate numbers to replace the characters:
1 2 3 4 5 6 # 'enumerate' from 1 -> 'i'
E E E N R V # 'sorted' input 's' -> 'c'
Example usage:
>>> process("VENEER")
'614235'
One way would be to use numpy.argsort to find the order, then find the ranks, and join them:
>>> s = 'VENEER'
>>> order = np.argsort(list(s))
>>> rank = np.argsort(order) + 1
>>> ''.join(map(str, rank))
'614235'
You can use a regex:
import re
s="VENEER"
for n, c in enumerate(sorted(s), 1):
s=re.sub('%c' % c, '%i' % n, s, count=1)
print s
# 614235
You can also use several nested generators:
def indexes(seq):
for v, i in sorted((v, i) for (i, v) in enumerate(seq)):
yield i
print ''.join('%i' % (e+1) for e in indexes(indexes(s)))
# 614235
From your title, you may want to do like this?
>>> from collections import OrderedDict
>>> s='VENEER'
>>> d = {k: n for n, k in enumerate(OrderedDict.fromkeys(sorted(s)), 1)}
>>> "".join(map(lambda k: str(d[k]), s))
'412113'
As #jonrsharpe commented I didn't need to use OrderedDict.
def caps_to_nums(in_string):
indexed_replaced_string = [(idx, val) for val, (idx, ch) in enumerate(sorted(enumerate(in_string), key=lambda x: x[1]), 1)]
return ''.join(map(lambda x: str(x[1]), sorted(indexed_replaced_string)))
First we run enumerate to be able to save the natural sort order
enumerate("VENEER") -> [(0, 'V'), (1, 'E'), (2, 'N'), (3, 'E'), (4, 'E'), (5, 'R')]
# this gives us somewhere to RETURN to later.
Then we sort that according to its second element, which is alphabetical, and run enumerate again with a start value of 1 to get the replacement value. We throw away the alpha value, since it's not needed anymore.
[(idx, val) for val, (idx, ch) in enumerate(sorted([(0, 'V'), (1, 'E'), ...], key = lambda x: x[1]), start=1)]
# [(1, 1), (3, 2), (4, 3), (2, 4), (5, 5), (0, 6)]
Then map the second element (our value) sorting by the first element (the original index)
map(lambda x: str(x[1]), sorted(replacement_values)
and str.join it
''.join(that_mapping)
Ta-da!

Categories

Resources