Run Length Encoding in Python with List Comprehension - python

I have a more basic Run Length Encoding question compared to many of the questions about this topic that have already been answered. Essentially, I'm trying to take the string
string = 'aabccccaaa'
and have it return
a2b1c4a3
I thought that if I can manage to get all the information into a list like I have illustrated below, I would easily be able to return a2b1c4a3
test = [['a','a'], ['b'], ['c','c','c','c'], ['a','a','a']]
I came up with the following code so far, but was wondering if someone would be able to help me figure out how to make it create the output I illustrated above.
def string_compression():
for i in xrange(len(string)):
prev_item, current_item = string[i-1], string[i]
print prev_item, current_item
if prev_item == current_item:
<HELP>
If anyone has any additional comments regarding more efficient ways to go about solving a question like this I am all ears!

You can use itertools.groupby():
from itertools import groupby
grouped = [list(g) for k, g in groupby(string)]
This will produce your per-letter groups as a list of lists.
You can turn that into a RLE in one step:
rle = ''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])
Each k is the letter being grouped, each g an iterator producing N times the same letter; the sum(1 for _ in g) expression counts those in the most efficient way possible.
Demo:
>>> from itertools import groupby
>>> string = 'aabccccaaa'
>>> [list(g) for k, g in groupby(string)]
[['a', 'a'], ['b'], ['c', 'c', 'c', 'c'], ['a', 'a', 'a']]
>>> ''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])
'a2b1c4a3'

Consider using the more_itertools.run_length tool.
Demo
import more_itertools as mit
iterable = "aabccccaaa"
list(mit.run_length.encode(iterable))
# [('a', 2), ('b', 1), ('c', 4), ('a', 3)]
Code
"".join(f"{x[0]}{x[1]}" for x in mit.run_length.encode(iterable)) # python 3.6
# 'a2b1c4a3'
"".join(x[0] + str(x[1]) for x in mit.run_length.encode(iterable))
# 'a2b1c4a3'
Alternative itertools/functional style:
"".join(map(str, it.chain.from_iterable(x for x in mit.run_length.encode(iterable))))
# 'a2b1c4a3'
Note: more_itertools is a third-party library that installable via pip install more_itertools.

I'm a Python beginner and this is what I wrote for RLE.
s = 'aabccccaaa'
grouped_d = [(k, len(list(g))) for k, g in groupby(s)]
result = ''
for key, count in grouped_d:
result += key + str(count)
print(f'result = {result}')

Related

How to extract each word consecutive to its own previous number in a string and sorting the result in Python

Input : x3b4U5i2
Output : bbbbiiUUUUUxxx
How can i solve this problem in Python. I have to print the word next to it's number n times and sort it
It wasn't clear if multiple digit counts or groups of letters should be handled. Here's a solution that does all of that:
import re
def main(inp):
parts = re.split(r"(\d+)", inp)
parts_map = {parts[i]:int(parts[i+1]) for i in range(0, len(parts)-1, 2)}
print(''.join([c*parts_map[c] for c in sorted(parts_map.keys(),key=str.lower)]))
main("x3b4U5i2")
main("x3brx4U5i2")
main("x23b4U35i2")
Result:
bbbbiiUUUUUxxx
brxbrxbrxbrxiiUUUUUxxx
bbbbiiUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUxxxxxxxxxxxxxxxxxxxxxxx
I'm assuming the formatting will always be <char><int> with <int> being in between 1 and 9...
input_ = "x3b4U5i2"
result_list = [input_[i]*int(input_[i+1]) for i in range(0, len(input_), 2)]
result_list.sort(key=str.lower)
result = ''.join(result_list)
There's probably a much more performance-oriented approach to solving this, it's just the first solution that came into my limited mind.
Edit
After the feedback in the comments I've tried to improve performance by sorting it first, but I have actually decreased performance in the following implementaiton:
input_ = "x3b4U5i2"
def sort_first(value):
return value[0].lower()
tuple_construct = [(input_[i], int(input_[i+1])) for i in range(0, len(input_), 2)]
tuple_construct.sort(key=sort_first)
result = ''.join([tc[0] * tc[1] for tc in tuple_construct])
Execution time for 100,000 iterations on it:
1) The execution time is: 0.353036
2) The execution time is: 0.4361724
One option, extract the character/digit(s) pairs with a regex, sort them by letter (ignoring case), multiply the letter by the number of repeats, join:
s = 'x3b4U5i2'
import re
out = ''.join([c*int(i) for c,i in
sorted(re.findall('(\D)(\d+)', s),
key=lambda x: x[0].casefold())
])
print(out)
Output: bbbbiiUUUUUxxx
If you want to handle multiple characters you can use '(\D+)(\d+)'
No list comprehensions or generator expressions in sight. Just using re.sub with a lambda to expand the length encoding, then sorting that, and then joing that back into a string.
import re
s = "x3b4U5i2"
''.join(sorted(re.sub(r"(\D+)(\d+)",
lambda m: m.group(1)*int(m.group(2)),
s),
key=lambda x: x[0].casefold()))
# 'bbbbiiUUUUUxxx'
If we use re.findall to extract a list of pairs of strings and multipliers:
import re
s = 'x3b4U5i2'
pairs = re.findall(r"(\D+)(\d+)", s)
Then we can use some functional style to sort that list before expanding it.
from operator import itemgetter
def compose(f, g):
return lambda x: f(g(x))
sorted(pairs, key=compose(str.lower, itemgetter(0)))
# [('b', '4'), ('i', '2'), ('U', '5'), ('x', '3')]

Permutations of several lists in python efficiently

I'm trying to write a python script that will generate random permutations of several lists without repeating
i.e. [a,b] [c,d]
a, c
b,c,
a,d
b,d
I can generate every permutation using the following, however the result is somewhat non random:
for r in itertools.product(list1, list2):
target.write("%s,%s" % (r[0], r[1])
Does anyone know a way i can implement this such that I can extract only 2 permutations, and they will be completely random but ensure that they will never be repeated?
You can use random.choice():
>>> from itertools import product
>>> import random
>>> l1 = ['a', 'b', 'c']
>>> l2 = ['d', 'e', 'f']
>>> prod = tuple(product(l1, l2))
>>>
>>> random.choice(prod)
('c', 'e')
>>> random.choice(prod)
('a', 'f')
>>> random.choice(prod)
('c', 'd')
Or simply use a nested list comprehension for creating the products:
>>> lst = [(i, j) for j in l2 for i in l1]
If you don't want to produce duplicate items you can use a set object which will create a set object from your product without an specified order then you can simply pot the items from it:
>>> prod = set(product(l1, l2))
>>>
>>> prod.pop()
('c', 'f')
>>> prod.pop()
('a', 'f')
>>> prod.pop()
('a', 'd')
Or use shuffle in order to shuffle the iterable, as #ayhan has suggested in his answer.
You can use random.shuffle then pop to make sure the results will not be repeated:
list1 = ["a", "b"]
list2 = ["c", "d"]
p = list(itertools.product(list1, list2))
random.shuffle(p)
e1 = p.pop()
e2 = p.pop()
list(itertools.product()) is not efficient as it generates and stores all of them. If you have big lists you can generate one at a time and check whether they are duplicated:
s = set()
list1 = ["a", "b"]
list2 = ["c", "d"]
while True:
r = (random.choice(list1), random.choice(list2))
if r not in s:
target.write("%s,%s" % (r[0], r[1]))
s.add(r)
break

'backwards' enumerate

Is there a way to get a generator/iterator that yields the reverse of enumerate:
from itertools import izip, count
enumerate(I) # -> (indx, v)
izip(I, count()) # -> (v, indx)
without pulling in itertools?
You can do this with a simple generator expression:
((v, i) for i, v in enumerate(some_iterable))
Here as a list comprehension to easily see the output:
>>> [(v, i) for i, v in enumerate(["A", "B", "C"])]
[('A', 0), ('B', 1), ('C', 2)]
((v, indx) for indx, v in enumerate(I))
if you really want to avoid itertools. Why would you?
I'm not sure if I have understood your question right. But here is my solution.
Based on the code on: https://docs.python.org/2/library/functions.html#enumerate
def enumerate_rev(sequence, start=0):
n = start
for elem in sequence:
yield elem,n
n += 1

Getting permutations in Python, itertools

I want to get all the 3 letter permutations possible from every letter in the alphabet using itertools. This comes back blank:
import itertools
def permutations(ABCDEFGHIJKLMNOPQRSTUVWXYZ, r=3):
pool = tuple(iterable)
n = len(pool)
r = n if r is None else r
for indices in product(range(n), repeat=r):
if len(set(indices)) == r:
yield tuple(pool[i] for i in indices)
What am I doing wrong?
You are a bit mixed up, that is just code explaining what permutations does. itertools is actually written in C code, the python equivalent is just given to show how it works.
>>> from itertools import permutations
>>> from string import ascii_uppercase
>>> for x in permutations(ascii_uppercase, r=3):
print x
('A', 'B', 'C')
('A', 'B', 'D')
('A', 'B', 'E')
('A', 'B', 'F')
.....
That should work fine
The code in the itertools.permutations documentation explains how the function is implemented, not how to use it. You want to do this:
perms = itertools.permutations('ABCDEFGHIJKLMNOPQRSTUVWXYZ', r=3)
You can print them all out by converting it to a list (print(list(perms))), but you can just iterate over them in a for loop if you want to do something else with them - eg,
for perm in perms:
...

Group list by values [duplicate]

This question already has answers here:
Python group by
(9 answers)
Closed last month.
Let's say I have a list like this:
mylist = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
How can I most elegantly group this to get this list output in Python:
[["A", "C"], ["B"], ["D", "E"]]
So the values are grouped by the secound value but the order is preserved...
values = set(map(lambda x:x[1], mylist))
newlist = [[y[0] for y in mylist if y[1]==x] for x in values]
from operator import itemgetter
from itertools import groupby
lki = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
lki.sort(key=itemgetter(1))
glo = [[x for x,y in g]
for k,g in groupby(lki,key=itemgetter(1))]
print glo
.
EDIT
Another solution that needs no import , is more readable, keeps the orders, and is 22 % shorter than the preceding one:
oldlist = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
newlist, dicpos = [],{}
for val,k in oldlist:
if k in dicpos:
newlist[dicpos[k]].extend(val)
else:
newlist.append([val])
dicpos[k] = len(dicpos)
print newlist
Howard's answer is concise and elegant, but it's also O(n^2) in the worst case. For large lists with large numbers of grouping key values, you'll want to sort the list first and then use itertools.groupby:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> seq = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
>>> seq.sort(key = itemgetter(1))
>>> groups = groupby(seq, itemgetter(1))
>>> [[item[0] for item in data] for (key, data) in groups]
[['A', 'C'], ['B'], ['D', 'E']]
Edit:
I changed this after seeing eyequem's answer: itemgetter(1) is nicer than lambda x: x[1].
>>> import collections
>>> D1 = collections.defaultdict(list)
>>> for element in L1:
... D1[element[1]].append(element[0])
...
>>> L2 = D1.values()
>>> print L2
[['A', 'C'], ['B'], ['D', 'E']]
>>>
I don't know about elegant, but it's certainly doable:
oldlist = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
# change into: list = [["A", "C"], ["B"], ["D", "E"]]
order=[]
dic=dict()
for value,key in oldlist:
try:
dic[key].append(value)
except KeyError:
order.append(key)
dic[key]=[value]
newlist=map(dic.get, order)
print newlist
This preserves the order of the first occurence of each key, as well as the order of items for each key. It requires the key to be hashable, but does not otherwise assign meaning to it.
len = max(key for (item, key) in list)
newlist = [[] for i in range(len+1)]
for item,key in list:
newlist[key].append(item)
You can do it in a single list comprehension, perhaps more elegant but O(n**2):
[[item for (item,key) in list if key==i] for i in range(max(key for (item,key) in list)+1)]
>>> xs = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
>>> xs.sort(key=lambda x: x[1])
>>> reduce(lambda l, x: (l.append([x]) if l[-1][0][1] != x[1] else l[-1].append(x)) or l, xs[1:], [[xs[0]]]) if xs else []
[[['A', 0], ['C', 0]], [['B', 1]], [['D', 2], ['E', 2]]]
Basically, if the list is sorted, it is possible to reduce by looking at the last group constructed by the previous steps - you can tell if you need to start a new group, or modify an existing group. The ... or l bit is a trick that enables us to use lambda in Python. (append returns None. It is always better to return something more useful than None, but, alas, such is Python.)
if using convtools library, which provides a lot of data processing primitives and generates ad hoc code under the hood, then:
from convtools import conversion as c
my_list = [["A", 0], ["B", 1], ["C", 0], ["D", 2], ["E", 2]]
# store the converter somewhere because this is where code generation
# takes place
converter = (
c.group_by(c.item(1))
.aggregate(c.ReduceFuncs.Array(c.item(0)))
.gen_converter()
)
assert converter(my_list) == [["A", "C"], ["B"], ["D", "E"]]
An answer inspired by #Howard's answer.
from operator import itemgetter
def group_by(nested_iterables: Iterable[Iterable], key_index: int) \
-> List[Tuple[Any, Iterable[Any]]]:
""" Groups elements nested in <nested_iterables> based on their <key_index>_th element.
Behaves similarly to itertools.groupby when the input to the itertools function is sorted.
E.g. If <nested_iterables> = [(1, 2), (2, 3), (5, 2), (9, 3)] and
<key_index> = 1, we will return [(2, [(1, 2), (5, 2)]), (3, [(2, 3), (9,3)])].
Returns:
A list of (group_key, values) tuples where <values> is an iterator of the iterables in
<nested_iterables> that all have their <key_index>_th element equal to <group_key>.
"""
group_keys = set(map(itemgetter(key_index), nested_iterables))
return [(key, list(filter(lambda x: x[key_index] == key, nested_iterables)))
for key in group_keys]

Categories

Resources