Pythonic way to iterate over a shifted list of tuples - python

Given a list L = [('a',3),('b',4),('c',14),('d',10)],
the desired output is the first item from a tuple and the second item from the next tuple, e.g.:
a 4
b 14
c 10
Straightforward but unpythonic way would be
for i in range(len(L)-1):
print(L[i][0], L[i+1][1])
Alternatively, this is what I've came up with:
for (a0,a1),(b0,b1) in zip(L,L[1:]):
print(a0,b1)
but it seems to be wasteful. Is there a standard way to do this?

I personally think both options are just fine It is possible to extract the items and join them:
pairs = zip(map(itemgetter(0), L), map(itemgetter(1), L[1:]))
# [('a', 4), ('b', 14), ('c', 10)]

A pythonic way is to use a generator expression.
You could write it like this:
for newTuple in ((L[i][0], L[i+1][1]) for i in range(len(L)-1)):
print(newTuple)
It looks like a list-comprehension, but the iterator-generator will not create the full list, just yields a tuple by tuple, so it is not taking additional memory for a full-copy of list.

To improve your zip example (which is already good), you could use itertools.islice to avoid creating a sliced copy of the initial list. In python 3, the below code only generates values, no temporary list is created in the process.
import itertools
L = [('a',3),('b',4),('c',14),('d',10)]
for (a0,_),(_,b1) in zip(L,itertools.islice(L,1,None)):
print(a0,b1)

I'd split the first and second items with the help of two generator expressions, use islice to drop one item and zip the two streams together again.
first_items = (a for a, _ in L)
second_items = (b for _, b in L)
result = zip(first_items, islice(second_items, 1, None))
print(list(result))
# [('a', 4), ('b', 14), ('c', 10)]

Related

Most efficient way to count duplicate tuples in python list

I have a list with more than 100 millions of tuples, with key-value elements like this:
list_a = [(1,'a'), (2,'b'), (1,'a'), (3,'b'), (3,'b'), (1,'a')]
I need to output a second list like this:
list_b = [(1,'a', 3), (2, 'b', 1), (3, 'b', 2) ]
Last element in a tuple is the count of duplicates in the list for such tuple. Order in list_b doesn't matter.
Then, I wrote this code:
import collections
list_b = []
for e, c in collections.Counter(list_a).most_common():
list_b.append("{}, {}, {}".format(e[0], e[1], c))
Running with 1000 tuples it last 2 seconds approximately... figure out how long will take with more that 100 millions. Any idea to speed it up?
Your bottle neck is using list.append method, since it's running on native python instead of the innate C code, it'll perform much slower.
You can opt to use list comprehension instead and it'll be much faster:
c = Counter(list_a)
result = [(*k, v) for k, v in c.items()]
Ran this on a 1000 item list on my machine, it was pretty quick.

Pythonic way to generate a pseudo ordered pair list from an iterable (e.g. a list or set)

Given an iterable consisting of a finite set of elements:
(a, b, c, d)
as an example
What would be a Pythonic way to generate the following (pseudo) ordered pair from the above iterable:
ab
ac
ad
bc
bd
cd
A trivial way would be to use for loops, but I'm wondering if there is a pythonic way of generating this list from the iterable above ?
Try using combinations.
import itertools
combinations = itertools.combinations('abcd', n)
will give you an iterator, you can loop over it or convert it into a list with list(combinations)
In order to only include pairs as in your example, you can pass 2 as the argument:
combinations = itertools.combinations('abcd', 2)
>>> print list(combinations)
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
You can accomplish this in a list comprehension.
data = [1, 2, 3, 4]
output = [(data[n], next) for n in range(0,len(data)) for next in data[n:]]
print(repr(output))
Pulling the comprehension apart, it's making a tuple of the first member of the iterable and the first object in a list composed of all other members of the iterable. data[n:] tells Python to take all members after the nth.
Here it is in action.
Use list comprehensions - they seem to be considered pythonic.
stuff = ('a','b','c','d')
Obtain the n-digit binary numbers, in string format, where only two of the digits are one and n is the length of the items.
n = len(stuff)
s = '{{:0{}b}}'.format(n)
z = [s.format(x) for x in range(2**n) if s.format(x).count('1') ==2]
Find the indices of the ones for each combination.
y = [[i for i, c in enumerate(combo) if c == '1'] for combo in z]
Use the indices to select the items, then sort.
x = [''.join(operator.itemgetter(*indices)(stuff)) for indices in y]
x.sort()

Python - Add a tuple to an existing list of tuples in a specific position

Suppose I have a list of tuples as follows:
listA = [ (B,2), (C,3), (D,4) ]
I would like to add another tuple (E,1) to this list. How can I do this?
And more specifically, I would like to add this tuple as the 1st tuple in the list so that I get:
newList = [ (E,1), (B,2), (C,3), (D,4) ]
I am using Python 2.7.
Thanks in advance!
If you are going to be appending to the beginning a collections.deque would be a more efficient structure:
from collections import deque
deq = deque([("B",2), ("C",3), ("D",4) ])
deq.appendleft(("E",1))
print(deq)
deque([('E', 1), ('B', 2), ('C', 3), ('D', 4)])
appending to the start of the deque is 0(1).
If you actually wanted a new list and to keep the old you can simply:
newList = [(E,1)] + listA
listA.insert(index, item)
For you:
listA.insert(0, ('E', 1))
If you want it in your new variable, assign it AFTER inserting (thanks TigerHawk)
newList = listA
Important thing to remember, as well - as Padraic Cunningham pointed out - this has it so both lists are referencing the same object. If you change listA, you'll change newList. You can make a new object by doing some other things:
newList = listA[:]
newList = list(listA)
To add to beginning of this you just do
listA.insert(0, (E,1)).
Insert adds to a specific index of a list.
Read the docs: https://docs.python.org/2/tutorial/datastructures.html

Getting all keys in a dict that overlap with other keys in the same dict

I have a list comprehension that looks like this:
cart = [ ((p,pp),(q,qq)) for ((p,pp),(q,qq))\
in itertools.product(C.items(), repeat=2)\
if p[1:] == q[:-1] ]
C is a dict with keys that are tuples of arbitrary integers . All the tuples have the same length. Worst case is that all the combinations should be included in the new list. This can happen quite frequently.
As an example, I have a dictionary like this:
C = { (0,1):'b',
(2,0):'c',
(0,0):'d' }
And I want the the result to be:
cart = [ (((2, 0), 'c'), ((0, 1), 'b'))
(((2, 0), 'c'), ((0, 0), 'd'))
(((0, 0), 'd'), ((0, 1), 'b'))
(((0, 0), 'd'), ((0, 0), 'd')) ]
So, by overlap I am referring to, for instance, that the tuples (1,2,3,4) and (2,3,4,5) have the overlapping section (2,3,4). The overlapping sections must be on the "edges" of the tuples. I only want overlaps that have length one shorter than the tuple length. Thus (1,2,3,4) does not overlap with (3,4,5,6). Also note that when removing the first or last element of a tuple we might end up with non-distinct tuples, all of which must be compared to all the other elements. This last point was not emphasized in my first example.
The better part of my codes execution time is spent in this list comprehension. I always need all elements of cart so there appears to be no speedup when using a generator instead.
My question is: Is there a faster way of doing this?
A thought I had was that I could try to create two new dictionaries like this:
aa = defaultdict(list)
bb = defaultdict(list)
[aa[p[1:]].append(p) for p in C.keys()]
[bb[p[:-1]].append(p) for p in C.keys()]
And somehow merge all combinations of elements of the list in aa[i] with the list in bb[i] for all i, but I can not seem to wrap my head around this idea either.
Update
Both the solution added by tobias_k and shx2 have better complexity than my original code (as far as I can tell). My code is O(n^2) whereas the two other solutions are O(n). For my problem size and composition however, all three solutions seem to run at more or less the same time. I suppose this has to do with a combination of overhead associated with function calls, as well as the nature of the data I am working with. In particular the number of different keys, as well as the actual composition of the keys, seem to have a large impact. The latter I know because the code runs much slower for completely random keys. I have accepted tobias_k's answer because his code is the easiest to follow. However, i would still greatly welcome other suggestions on how to perform this task.
You were actually on the right track, using the dictionaries to store all the prefixes to the keys. However, keep in mind that (as far as I understand the question) two keys can also overlap if the overlap is less than len-1, e.g. the keys (1,2,3,4) and (3,4,5,6) would overlap, too. Thus we have to create a map holding all the prefixes of the keys. (If I am mistaken about this, just drop the two inner for loops.) Once we have this map, we can iterate over all the keys a second time, and check whether for any of their suffixes there are matching keys in the prefixes map. (Update: Since keys can overlap w.r.t. more than one prefix/suffix, we store the overlapping pairs in a set.)
def get_overlap(keys):
# create map: prefix -> set(keys with that prefix)
prefixes = defaultdict(set)
for key in keys:
for prefix in [key[:i] for i in range(len(key))]:
prefixes[prefix].add(key)
# get keys with matching prefixes for all suffixes
overlap = set()
for key in keys:
for suffix in [key[i:] for i in range(len(key))]:
overlap.update([(key, other) for other in prefixes[suffix]
if other != key])
return overlap
(Note that, for simplicity, I only care about the keys in the dictionary, not the values. Extending this to return the values, too, or doing this as a postprocessing step, should be trivial.)
Overall running time should be only 2*n*k, n being the number of keys and k the length of the keys. Space complexity (the size of the prefixes map) should be between n*k and n^2*k, if there are very many keys with the same prefixes.
Note: The above answer is for the more general case that the overlapping region can have any length. For the simpler case that you consider only overlaps one shorter than the original tuple, the following should suffice and yield the results described in your examples:
def get_overlap_simple(keys):
prefixes = defaultdict(list)
for key in keys:
prefixes[key[:-1]].append(key)
return [(key, other) for key in keys for other in prefixes[key[1:]]]
Your idea of preprocessing the data into a dict was a good one. Here goes:
from itertools import groupby
C = { (0,1): 'b', (2,0): 'c', (0,0): 'd' }
def my_groupby(seq, key):
"""
>>> group_by(range(10), lambda x: 'mod=%d' % (x % 3))
{'mod=2': [2, 5, 8], 'mod=0': [0, 3, 6, 9], 'mod=1': [1, 4, 7]}
"""
groups = dict()
for x in seq:
y = key(x)
groups.setdefault(y, []).append(x)
return groups
def get_overlapping_items(C):
prefixes = my_groupby(C.iteritems(), key = lambda (k,v): k[:-1])
for k1, v1 in C.iteritems():
prefix = k1[1:]
for k2, v2 in prefixes.get(prefix, []):
yield (k1, v1), (k2, v2)
for x in get_overlapping_items(C):
print x
(((2, 0), 'c'), ((0, 1), 'b'))
(((2, 0), 'c'), ((0, 0), 'd'))
(((0, 0), 'd'), ((0, 1), 'b'))
(((0, 0), 'd'), ((0, 0), 'd'))
And by the way, instead of:
itertools.product(*[C.items()]*2)
do:
itertools.product(C.items(), repeat=2)

Is it possible to take an ordered "slice" of a dictionary in Python based on a list of keys?

Suppose I have the following dictionary and list:
my_dictionary = {1:"hello", 2:"goodbye", 3:"World", "sand":"box"}
my_list = [1,2,3]
Is there a direct (Pythonic) way to get the key-value pairs out of the dictionary for which the keys are elements in the list, in an order defined by the list order?
The naive approach is simply to iterate over the list and pull out the values in the map one by one, but I wonder if python has the equivalent of list slicing for dictionaries.
Don't know if pythonic enough but this is working:
res = [(x, my_dictionary[x]) for x in my_list]
This is a list comprehension, but, if you need to iterate that list only once, you can also turn it into a generator expression, e.g. :
for el in ((x, my_dictionary[x]) for x in my_list):
print el
Of course the previous methods work only if all elements in the list are present in the dictionary; to account for the key-not-present case you can do this:
res = [(x, my_dictionary[x]) for x in my_list if x in my_dictionary]
>>> zip(my_list, operator.itemgetter(*my_list)(my_dictionary))
[(1, 'hello'), (2, 'goodbye'), (3, 'World')]
How about this? Take every item in my_list and pass it to the dictionary's get method. It also handles exceptions around missing keys by replacing them with None.
map(my_dictionary.get, my_list)
If you want tupples zip it -
zip(my_list, map(my_dictionary.get, my_list))
If you want a new dict, pass the tupple to dict.
dict(zip(my_list, map(my_dictionary.get, my_list)))
A straight forward way would be to pick each item from the dictionary and check if the key is present in the list
>>> [e for e in my_dictionary.items() if e[0] in my_list]
[(1, 'hello'), (2, 'goodbye'), (3, 'World')]
The above search would be linear so you might gain some performance by converting the list to set
>>> [e for e in my_dictionary.items() if e[0] in set(my_list)]
[(1, 'hello'), (2, 'goodbye'), (3, 'World')]
And finally if you need a dictionary instead of a list of key,value pair tuples you can use dictionary comprehension
>>> dict(e for e in my_dictionary.items() if e[0] in set(my_list))
{1: 'hello', 2: 'goodbye', 3: 'World'}
>>>

Categories

Resources