Grouping every three items together in list - Python [duplicate] - python

This question already has answers here:
How can you split a list every x elements and add those x amount of elements to an new list?
(4 answers)
Closed 5 years ago.
I have a list consisting of a repeating patterns i.e.
list=['a','1','first','b','2','second','c','3','third','4','d','fourth']`
I am not sure how long this list will be, it could be fairly long, but I want to create list of the repeating patters i.e. along with populated names
list_1=['a','1','first']
list_2=['b','2','second']
list_3=['c','3','third']
..... etc
What is the best, basic code (not requiring import of modules) that I can use to achieve this?

You can get the chunks using zip():
>>> lst = ['a','1','first','b','2','second','c','3','third','4','d','fourth']
>>> list(zip(*[iter(lst)]*3))
[('a', '1', 'first'), ('b', '2', 'second'), ('c', '3', 'third'), ('4', 'd', 'fourth')]
Using zip() avoids creating intermediate lists, which could be important if you have long lists.
zip(*[iter(lst)]*3) could be rewritten:
i = iter(lst) # Create iterable from list
zip(i, i, i) # zip the iterable 3 times, giving chunks of the original list in 3
But the former, while a little more cryptic, is more general.
If you need names for this lists then I would suggest using a dictionary:
>>> d = {'list_{}'.format(i): e for i, e in enumerate(zip(*[iter(lst)]*3), 1)}
>>> d
{'list_1': ('a', '1', 'first'),
'list_2': ('b', '2', 'second'),
'list_3': ('c', '3', 'third'),
'list_4': ('4', 'd', 'fourth')}
>>> d['list_2']
('b', '2', 'second')

Try this
chunks = [data[x:x+3] for x in xrange(0, len(data), 3)]
It will make sublists with 3 items

Related

python: Insert elements into tuples inside of list of tuples

I have two lists. One list of tuples containing two elements, another containing only strings.
What is the best way to combine these in this way:
list1 = [('1','2'), ('3','4')]
list2 = ['one','two']
expected_result = [('1','2','one'), ('3','4','two')]
I am stuck on something like:
result = [elt+(list2[0],) for elt in list1]
However, I'm not sure if it's possible to iterate 2 lists inside of one list comprehension at the same time. Having a bit of a brain fart here. any help would be appreciated.
Bonus points if it fits on one line (list comprehension style)!
You can zip() the two lists together. This will give you elements like (('1', '2'), 'one'). This assumes the lists are the same length. Then in a list comprehension make a new tuple from those either by concating them, or spreading the first into tuple as below:
list1 = [('1','2'), ('3','4')]
list2 = ['one','two']
[(*t, word) for t, word in zip(list1, list2)]
# [('1', '2', 'one'), ('3', '4', 'two')]
alternatively:
[t + (word,) for t, word in zip(list1, list2)]
List comprehension style solution (one liner)
list1 = [('1','2'), ('3','4')]
list2 = ['one','two']
new_lst = [ list1[i]+(list2[i],) for i in range(len(list1))]
print(new_lst)
>> [('1', '2', 'one'), ('3', '4', 'two')]

Cleanest way to iterate over pair of iterables of different lengths, wrapping the shorter iterable? [duplicate]

This question already has answers here:
How to zip two differently sized lists, repeating the shorter list?
(15 answers)
Closed 5 years ago.
If I have two iterables of different lengths, how can I most cleanly pair them, re-using values from the shorter one until all values from the longer are consumed?
For example, given two lists
l1 = ['a', 'b', 'c']
l2 = ['x', 'y']
It would be desirable to have a function fn() resulting in pairs:
>>> fn(l1, l2)
[('a', 'x'), ('b', 'y'), ('c', 'x')]
I found I could write a function to perform this as such
def fn(l1, l2):
if len(l1) > len(l2):
return [(v, l2[i % len(l2)]) for i, v in enumerate(l1)]
return [(l1[i % len(l1)], v) for i, v in enumerate(l2)]
>>> fn(l1, l2)
[('a', 'x'), ('b', 'y'), ('c', 'x')]
>>> l2 = ['x', 'y', 'z', 'w']
>>> fn(l1,l2)
[('a', 'x'), ('b', 'y'), ('c', 'z'), ('a', 'w')]
However, I'm greedy and was curious what other methods exist? so that I may select the most obvious and elegant and be wary of others.
itertools.zip_longest as suggested in many similar questions is very close to my desired use case as it has a fillvalue argument which will pad the longer pairs. However, this only takes a single value, instead of wrapping back to the first value in the shorter list.
As a note: in my use case one list will always be much shorter than the other and this may allow a short-cut, but a generic solution would be exciting too!
You may use itertools.cycle() with zip to get the desired behavior.
As the itertools.cycle() document says, it:
Make an iterator returning elements from the iterable and saving a copy of each. When the iterable is exhausted, return elements from the saved copy.
For example:
>>> l1 = ['a', 'b', 'c']
>>> l2 = ['x', 'y']
>>> from itertools import cycle
>>> zip(l1, cycle(l2))
[('a', 'x'), ('b', 'y'), ('c', 'x')]
Since in your case, length of l1 and l2 could vary, your generic fn() should be like:
from itertools import cycle
def fn(l1, l2):
return zip(l1, cycle(l2)) if len(l1) > len(l2) else zip(cycle(l1), l2)
Sample Run:
>>> l1 = ['a', 'b', 'c']
>>> l2 = ['x', 'y']
# when second parameter is shorter
>>> fn(l1, l2)
[('a', 'x'), ('b', 'y'), ('c', 'x')]
# when first parameter is shorter
>>> fn(l2, l1)
[('x', 'a'), ('y', 'b'), ('x', 'c')]
If you're not sure which one is the shortest, next it.cycle the longest len of the two lists:
def fn(l1, l2):
return (next(zip(itertools.cycle(l1), itertoools.cycle(l2))) for _ in range(max((len(l1), len(l2)))))
>>> list(fn(l1, l2))
[('a', 'x'), ('a', 'x'), ('a', 'x')]
itertools.cycle will repeat the list infinitely. Then, zip the two infinite lists together to get the cycle that you want, but repeated infinitely. So now, we need to trim it to the right size. max((len(l1), len(l2))) will find the longest length of the two lists, then next the infinite iterable until you get to the right length. Note that this returns a generator, so to get the output you want use list to eat the function.

Comparing elements between two lists of lists in python

List A:
[('Harry', 'X', 'A'),
('James', 'Y', 'G'),
('John', 'Z', 'D')]
List B:
[('Helen', '2', '(A; B)', '3'),
('Victor', '9', '(C; D; E)', '4'),
('Alan', '10', '(A)', '57'),
('Paul', '11', '(F; B)', '43'),
('Sandra', '12', '(F)', '31')]
Basically I have to compare the third element (for x in listA -> x[2]) from list A and check if is there any list in list B that has the same element (for y in listB, x[2] == y[2]) but I'm just losing my mind with this.
My idea was to get the third element from each list in list B, put them into a new list, and then remove that ";" so I could access each element way more easily.
for x in listB:
j = x[2]
j = j.strip().split(', ')
for k in j:
FinalB.append(k)
FinalB = [(k[1:-1].split(";")) for k in FinalB]
Then I'd take the third element from each list of list A and compare them with the elements inside each list of FinalB: if there was a match, I'd get the index of the element in FinalB (the one that's matched), use that index to access his list in listB and get the first element of his list inside list B (basically, I have to know the names from the users inside each list that have the same 3rd element)
My code so far:
FinalB= []
DomainsList = []
for x in listA:
j = x[2]
j = j.strip().split(', ')
for k in j:
FinalB.append(k)
FinalB = [(k[1:-1].split(";")) for k in FinalB]
for y in listA:
for z in FinalB:
for k in z:
if y[2] == k:
m = FinalB.index(z)
DomainsList.append([listA[m][0],listB[m][0]])
return DomainsList
Yes, this is not working (no error, I probably just did this in an absolute wrong way) and I can't figure out what and where I'm doing wrong.
First, I think a better way to handle '(C; D; E)' is to change it to 'CDE', so the first loop becomes:
FinalB = [filter(str.isalpha, x[2]) for x in listB]
We take each string and keep only the alpha characters, so we end up with:
In [18]: FinalB
Out[18]: ['AB', 'CDE', 'A', 'FB', 'F']
This means we can use listA[x][2] in FinalB[y] to test if we have a match:
for y in listA:
for z in FinalB:
if y[2] in z:
DomainsList.append([y[0], listB[FinalB.index(z)][0]])
I had to tweak the arguments to the append() to pick the right elements, so we end up with:
In [17]: DomainsList
Out[17]: [['Harry', 'Helen'], ['Harry', 'Alan'], ['John', 'Victor']]
Usefully, if instead of '(C; D; E)' you have '(foo; bar; baz)', then with just one tweak the code can work for that too:
import re
FinalB = [filter(None, re.split("[; \(\)]+", x[2])) for x in listB]
The remaining code works as before.
It will always help to start a question with context and details.
The python version could also come into play.
The data structure you have given for us to work with is very questionable - especially the third element in each of the tuples in listB...why have a string element and then define it like this '(C; D; E)' ??
Even though I don't understand where you are coming from with this or what this is meant to achieve,no context provided in post, this code should get you there.
It will give you a list of tupples ( listC ), with each tuple having two elements. Element one having the name from listA and element 2 having the name from listB where they have a match as described in post.
NOTE: at the moment the match is simply done with a find, which will work perfectly with the provided details, however you may need to change this to be suitable for your needs if you could have data that would cause false positives or if you want to ignore case.
listA = [('Harry', 'X', 'A'), ('James', 'Y', 'G'), ('John', 'Z', 'D')]
listB = [('Helen', '2', '(A; B)', '3'),
('Victor', '9', '(C; D; E)', '4'),
('Alan', '10', '(A)', '57'),
('Paul', '11', '(F; B)', '43'),
('Sandra', '12', '(F)', '31')]
listC = []
for a in listA:
for b in listB:
if b[2].find(a[2]) != -1:
listC.append((a[0], b[0]))
print(listC)
This gives you.
[('Harry', 'Helen'), ('Harry', 'Alan'), ('John', 'Victor')]

Python dictionary key assign

I've created a dictionary from a tuple, but can't seem to find an answer as to how I'd switch my keys and values without editing the original tuple. This is what I have so far:
tuples = [('a', '1'), ('b', '1'), ('c', '2'), ('d', '3')]
dic = dict(tuples)
print dic
This gives the output:
{'a': '1', 'b': ''1', 'c': '2', 'd': '3'}
But I'm looking for:
{'1': 'a' 'b', '2': 'c', '3': 'd'}
Is there a simple code that could produce this?
Build a dictionary in a loop, collecting your values into lists:
result = {}
for value, key in tuples:
result.setdefault(key, []).append(value)
The dict.setdefault() method will set and return a default value if the key isn't present. Here I used it to set a default empty list value if the key is not present, so the .append(value) is applied to a list object, always.
Don't try to make this a mix of single string and multi-string list values, you'll only complicate matters down the road.
Demo:
>>> tuples = [('a', '1'), ('b', '1'), ('c', '2'), ('d', '3')]
>>> result = {}
>>> for value, key in tuples:
... result.setdefault(key, []).append(value)
...
>>> result
{'1': ['a', 'b'], '3': ['d'], '2': ['c']}
from operator import itemgetter
from itertools import groupby
first = itemgetter(0)
second = itemgetter(1)
d = dict((x, [v for _, v in y]) for x, y in groupby(sorted(tuples, key=second), key=second)
groupby groups the tuples into a new iterator of tuples, whose first element is the unique second item of each of the original, and whose second element is another iterator consisting of the corresponding first items. A quick example (pretty-printed for clarity):
>>> list(groupby(sorted(tuples, key=second), key=second)))
[('1', <itertools._grouper object at 0x10910b8d0>),
('2', <itertools._grouper object at 0x10910b790>),
('3', <itertools._grouper object at 0x10910b750>)]
The sorting by the same key used by groupby is necessary to ensure all like items are grouped together; groupby only makes one pass through the list.
The subiterators consist of tuples like ('1', 'a'), so the second value in each item is the one we want to add to the value in our new dictionary.

The last list elements and conditional loops

mylist="'a','b','c'"
count=0
i=0
while count< len(mylist):
if mylist[i]==mylist[i+1]:
print mylist[i]
count +=1
i +=1
Error:
File "<string>", line 6, in <module>
IndexError: string index out of range
I'm assuming that when it gets to the last (nth) element it can't find an n+1 to compare it to, so it gives me an error.
Interestingly, i think that I've done this before and not had this problem on a larger list: Here is an example (with credit to Raymond Hettinger for fixing it up)
list=['a','a','x','c','e','e','f','f','f']
i=0
count = 0
while count < len(list)-2:
if list[i] == list[i+1]:
if list [i+1] != list [i+2]:
print list[i]
i+=1
count +=1
else:
print "no"
count += 1
else:
i +=1
count += 1
For crawling through a list in the way I've attempted, is there any fix so that I don't go "out of range?" I plan to implement this on a very large list, where I'll have to check if "list[i]==list[i+16]", for example. In the future, I would like to add on conditions like "if int(mylist[i+3])-int(mylist[i+7])>10: newerlist.append[mylist[i]". So it's important that I solve this problem.
I thought about inserting a break statement, but was unsuccessful.
I know this is not the most efficient, but I'm at the point where it's what i understand best.
So it sounds like you are trying to compare elements in your list at various fixed offsets. perhaps something like this could help you:
for old, new in zip(lst, lst[n:]):
if some_cond(old, new):
do_work()
Explanation:
lst[n:] returns a copy of lst, starting from the nth (mind the 0-indexing) element
>>> lst = [1,2,2,3];
>>> lst[1:]
[2,2,3]
zip(l1, l2) creates a new list of tuples, with one element from each list
>>> zip(lst, lst[1:])
[(1, 2), (2, 2), (2, 3)]
Note that it stops as soon as either list runs out. in this case, the offset list runs out first.
for a list of tuples, you can "upack directly" in the loop variable, so
for old, new in zip(lst, lst[1:])
gives loops through the elements you want (pairs of successive elements in your list)
As a general idea, if you are trying to look ahead a certain number of places, you can do a few things:
In the loop check (I.e. count < length), you'll need to check on the max field. So in your example, you wanted to go 16 spaces. This would mean that you would need to check count < (length - 16). The downside is that your last elements (the last 16) won't be iterated over.
Check inside the loop to make sure the index is applicable. That is, on each if statement start with: if(I+16 < length && logic_you_want_to_check). This will allow you to continue through the loop, but when the logic will fail because its out of bounds, you won't error out.
Note- this probably isn't what you want, but ill add it for completeness. Wrap around your logic. This will only work if wrap arounds can be considered. If you literally want to check the 16th index ahead of your current index (I.e like a place in a line perhaps), then wrapping around doesn't really suit well. But if don't need that logic, and want to model your values in a circular pattern, you can modulus your index. That is: if array[i] == array [(i + 16)%length(array)] would check either 16 ahead or wrap around to the front of the array.
Edit:
Right, with the new information in the OP, this becomes much simpler. Use the itertools grouper() recipe to group the data for each person into tuples:
import itertools
def grouper(iterable, n, fillvalue=None):
"""Collect data into fixed-length chunks or blocks"""
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
data = ['John', 'Sally', '5', '10', '11', '4', 'John', 'Sally', '3', '7', '7', '10', 'Bill', 'Hallie', '4', '6', '2', '1']
grouper(data, 6)
Now your data looks like:
[
('John', 'Sally', '5', '10', '11', '4'),
('John', 'Sally', '3', '7', '7', '10'),
('Bill', 'Hallie', '4', '6', '2', '1')
]
Which should be easy to work with, by comparison.
Old Answer:
If you need to make more arbitrary links, rather than just checking continuous values:
def offset_iter(iterable, n):
offset = iter(iterable)
consume(offset, n)
return offset
data = ['a', 'a', 'x', 'c', 'e', 'e', 'f', 'f', 'f']
offset_3 = offset_iter(data, 3)
for item, plus_3 in zip(data, offset_3): #Naturally, itertools.izip() in 2.x
print(item, plus_3) #if memory usage is important.
Naturally, you would want to use semantically valid names. The advantage to this method is it works with arbitrary iterables, not just lists, and is efficient and readable, without any ugly, inefficient iteration by index. If you need to continue checking once the offset values have run out (for other conditions, say) then use itertools.zip_longest() (itertools.izip_longest() in 2.x).
Using the consume() recipe from itertools.
import itertools
import collections
def consume(iterator, n):
"""Advance the iterator n-steps ahead. If n is none, consume entirely."""
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(itertools.islice(iterator, n, n), None)
I would, however, greatly question if you need to re-examine your data structure in this case.
Original Answer:
I'm not sure what your aim is, but from what I gather you probably want itertools.groupby():
>>> import itertools
>>> data = ['a', 'a', 'x', 'c', 'e', 'e', 'f', 'f', 'f']
>>> grouped = itertools.groupby(data)
>>> [(key, len(list(items))) for key, items in grouped]
[('a', 2), ('x', 1), ('c', 1), ('e', 2), ('f', 3)]
You can use this to work out when there are (arbitrarily large) runs of repeated items. It's worth noting you can provide itertools.groupby() with a key argument that will group them based on any factor you want, not just equality.
If you adhere to "Practicality beats purity":
for idx, element in enumerate(yourlist[n:]):
if yourlist[idx] == yourlist[idx-n]
...
If you don't care about memory efficiency go for second's answer. If you want the purest answer then go for Lattyware's one.

Categories

Resources