Grouping lists of numbers together in Python - python

I have the following which is a list of a list in Python, and is a partial list of values that I have:
[1,33]
[2,10,42]
[5,1,33,44]
[10,42,98]
[44,12,100,124]
Is there a way of grouping them so they collect the values that are common in each list?
For example, if I look at the first list [1,33], I can see that the value exists in the third list: [5,1,33,44]
So, those are grouped together as
[5,1,33,44]
If I carry on looking, I can see that 44 is in the final list, and so that will be grouped along with this list.
[44,12,100,124] is added onto [5,1,33,44]
to give:
[1,5,12,33,44,100,124]
The second list [2,10,42] has common values with [10,42,98] and are therefore joined together to give:
[2,10,42,98]
So the final lists are:
[1,5,12,33,44,100,124]
[2,10,42,98]
I am guessing there is a specific name for this type of grouping. Is there a library available that can deal with it automatically? Or would I have to write a manual way of searching?
I hope the edit makes it clearer as to what I am trying to achieve.
Thanks.

Here's a solution that does not require anything from the standard library or 3rd party packages. Note that this will modify a. To avoid that, just make a copy of a and work with that. The result is a list of lists containing your resulting sorted lists.
a = [
[1,33],
[2,10,42],
[5,1,33,44],
[10,42,98],
[44,12,100,124]
]
res = []
while a:
el = a.pop(0)
res.append(el)
for sublist in a:
if set(el).intersection(set(sublist)):
res[-1].extend(sublist)
a.remove(sublist)
res = [sorted(set(i)) for i in res]
print(res)
# [[1, 5, 12, 33, 44, 100, 124], [2, 10, 42, 98]]
How this works:
Form an empty result list res. Groupings from a will be "transferred" here.
.pop() off the first element of a. This modifies a in place and defines el as that element.
Then loop through each sublist in a, comparing your popped el to those sublists and "building up" common sets. This is where your problem is a tiny bit tricky in that you need to gradually increment your intersected set rather than finding the intersection of multiple sublists all at once.
Repeat this process until a is empty.
Alternatively, if you just want to group together the even- and odd-numbered sublists (still a bit unclear from your question), you can use itertools:
from itertools import chain
grp1 = sorted(set(chain.from_iterable(a[::2])))
grp2 = sorted(set(chain.from_iterable(a[1::2])))
print(grp1)
print(grp2)
# [1, 5, 12, 33, 44, 100, 124]
# [2, 10, 42, 98]

Related

Python cross multiplication with an arbitrary number of lists

I'm not sure what the correct term is for the multiplication here but I need to multiply an element from List A for example by every element in List B and create a new list for the new elements, so that the total length of the new list is len(A)*len(B).
As an example
A = [1,3,5], B=[4,6,8]
I need to multiply the two together to get
C = [4,6,8,12,18,24,20,30,40]
I have researched this and I have found that itertools(product) have exactly what I needed, however it is for a specific number of lists and I need to generalise to any number of lists as requested by the user.
I don't have access to the full code right now but the code asks the user for some lists (can be any number of lists) and the lists can have any number of elements in the lists (but all lists contain the same number of elements). These lists are then stored in one big list.
For example (user input)
A = [2,5,8], B= [4,7,3]
The big list will be
C = [[2,5,8],[4,7,3]]
In this case there are two lists in the big list but in general it can be any number of lists.
Once the code has this I have
print([a*b for a,b in itertools.product(C[0],C[1])])
>> [8,14,6,20,35,15,32,56,24]
The output of this is exactly what I want, however in this case the code is written for exactly two lists and I need it generalised to n lists.
I've been thinking about creating a loop to somehow loop over it n times but so far I have not been successful in this. Since C could any of any length then the loop needs a way to know when it's reached the end of the list. I don't need it to compute the product with n lists at the same time
print([a0*a1*...*a(n-1) for a0,a1,...,a(n-1) in itertools.product(C[0],C[1],C[2],...C[n-1])])
The loop could multiply two lists at a time then use the result from that multiplication against the next list in C and so on until C[n-1].
I would appreciate any advice to see if I'm at least heading in the right direction.
p.s. I am using numpy and the lists are arrays.
You can pass variable number of arguments to itertools.product with *. * is the unpacking operator that unpacks the list and passes its values the values of list to the function as if they are separately passed.
import itertools
import math
A = [[1, 2], [3, 4], [5, 6]]
result = list(map(math.prod, itertools.product(*A)))
print(result)
Result:
[15, 18, 20, 24, 30, 36, 40, 48]
You can find many explanations on the internet about * operator. In short, if you call a function like f(*lst), it will be roughly equivalent to f(lst[0], lst[1], ..., lst[len(lst) - 1]). So, it will save you from the need to know the length of the list.
Edit: I just realized that math.prod is a 3.8+ feature. If you're running an older version of Python, you can replace it with its numpy equivalent, np.prod.
You could use a reduce function that is intended exactly for these types of operations, which is based on recursion and accumulation. I am providing you an example with a primitive function so you can better understand its functionality:
lists = [
[4, 6, 8],
[1, 3, 5]
]
def reduce(function, iterable, initializer=None):
it = iter(iterable)
if initializer is None:
value = next(it)
else:
value = initializer
for element in it:
value = function(value, element)
return value
def cmp(a, b):
for x in a:
for y in b:
yield x*y
summed = list(reduce(cmp, lists))
# OUTPUT
[4, 12, 20, 6, 18, 30, 8, 24, 40]
In case you need it sorted just make use of the sort() function.

How do I extract a desired set of values from a list? [duplicate]

This question already has answers here:
How to modify list entries during for loop?
(10 answers)
Closed 5 months ago.
I know you should not add/remove items while iterating over a list. But can I modify an item in a list I'm iterating over if I do not change the list length?
class Car(object):
def __init__(self, name):
self.name = name
def __repr__(self):
return type(self).__name__ + "_" + self.name
my_cars = [Car("Ferrari"), Car("Mercedes"), Car("BMW")]
print(my_cars) # [Car_Ferrari, Car_Mercedes, Car_BMW]
for car in my_cars:
car.name = "Moskvich"
print(my_cars) # [Car_Moskvich, Car_Moskvich, Car_Moskvich]
Or should I iterate over the list indices instead? Like that:
for car_id in range(len(my_cars)):
my_cars[car_id].name = "Moskvich"
The question is: are the both ways above allowed or only the second one is error-free?
If the answer is yes, will the following snippet be valid?
lovely_numbers = [[41, 32, 17], [26, 55]]
for numbers_pair in lovely_numbers:
numbers_pair.pop()
print(lovely_numbers) # [[41, 32], [26]]
UPD. I'd like to see the python documentation where it says "these operations are allowed" rather than someone's assumptions.
You are not modifying the list, so to speak. You are simply modifying the elements in the list. I don't believe this is a problem.
To answer your second question, both ways are indeed allowed (as you know, since you ran the code), but it would depend on the situation. Are the contents mutable or immutable?
For example, if you want to add one to every element in a list of integers, this would not work:
>>> x = [1, 2, 3, 4, 5]
>>> for i in x:
... i += 1
...
>>> x
[1, 2, 3, 4, 5]
Indeed, ints are immutable objects. Instead, you'd need to iterate over the indices and change the element at each index, like this:
>>> for i in range(len(x)):
... x[i] += 1
...
>>> x
[2, 3, 4, 5, 6]
If your items are mutable, then the first method (of directly iterating over the elements rather than the indices) is more efficient without a doubt, because the extra step of indexing is an overhead that can be avoided since those elements are mutable.
I know you should not add/remove items while iterating over a list. But can I modify an item in a list I'm iterating over if I do not change the list length?
You're not modifying the list in any way at all. What you are modifying is the elements in the list; That is perfectly fine. As long as you don't directly change the actual list, you're fine.
There's no need to iterate over the indices. In fact, that's unidiomatic. Unless you are actually trying to change the list itself, simply iterate over the list by value.
If the answer is yes, will the following snippet be valid?
lovely_numbers = [[41, 32, 17], [26, 55]]
for numbers_pair in lovely_numbers:
numbers_pair.pop()
print(lovely_numbers) # [[41, 32], [26]]
Absolutely. For the exact same reasons as I said above. Your not modifying lovely_numbers itself. Rather, you're only modifying the elements in lovely_numbers.
Examples where the list is modified and not during while iterating over the elements of the list
list_modified_during_iteration.py
a = [1,2]
i = 0
for item in a:
if i<5:
print 'append'
a.append(i+2)
print a
i += 1
list_not_modified_during_iteration.py (Changed item to i)
a = [1,2]
i = 0
for i in range(len(a)):
if i<5:
print 'append'
a.append(i+2)
print a
i += 1
Of course, you can. The first way is normal, but in some cases you can also use list comprehensions or map().

How to get the second half of a list of lists as a list of lists?

So I know that to get a single column, I'd have to write
a = list(zip(*f)[0])
and the resulting a will be a list containing the first element in the lists in f.
How do I do this to get more than one element per list? I tried
a = list(zip(*f)[1:19])
But it just returned a list of lists where the inner list is the composed of the ith element in every list.
The easy way is not to use zip(). Instead, use a list comprehension:
a = [sub[1:19] for sub in f]
If it is actually the second half that you are looking for:
a = [sub[len(sub) // 2:] for sub in f]
That will include the 3 in [1, 2, 3, 4, 5]. If you don't want to include it:
a = [sub[(len(sub) + 1) // 2:] for sub in f]
You should definitely prefer #zondo's solution for both performance and readability. However, a zip based solution is possible and would look as follows (in Python 2):
zip(*zip(*f)[1:19])
You should not consider this cycle of unpacking, zipping, slicing, unpacking and re-zipping in any serious code though ;)
In Python 3, you would have to cast both zip results to list, making this even less sexy.

Removing duplicates and preserving order when elements inside the list is list itself

I have a following problem while trying to do some nodal analysis:
For example:
my_list=[[1,2,3,1],[2,3,1,2],[3,2,1,3]]
I want to write a function that treats the element_list inside my_list in a following way:
-The number of occurrence of certain element inside the list of my_list is not important and, as long as the unique elements inside the list are same, they are identical.
Find the identical loop based on the above premises and only keep the
first one and ignore other identical lists of my_list while preserving
the order.
Thus, in above example the function should return just the first list which is [1,2,3,1] because all the lists inside my_list are equal based on above premises.
I wrote a function in python to do this but I think it can be shortened and I am not sure if this is an efficient way to do it. Here is my code:
def _remove_duplicate_loops(duplicate_loop):
loops=[]
for i in range(len(duplicate_loop)):
unique_el_list=[]
for j in range(len(duplicate_loop[i])):
if (duplicate_loop[i][j] not in unique_el_list):
unique_el_list.append(duplicate_loop[i][j])
loops.append(unique_el_list[:])
loops_set=[set(x) for x in loops]
unique_loop_dict={}
for k in range(len(loops_set)):
if (loops_set[k] not in list(unique_loop_dict.values())):
unique_loop_dict[k]=loops_set[k]
unique_loop_pos=list(unique_loop_dict.keys())
unique_loops=[]
for l in range(len(unique_loop_pos)):
unique_loops.append(duplicate_loop[l])
return unique_loops
from collections import OrderedDict
my_list = [[1, 2, 3, 1], [2, 3, 1, 2], [3, 2, 1, 3]]
seen_combos = OrderedDict()
for sublist in my_list:
unique_elements = frozenset(sublist)
if unique_elements not in seen_combos:
seen_combos[unique_elements] = sublist
my_list = seen_combos.values()
you could do it in a fairly straightforward way using dictionaries. but you'll need to use frozenset instead of set, as sets are mutable and therefore not hashable.
def _remove_duplicate_lists(duplicate_loop):
dupdict = OrderedDict((frozenset(x), x) for x in reversed(duplicate_loop))
return reversed(dupdict.values())
should do it. Note the double reversed() because normally the last item is the one that is preserved, where you want the first, and the double reverses accomplish that.
edit: correction, yes, per Steven's answer, it must be an OrderedDict(), or the values returned will not be correct. His version might be slightly faster too..
edit again: You need an ordered dict if the order of the lists is important. Say your list is
[[1,2,3,4], [4,3,2,1], [5,6,7,8]]
The ordered dict version will ALWAYS return
[[1,2,3,4], [5,6,7,8]]
However, the regular dict version may return the above, or may return
[[5,6,7,8], [1,2,3,4]]
If you don't care, a non-ordered dict version may be faster/use less memory.

python multimdimensional sorting, combined values

I have a multidimensional list where I would like to sort on a combined weighting of two numeric elements, example, of results using: sorted(results, key=operator.itemgetter(2,3))
[..,1,34]
...
...
[..,10,2]
[..,11,1]
[..,13,3]
[..,13,3]
[..,13,3]
[..,16,1]
[..,29,1]
The problem with itemgetter is that is first sorts by element 2, then by element 3, where
I would like to have the 13,3 at the top/bottom (dependent on asc/desc sort).
Is this possible and if so how.
Many thanks
Edit 1.
Sorry for being obtuse, I am processing dom data, results from search pages, it's a generic search engine searcher, so to speak.
What I am doing is finding the a and div tags, then I create a count how many items a particular class or id occurs the the div/a tag, this is element 2, then I rescan the list of found tags again and see what other class/id's for the tags match the total for the current tag being processed, thus in this case item 13,3 has 13 matches for class/id for that type of tag, and 3 denotes that there are 3 other tags with class/id's that occur the same amount of times, hence why I wish to sort like that, and no, it is not a dict, it's definitely a list.
Thank you.
I'm making a total guess here, given lack of any other explanation, and assuming what you're actually trying to do is sort by the product of the last two keys in your list, secondarily sorted by magnitude of the first element in the product. That's the only explanation I can come up with offhand for why (13,3) would be the top result.
In that case, you'd be looking for something like this:
sorted(results, key=lambda x: (x[-2]*x[-1], x[-2]), reverse=True)
That would give you the following sort:
[[13, 3], [13, 3], [13, 3], [1, 34], [29, 1], [10, 2], [16, 1], [11, 1]]
Alternatively, if what you're actually looking for here is to have the results ordered by the number of times they appear in your list, we can use a collections.Counter. Unfortunately, lists aren't hashable, so we'll cheat a bit and convert them to tuples to use as the keys. There are ways around this, but this is the simplest way for me for now to demonstrate what I'm talking about.
import collections, json
def sort_results(results):
c = collections.Counter([tuple(k) for k in results])
return sorted(c, key=lambda x: c[x], reverse=True)
This gets you:
[(13, 3), (1, 34), (16, 1), (29, 1), (11, 1), (10, 2)]
Thanks J.F. Sebastian for pointing out that tuples could be used instead of str!
Yes, you can write whatever function you want as the key function. For example, if you wanted to sort by the sum of the second and third elements:
def keyfunc(item):
return sum(operator.itemgetter(2, 3)(item))
sorted(results, key=keyfunc)
So if you used this function as your keyfunc, the item with 13 as the second element 3 as the third element of the list would be sorted as though it were the value 16.
It's not clear how you want to sort these elements, but you can change the body of keyfunc to perform whatever operation you'd like.

Categories

Resources