I have data like
[2, 2, 2, 2, 2, 3, 13, 113]
which I then want to sort into separate lists by keys generated by myself. In fact I want to generate all possible lists.
Some examples:
values: [2, 2, 2, 2, 2, 3, 13, 113]
keys: [0, 0, 1, 2, 1, 3, 3, 1]
sublists: [2, 2], [2, 2, 113], [2], [3, 13]
values: [2, 2, 2, 2, 2, 3, 13, 113]
keys: [0, 1, 0, 0, 0, 1, 1, 0]
sublists: [2, 2, 2, 2, 113], [2, 3, 13]
values: [2, 2, 2, 2, 2, 3, 13, 113]
keys: [2, 3, 0, 0, 4, 4, 1, 3]
sublists: [2, 2], [13], [2], [2, 113], [2, 3]
All possible keys are generated by
def generate_keys(prime_factors):
key_size = len(prime_factors) - 1
key_values = [str(i) for i in range(key_size)]
return list(itertools.combinations_with_replacement(key_values, \
len(prime_factors)))
Then I thought I could use the keys to shift the values into the sublists. That's the part I'm stuck on. I thought itertools.groupby would be my solution but upon further investigation I see no way to use my custom lists as keys for groupby.
How do I split my big list into smaller sublists using these keys? There may even be a way to do this without using keys. Either way, I don't know how to do it and looking at other Stack Overflow questions has eben in the ballpark but not exactly this question.
This does what you want:
def sift(keys, values):
answer = collections.defaultdict(list)
kvs = zip(keys, values)
for k,v in kvs:
answer[k].append(v)
return [answer[k] for k in sorted(answer)]
In [205]: keys = [0, 0, 1, 2, 1, 3, 3, 1]
In [206]: values = [2, 2, 2, 2, 2, 3, 13, 113]
In [207]: sift(keys,values)
Out[207]: [[2, 2], [2, 2, 113], [2], [3, 13]]
Explanation:
collections.defaultdict is a handy dict-like class that lets you define what should happen in the event that a key doesn't exist in the dictionary that you're trying to manipulate. For example, in my code, I have answer[k].append(v). We know that append is a list function, so we know that answer[k] should be a list. However, if I was using a conventional dict and I tried to append to the value of a non-existent key, I would have gotten a KeyError as follows:
In [212]: d = {}
In [213]: d[1] = []
In [214]: d
Out[214]: {1: []}
In [215]: d[1].append('one')
In [216]: d[1]
Out[216]: ['one']
In [217]: d
Out[217]: {1: ['one']}
In [218]: d[2].append('two')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/Users/USER/<ipython-input-218-cc58f739eefa> in <module>()
----> 1 d[2].append('two')
KeyError: 2
This was only made possible because I defined answer = collections.defaultdict(list). If I had defined answer = collections.defaultdict(int), I would gotten a different error - one that would tell me that int objects don't have an append method.
zip on the other hand takes two lists (well actually, it takes at least two iterables), lets call them list1 and list2 and returns a list of tuples in which the ith tuple contains two objects. The first is list1[i] and the second is list2[i]. If list1 and list2 are of unequal length, len(zip(list1, list2)) would be the smaller value among len(list1) and len(list2) (i.e. min(len(list1), len(list2)).
Once I've zipped keys and values, I want to create a dict such that maps a value from keys to a list of values from values. This is why I used a defaultdict, so that I wouldn't have to check for the existence of a key in it before I appended to its value. If I had used a conventional dict, I would have had to do this:
answer = {}
kvs = zip(keys, values)
for k,v, in kvs:
if k in answer:
answer[k].append(v)
else:
answer[k] = [v]
Now that you have a dict (or a dict-like object) that maps values from keys to lists of ints that share the same key, all you need to do is get the lists which are the values of answer in sorted order, sorted by the keys of answer. sorted(answer) gives me a list of all of answers keys in sorted order.
Once I have this list of sorted keys, all I have to do is get their values, which are lists of ints, and put all those lists into one big list and return that big list.
… annnnnd Done! Hope that helps
Related
Given the following arrays (arr,indices) ,I need to sort the array with respect to (i[0])th index in ascending order if i[1] equals 0 and descending order if i[1] equals 1 ,where i refers to each element of the indices array.
Constraints
1<= len(indices) <=10
1<= len(arr) <=10^4
Example
arr=[[1,2,3],[3,2,1],[4,2,1],[6,4,3]]
indices=[[2,0],[0,1]]
required output
[[4,2,1],[3,2,1],[6,4,3],[1,2,3]]
Explanation
first arr gets sorted with respect to 2nd index as (indices[0][0]=2) in ascending order as (indices[0][1]=0)
[[3,2,1],[ 4,2,1],[1,2,3],[6,4,3]]
then it gets sorted with 0th index as (indices[1][0]=0) in descending order as (indices[1][1]=1)
[[4,2,1],[3,2,1],[6,4,3],[1,2,3]]
Note
arr,indices need to be taken as input , so it is not possible for me to write arr.sort(key=lambda x: (x[2],-x[0]))
My Approach
I have tried the following but it is not giving the correct output
arr.sort(key=lambda x:next(x[i[0]] if i[1]==0 else -x[i[0]] for i in indices))
My output
[[3,2,1],[4,2,1],[1,2,3],[6,4,3]]
Expected output
[[4,2,1],[3,2,1],[6,4,3],[1,2,3]]
This one requires a very complex key. It looks to me like you have many different layers of sorting, here, and earlier elements of indices take precedence over later elements, when sort order would be affected.
I think what the sorting key needs to do is return a tuple/iterable, where the first element to sort by is whatever the first element of indices says to do, and the second element to sort by (in case of a tie in the first) is whatever the second element of indices says to do, and so on.
In which case you'd want something like this (a nested comprehension inside the key lambda, to generate that tuple (or, list, in this case)):
arr=[[1,2,3],[3,2,1],[4,2,1],[6,4,3]]
indices=[[2,0],[0,1]]
out = sorted(arr, key=lambda a: [
(-1 if d else 1) * a[i]
for (i, d) in indices
])
# [[4, 2, 1], [3, 2, 1], [6, 4, 3], [1, 2, 3]]
For sorting numbers only, you can use a quick hack of "multiply by -1 to sort descending instead of ascending". Which I did here.
You could use the stability:
from operator import itemgetter
for i, r in reversed(indices):
arr.sort(key=itemgetter(i), reverse=r)
This doesn't use the negation trick, so it also works for data other than numbers.
Check this out:
>>> a
[[6, 4, 3], [4, 2, 1], [3, 2, 1], [1, 2, 3]]
>>> i
[[2, 0], [0, 1]]
>>> for j in enumerate(i):
... a.sort(key=lambda x:x[j[1][0]],reverse=False if j[1][1]==0 else True)
... print(a)
...
[[3, 2, 1], [4, 2, 1], [1, 2, 3], [6, 4, 3]]
[[6, 4, 3], [4, 2, 1], [3, 2, 1], [1, 2, 3]]
Is this what you want?
I think in your second example is small mistake. It should be
[[6, 4, 3], [4, 2, 1], [3, 2, 1], [1, 2, 3]]
rather than
[[4,2,1],[3,2,1],[6,4,3],[1,2,3]]
this is under
then it gets sorted with 0th index as (indices[1][0]=0) in descending order as (indices[1][1]=1)
I've got a list of lists, with integers in it, like this:
[[[1, 2, 3], [4, 5, 6, 7]], [[3, 7, 5], [1, 2, 4, 6]]]
Given the integer 1, I would like some way to make a function that will return
[2, 3, 4, 6]
My current method is:
bigList = [[[1, 2, 3], [4, 5, 6, 7]], [[3, 7, 5], [1, 2, 4, 6]]]
hasBeenWith = []
integer = 1
for medList in bigList:
for smallList in medList:
if integer in smallList:
hasBeenWith = hasBeenWith + list(set(smallList) - set(hasBeenWith))
I know this is a naive algorithm. What is a better, more pythonic way to do it?
>>> set(integer for medList in bigList for smallList in medList for integer in smallList if 1 in smallList)
{1, 2, 3, 4, 6}
You can use a set comprehension to loop through all the lists and pick out the elements of the lists that contain 1. Notice that the for loops are in the same order as in your code, they're just all in one line.
>>> set(integer for medList in bigList for smallList in medList for integer in smallList if 1 in smallList and integer != 1)
{2, 3, 4, 6}
Then you'd want to exclude 1 from the result.
>>> list(set(integer for medList in bigList for smallList in medList for integer in smallList if 1 in smallList and integer != 1))
[2, 3, 4, 6]
And if you want the result as a list, convert it to one at the end. Working with a set and switching to a list at the end is more efficient than storing the intermediate results as lists.
By the way, the numbers came out in order here, but that's not guaranteed to happen. Sets aren't ordered, so it's just a coincidence that they're sorted. If you want them in order, add in a call to sorted().
This is a bit shorter, and still pretty readable. Since you only care about the bottom level lists, it would be more readable IMO if you had a function that flattened the nested list structure down to a single iterable of lists (basically what all the chain's and list comprehensions are doing. Then you could just have a single list comprehension that iterates through the flattened structure.
from itertools import chain
num = 1
biglist = [[[1, 2, 3], [4, 5, 6, 7]], [[3, 7, 5], [1, 2, 4, 6]]]
been_with = set(chain(*[x for x in chain(*biglist) if num in x])) - {num}
I want to take a list, e.g. [0, 1, 0, 1, 2, 2, 3], and make a list of lists of (index + 1) for each of the unique elements. For the above, for example, it would be [[1, 3], [2, 4], [5, 6], [7]].
Right now my solution is the ultimate in clunkiness:
list_1 = [0, 1, 0, 1, 2, 2, 3]
maximum = max(list_1)
master = []
for i in range(maximum + 1):
temp_list = []
for j,k in enumerate(list_1):
if k == i:
temp_list.append(j + 1)
master.append(temp_list)
print master
Any thoughts on a more pythonic way to do this would be much appreciated!
I would do this in two steps:
Build a map {value: [list, of, indices], ...}:
index_map = {}
for index, value in enumerate(list_1):
index_map.setdefault(value, []).append(index+1)
Extract the value lists from the dictionary into your master list:
master = [index_map.get(index, []) for index in range(max(index_map)+1)]
For your example, this would give:
>>> index_map
{0: [1, 3], 1: [2, 4], 2: [5, 6], 3: [7]}
>>> master
[[1, 3], [2, 4], [5, 6], [7]]
This implementation iterates over the whole list only once (O(n), where n is len(list_1)) whereas the others so far iterate over the whole list once for each unique element (O(n*m), where m is len(set(list_1))). By taking max(d) rather than max(list_1) you only need to iterate over the length of the unique items, which is also more efficient.
The implementation can be slightly simpler if you make d = collections.defaultdict(list).
list_1 = [0, 1, 0, 1, 2, 2, 3]
master = []
for i in range(max(list_1)+1):
master.append([j+1 for j,k in enumerate(list_1) if k==i])
master = [[i+1 for i in range(len(list_1)) if list_1[i]==j] for j in range(max(list_1)+1)]
It is just the same that your current code, but it uses list comprehension which is often a quite good pythonic way to solve this kind of problem.
I've seen some questions here very related but their answer doesn't work for me. I have a list of lists where some sublists are repeated but their elements may be disordered. For example
g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
The output should be, naturally according to my question:
g = [[1,2,3],[9,0,1],[4,3,2]]
I've tried with set but only removes those lists that are equal (I thought It should work because sets are by definition without order). Other questions i had visited only has examples with lists exactly duplicated or repeated like this: Python : How to remove duplicate lists in a list of list?. For now order of output (for list and sublists) is not a problem.
(ab)using side-effects version of a list comp:
seen = set()
[x for x in g if frozenset(x) not in seen and not seen.add(frozenset(x))]
Out[4]: [[1, 2, 3], [9, 0, 1], [4, 3, 2]]
For those (unlike myself) who don't like using side-effects in this manner:
res = []
seen = set()
for x in g:
x_set = frozenset(x)
if x_set not in seen:
res.append(x)
seen.add(x_set)
The reason that you add frozensets to the set is that you can only add hashable objects to a set, and vanilla sets are not hashable.
If you don't care about the order for lists and sublists (and all items in sublists are unique):
result = set(map(frozenset, g))
If a sublist may have duplicates e.g., [1, 2, 1, 3] then you could use tuple(sorted(sublist)) instead of frozenset(sublist) that removes duplicates from a sublist.
If you want to preserve the order of sublists:
def del_dups(seq, key=frozenset):
seen = {}
pos = 0
for item in seq:
if key(item) not in seen:
seen[key(item)] = True
seq[pos] = item
pos += 1
del seq[pos:]
Example:
del_dups(g, key=lambda x: tuple(sorted(x)))
See In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?
What about using mentioned by roippi frozenset this way:
>>> g = [list(x) for x in set(frozenset(i) for i in [set(i) for i in g])]
[[0, 9, 1], [1, 2, 3], [2, 3, 4]]
I would convert each element in the list to a frozenset (which is hashable), then create a set out of it to remove duplicates:
>>> g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
>>> set(map(frozenset, g))
set([frozenset([0, 9, 1]), frozenset([1, 2, 3]), frozenset([2, 3, 4])])
If you need to convert the elements back to lists:
>>> map(list, set(map(frozenset, g)))
[[0, 9, 1], [1, 2, 3], [2, 3, 4]]
I'm using Python 2.7 and I have a large dictionary that looks a little like this
{J: [92704, 238476902378, 32490872394, 234798327, 2390470], M: [32974097, 237407, 3248707, 32847987, 34879], Z: [8237, 328947, 239487, 234, 182673]}
How can I sum these by value to create a new dictionary that sums the first values in each dictionary, then the second, etc. Like
{FirstValues: J[0]+M[0]+Z[0]}
etc
In [4]: {'FirstValues': sum(e[0] for e in d.itervalues())}
Out[4]: {'FirstValues': 33075038}
where d is your dictionary.
print [sum(row) for row in zip(*yourdict.values())]
yourdict.values() gets all the lists, zip(* ) groups the first, second, etc items together and sum sums each group.
I don't know why do you need dictionary as output, but here it is:
dict(enumerate( [sum(x) for x in zip(*d.values())] ))
from itertools import izip_longest
totals = (sum(vals) for vals in izip_longest(*mydict.itervalues(), fillvalue=0))
print tuple(totals)
In English...
zip the lists (dict values) together, padding with 0 (if you want, you don't have to).
Sum each zipped group
For example,
mydict = {
'J': [1, 2, 3, 4, 5],
'M': [1, 2, 3, 4, 5],
'Z': [1, 2, 3, 4]
}
## When zipped becomes...
([1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4], [5, 5, 0])
## When summed becomes...
(3, 6, 9, 12, 10)
It does really not make sense to create a new dictionary as the new keys are (probably) meaningless. The results don't relate to the original keys. More appropriate is a tuple as results[0] holds the sum of all values at position 0 in the original dict values etc.
If you must have a dict, take the totals iterator and turn it into a dict thus:
new_dict = dict(('Values%d' % idx, val) for idx, val in enumerate(totals))
Say you have some dict like:
d = {'J': [92704, 238476902378, 32490872394, 234798327, 2390470],
'M': [32974097, 237407, 3248707, 32847987, 34879],
'Z': [8237, 328947, 239487, 234, 182673]}
Make a defaultdict (int)
from collections import defaultdict
sum_by_index = defaultdict(int)
for alist in d.values():
for index,num in enumerate(alist):
sum_by_index[index] += num