Python: Conversion from dictionary to array - python

I have a Python dictionary (say D) where every key corresponds to some predefined list. I want to create an array with two columns where the first column corresponds to the keys of the dictionary D and the second column corresponds to the sum of the elements in the corresponding lists. As an example, if,
D = {1: [5,55], 2: [25,512], 3: [2, 18]}
Then, the array that I wish to create should be,
A = array( [[1,60], [2,537], [3, 20]] )
I have given a small example here, but I would like to know of a way where the implementation is the fastest. Presently, I am using the following method:
A_List = map( lambda x: [x,sum(D[x])] , D.keys() )
I realize that the output from my method is in the form of a list. I can convert it into an array in another step, but I don't know if that will be a fast method (I presume that the use of arrays will be faster than the use of lists). I will really appreciate an answer where I can know what's the fastest way of achieving this aim.

You can use a list comprehension to create the desired output:
>>> [(k, sum(v)) for k, v in D.items()] # Py2 use D.iteritems()
[(1, 60), (2, 537), (3, 20)]
On my computer, this runs about 50% quicker than the map(lambda:.., D) version.
Note: On py3 map just returns a generator so you need to list(map(...)) to get the real time it takes.

I hope that helps:
Build an array with the values of the keys of D:
first_column = list(D.keys())
Build an array with the sum of values in each key:
second_column = [sum(D[key]) for key in D.keys()]
Build an array with shape [first_column,second_column]
your_array = list(zip(first_column,second_column))

You can try this also:
a=[]
for i in D.keys():
a+=[[i,sum(D[i])]]

Related

How to sort a list containing frozensets (python)

I have a list of frozensets that I'd like to sort, Each of the frozensets contains a single integer value that results from an intersection operation between two sets:
k = frozenset(w) & frozenset(string.digits)
d[k] = w # w is the value
list(d) # sorted(d) doesn't work since the keys are sets and sets are unordered.
Here is the printed list:
[frozenset({'2'}), frozenset({'1'}), frozenset({'4'}), frozenset({'3'})]
How can I sort the list using the values contained in the sets?
You need to provide function as key to sorted which would accept frozenset as argument and return something which might be compared. If each frozenset has exactly 1 element and said element is always single digit then you might use max function (it will extract that single element, as sole element is always biggest element of frozenset) that is
d1 = [frozenset({'2'}), frozenset({'1'}), frozenset({'4'}), frozenset({'3'})]
d2 = sorted(d1,key=max)
print(d2)
output
[frozenset({'1'}), frozenset({'2'}), frozenset({'3'}), frozenset({'4'})]
If you want to know more read Sorting HOW TO
Previous answers can not sorted correctly, Because of strings
d = [frozenset({'224'}), frozenset({'346'}), frozenset({'2'}), frozenset({'22345'})]
sorted(d, key=lambda x: int(list(x)[0]))
Output:
[frozenset({'2'}),
frozenset({'224'}),
frozenset({'346'}),
frozenset({'22345'})]
Honestly, unless you really need to keep the elements as frozenset, the best might be to generate a list of values upstream ([2, 1, 4, 3]).
Anyway, to be able to sort the frozensets you need to make them ordered elements, for instance by converting to tuple. You can do this transparently using the key parameter of sorted
l = [frozenset({'2'}), frozenset({'1'}), frozenset({'4'}), frozenset({'3'})]
sorted(l, key=tuple)
or natsorted for strings with multiple digits:
from natsort import natsorted
l = [frozenset({'2'}), frozenset({'1'}), frozenset({'14'}), frozenset({'3'})]
natsorted(l, key=tuple)
output:
[frozenset({'1'}), frozenset({'2'}), frozenset({'3'}), frozenset({'14'})]

Combinations of elements at various keys of a dict

I have a python dict which has various keys such that dict.keys()=[1,2,3] and each key holds an array of possibly different size such that dict[1] = [1,3,6], dict[2] = ['a', 'b'] dict[3] = [x]. I want to have a new array where I get all possible combinations of the n elements from each of the arrays.
For example, if the arrays were provided beforehand,
arr_1 = itertools.combinations(dict[1],n)
arr_2 = itertools.combinations(dict[2],n)
arr_3 = itertools.combinations(dict[3],n)
and then finally,
final = itertools.product(arr_1, arr_2, arr_3)
In a scenario where I do not the keys of the dict and the array sizes, how can I create the final array ?
If I understand correctly
itertools.product(*[itertools.combinations(v, min(n, len(v)) for v in dic.values()])
should do the trick.
edit: adjusted w.r.t. comments.
Your question is a bit coarsely stated. If you are asking, how you can dynamically form the final array when you need to determine dict keys and values on the fly, you could do it like this:
def combo_dist(dict): # <--- dict is your starting dictionary
array = []
for v in dict.values(): # <--- all values from your dict
arrays.append(itertools.combination(v,len(v)))
# now array should be populated and you can make your final product:
final = itertools.product(*array)
return final
# now call this function:
final_array = combo_dict(your_dict)
now, if I understood correctly, this is the spelled out version of the algorithm. You could actually do a one-liner with list comprehension:
final_array = itertools.product(*[itertools.combinations(v, len(v)) for v in your_dict.values()])
but mind, that here the order of values is not deterministic. You may want to use sorted() as well in one of the steps. Depends on your needs.

most efficient way to return values of dict in a list on a pre-determine order

assuming that I have a dictionary such has:
x = {'a':1, 'c':4, 'b':5, 'z':3}
what would be the most efficient way to return a list (or numpy 1-D array) of pre-determine key order. I want the list to look be in this order -> [a,b,c,z] i.e.
[1, 5, 4, 3]
thanks!
The best answer here is "hidden" in the comments:
Store the keys in a list, iterate over the list and retrieve the keys:
order = ["a", "b", "c", "z"]; items = [x[key] for key in order] ( provided by kindall )
I don't know about "the most efficient way" (since I don't know what cost you are trying to optimize), but here is one way:
x = {'a':1, 'c':4, 'b':5, 'z':3}
y = list(map(x.get, sorted(x)))
print(y)
Here is another choice:
y = [v for k,v in sorted(x.items())]
There are a couple of ways you could do this. The simplest would be using an OrderedDict, which you could read about here.
Another would be to have a separate list of the keys. You could iterate through that list, getting the corresponding values from the dictionary, and appending them to a new list. This would give you the values in the order you want. Let me know if you would like an example of this.
(Speed-wise, I believe OrderedDict would run the best, the second option would be O(N).)
Yet another approach which first sorts the keys, then retrieves the values.
In [51]: [x[el] for el in sorted(list(x.keys()))]
Out[51]: [1, 5, 4, 3]

What is the proper way to print a nested list with the highest value in Python

I have a a nested list and I'm trying to get the sum and print the list that has the highest numerical value when the individual numbers are summed together
x = [[1,2,3],[4,5,6],[7,8,9]]
highest = list()
for i in x:
highest.append(sum(i))
for ind, a in enumerate(highest):
if a == max(highest):
print(x[ind])
I've been able to print out the results but I think there should be a simple and more Pythonic way of doing this (Maybe using a list comprehension).
How would I do this?
How about:
print(max(x, key=sum))
Demo:
>>> x = [[1,2,3],[4,5,6],[7,8,9]]
>>> print(max(x, key=sum))
[7, 8, 9]
This works because max (along with a number of other python builtins like min, sort ...) accepts a function to be used for the comparison. In this case, I just said that we should compare the elements in x based on their individual sum and Bob's our uncle, we're done!

Multi Dimensional List - Sum Integer Element X by Common String Element Y

I have a multi dimensional list:
multiDimList = [['a',1],['a',1],['a',1],['b',2],['c',3],['c',3]]
I'm trying to sum the instances of element [1] where element [0] is common.
To put it more clearly, my desired output is another multi dimensional list:
multiDimListSum = [['a',3],['b',2],['c',6]]
I see I can access, say the value '2' in multiDimList by
x = multiDimList [3][1]
so I can grab the individual elements, and could probably build some sort of function to do this job, but it'd would be disgusting.
Does anyone have a suggestion of how to do this pythonically?
Assuming your actual sequence has similar elements grouped together as in your example (all instances of 'a', 'b' etc. together), you can use itertools.groupby() and operator.itemgetter():
from itertools import groupby
from operator import itemgetter
[[k, sum(v[1] for v in g)] for k, g in groupby(multiDimList, itemgetter(0))]
# result: [['a', 3], ['b', 2], ['c', 6]]
Zero Piraeus's answer covers the case when field entries are grouped in order. If they're not, then the following is short and reasonably efficient.
from collections import Counter
reduce(lambda c,x: c.update({x[0]: x[1]}) or c, multiDimList, Counter())
This returns a collection, accessible by element name. If you prefer it as a list you can call the .items() method on it, but note that the order of the labels in the output may be different from the order in the input even in the cases where the input was consistently ordered.
You could use a dict to accumulate the total associated to each string
d = {}
multiDimList = [['a',1],['a',1],['a',1],['b',2],['c',3],['c',3]]
for string, value in multiDimList:
# Retrieves the current value in the dict if it exists or 0
current_value = d.get(string, 0)
d[string] += value
print d # {'a': 3, 'b': 2, 'c': 6}
You can then access the value for b by using d["b"].

Categories

Resources