Binning variable length lists in python - python

I have a dictionary d with 100 keys where the values are variable length lists, e.g.
In[165]: d.values()[0]
Out[165]:
[0.0432,
0.0336,
0.0345,
0.044,
0.0394,
0.0555]
In[166]: d.values()[1]
Out[166]:
[0.0236,
0.0333,
0.0571]
Here's what I'd like to do: for every list in d.values(), I'd like to organize the values into 10 bins (where a value gets tossed into a bin if it satisfies the criteria, e.g. is between 0.03 and 0.04, 0.04 and 0.05, etc.).
What'd I'd like to end up with is something that looks exactly like d, but instead of d.values()[0] being a list of numbers, I'd like it to be a list of lists, like so:
In[167]: d.values()[0]
Out[167]:
[[0.0336,0.0345,0.0394],
[0.0432,0.044],
[0.0555]]
Each key would still be associated with the same values, but they'd be structured into the 10 bins.
I've been going crazy with nested for loops and if/elses, etc. What is the best way to go about this?
EDIT: Hi, all. Just wanted to let you know I resolved my issues. I used a variation of #Brent Washburne's answer. Thanks for the help!

def bin(values):
bins = [[] for _ in range(10)] # create ten bins
for n in values:
b = int(n * 100) # normalize the value to the bin number
bins[b].append(n) # add the number to the bin
return bins
d = [0.0432,
0.0336,
0.0345,
0.044,
0.0394,
0.0555]
print bin(d)
The result is:
[[], [], [], [0.0336, 0.0345, 0.0394], [0.0432, 0.044], [0.0555], [], [], [], []]

You can use itertools.groupby() function by passing a proper key-function in order to categorize your items. And in this case you can use floor(x*100) as your key-function:
>>> from math import floor
>>> from itertools import groupby
>>> lst = [0.0432, 0.0336, 0.0345, 0.044, 0.0394, 0.0555]
>>> [list(g) for _,g in groupby(sorted(lst), key=lambda x: floor(x*100))]
[[0.0336, 0.0345, 0.0394], [0.0432, 0.044], [0.0555]]
And for applying this on your values you can use a dictionary comprehension:
def categorizer(val):
return [list(g) for _,g in groupby(sorted(lst), key=lambda x: floor(x*100))]
new_dict = {k:categorizer(v) for k,v in old_dict.items()}
As another approach which is more optimized in term of execution speed you can use a dictionary for categorizing:
>>> def categorizer(val, d={}):
... for i in val:
... d.setdefault(floor(i*100),[]).append(i)
... return d.values()

Why not make the values a set of dictionaries where the ke is the bin indicator and the values a list of those items that are in that bin?
yoe would define
newd = [{bin1:[], bin2:[], ...binn:[]}, ... ]
newd[0][bin1] = (list of items in d[0] that belong in bin1)
You now have a list of dictionaries each of which has the appropriate bin listings.
newd[0] is now the equivalent of a dictionary built from d[0] each key (which I call bin1, bin2, ... binn) contains a list of the values that are appropriate for that bin. Thus we have `newd[0][bin1], newd[0][bin2, ... new[k][lastbin]
Dictionary creation allows you to create the appropriate key and value list as you go along. If there is not yet a particular bin key, create the empty list and then the append of the value to the list will succeed.
Now when you want to identify elements of a bin, you can loop through the list of newd and extract whichever bin that you want. This allows you to have bins with no entry without having to create empty lists. If a bin key is not in newd, the retrieve is set to return an empty list as a default (to avoid the dictionary invalid key exception).

Related

Python: List of pairs. Making every pair single and sum the values of the same keys

I have a list of pairs.The list contains items of [x,y].I would like to make list or dictionary making the left item the key and right the value.The list maybe contains multiple times the same key. I want to sum the values and keep one time the key.
E.x
pairs[0]=['3106124650', 2.86]
pairs[1]=['3106124650', 8.86]
pairs[2]=['5216154610', 23.77]
I want to keep '3106124650' one time and sum the values.So my new list or dictionary will contain one time this key with value 11.72.
'3106124650',11.72
Here's a way. For large datasets, numpy will probably be faster though.
import collections
result = collections.defaultdict(lambda : 0)
for k,v in pairs:
result[k]+=v
sumdict = dict()
for i, v in pairs:
sumdict[i] = v + sumdict.get(i, 0)
li=[['a',1],['a',2],['b',3],['c',4]]
d={}
for w in li:
d[w[0]]=w[1]+d.get(w[0],0)
Output:{'a': 3, 'b': 3, 'c': 4}
you can try this:
d={}
for entry in pairs:
if entry[0] in d:
d[entry[0]]+=entry[1]
else:
d[entry[0]]=entry[1]

Python dictionary - list compute to avergae

I have a dictionary with a list as value.
I want to have an average of this list.
How do I compute that?
dict1 = {
'Monty Python and the Holy Grail': [[9, 10, 9.5, 8.5, 3, 7.5, 8]],
"Monty Python's Life of Brian": [[10, 10, 0, 9, 1, 8, 7.5, 8, 6, 9]],
"Monty Python's Meaning of Life": [[7, 6, 5]],
'And Now For Something Completely Different': [[6, 5, 6, 6]]
}
I have tried
dict2 = {}
for key in dict1:
dict2[key] = sum(dict1[key])
but it says: "TypeError: unsupported operand type(s) for +: 'int' and 'list'"
As noted in other posts, the first issue is that your dictionary keys are lists of lists, and not simple lists. The second issue is that you were calling sum, without then dividing by the number of elements, which would not give you an average.
If you are willing to use numpy, try this:
import numpy as np
dict_of_means = {k:np.mean(v) for k,v in dict1.items()}
>>> dict_of_means
{'Monty Python and the Holy Grail': 7.9285714285714288, "Monty Python's Life of Brian": 6.8499999999999996, "Monty Python's Meaning of Life": 6.0, 'And Now For Something Completely Different': 5.75}
Or, without using numpy or any external packages, you can do it manually by first flattening your lists of lists in the keys, and going through the same type of dict comprehension, but getting the sum of your flattened list and then dividing by the number of elements in that flattened list:
dict_of_means = {k: sum([i for x in v for i in x])/len([i for x in v for i in x])
for k, v in dict1.items()}
Note that [i for x in v for i in x] takes a list of lists v and flattens it to a simple list.
FYI, the dictionary comprehension syntax is more or less equivalent to this for loop:
dict_of_means = {}
for k,v in dict1.items():
dict_of_means[k] = sum([i for x in v for i in x])/len([i for x in v for i in x])
There is an in-depth description of dictionary comprehensions in the question I linked above.
If you don't want to use external libraries and you want to keep that structure:
dict2 = {}
for key in dict1:
dict2[key] = sum(dict1[key][0])/len(dict1[key][0])
The problem is that your values are not 1D lists, they're 2D lists. If you simply remove the extra brackets, your solution should work.
Also don't forget to divide the sum of the list by the length of the list (and if you're using python 2, to import the new division).
You can do that simply by using itertools.chain and a helper function to compute average.
Here is the helper function to compute average
def average(iterable):
sum = 0.0
count = 0
for v in iterable:
sum += v
count += 1
if count > 0:
return sum / count
If you want to average for each key, you can simply do that using dictionary comprehension and helper function we wrote above:
from itertools import chain
averages = {k: average(chain.from_iterable(v)) for k, v in dict1.items()}
Or If you want to get average across all the keys:
from itertools import chain
average(chain.from_iterable(chain.from_iterable(dict1.values())))
Your lists are nested, all being lists of a single item, which is itself a list of the actual numbers. Here I extract these lists using val[0], val being the outer lists:
for key, val in dict1.copy().items():
the_list = val[0]
dict1[key] = sum(the_list)/len(the_list)
This replaces all these nested lists with the average you are after. Also, you should never mutate anything while looping over it. Therefore, a copy of the dict is used above.
Alternatively you could make use of the fancier dictionary comprehension:
dict2 = {key: sum(the_list)/len(the_list) for key, (the_list,) in dict1.items()}
Note the clever but subtle way the inner list is extracted here.

dictionary comprehension of multiple items in set in python

How do I use dictionary comprehension to obtain the average of the student scores
co_dct = {"Juan":[90,85,98], "Lana":[94,80,100], "Alicia":[100,90], "Sam":[]}
co_dct = d/d[] for d in co_dct
print(co_dct)
As a dictionary comprehension:
>>> co_dct = {"Juan":[90,85,98], "Lana":[94,80,100], "Alicia":[100,90], "Sam":[]}
>>> {k: sum(co_dct[k])/float(len(co_dct[k])) for k in co_dct if co_dct[k]}
{'Juan': 91.0, 'Lana': 91.33333333333333, 'Alicia': 95.0}
Note the use of a filter to guard against division by zero errors when the sample list is empty. This results in the loss of keys that have empty samples, but that seems reasonable since you can't produce an average without data.
Since you are using Python 3 another way is to use statistics.mean():
>>> from statistics import mean
>>> {k: mean(co_dct[k]) for k in co_dct if co_dct[k]}
{'Lana': 91.33333333333333, 'Alicia': 95, 'Juan': 91}
A minor optimisation might be to use co_dct.items() to avoid multiple dict lookups:
>>> {k: mean(values) for k, values in co_dct.items() if values}

Access keys and vals in listed python-dict

I've got a list k with the 0'th element:
k[0]: {'pattern': 0, 'pos': array([ 9.83698, 106.539 , 130.314 ]), 'id': 1922}
(It looks like a dict, but its a list indeed)
when I iterate through the 0'th element of the list k and print out each element I Get:
for i in k:
print i
=>output:
pattern
pos
id
I'd like to access not only the keys but the values as well. How to do this?
I've also tried to convert the list back into a dict using zip and izip, but same resutlts...i.e. only keys are printed, no values...
any help will be appreciated
thx in advance
you can use k.values() to iterate through the values, or k.items() to iterate through (key, value) pairs
for value in k.values():
print value
for key, value in k.items():
print key, value
The fastest way to iterate over the dictionary you created (it is in fact a dictionary) is not to create the lists of keys/values using k[0].keys(), k[0].values and k[0].items() but using k[0].iteritems() which creates a dictionary iterator that returns just the pairs without allocating lists in the memory.
It also runs much faster for big dictionaries (a being the dictionary):
>>> non_iter_timer = timeit.Timer("for k,v in a.items(): k + v", setup="a = {x:x for x in xrange(10000000)}")
>>> non_iter_timer.repeat(3, 10)
[25.612606023166585, 25.100741935717622, 24.840450306339463]
>>> iter_timer = timeit.Timer("for k,v in a.iteritems(): k + v", setup="a = {x:x for x in xrange(10000000)}")
>>> iter_timer.repeat(3, 10)
[9.26259596885518, 9.198298194571748, 9.77466250122282]

Python: OrderedDictionary sorting based on length of key's value

I have an object like this:
t = {'rand_key_1': ['x'], 'rand_key_2': [13,23], 'rand_key_3': [(1)], 'rk5': [1,100,3,4,3,3]}
a dictionary with random keys (string and/or int) which ALL have a list as a value, with varying sizes.
I want to turn this dictionary into an OrderedDict which is ordered depending on the Length of the list of the dictionary items. So after ordering I want to get:
t_ordered = {'rk5': ..., 'rand_key_2': .., 'rand_key_1': .., 'rand_key_3': ..}
(if two or more items have same value, their order do not really matter.
I tried this but I am failing:
OrderedDict(sorted(d, key=lambda t: len(t[1])))
I am not experiences so excuse me if what I try is uber stupid.
What can I do?
Thank you.
You were actually very close with the sorting function you passed to sorted. The thing to note is that sorted will return an interable of the dictionaries keys in order. So if we fix your function to index the dictionary with each key:
>>> sorted(t, key=lambda k: len(t[k]))
['rand_key_3', 'rand_key_1', 'rand_key_2', 'rk5']
You can also specify that the keys are returned in reverse order and iterating directly over these keys:
>>> for sorted_key in sorted(t, key=lambda k: len(t[k]), reverse=True):
... print sorted_key, t[sorted_key]
rk5 [1, 100, 3, 4, 3, 3]
rand_key_2 [13, 23]
rand_key_3 [1]
rand_key_1 ['x']
Usually you wouldn't need to create an OrderedDict, as you would just iterate over a new sorted list using the latest dictionary data.
Using simple dictionary sorting first and then using OrderedDict():
>>> from collections import OrderedDict as od
>>> k=sorted(t, key=lambda x:len(t[x]), reverse=True)
>>> k
['rk5', 'rand_key_2', 'rand_key_3', 'rand_key_1']
>>> od((x, t[x]) for x in k)
OrderedDict([('rk5', [1, 100, 3, 4, 3, 3]), ('rand_key_2', [13, 23]), ('rand_key_3', [1]), ('rand_key_1', ['x'])])
Since an ordered dictionary remembers its insertion order, so you can do this:
OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))
OrderedDict in Python is a collection that remembers the order in which items were inserted. Ordered in this context does not mean sorted.
If all you need is to get all the items in sorted order you can do something like this:
for key, value in sorted(t, key = lambda x: -len(x[0])):
# do something with key and value
However, you are still using an unsorted data structure - just iterating over it in sorted order. This still does not support operations like looking up the k-th element, or the successor or predecessor of an element in the dict.

Categories

Resources