Slicing a dictionary of list - python

I have a dictionary of lists, each list greater than 50 items, and to simplify, lets say the dictionary keys are ['a','b','c']. I spend way to long trying to figure out a very pythonic was to sort and slice these lists. What I have so far:
dict = dictionary_of_lists under discussion
[dict[k].sort(reverse=True) for k in dict.keys()]
for k, l in dict.items():
slice = 10 if k in ('a','c') else 20
dict[k] = l[:slice]
I end up with a sorted, and trimmed up list, just like I want. But what I wanted was a one line piece of code like [dict[k].sort(reverse=True) for k in dict.keys()] when I slice against the sorted list. And if someone can figure out how to put the sorting and slicing together, they would be my hero.
UPDATE: First, I like being able to ask somewhat complex questions because they help me learn better coding skills (since I am self taught). So thanks everyone below! My new code:
for c in list_of_categories:
list = [getattr(p,c.name) for p in people if hasattr(p,c.name)]
slice = c.get_slice_value # I added an #property function to a class named `Category`
c.total = sum(sorted(list, reverse=True)[:slice])

List comprehensions with side effects are usually considered bad style. Create a new dict instead:
dct = {k: sorted(l, reverse=True)[:10 if k in ('a','c') else 20]
for k, l in dct.items()}
Also slice values look arbitrary at the moment, it might be better to configure them separately, for example:
slices = {
'a': 10,
'b': 10,
'c': 20
}
dct = {k: sorted(l, reverse=True)[:slices[k]]
for k, l in dct.items()}

sort() works in place, affecting each list. You'd want to create new ones:
[sorted(d[k], reverse = True)[:10 if k in ('a','c') else 20] for k in d.keys()]
Note that it's not very readable.

Related

Python dictionary comprehension to group together equal keys

I have a code snippit that groups together equal keys from a list of dicts and adds the dict with equal ObjectID to a list under that key.
Code bellow works, but I am trying to convert it to a Dictionary comprehension
group togheter subblocks if they have equal ObjectID
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = []
output[row["OBJECTID"]].append(row)
Using a comprehension is possible, but likely inefficient in this case, since you need to (a) check if a key is in the dictionary at every iteration, and (b) append to, rather than set the value. You can, however, eliminate some of the boilerplate using collections.defaultdict:
output = defaultdict(list)
for row in subblkDBF:
output[row['OBJECTID']].append(row)
The problem with using a comprehension is that if really want a one-liner, you have to nest a list comprehension that traverses the entire list multiple times (once for each key):
{k: [d for d in subblkDBF if d['OBJECTID'] == k] for k in set(d['OBJECTID'] for d in subblkDBF)}
Iterating over subblkDBF in both the inner and outer loop leads to O(n^2) complexity, which is pointless, especially given how illegible the result is.
As the other answer shows, these problems go away if you're willing to sort the list first, or better yet, if it is already sorted.
If rows are sorted by Object ID (or all rows with equal Object ID are at least next to each other, no matter the overall order of those IDs) you could write a neat dict comprehension using itertools.groupby:
from itertools import groupby
from operator import itemgetter
output = {k: list(g) for k, g in groupby(subblkDBF, key=itemgetter("OBJECTID"))}
However, if this is not the case, you'd have to sort by the same key first, making this a lot less neat, and less efficient than above or the loop (O(nlogn) instead of O(n)).
key = itemgetter("OBJECTID")
output = {k: list(g) for k, g in groupby(sorted(subblkDBF, key=key), key=key)}
You can adding an else block to safe on time n slightly improve perfomrance a little:
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = [row]
else:
output[row["OBJECTID"]].append(row)

tuple to list conversion within dictionary values (list of lists (and tuples))

I am dealing with a dictionary that is formatted as such:
dic = {'Start': [['Story' , '.']],
'Wonderful': [('thing1',), ["thing1", "and", "thing2"]],
'Amazing': [["The", "thing", "action", "the", "thing"]],
'Fantastic': [['loved'], ['ate'], ['messaged']],
'Example': [['bus'], ['car'], ['truck'], ['pickup']]}
if you notice, in the story key, there is a tuple within a list. I am looking for a way to convert all tuples within the inner lists of each key into lists.
I have tried the following:
for value in dic.values():
for inner in value:
inner = list(inner)
but that does not work and I don't see why. I also tried an if type(inner) = tuple statement to try and convert it only if its a tuple but that is not working either... Any help would be very greatly appreciated.
edit: I am not allowed to import, and only have really learned a basic level of python. A solution that I could understand with that in mind is preferred.
You need to invest some time learning how assignment in Python works.
inner = list(inner) constructs a list (right hand side), then binds the name inner to that new list and then... you do nothing with it.
Fixing your code:
for k, vs in dic.items():
dic[k] = [list(x) if isinstance(x, tuple) else x for x in vs]
You need to update the element by its index
for curr in dic.values():
for i, v in enumerate(curr):
if isinstance(v, tuple):
curr[i] = list(v)
print(dic)
Your title, data and code suggest that you only have tuples and lists there and are willing to run list() on all of them, so here's a short way to convert them all to lists and assign them back into the outer lists (which is what you were missing) (Try it online!):
for value in dic.values():
value[:] = map(list, value)
And a fun way (Try it online!):
for value in dic.values():
for i, [*value[i]] in enumerate(value):
pass

List comprehension for dict in dict

I have the following python dictionary (a dict in a dict):
d = {'k1': {'kk1':'v1','kk2':'v2','kk3':'v3'},'k2':{'kk1':'v4'}}
I can't get my brains to figure out the list comprehension to get a list of all values (v1, v2...). If you can give my an example with a lambda also, that you be nice.
The goal is to have values_lst = ['v1','v2','v3','v4']
Thanks
Combine two loops to "flatten" the dict of dicts. Loop over the values of d first, and then loop over the values of the values of d. The syntax might be a bit hard to grasp at first:
values_lst = [v for x in d.values() for v in x.values()]

Make this faster. (Min, Max in same iteration using a condition)

I would like to ask if/how could I rewrite those lines below, to run faster.
*(-10000, 10000) is just a range where I can be sure my numbers are between.
first = 10000
last = -10000
for key in my_data.keys():
if "LastFirst_" in key: # In my_data there are many more keys with lots of vals.
first = min(first, min(my_data[key]))
last = max(last, max(my_data[key]))
print first, last
Also, is there any pythonic way to write that (even if that wouldn't mean it will run faster)?
Thx
Use the * operator to unpack the values:
>>> my_data = {'LastFirst_1':[1, 4, 5], 'LastFirst_2':[2, 4, 6]}
>>> d = [item for k,v in my_data.items() if 'LastFirst_' in k for item in v]
>>> first = 2
>>> last = 5
>>> min(first, *d)
1
>>> max(last, *d)
6
You could use some comprehensions to simplify the code.
first = min(min(data) for (key, data) in my_data.items() if "LastFirst_" in key)
last = max(max(data) for (key, data) in my_data.items() if "LastFirst_" in key)
The min and max functions are overloaded to take either multiple values (as you use it), or one sequence of values, so you can pass in iterables (e.g. lists) and get the min or max of them.
Also, if you're only interested in the values, use .values() or itervalues(). If you're interested in both, use .items() or .iteritems(). (In Python 3, there is no .iter- version.)
If you have many sequences, you can use itertools.chain to make them one long iterable. You can also manually string them along using multiple for in a single comprehension, but that can be distasteful.
import itertools
def flatten1(iterables):
# The "list" is necessary, because we want to use this twice
# but `chain` returns an iterator, which can only be used once.
return list(itertools.chain(*iterables))
# Note: The "(" ")" indicates that this is an iterator, not a list.
valid_lists = (v for k,v in my_data.iteritems() if "LastFirst_" in k)
valid_values = flatten1(valid_lists)
# Alternative: [w for w in v for k,v in my_data.iteritems() if "LastFirst_" in k]
first = min(valid_values)
last = max(valid_values)
print first, last
If the maximum and minimum elements are NOT in the dict, then the coder should decide what to do, but I would suggest that they consider allowing the default behavior of max/min (probably a raised exception, or the None value), rather than try to guess the upper or lower bound. Either one would be more Pythonic.
In Python 3, you may specify a default argument, e.g. max(valid_values, default=10000).
my_data = {'LastFirst_a': [1, 2, 34000], 'LastFirst_b': [-12000, 1, 5]}
first = 10000
last = -10000
# Note: replace .items() with .iteritems() if you're using Python 2.
relevant_data = [el for k, v in my_data.items() for el in v if "LastFirst_" in k]
# maybe faster:
# relevant_data = [el for k, v in my_data.items() for el in v if k.startswith("LastFirst_")]
first = max(first, max(relevant_data))
last = min(last, min(relevant_data))
print(first, last)
values = [my_data[k] for k in my_data if 'LastKey_' in k]
flattened = [item for sublist in values for item in sublist]
min(first, min(flattened))
max(last, max(flattened))
or
values = [item for sublist in (j for a, j in d.iteritems() if 'LastKey_' in a) for item in sublist]
min(first, min(values))
max(last, max(values))
I was running some benchmarks and it seems that the second solution is slightly faster than the first.
However, I also compared these two versions with the code posted by other posters.
solution one: 0.648876905441
solution two: 0.634277105331
solution three (TigerhawkT3): 2.14495801926
solution four (Def_Os): 1.07884407043
solution five (leewangzhong): 0.635314941406
based on a randomly generated dictionary of 1 million keys.
I think that leewangzhong's solution is really good. Besides the timing shown above, in the next experiments it's resulting slightly faster than my second solution (we are talking about milliseconds, though), like:
solution one: 0.678879022598
solution two: 0.62641787529
solution three: 2.15943193436
solution four: 1.05863213539
solution five: 0.611482858658
Itertools is really a great module!

Replacing multiple occurrences in nested arrays

I've got this python dictionary "mydict", containing arrays, here's what it looks like :
mydict = dict(
one=['foo', 'bar', 'foobar', 'barfoo', 'example'],
two=['bar', 'example', 'foobar'],
three=['foo', 'example'])
i'd like to replace all the occurrences of "example" by "someotherword".
While I can already think of a few ways to do it, is there a most "pythonic" method to achieve this ?
for arr in mydict.values():
for i, s in enumerate(arr):
if s == 'example':
arr[i] = 'someotherword'
If you want to leave the original untouched, and just return a new dictionary with the modifications applied, you can use:
replacements = {'example' : 'someotherword'}
newdict = dict((k, [replacements.get(x,x) for x in v])
for (k,v) in mydict.iteritems())
This also has the advantage that its easy to extend with new words just by adding them to the replacements dict. If you want to mutate an existing dict in place, you can use the same approach:
for l in mydict.values():
l[:]=[replacements.get(x,x) for x in l]
However it's probably going to be slower than J.F Sebastian's solution, as it rebuilds the whole list rather than just modifying the changed elements in place.
Here's another take:
for key, val in mydict.items():
mydict[key] = ["someotherword" if x == "example" else x for x in val]
I've found that building lists is very fast, but of course profile if performance is important.

Categories

Resources