Make this faster. (Min, Max in same iteration using a condition) - python

I would like to ask if/how could I rewrite those lines below, to run faster.
*(-10000, 10000) is just a range where I can be sure my numbers are between.
first = 10000
last = -10000
for key in my_data.keys():
if "LastFirst_" in key: # In my_data there are many more keys with lots of vals.
first = min(first, min(my_data[key]))
last = max(last, max(my_data[key]))
print first, last
Also, is there any pythonic way to write that (even if that wouldn't mean it will run faster)?
Thx

Use the * operator to unpack the values:
>>> my_data = {'LastFirst_1':[1, 4, 5], 'LastFirst_2':[2, 4, 6]}
>>> d = [item for k,v in my_data.items() if 'LastFirst_' in k for item in v]
>>> first = 2
>>> last = 5
>>> min(first, *d)
1
>>> max(last, *d)
6

You could use some comprehensions to simplify the code.
first = min(min(data) for (key, data) in my_data.items() if "LastFirst_" in key)
last = max(max(data) for (key, data) in my_data.items() if "LastFirst_" in key)

The min and max functions are overloaded to take either multiple values (as you use it), or one sequence of values, so you can pass in iterables (e.g. lists) and get the min or max of them.
Also, if you're only interested in the values, use .values() or itervalues(). If you're interested in both, use .items() or .iteritems(). (In Python 3, there is no .iter- version.)
If you have many sequences, you can use itertools.chain to make them one long iterable. You can also manually string them along using multiple for in a single comprehension, but that can be distasteful.
import itertools
def flatten1(iterables):
# The "list" is necessary, because we want to use this twice
# but `chain` returns an iterator, which can only be used once.
return list(itertools.chain(*iterables))
# Note: The "(" ")" indicates that this is an iterator, not a list.
valid_lists = (v for k,v in my_data.iteritems() if "LastFirst_" in k)
valid_values = flatten1(valid_lists)
# Alternative: [w for w in v for k,v in my_data.iteritems() if "LastFirst_" in k]
first = min(valid_values)
last = max(valid_values)
print first, last
If the maximum and minimum elements are NOT in the dict, then the coder should decide what to do, but I would suggest that they consider allowing the default behavior of max/min (probably a raised exception, or the None value), rather than try to guess the upper or lower bound. Either one would be more Pythonic.
In Python 3, you may specify a default argument, e.g. max(valid_values, default=10000).

my_data = {'LastFirst_a': [1, 2, 34000], 'LastFirst_b': [-12000, 1, 5]}
first = 10000
last = -10000
# Note: replace .items() with .iteritems() if you're using Python 2.
relevant_data = [el for k, v in my_data.items() for el in v if "LastFirst_" in k]
# maybe faster:
# relevant_data = [el for k, v in my_data.items() for el in v if k.startswith("LastFirst_")]
first = max(first, max(relevant_data))
last = min(last, min(relevant_data))
print(first, last)

values = [my_data[k] for k in my_data if 'LastKey_' in k]
flattened = [item for sublist in values for item in sublist]
min(first, min(flattened))
max(last, max(flattened))
or
values = [item for sublist in (j for a, j in d.iteritems() if 'LastKey_' in a) for item in sublist]
min(first, min(values))
max(last, max(values))
I was running some benchmarks and it seems that the second solution is slightly faster than the first.
However, I also compared these two versions with the code posted by other posters.
solution one: 0.648876905441
solution two: 0.634277105331
solution three (TigerhawkT3): 2.14495801926
solution four (Def_Os): 1.07884407043
solution five (leewangzhong): 0.635314941406
based on a randomly generated dictionary of 1 million keys.
I think that leewangzhong's solution is really good. Besides the timing shown above, in the next experiments it's resulting slightly faster than my second solution (we are talking about milliseconds, though), like:
solution one: 0.678879022598
solution two: 0.62641787529
solution three: 2.15943193436
solution four: 1.05863213539
solution five: 0.611482858658
Itertools is really a great module!

Related

Python dictionary comprehension to group together equal keys

I have a code snippit that groups together equal keys from a list of dicts and adds the dict with equal ObjectID to a list under that key.
Code bellow works, but I am trying to convert it to a Dictionary comprehension
group togheter subblocks if they have equal ObjectID
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = []
output[row["OBJECTID"]].append(row)
Using a comprehension is possible, but likely inefficient in this case, since you need to (a) check if a key is in the dictionary at every iteration, and (b) append to, rather than set the value. You can, however, eliminate some of the boilerplate using collections.defaultdict:
output = defaultdict(list)
for row in subblkDBF:
output[row['OBJECTID']].append(row)
The problem with using a comprehension is that if really want a one-liner, you have to nest a list comprehension that traverses the entire list multiple times (once for each key):
{k: [d for d in subblkDBF if d['OBJECTID'] == k] for k in set(d['OBJECTID'] for d in subblkDBF)}
Iterating over subblkDBF in both the inner and outer loop leads to O(n^2) complexity, which is pointless, especially given how illegible the result is.
As the other answer shows, these problems go away if you're willing to sort the list first, or better yet, if it is already sorted.
If rows are sorted by Object ID (or all rows with equal Object ID are at least next to each other, no matter the overall order of those IDs) you could write a neat dict comprehension using itertools.groupby:
from itertools import groupby
from operator import itemgetter
output = {k: list(g) for k, g in groupby(subblkDBF, key=itemgetter("OBJECTID"))}
However, if this is not the case, you'd have to sort by the same key first, making this a lot less neat, and less efficient than above or the loop (O(nlogn) instead of O(n)).
key = itemgetter("OBJECTID")
output = {k: list(g) for k, g in groupby(sorted(subblkDBF, key=key), key=key)}
You can adding an else block to safe on time n slightly improve perfomrance a little:
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = [row]
else:
output[row["OBJECTID"]].append(row)

How can I find the first time an element occurs for the second time?

I would like to find the first time an element occurs for the second time in a list. For example, if my list was
['1','2','1B','2B','2B','2','1B','1']
the result should be '2B' (or it could return the index 4), since the element '2B' is the first element to occur twice (going left to right).
I know I can do this with a basic for loop counting occurrences as I go along; I just wondered what's the most efficient way to do it.
You can't do better than worst case O(N). If you want be concise and don't mind some side-effect kung-fu, you can use next with a conditional generator:
lst = ['1','2','1B','2B','2B','2','1B','1']
s = set()
next(x for x in lst if x in s or s.add(x))
# '2B'
You could loop over the elements in the list, keep track of which have appeared by adding them to a set, and break as soon as an element is already in the set:
l = ['1','2','1B','2B','2B','2','1B','1']
s = set()
for i in l:
if i in s:
result = i
break
else:
s.add(i)
print(result)
# '2B'
You could do:
appearance = set()
data = ['1','2','1B','2B','2B','2','1B','1']
for i, d in enumerate(data):
if d in appearance:
print(i)
break
else:
appearance.add(d)
Output
4
x = ['1','2','1B','2B','2B','2','1B','1']
d = {}
for i in x:
if not i in d:
d[i] = 0
else:
print(i)
break
output
2B
Another solution but not as efficient as yatu's answer.
l = ['1','2','1B','2B','2B','2','1B','1']
next(x for i, x in enumerate(l, 1) if len(l[:i]) > len(set(l[:i])))
# '2B'
In addition to the great answers posted, it's worth mentioning itertools.takewhile:
>>> from itertools import takewhile
>>> ls = ['1','2','1B','2B','2B','2','1B','1']
>>> seen = set()
>>> len(list(takewhile(lambda x: x not in seen and not seen.add(x), ls)))
4
or
>>> list(takewhile(lambda x: x not in seen and not seen.add(x), ls)).pop()
'2B'
The above raises IndexError if the list is empty and both methods return the whole list if all items are unique, requiring a bit of interpretation.
This also generates a temporary list (unlike the explicit loop approach) and is not especially readable, but at least only performs a partial traversal when a dupe does exist and makes it easy to get the index or the group of elements to the left of the duplicate.

An interesting code for obtaining unique values from a list

Say given a list s = [2,2,2,3,3,3,4,4,4]
I saw the following code being used to obtain unique values from s:
unique_s = sorted(unique(s))
where unique is defined as:
def unique(seq):
# not order preserving
set = {}
map(set.__setitem__, seq, [])
return set.keys()
I'm just curious to know if there is any difference between this and just doing list(set(s))? Both results in a mutable object with the same values.
I'm guessing this code is faster since it's only looping once rather than twice in the case of type conversion?
You should use the code you describe:
list(set(s))
This works on all Pythons from 2.4 (I think) to 3.3, is concise, and uses built-ins in an easy to understand way.
The function unique appears to be designed to work if set is not a built-in, which is true for Python 2.3. Python 2.3 is fairly ancient (2003). The unique function is also broken for the Python 3.x series, since dict.keys returns an iterator for Python 3.x.
For a sorted sequence you could use itertools unique_justseen() recipe to get unique values while preserving order:
from itertools import groupby
from operator import itemgetter
print map(itemgetter(0), groupby([2,2,2,3,3,3,4,4,4]))
# -> [2, 3, 4]
To remove duplicate items from a sorted sequence inplace (to leave only unique values):
def del_dups(sorted_seq):
prev = object()
pos = 0
for item in sorted_seq:
if item != prev:
prev = item
sorted_seq[pos] = item
pos += 1
del sorted_seq[pos:]
L = [2,2,2,3,3,3,4,4,4]
del_dups(L)
print L # -> [2, 3, 4]

What is the proper way of deleting item from a dictionary in "for" loop (Python)?

I want to iterate a dictionary, examine the value and delete items that matches certain values.
Example
d = {1, 1, 2, 1, 4, 5}
for i in d:
if i == 1:
del i
But we know doing this is dangerous since the list is updated while it is been iterated. What is clean way of doing this in Python?
If k is your dictionary, you can do
k = {x:v for x,v in k.iteritems() if x != 1}
For 2.7+ and 3.0
For anything older, you can do
k = dict((x,v) for x,v in k.iteritems() if x!=1)
I would recommend you to use builtin filter function since it is more clear what will be done with a list. Also filter works with any iterable: a sequence, a container or an iterator.
Here is an example how to work with the function
In [2]: d = [1,1,2,1,4,5]
In [3]: filter(lambda x: (x!=1), d)
Out[3]: [2, 4, 5]
Instead of lambda you can pass name of other function that will be used to filter data from list.
Documentation on filter is available at http://docs.python.org/library/functions.html#filter
Also note, that filter is well-known in world of functional programming and utilizes powerful concepts so you can find it in many other languages with even different paradigm.
for j in range(len(d)-1,-1,-1):
if d[j] == 1:
del d[j]
While I would normally just make a new dictionary (see Noufal Ibrahim's answer), you can safely modify the existing dictionary in place by getting a list of the keys and iterating through that:
for k in d.keys():
if d[k] == 1:
del d[k]

what is a quick way to delete all elements from a list that do not satisfy a constraint?

I have a list of strings. I have a function that given a string returns 0 or 1. How can I delete all strings in the list for which the function returns 0?
[x for x in lst if fn(x) != 0]
This is a "list comprehension", one of Python's nicest pieces of syntactical sugar that often takes lines of code in other languages and additional variable declarations, etc.
See:
http://docs.python.org/tutorial/datastructures.html#list-comprehensions
I would use a generator expression over a list comprehension to avoid a potentially large, intermediate list.
result = (x for x in l if f(x))
# print it, or something
print list(result)
Like a list comprehension, this will not modify your original list, in place.
edit: see the bottom for the best answer.
If you need to mutate an existing list, for example because you have another reference to it somewhere else, you'll need to actually remove the values from the list.
I'm not aware of any such function in Python, but something like this would work (untested code):
def cull_list(lst, pred):
"""Removes all values from ``lst`` which for which ``pred(v)`` is false."""
def remove_all(v):
"""Remove all instances of ``v`` from ``lst``"""
try:
while True:
lst.remove(v)
except ValueError:
pass
values = set(lst)
for v in values:
if not pred(v):
remove_all(v)
A probably more-efficient alternative that may look a bit too much like C code for some people's taste:
def efficient_cull_list(lst, pred):
end = len(lst)
i = 0
while i < end:
if not pred(lst[i]):
del lst[i]
end -= 1
else:
i += 1
edit...: as Aaron pointed out in the comments, this can be done much more cleanly with something like
def reversed_cull_list(lst, pred):
for i in range(len(lst) - 1, -1, -1):
if not pred(lst[i]):
del lst[i]
...edit
The trick with these routines is that using a function like enumerate, as suggested by (an) other responder(s), will not take into account the fact that elements of the list have been removed. The only way (that I know of) to do that is to just track the index manually instead of allowing python to do the iteration. There's bound to be a speed compromise there, so it may end up being better just to do something like
lst[:] = (v for v in lst if pred(v))
Actually, now that I think of it, this is by far the most sensible way to do an 'in-place' filter on a list. The generator's values are iterated before filling lst's elements with them, so there are no index conflict issues. If you want to make this more explicit just do
lst[:] = [v for v in lst if pred(v)]
I don't think it will make much difference in this case, in terms of efficiency.
Either of these last two approaches will, if I understand correctly how they actually work, make an extra copy of the list, so one of the bona fide in-place solutions mentioned above would be better if you're dealing with some "huge tracts of land."
>>> s = [1, 2, 3, 4, 5, 6]
>>> def f(x):
... if x<=2: return 0
... else: return 1
>>> for n,x in enumerate(s):
... if f(x) == 0: s[n]=None
>>> s=filter(None,s)
>>> s
[3, 4, 5, 6]
With a generator expression:
alist[:] = (item for item in alist if afunction(item))
Functional:
alist[:] = filter(afunction, alist)
or:
import itertools
alist[:] = itertools.ifilter(afunction, alist)
All equivalent.
You can also use a list comprehension:
alist = [item for item in alist if afunction(item)]
An in-place modification:
import collections
indexes_to_delete= collections.deque(
idx
for idx, item in enumerate(alist)
if afunction(item))
while indexes_to_delete:
del alist[indexes_to_delete.pop()]

Categories

Resources