Related
I've created a function to combine specific items in a python list, but I suspect there is a better way I can't find despite extreme googling. I need the code to be fast, as I'm going to be doing this thousands of times.
mergeleft takes a list of items and a list of indices. In the example below, I call it as mergeleft(fields,(2,4,5)). Items 5, 4, and 2 of list fields will be concatenated to the item immediately to the left. In this case, 3 and d get concatenated to c; b gets concatenated to a. The result is a list ('ab', 'cd3', 'f').
fields = ['a','b','c','d', 3,'f']
def mergeleft(x, fieldnums):
if 1 in fieldnums: raise Exception('Cannot merge field 1 left')
if max(fieldnums) > len(x): raise IndexError('Fieldnum {} exceeds available fields {}'.format(max(fieldnums),len(x)))
y = []
deleted_rows = ''
for i,l in enumerate(reversed(x)):
if (len(x) - i) in fieldnums:
deleted_rows = str(l) + deleted_rows
else:
y.append(str(l)+deleted_rows)
deleted_rows = ''
y.reverse()
return y
print(mergeleft(fields,(2,4,5)))
# Returns ['ab','cd3','f']
fields = ['a','b','c','d', 3,'f']
This assumes a list of indices in monotonic ascending order.
I reverse the order, so that I'm merging right-to-left.
For each given index, I merge that element into the one on the left, converting to string at each point.
Do note that I've changed the fieldnums type to list, so that it's easily reversible. You can also just traverse the tuple in reverse order.
def mergeleft(lst, fieldnums):
fieldnums.reverse()
for pos in fieldnums:
# Merge this field left
lst[pos-2] = str(lst[pos-2]) + str(lst[pos-1])
lst = lst[:pos-1] + lst[pos:]
return lst
print(mergeleft(fields,[2,4,5]))
Output:
['ab', 'cd3', 'f']
Here's a decently concise solution, probably among many.
def mergeleft(x, fieldnums):
if 1 in fieldnums: raise Exception('Cannot merge field 1 left')
if max(fieldnums) > len(x): raise IndexError('Fieldnum {} exceeds available fields {}'.format(max(fieldnums),len(x)))
ret = list(x)
for i in reversed(sorted(set(fieldnums))):
ret[i-1] = str(ret[i-1]) + str(ret.pop(i))
return ret
This is a follow up to a similar question which asked the best way to write
for item in somelist:
if determine(item):
code_to_remove_item
and it seems the consensus was on something like
somelist[:] = [x for x in somelist if not determine(x)]
However, I think if you are only removing a few items, most of the items are being copied into the same object, and perhaps that is slow. In an answer to another related question, someone suggests:
for item in reversed(somelist):
if determine(item):
somelist.remove(item)
However, here the list.remove will search for the item, which is O(N) in the length of the list. May be we are limited in that the list is represented as an array, rather than a linked list, so removing items will need to move everything after it. However, it is suggested here that collections.dequeue is represented as a doubly linked list. It should then be possible to remove in O(1) while iterating. How would we actually accomplish this?
Update:
I did some time testing as well, with the following code:
import timeit
setup = """
import random
random.seed(1)
b = [(random.random(),random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
return (x[1]>.45) and (x[1]<.5)
"""
listcomp = """
c[:] = [x for x in b if tokeep(x)]
"""
filt = """
c = filter(tokeep, b)
"""
print "list comp = ", timeit.timeit(listcomp,setup, number = 10000)
print "filtering = ", timeit.timeit(filt,setup, number = 10000)
and got:
list comp = 4.01255393028
filtering = 3.59962391853
The list comprehension is the asymptotically optimal solution:
somelist = [x for x in somelist if not determine(x)]
It only makes one pass over the list, so runs in O(n) time. Since you need to call determine() on each object, any algorithm will require at least O(n) operations. The list comprehension does have to do some copying, but it's only copying references to the objects not copying the objects themselves.
Removing items from a list in Python is O(n), so anything with a remove, pop, or del inside the loop will be O(n**2).
Also, in CPython list comprehensions are faster than for loops.
If you need to remove item in O(1) you can use HashMaps
Since list.remove is equivalent to del list[list.index(x)], you could do:
for idx, item in enumerate(somelist):
if determine(item):
del somelist[idx]
But: you should not modify the list while iterating over it. It will bite you, sooner or later. Use filter or list comprehension first, and optimise later.
A deque is optimized for head and tail removal, not for arbitrary removal in the middle. The removal itself is fast, but you still have to traverse the list to the removal point. If you're iterating through the entire length, then the only difference between filtering a deque and filtering a list (using filter or a comprehension) is the overhead of copying, which at worst is a constant multiple; it's still a O(n) operation. Also, note that the objects in the list aren't being copied -- just the references to them. So it's not that much overhead.
It's possible that you could avoid copying like so, but I have no particular reason to believe this is faster than a straightforward list comprehension -- it's probably not:
write_i = 0
for read_i in range(len(L)):
L[write_i] = L[read_i]
if L[read_i] not in ['a', 'c']:
write_i += 1
del L[write_i:]
I took a stab at this. My solution is slower, but requires less memory overhead (i.e. doesn't create a new array). It might even be faster in some circumstances!
This code has been edited since its first posting
I had problems with timeit, I might be doing this wrong.
import timeit
setup = """
import random
random.seed(1)
global b
setup_b = [(random.random(), random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
return (x[1]>.45) and (x[1]<.5)
# define and call to turn into psyco bytecode (if using psyco)
b = setup_b[:]
def listcomp():
c[:] = [x for x in b if tokeep(x)]
listcomp()
b = setup_b[:]
def filt():
c = filter(tokeep, b)
filt()
b = setup_b[:]
def forfilt():
marked = (i for i, x in enumerate(b) if tokeep(x))
shift = 0
for n in marked:
del b[n - shift]
shift += 1
forfilt()
b = setup_b[:]
def forfiltCheating():
marked = (i for i, x in enumerate(b) if (x[1] > .45) and (x[1] < .5))
shift = 0
for n in marked:
del b[n - shift]
shift += 1
forfiltCheating()
"""
listcomp = """
b = setup_b[:]
listcomp()
"""
filt = """
b = setup_b[:]
filt()
"""
forfilt = """
b = setup_b[:]
forfilt()
"""
forfiltCheating = '''
b = setup_b[:]
forfiltCheating()
'''
psycosetup = '''
import psyco
psyco.full()
'''
print "list comp = ", timeit.timeit(listcomp, setup, number = 10000)
print "filtering = ", timeit.timeit(filt, setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, setup, number = 10000)
print '\nnow with psyco \n'
print "list comp = ", timeit.timeit(listcomp, psycosetup + setup, number = 10000)
print "filtering = ", timeit.timeit(filt, psycosetup + setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, psycosetup + setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, psycosetup + setup, number = 10000)
And here are the results
list comp = 6.56407690048
filtering = 5.64738512039
forfilter = 7.31555104256
forfiltCheating = 4.8994679451
now with psyco
list comp = 8.0485959053
filtering = 7.79016900063
forfilter = 9.00477004051
forfiltCheating = 4.90830993652
I must be doing something wrong with psyco, because it is actually running slower.
elements are not copied by list comprehension
this took me a while to figure out. See the example code below, to experiment yourself with different approaches
code
You can specify how long a list element takes to copy and how long it takes to evaluate. The time to copy is irrelevant for list comprehension, as it turned out.
import time
import timeit
import numpy as np
def ObjectFactory(time_eval, time_copy):
"""
Creates a class
Parameters
----------
time_eval : float
time to evaluate (True or False, i.e. keep in list or not) an object
time_copy : float
time to (shallow-) copy an object. Used by list comprehension.
Returns
-------
New class with defined copy-evaluate performance
"""
class Object:
def __init__(self, id_, keep):
self.id_ = id_
self._keep = keep
def __repr__(self):
return f"Object({self.id_}, {self.keep})"
#property
def keep(self):
time.sleep(time_eval)
return self._keep
def __copy__(self): # list comprehension does not copy the object
time.sleep(time_copy)
return self.__class__(self.id_, self._keep)
return Object
def remove_items_from_list_list_comprehension(lst):
return [el for el in lst if el.keep]
def remove_items_from_list_new_list(lst):
new_list = []
for el in lst:
if el.keep:
new_list += [el]
return new_list
def remove_items_from_list_new_list_by_ind(lst):
new_list_inds = []
for ee in range(len(lst)):
if lst[ee].keep:
new_list_inds += [ee]
return [lst[ee] for ee in new_list_inds]
def remove_items_from_list_del_elements(lst):
"""WARNING: Modifies lst"""
new_list_inds = []
for ee in range(len(lst)):
if lst[ee].keep:
new_list_inds += [ee]
for ind in new_list_inds[::-1]:
if not lst[ind].keep:
del lst[ind]
if __name__ == "__main__":
ClassSlowCopy = ObjectFactory(time_eval=0, time_copy=0.1)
ClassSlowEval = ObjectFactory(time_eval=1e-8, time_copy=0)
keep_ratio = .8
n_runs_timeit = int(1e2)
n_elements_list = int(1e2)
lsts_to_tests = dict(
list_slow_copy_remove_many = [ClassSlowCopy(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
list_slow_copy_keep_many = [ClassSlowCopy(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
list_slow_eval_remove_many = [ClassSlowEval(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
list_slow_eval_keep_many = [ClassSlowEval(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
)
for lbl, lst in lsts_to_tests.items():
print()
for fct in [
remove_items_from_list_list_comprehension,
remove_items_from_list_new_list,
remove_items_from_list_new_list_by_ind,
remove_items_from_list_del_elements,
]:
lst_loc = lst.copy()
t = timeit.timeit(lambda: fct(lst_loc), number=n_runs_timeit)
print(f"{fct.__name__}, {lbl}: {t=}")
output
remove_items_from_list_list_comprehension, list_slow_copy_remove_many: t=0.0064229519994114526
remove_items_from_list_new_list, list_slow_copy_remove_many: t=0.006507338999654166
remove_items_from_list_new_list_by_ind, list_slow_copy_remove_many: t=0.006562008995388169
remove_items_from_list_del_elements, list_slow_copy_remove_many: t=0.0076057760015828535
remove_items_from_list_list_comprehension, list_slow_copy_keep_many: t=0.006243691001145635
remove_items_from_list_new_list, list_slow_copy_keep_many: t=0.007145451003452763
remove_items_from_list_new_list_by_ind, list_slow_copy_keep_many: t=0.007032064997474663
remove_items_from_list_del_elements, list_slow_copy_keep_many: t=0.007690364996960852
remove_items_from_list_list_comprehension, list_slow_eval_remove_many: t=1.2495998149970546
remove_items_from_list_new_list, list_slow_eval_remove_many: t=1.1657221479981672
remove_items_from_list_new_list_by_ind, list_slow_eval_remove_many: t=1.2621939050004585
remove_items_from_list_del_elements, list_slow_eval_remove_many: t=1.4632593330024974
remove_items_from_list_list_comprehension, list_slow_eval_keep_many: t=1.1344162709938246
remove_items_from_list_new_list, list_slow_eval_keep_many: t=1.1323430630000075
remove_items_from_list_new_list_by_ind, list_slow_eval_keep_many: t=1.1354237199993804
remove_items_from_list_del_elements, list_slow_eval_keep_many: t=1.3084568729973398
import collections
list1=collections.deque(list1)
for i in list2:
try:
list1.remove(i)
except:
pass
INSTEAD OF CHECKING IF ELEMENT IS THERE. USING TRY EXCEPT.
I GUESS THIS FASTER
What would be an efficient and pythonic way to check list monotonicity? i.e. that it has monotonically increasing or decreasing values?
Examples:
[0, 1, 2, 3, 3, 4] # This is a monotonically increasing list
[4.3, 4.2, 4.2, -2] # This is a monotonically decreasing list
[2, 3, 1] # This is neither
It's better to avoid ambiguous terms like "increasing" or "decreasing" as it's not clear if equality is acceptable or not. You should always use either for example "non-increasing" (clearly equality is accepted) or "strictly decreasing" (clearly equality is NOT accepted).
def strictly_increasing(L):
return all(x<y for x, y in zip(L, L[1:]))
def strictly_decreasing(L):
return all(x>y for x, y in zip(L, L[1:]))
def non_increasing(L):
return all(x>=y for x, y in zip(L, L[1:]))
def non_decreasing(L):
return all(x<=y for x, y in zip(L, L[1:]))
def monotonic(L):
return non_increasing(L) or non_decreasing(L)
If you have large lists of numbers it might be best to use numpy, and if you are:
import numpy as np
def monotonic(x):
dx = np.diff(x)
return np.all(dx <= 0) or np.all(dx >= 0)
should do the trick.
import itertools
import operator
def monotone_increasing(lst):
pairs = zip(lst, lst[1:])
return all(itertools.starmap(operator.le, pairs))
def monotone_decreasing(lst):
pairs = zip(lst, lst[1:])
return all(itertools.starmap(operator.ge, pairs))
def monotone(lst):
return monotone_increasing(lst) or monotone_decreasing(lst)
This approach is O(N) in the length of the list.
#6502 has an elegant code for sequences (iterables with __getitem__ and __len__ methods) and #chqrlie has an even better code which does not create temporary copies of sequences with slicing. I just want to add a general version that works for all iterables (objects with an __iter__ method):
def pairwise(iterable):
items = iter(iterable)
last = next(items)
for item in items:
yield last, item
last = item
def strictly_increasing(iterable):
return all(x<y for x, y in pairwise(iterable))
def strictly_decreasing(iterable):
return all(x>y for x, y in pairwise(iterable))
def non_increasing(iterable):
return all(x>=y for x, y in pairwise(iterable))
def non_decreasing(iterable):
return all(x<=y for x, y in pairwise(iterable))
def monotonic(iterable):
return non_increasing(iterable) or non_decreasing(iterable)
The pandas package makes this convenient.
import pandas as pd
The following commands work with a list of integers or floats.
Monotonically increasing (≥):
pd.Series(mylist).is_monotonic_increasing
Strictly monotonically increasing (>):
myseries = pd.Series(mylist)
myseries.is_unique and myseries.is_monotonic_increasing
Alternative using an undocumented private method:
pd.Index(mylist)._is_strictly_monotonic_increasing
Monotonically decreasing (≤):
pd.Series(mylist).is_monotonic_decreasing
Strictly monotonically decreasing (<):
myseries = pd.Series(mylist)
myseries.is_unique and myseries.is_monotonic_decreasing
Alternative using an undocumented private method:
pd.Index(mylist)._is_strictly_monotonic_decreasing
#6502 has elegant python code for this. Here is an alternative solution with simpler iterators and no potentially expensive temporary slices:
def strictly_increasing(L):
return all(L[i] < L[i+1] for i in range(len(L)-1))
def strictly_decreasing(L):
return all(L[i] > L[i+1] for i in range(len(L)-1))
def non_increasing(L):
return all(L[i] >= L[i+1] for i in range(len(L)-1))
def non_decreasing(L):
return all(L[i] <= L[i+1] for i in range(len(L)-1))
def monotonic(L):
return non_increasing(L) or non_decreasing(L)
import operator, itertools
def is_monotone(lst):
op = operator.le # pick 'op' based upon trend between
if not op(lst[0], lst[-1]): # first and last element in the 'lst'
op = operator.ge
return all(op(x,y) for x, y in itertools.izip(lst, lst[1:]))
Here is a functional solution using reduce of complexity O(n):
is_increasing = lambda L: reduce(lambda a,b: b if a < b else 9999 , L)!=9999
is_decreasing = lambda L: reduce(lambda a,b: b if a > b else -9999 , L)!=-9999
Replace 9999 with the top limit of your values, and -9999 with the bottom limit. For example, if you are testing a list of digits, you can use 10 and -1.
I tested its performance against #6502's answer and its faster.
Case True: [1,2,3,4,5,6,7,8,9]
# my solution ..
$ python -m timeit "inc = lambda L: reduce(lambda a,b: b if a < b else 9999 , L)!=9999; inc([1,2,3,4,5,6,7,8,9])"
1000000 loops, best of 3: 1.9 usec per loop
# while the other solution:
$ python -m timeit "inc = lambda L: all(x<y for x, y in zip(L, L[1:]));inc([1,2,3,4,5,6,7,8,9])"
100000 loops, best of 3: 2.77 usec per loop
Case False from the 2nd element: [4,2,3,4,5,6,7,8,7]:
# my solution ..
$ python -m timeit "inc = lambda L: reduce(lambda a,b: b if a < b else 9999 , L)!=9999; inc([4,2,3,4,5,6,7,8,7])"
1000000 loops, best of 3: 1.87 usec per loop
# while the other solution:
$ python -m timeit "inc = lambda L: all(x<y for x, y in zip(L, L[1:]));inc([4,2,3,4,5,6,7,8,7])"
100000 loops, best of 3: 2.15 usec per loop
I timed all of the answers in this question under different conditions, and found that:
Sorting was the fastest by a long shot IF the list was already monotonically increasing
Sorting was the slowest by a long shot IF the list was shuffled/random or if the number of elements out of order was greater than ~1. The more out of order the list of course corresponds to a slower result.
Michael J. Barbers method was the fastest IF the list was mostly monotonically increasing, or completely random.
Here is the code to try it out:
import timeit
setup = '''
import random
from itertools import izip, starmap, islice
import operator
def is_increasing_normal(lst):
for i in range(0, len(lst) - 1):
if lst[i] >= lst[i + 1]:
return False
return True
def is_increasing_zip(lst):
return all(x < y for x, y in izip(lst, islice(lst, 1, None)))
def is_increasing_sorted(lst):
return lst == sorted(lst)
def is_increasing_starmap(lst):
pairs = izip(lst, islice(lst, 1, None))
return all(starmap(operator.le, pairs))
if {list_method} in (1, 2):
lst = list(range({n}))
if {list_method} == 2:
for _ in range(int({n} * 0.0001)):
lst.insert(random.randrange(0, len(lst)), -random.randrange(1,100))
if {list_method} == 3:
lst = [int(1000*random.random()) for i in xrange({n})]
'''
n = 100000
iterations = 10000
list_method = 1
timeit.timeit('is_increasing_normal(lst)', setup=setup.format(n=n, list_method=list_method), number=iterations)
timeit.timeit('is_increasing_zip(lst)', setup=setup.format(n=n, list_method=list_method), number=iterations)
timeit.timeit('is_increasing_sorted(lst)', setup=setup.format(n=n, list_method=list_method), number=iterations)
timeit.timeit('is_increasing_starmap(lst)', setup=setup.format(n=n, list_method=list_method), number=iterations)
If the list was already monotonically increasing (list_method == 1), the fastest to slowest was:
sorted
starmap
normal
zip
If the list was mostly monotonically increasing (list_method == 2), the fastest to slowest was:
starmap
zip
normal
sorted
(Whether or not the starmap or zip was fastest depended on the execution and I couldn't identify a pattern. Starmap appeared to be usually faster)
If the list was completely random (list_method == 3), the fastest to slowest was:
starmap
zip
normal
sorted (extremely bad)
Here's a variant that accepts both materialized and non-materialized sequences. It automatically determines whether or not it's monotonic, and if so, its direction (i.e. increasing or decreasing) and strictness. Inline comments are provided to help the reader. Similarly for test-cases provided at the end.
def isMonotonic(seq):
"""
seq.............: - A Python sequence, materialized or not.
Returns.........:
(True,0,True): - Mono Const, Strict: Seq empty or 1-item.
(True,0,False): - Mono Const, Not-Strict: All 2+ Seq items same.
(True,+1,True): - Mono Incr, Strict.
(True,+1,False): - Mono Incr, Not-Strict.
(True,-1,True): - Mono Decr, Strict.
(True,-1,False): - Mono Decr, Not-Strict.
(False,None,None) - Not Monotonic.
"""
items = iter(seq) # Ensure iterator (i.e. that next(...) works).
prev_value = next(items, None) # Fetch 1st item, or None if empty.
if prev_value == None: return (True,0,True) # seq was empty.
# ============================================================
# The next for/loop scans until it finds first value-change.
# ============================================================
# Ex: [3,3,3,78,...] --or- [-5,-5,-5,-102,...]
# ============================================================
# -- If that 'change-value' represents an Increase or Decrease,
# then we know to look for Monotonically Increasing or
# Decreasing, respectively.
# -- If no value-change is found end-to-end (e.g. [3,3,3,...3]),
# then it's Monotonically Constant, Non-Strict.
# -- Finally, if the sequence was exhausted above, which means
# it had exactly one-element, then it Monotonically Constant,
# Strict.
# ============================================================
isSequenceExhausted = True
curr_value = prev_value
for item in items:
isSequenceExhausted = False # Tiny inefficiency.
if item == prev_value: continue
curr_value = item
break
else:
return (True,0,True) if isSequenceExhausted else (True,0,False)
# ============================================================
# ============================================================
# If we tricked down to here, then none of the above
# checked-cases applied (i.e. didn't short-circuit and
# 'return'); so we continue with the final step of
# iterating through the remaining sequence items to
# determine Monotonicity, direction and strictness.
# ============================================================
strict = True
if curr_value > prev_value: # Scan for Increasing Monotonicity.
for item in items:
if item < curr_value: return (False,None,None)
if item == curr_value: strict = False # Tiny inefficiency.
curr_value = item
return (True,+1,strict)
else: # Scan for Decreasing Monotonicity.
for item in items:
if item > curr_value: return (False,None,None)
if item == curr_value: strict = False # Tiny inefficiency.
curr_value = item
return (True,-1,strict)
# ============================================================
# Test cases ...
assert isMonotonic([1,2,3,4]) == (True,+1,True)
assert isMonotonic([4,3,2,1]) == (True,-1,True)
assert isMonotonic([-1,-2,-3,-4]) == (True,-1,True)
assert isMonotonic([]) == (True,0,True)
assert isMonotonic([20]) == (True,0,True)
assert isMonotonic([-20]) == (True,0,True)
assert isMonotonic([1,1]) == (True,0,False)
assert isMonotonic([1,-1]) == (True,-1,True)
assert isMonotonic([1,-1,-1]) == (True,-1,False)
assert isMonotonic([1,3,3]) == (True,+1,False)
assert isMonotonic([1,2,1]) == (False,None,None)
assert isMonotonic([0,0,0,0]) == (True,0,False)
I suppose this could be more Pythonic, but it's tricky because it avoids creating intermediate collections (e.g. list, genexps, etc); as well as employs a fall/trickle-through and short-circuit approach to filter through the various cases: E.g. Edge-sequences (like empty or one-item sequences; or sequences with all identical items); Identifying increasing or decreasing monotonicity, strictness, and so on. I hope it helps.
L = [1,2,3]
L == sorted(L)
L == sorted(L, reverse=True)
Here is an implementation that is both efficient (the space required is constant, no slicing performing a temporary shallow copy of the input) and general (any iterables are supported as input, not just sequences):
def is_weakly_increasing(iterable):
iterator = iter(iterable)
next(iterator)
return all(x <= y for x, y in zip(iterable, iterator))
def is_weakly_decreasing(iterable):
iterator = iter(iterable)
next(iterator)
return all(x >= y for x, y in zip(iterable, iterator))
def is_weakly_monotonic(iterable):
return is_weakly_increasing(iterable) or is_weakly_decreasing(iterable)
def is_strictly_increasing(iterable):
iterator = iter(iterable)
next(iterator)
return all(x < y for x, y in zip(iterable, iterator))
def is_stricly_decreasing(iterable):
iterator = iter(iterable)
next(iterator)
return all(x > y for x, y in zip(iterable, iterator))
def is_strictly_monotonic(iterable):
return is_strictly_increasing(iterable) or is_strictly_decreasing(iterable)
def IsMonotonic(data):
''' Returns true if data is monotonic.'''
data = np.array(data)
# Greater-Equal
if (data[-1] > data[0]):
return np.all(data[1:] >= data[:-1])
# Less-Equal
else:
return np.all(data[1:] <= data[:-1])
My proposition (with numpy) as a summary of few ideas here. Uses
casting to np.array for creation of bool values for each lists comparision,
np.all for checking if all results are True
checking diffrence between first and last element for choosing comparison operator,
using direct comparison >=, <= instead of calculatin np.diff,
Here are two ways of determining if a list if monotonically increasing or decreasing using just range or list comprehensions. Using range is slightly more efficient because it can short-circuit, whereas the list comprehension must iterate over the entire list. Enjoy.
a = [1,2,3,4,5]
b = [0,1,6,1,0]
c = [9,8,7,6,5]
def monotonic_increase(x):
if len(x) <= 1: return False
for i in range(1, len(x)):
if x[i-1] >= x[i]:
return False
return True
def monotonic_decrease(x):
if len(x) <= 1: return False
for i in range(1, len(x)):
if x[i-1] <= x[i]:
return False
return True
monotonic_increase = lambda x: len(x) > 1 and all(x[i-1] < x[i] for i in range(1, len(x)))
monotonic_decrease = lambda x: len(x) > 1 and all(x[i-1] > x[i] for i in range(1, len(x)))
print(monotonic_increase(a))
print(monotonic_decrease(c))
print(monotonic_decrease([]))
print(monotonic_increase(c))
print(monotonic_decrease(a))
print(monotonic_increase(b))
print(monotonic_decrease(b))
def solution1(a):
up, down = True, True
for i in range(1, len(a)):
if a[i] < a[i-1]: up = False
if a[i] > a[i-1]: down = False
return up or down
def solution2(a):
return a == sorted(a[::-1])
Solution1 is O(n) and the shorter solution2 is O(n log n)
summary: Yes, setting a single variable from another variable is thread safe. This provides a high-speed way to transfer a single value bewteen tasks.
this link has a clear explanation:Grok the GIL: How to write fast and thread-safe Python
My simplified explanation:
All computers, way down at the machine code, provide some (not all) instructions that are non-interruptable. These are reffered to as "atomic".
For example, storing a 16 bit-constant into a 16-bit memory location. way-back the 8-bit computers were a mix of features and implimentations. For example, the 8-bit processors 6502, 6800 both had non-interruptable (atomic) instructions to move 16-bits from one memory location to another. The Z80 did not.
The PDP-11 machine code had an atomic increment/decrement instructions (var++; v--) which is how K&R came to add that to the early C compilers. At least from what I remember.
Why does this matter?
Today a lot of computer programming languages are written to a custom-designed "virtual machine". These virtual machines, like all computer processors, support a mix of features and implimentations.
Python's virtual machine is written with it's custom GIL ("global interpreter lock")
what this means is that some (but not all) python's operations are "thread-safe" in that some (but not all) execute in a single "GIL" cycle.
There are some surprises. For example "sort" is atomic and therefor thread-safe. Since this is a relatively long process, that's a surprise (read the link for a much better explanation)
Python has a function that will display the "virtual machine codes" a function is "compiled" into:
## ref: https://opensource.com/article/17/4/grok-gil
import dis
vara= 33
varb= 55
def foo():
global vara
vara= 33
varb= 55
varb= vara
return
dis.dis(foo)
which gives the output:
415 0 LOAD_CONST 1 (33)
2 STORE_GLOBAL 0 (vara)
416 4 LOAD_CONST 2 (55)
6 STORE_FAST 0 (varb)
417 8 LOAD_GLOBAL 0 (vara)
10 STORE_FAST 0 (varb)
418 12 LOAD_CONST 0 (None)
14 RETURN_VALUE
Care must be taken when using this (well, all multithreaded programs should be carefully designed and coded) since ONLY A SINGLE value is safe! The good news is that this SINGLE VALUE can point to an entire data structure. Of course the entired data structure isn't atomic so any changes to it should be wrapped in mutlithread "stalls" (another name for semaphores, locks, etc- since they basically "stall" the code, until the semaphores/locks are released by other threads. I read this somewhere from one of the co-developers of multithreading...in hindsight he thought "stalls" were a better, more discription term. Perhaps dijkstra??)
>>> seq = [0, 1, 2, 3, 3, 4]
>>> seq == sorted(seq) or seq == sorted(seq, reverse=True)
lst = [1,2,3,4,1]
I want to know 1 occurs twice in this list, is there any efficient way to do?
lst.count(1) would return the number of times it occurs. If you're going to be counting items in a list, O(n) is what you're going to get.
The general function on the list is list.count(x), and will return the number of times x occurs in a list.
Are you asking whether every item in the list is unique?
len(set(lst)) == len(lst)
Whether 1 occurs more than once?
lst.count(1) > 1
Note that the above is not maximally efficient, because it won't short-circuit -- even if 1 occurs twice, it will still count the rest of the occurrences. If you want it to short-circuit you will have to write something a little more complicated.
Whether the first element occurs more than once?
lst[0] in lst[1:]
How often each element occurs?
import collections
collections.Counter(lst)
Something else?
For multiple occurrences, this give you the index of each occurence:
>>> lst=[1,2,3,4,5,1]
>>> tgt=1
>>> found=[]
>>> for index, suspect in enumerate(lst):
... if(tgt==suspect):
... found.append(index)
...
>>> print len(found), "found at index:",", ".join(map(str,found))
2 found at index: 0, 5
If you want the count of each item in the list:
>>> lst=[1,2,3,4,5,2,2,1,5,5,5,5,6]
>>> count={}
>>> for item in lst:
... count[item]=lst.count(item)
...
>>> count
{1: 2, 2: 3, 3: 1, 4: 1, 5: 5, 6: 1}
def valCount(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
u = [ x for x,y in valCount(lst).iteritems() if y > 1 ]
u is now a list of all values which appear more than once.
Edit:
#katrielalex: thank you for pointing out collections.Counter, of which I was not previously aware. It can also be written more concisely using a collections.defaultdict, as demonstrated in the following tests. All three methods are roughly O(n) and reasonably close in run-time performance (using collections.defaultdict is in fact slightly faster than collections.Counter).
My intention was to give an easy-to-understand response to what seemed a relatively unsophisticated request. Given that, are there any other senses in which you consider it "bad code" or "done poorly"?
import collections
import random
import time
def test1(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
def test2(lst):
res = collections.defaultdict(lambda: 0)
for v in lst:
res[v] += 1
return res
def test3(lst):
return collections.Counter(lst)
def rndLst(lstLen):
r = random.randint
return [r(0,lstLen) for i in xrange(lstLen)]
def timeFn(fn, *args):
st = time.clock()
res = fn(*args)
return time.clock() - st
def main():
reps = 5000
res = []
tests = [test1, test2, test3]
for t in xrange(reps):
lstLen = random.randint(10,50000)
lst = rndLst(lstLen)
res.append( [lstLen] + [timeFn(fn, lst) for fn in tests] )
res.sort()
return res
And the results, for random lists containing up to 50,000 items, are as follows:
(Vertical axis is time in seconds, horizontal axis is number of items in list)
Another way to get all items that occur more than once:
lst = [1,2,3,4,1]
d = {}
for x in lst:
d[x] = x in d
print d[1] # True
print d[2] # False
print [x for x in d if d[x]] # [1]
You could also sort the list which is O(n*log(n)), then check the adjacent elements for equality, which is O(n). The result is O(n*log(n)). This has the disadvantage of requiring the entire list be sorted before possibly bailing when a duplicate is found.
For a large list with a relatively rare duplicates, this could be the about the best you can do. The best way to approach this really does depend on the size of the data involved and its nature.
What is an efficient way to find the most common element in a Python list?
My list items may not be hashable so can't use a dictionary.
Also in case of draws the item with the lowest index should be returned. Example:
>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'
A simpler one-liner:
def most_common(lst):
return max(set(lst), key=lst.count)
Borrowing from here, this can be used with Python 2.7:
from collections import Counter
def Most_Common(lst):
data = Counter(lst)
return data.most_common(1)[0][0]
Works around 4-6 times faster than Alex's solutions, and is 50 times faster than the one-liner proposed by newacct.
On CPython 3.6+ (any Python 3.7+) the above will select the first seen element in case of ties. If you're running on older Python, to retrieve the element that occurs first in the list in case of ties you need to do two passes to preserve order:
# Only needed pre-3.6!
def most_common(lst):
data = Counter(lst)
return max(lst, key=data.get)
With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:
print most_common(['goose', 'duck', 'duck', 'goose'])
emits:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
What you want is known in statistics as mode, and Python of course has a built-in function to do exactly that for you:
>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3
Note that if there is no "most common element" such as cases where the top two are tied, this will raise StatisticsError on Python
<=3.7, and on 3.8 onwards it will return the first one encountered.
Without the requirement about the lowest index, you can use collections.Counter for this:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'
If they are not hashable, you can sort them and do a single loop over the result counting the items (identical items will be next to each other). But it might be faster to make them hashable and use a dict.
def most_common(lst):
cur_length = 0
max_length = 0
cur_i = 0
max_i = 0
cur_item = None
max_item = None
for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
if cur_item is None or cur_item != item:
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
max_length = cur_length
max_i = cur_i
max_item = cur_item
cur_length = 1
cur_i = i
cur_item = item
else:
cur_length += 1
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
return cur_item
return max_item
This is an O(n) solution.
mydict = {}
cnt, itm = 0, ''
for item in reversed(lst):
mydict[item] = mydict.get(item, 0) + 1
if mydict[item] >= cnt :
cnt, itm = mydict[item], item
print itm
(reversed is used to make sure that it returns the lowest index item)
Sort a copy of the list and find the longest run. You can decorate the list before sorting it with the index of each element, and then choose the run that starts with the lowest index in the case of a tie.
A one-liner:
def most_common (lst):
return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]
I am doing this using scipy stat module and lambda:
import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))
Result:
most_freq_val = 5
# use Decorate, Sort, Undecorate to solve the problem
def most_common(iterable):
# Make a list with tuples: (item, index)
# The index will be used later to break ties for most common item.
lst = [(x, i) for i, x in enumerate(iterable)]
lst.sort()
# lst_final will also be a list of tuples: (count, index, item)
# Sorting on this list will find us the most common item, and the index
# will break ties so the one listed first wins. Count is negative so
# largest count will have lowest value and sort first.
lst_final = []
# Get an iterator for our new list...
itr = iter(lst)
# ...and pop the first tuple off. Setup current state vars for loop.
count = 1
tup = next(itr)
x_cur, i_cur = tup
# Loop over sorted list of tuples, counting occurrences of item.
for tup in itr:
# Same item again?
if x_cur == tup[0]:
# Yes, same item; increment count
count += 1
else:
# No, new item, so write previous current item to lst_final...
t = (-count, i_cur, x_cur)
lst_final.append(t)
# ...and reset current state vars for loop.
x_cur, i_cur = tup
count = 1
# Write final item after loop ends
t = (-count, i_cur, x_cur)
lst_final.append(t)
lst_final.sort()
answer = lst_final[0][2]
return answer
print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e'
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'
Building on Luiz's answer, but satisfying the "in case of draws the item with the lowest index should be returned" condition:
from statistics import mode, StatisticsError
def most_common(l):
try:
return mode(l)
except StatisticsError as e:
# will only return the first element if no unique mode found
if 'no unique mode' in e.args[0]:
return l[0]
# this is for "StatisticsError: no mode for empty data"
# after calling mode([])
raise
Example:
>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data
Simple one line solution
moc= max([(lst.count(chr),chr) for chr in set(lst)])
It will return most frequent element with its frequency.
You probably don't need this anymore, but this is what I did for a similar problem. (It looks longer than it is because of the comments.)
itemList = ['hi', 'hi', 'hello', 'bye']
counter = {}
maxItemCount = 0
for item in itemList:
try:
# Referencing this will cause a KeyError exception
# if it doesn't already exist
counter[item]
# ... meaning if we get this far it didn't happen so
# we'll increment
counter[item] += 1
except KeyError:
# If we got a KeyError we need to create the
# dictionary key
counter[item] = 1
# Keep overwriting maxItemCount with the latest number,
# if it's higher than the existing itemCount
if counter[item] > maxItemCount:
maxItemCount = counter[item]
mostPopularItem = item
print mostPopularItem
ans = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())
4
print(all_ans[max_key])
1
#This will return the list sorted by frequency:
def orderByFrequency(list):
listUniqueValues = np.unique(list)
listQty = []
listOrderedByFrequency = []
for i in range(len(listUniqueValues)):
listQty.append(list.count(listUniqueValues[i]))
for i in range(len(listQty)):
index_bigger = np.argmax(listQty)
for j in range(listQty[index_bigger]):
listOrderedByFrequency.append(listUniqueValues[index_bigger])
listQty[index_bigger] = -1
return listOrderedByFrequency
#And this will return a list with the most frequent values in a list:
def getMostFrequentValues(list):
if (len(list) <= 1):
return list
list_most_frequent = []
list_ordered_by_frequency = orderByFrequency(list)
list_most_frequent.append(list_ordered_by_frequency[0])
frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
index = 0
while(index < len(list_ordered_by_frequency)):
index = index + frequency
if(index < len(list_ordered_by_frequency)):
testValue = list_ordered_by_frequency[index]
testValueFrequency = list_ordered_by_frequency.count(testValue)
if (testValueFrequency == frequency):
list_most_frequent.append(testValue)
else:
break
return list_most_frequent
#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))
Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]
Here:
def most_common(l):
max = 0
maxitem = None
for x in set(l):
count = l.count(x)
if count > max:
max = count
maxitem = x
return maxitem
I have a vague feeling there is a method somewhere in the standard library that will give you the count of each element, but I can't find it.
This is the obvious slow solution (O(n^2)) if neither sorting nor hashing is feasible, but equality comparison (==) is available:
def most_common(items):
if not items:
raise ValueError
fitems = []
best_idx = 0
for item in items:
item_missing = True
i = 0
for fitem in fitems:
if fitem[0] == item:
fitem[1] += 1
d = fitem[1] - fitems[best_idx][1]
if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
best_idx = i
item_missing = False
break
i += 1
if item_missing:
fitems.append([item, 1, i])
return items[best_idx]
But making your items hashable or sortable (as recommended by other answers) would almost always make finding the most common element faster if the length of your list (n) is large. O(n) on average with hashing, and O(n*log(n)) at worst for sorting.
>>> li = ['goose', 'duck', 'duck']
>>> def foo(li):
st = set(li)
mx = -1
for each in st:
temp = li.count(each):
if mx < temp:
mx = temp
h = each
return h
>>> foo(li)
'duck'
I needed to do this in a recent program. I'll admit it, I couldn't understand Alex's answer, so this is what I ended up with.
def mostPopular(l):
mpEl=None
mpIndex=0
mpCount=0
curEl=None
curCount=0
for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
curCount=curCount+1 if el==curEl else 1
curEl=el
if curCount>mpCount \
or (curCount==mpCount and i<mpIndex):
mpEl=curEl
mpIndex=i
mpCount=curCount
return mpEl, mpCount, mpIndex
I timed it against Alex's solution and it's about 10-15% faster for short lists, but once you go over 100 elements or more (tested up to 200000) it's about 20% slower.
def most_frequent(List):
counter = 0
num = List[0]
for i in List:
curr_frequency = List.count(i)
if(curr_frequency> counter):
counter = curr_frequency
num = i
return num
List = [2, 1, 2, 2, 1, 3]
print(most_frequent(List))
Hi this is a very simple solution, with linear time complexity
L = ['goose', 'duck', 'duck']
def most_common(L):
current_winner = 0
max_repeated = None
for i in L:
amount_times = L.count(i)
if amount_times > current_winner:
current_winner = amount_times
max_repeated = i
return max_repeated
print(most_common(L))
"duck"
Where number, is the element in the list that repeats most of the time
numbers = [1, 3, 7, 4, 3, 0, 3, 6, 3]
max_repeat_num = max(numbers, key=numbers.count) *# which number most* frequently
max_repeat = numbers.count(max_repeat_num) *#how many times*
print(f" the number {max_repeat_num} is repeated{max_repeat} times")
def mostCommonElement(list):
count = {} // dict holder
max = 0 // keep track of the count by key
result = None // holder when count is greater than max
for i in list:
if i not in count:
count[i] = 1
else:
count[i] += 1
if count[i] > max:
max = count[i]
result = i
return result
mostCommonElement(["a","b","a","c"]) -> "a"
The most common element should be the one which is appearing more than N/2 times in the array where N being the len(array). The below technique will do it in O(n) time complexity, with just consuming O(1) auxiliary space.
from collections import Counter
def majorityElement(arr):
majority_elem = Counter(arr)
size = len(arr)
for key, val in majority_elem.items():
if val > size/2:
return key
return -1
def most_common(lst):
if max([lst.count(i)for i in lst]) == 1:
return False
else:
return max(set(lst), key=lst.count)
def popular(L):
C={}
for a in L:
C[a]=L.count(a)
for b in C.keys():
if C[b]==max(C.values()):
return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)