I have a (long) array a of a handful of different integers. I would now like to create a dictionary where the keys are the integers and the values are arrays of indices where in a the respective integer occurs. This
import numpy
a = numpy.array([1, 1, 5, 5, 1])
u = numpy.unique(a)
d = {val: numpy.where(a==val)[0] for val in u}
print(d)
{1: array([0, 1, 4]), 5: array([2, 3])}
works fine, but it seems rather wasteful to first call unique, followed by a couple of wheres.
np.digitize doesn't seem to be ideal either as you have to specify the bins in advance.
Any ideas of how to improve the above?
Approach #1
One approach based on sorting would be -
def group_into_dict(a):
# Get argsort indices
sidx = a.argsort()
# Use argsort indices to sort input array
sorted_a = a[sidx]
# Get indices that define the grouping boundaries based on identical elems
cut_idx = np.flatnonzero(np.r_[True,sorted_a[1:] != sorted_a[:-1],True])
# Form the final dict with slicing the argsort indices for values and
# the starts as the keys
return {sorted_a[i]:sidx[i:j] for i,j in zip(cut_idx[:-1], cut_idx[1:])}
Sample run -
In [55]: a
Out[55]: array([1, 1, 5, 5, 1])
In [56]: group_into_dict(a)
Out[56]: {1: array([0, 1, 4]), 5: array([2, 3])}
Timings on array with 1000000 elements and varying proportion of unique numbers to compare proposed one against the original one -
# 1/100 unique numbers
In [75]: a = np.random.randint(0,10000,(1000000))
In [76]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
1 loop, best of 3: 6.62 s per loop
In [77]: %timeit group_into_dict(a)
10 loops, best of 3: 121 ms per loop
# 1/1000 unique numbers
In [78]: a = np.random.randint(0,1000,(1000000))
In [79]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
1 loop, best of 3: 720 ms per loop
In [80]: %timeit group_into_dict(a)
10 loops, best of 3: 92.1 ms per loop
# 1/10000 unique numbers
In [81]: a = np.random.randint(0,100,(1000000))
In [82]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
10 loops, best of 3: 120 ms per loop
In [83]: %timeit group_into_dict(a)
10 loops, best of 3: 75 ms per loop
# 1/50000 unique numbers
In [84]: a = np.random.randint(0,20,(1000000))
In [85]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
10 loops, best of 3: 60.8 ms per loop
In [86]: %timeit group_into_dict(a)
10 loops, best of 3: 60.3 ms per loop
So, if you are dealing with just 20 or less unique numbers, stick to the original one read on; otherwise sorting based one seems to be working well.
Approach #2
Pandas based one suited for very few unique numbers -
In [142]: a
Out[142]: array([1, 1, 5, 5, 1])
In [143]: import pandas as pd
In [144]: {u:np.flatnonzero(a==u) for u in pd.Series(a).unique()}
Out[144]: {1: array([0, 1, 4]), 5: array([2, 3])}
Timings on array with 1000000 elements with 20 unique elements -
In [146]: a = np.random.randint(0,20,(1000000))
In [147]: %timeit {u:np.flatnonzero(a==u) for u in pd.Series(a).unique()}
10 loops, best of 3: 35.6 ms per loop
# Original solution
In [148]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
10 loops, best of 3: 58 ms per loop
and for fewer unique elements -
In [149]: a = np.random.randint(0,10,(1000000))
In [150]: %timeit {u:np.flatnonzero(a==u) for u in pd.Series(a).unique()}
10 loops, best of 3: 25.3 ms per loop
In [151]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
10 loops, best of 3: 44.9 ms per loop
In [152]: a = np.random.randint(0,5,(1000000))
In [153]: %timeit {u:np.flatnonzero(a==u) for u in pd.Series(a).unique()}
100 loops, best of 3: 17.9 ms per loop
In [154]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
10 loops, best of 3: 34.4 ms per loop
How does pandas help here for fewer elements?
With sorting based approach #1, for the case of 20 unique elements, getting the argsort indices was the bottleneck -
In [164]: a = np.random.randint(0,20,(1000000))
In [165]: %timeit a.argsort()
10 loops, best of 3: 51 ms per loop
Now, pandas based function gives us the unique elements be it negative numbers or anything, which we are simply comparing against the elements in the input array without the need for sorting. Let's see the improvement on that side :
In [166]: %timeit pd.Series(a).unique()
100 loops, best of 3: 3.17 ms per loop
Of course, then it needs to get np.flatnonzero indices, which still keeps it comparatively more efficient.
If it's really just a handful of distincts it may be worthwhile using argpartition instead of argsort. The downside is that this requires a compression step:
import numpy as np
def f_pp_1(a):
ws = np.empty(a.max()+1, int)
rng = np.arange(a.size)
ws[a] = rng
unq = rng[ws[a] == rng]
idx = np.argsort(a[unq])
unq = unq[idx]
ws[a[unq]] = np.arange(len(unq))
compressed = ws[a]
counts = np.cumsum(np.bincount(compressed))
return dict(zip(a[unq], np.split(np.argpartition(a, counts[:-1]), counts[:-1])))
If the uniques are small we can save the sompresssion step:
def f_pp_2(a):
bc = np.bincount(a)
keys, = np.where(bc)
counts = np.cumsum(bc[keys])
return dict(zip(keys, np.split(np.argpartition(a, counts[:-1]), counts[:-1])))
data = np.random.randint(0, 10, (5,))[np.random.randint(0, 5, (10000000,))]
sol = f_pp_1(data)
for k, v in sol.items():
assert np.all(k == data[v])
For small numbers of distincts where is competitive if we can avoid unique:
def f_OP_plus(a):
ws = np.empty(a.max()+1, int)
rng = np.arange(a.size)
ws[a] = rng
unq = rng[ws[a] == rng]
idx = np.argsort(a[unq])
unq = unq[idx]
return {val: np.where(a==val)[0] for val in unq}
Here are my timings (best of 3, 10 per block) using the same test arrays as #Divakar (randint(0, nd, (ns,)) -- nd, ns = number of distincts, number of samples):
nd, ns: 5, 1000000
OP 39.88609421 ms
OP_plus 13.04150990 ms
Divakar_1 44.14700069 ms
Divakar_2 21.64940450 ms
pp_1 33.15216140 ms
pp_2 22.43267260 ms
nd, ns: 10, 1000000
OP 52.33878891 ms
OP_plus 17.14743648 ms
Divakar_1 57.76002519 ms
Divakar_2 30.70066951 ms
pp_1 45.33982391 ms
pp_2 34.71166079 ms
nd, ns: 20, 1000000
OP 67.47841339 ms
OP_plus 26.41335099 ms
Divakar_1 71.37646740 ms
Divakar_2 43.09316459 ms
pp_1 57.16468811 ms
pp_2 45.55416510 ms
nd, ns: 50, 1000000
OP 98.91191521 ms
OP_plus 51.15756912 ms
Divakar_1 72.72288438 ms
Divakar_2 70.31920571 ms
pp_1 63.78925461 ms
pp_2 53.00321991 ms
nd, ns: 100, 1000000
OP 148.17743159 ms
OP_plus 92.62091429 ms
Divakar_1 85.02774101 ms
Divakar_2 116.78823209 ms
pp_1 77.01576019 ms
pp_2 66.70976470 ms
And if we don't use the first nd integers for uniques but instead draw them randomly from between 0 and 10000:
nd, ns: 5, 1000000
OP 40.11689581 ms
OP_plus 12.99256920 ms
Divakar_1 42.13181480 ms
Divakar_2 21.55767360 ms
pp_1 33.21835019 ms
pp_2 23.46851982 ms
nd, ns: 10, 1000000
OP 52.84317869 ms
OP_plus 17.96655210 ms
Divakar_1 57.74175161 ms
Divakar_2 32.31985010 ms
pp_1 44.79893579 ms
pp_2 33.42640731 ms
nd, ns: 20, 1000000
OP 66.46886449 ms
OP_plus 25.78120639 ms
Divakar_1 66.58960858 ms
Divakar_2 42.47685110 ms
pp_1 53.67698781 ms
pp_2 44.53037870 ms
nd, ns: 50, 1000000
OP 98.95576960 ms
OP_plus 50.79147881 ms
Divakar_1 72.44545210 ms
Divakar_2 70.91441818 ms
pp_1 64.19071071 ms
pp_2 53.36350428 ms
nd, ns: 100, 1000000
OP 145.62422500 ms
OP_plus 90.82918381 ms
Divakar_1 76.92769479 ms
Divakar_2 115.24481240 ms
pp_1 70.85122908 ms
pp_2 58.85340699 ms
With ns,nd = number of samples, number of distincts, Your solution is O(ns*nd) so inefficient.
A simple O(ns) approach with defaultdict :
from collections import defaultdict
d=defaultdict(list)
for i,x in enumerate(a):d[x].append(i)
Unfortunately slow because python loops are slow, but faster than yours if nd/ns>1%.
Another O(ns) linear solution is possible if nd/ns<<1 (optimized here with numba) :
#numba.njit
def filldict_(a):
v=a.max()+1
cnts= np.zeros(v,np.int64)
for x in a:
cnts[x]+=1
g=cnts.max()
res=np.empty((v,g),np.int64)
cnts[:]=0
i=0
for x in a:
res[x,cnts[x]]=i
cnts[x]+=1
i+=1
return res,cnts,v
def filldict(a):
res,cnts,v=filldict_(a)
return {x:res[x,:cnts[x]] for x in range(v) if cnts[x]>0}
Faster on random numbers with little keys. runs :
In [51]: a=numpy.random.randint(0,100,1000000)
In [52]: %timeit d=group_into_dict(a) #Divakar
134 ms ± 17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [53]: %timeit c=filldict(a)
11.2 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
A lookup table mechanism can be added if keys are big integers, with little overload.
pandas solution 1: Use groupby and its indices function
df = pd.DataFrame(a)
d = df.groupby(0).indices
a = np.random.randint(0,10000,(1000000))
%%timeit
df = pd.DataFrame(a)
d = df.groupby(0).indices
42.6 ms ± 2.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
a = np.random.randint(0,100,(1000000))
%%timeit
df = pd.DataFrame(a)
d = df.groupby(0).indices
22.3 ms ± 5.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
pandas solution 2: Use groupby only (if you already know the keys or can get keys fast using other methods)
a = np.random.randint(0,100,(1000000))
%%timeit
df = pd.DataFrame(a)
d = df.groupby(0)
206 µs ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
groupby itself is really fast but it won't give you keys. If you already know the keys, you can then just the groupby objects as your dictionary. Usage:
d.get_group(key).index # index part is what you need!
Disadvantage: d.get_group(key) itself will cost non-trivial time.
%timeit d.get_group(10).index
496 µs ± 56.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
so it depends on your application and whether you know the keys to decide whether to go with this approach.
If all your values are positive, you may use np.nonzero(np.bincount(a))[0] to get the keys at a reasonable speed. (1.57 ms ± 78.2 µs for a = np.random.randint(0,1000,(1000000)))
Related
Say I have an array of distances x=[1,2,1,3,3,2,1,5,1,1].
I want to get the indices from x where cumsum reaches 10, in this case, idx=[4,9].
So the cumsum restarts after the condition are met.
I can do it with a loop, but loops are slow for large arrays and I was wondering if I could do it in a vectorized way.
A fun method
sumlm = np.frompyfunc(lambda a,b:a+b if a < 10 else b,2,1)
newx=sumlm.accumulate(x, dtype=np.object)
newx
array([1, 3, 4, 7, 10, 2, 3, 8, 9, 10], dtype=object)
np.nonzero(newx==10)
(array([4, 9]),)
Here's one with numba and array-initialization -
from numba import njit
#njit
def cumsum_breach_numba2(x, target, result):
total = 0
iterID = 0
for i,x_i in enumerate(x):
total += x_i
if total >= target:
result[iterID] = i
iterID += 1
total = 0
return iterID
def cumsum_breach_array_init(x, target):
x = np.asarray(x)
result = np.empty(len(x),dtype=np.uint64)
idx = cumsum_breach_numba2(x, target, result)
return result[:idx]
Timings
Including #piRSquared's solutions and using the benchmarking setup from the same post -
In [58]: np.random.seed([3, 1415])
...: x = np.random.randint(100, size=1000000).tolist()
# #piRSquared soln1
In [59]: %timeit list(cumsum_breach(x, 10))
10 loops, best of 3: 73.2 ms per loop
# #piRSquared soln2
In [60]: %timeit cumsum_breach_numba(np.asarray(x), 10)
10 loops, best of 3: 69.2 ms per loop
# From this post
In [61]: %timeit cumsum_breach_array_init(x, 10)
10 loops, best of 3: 39.1 ms per loop
Numba : Appending vs. array-initialization
For a closer look at how the array-initialization helps, which seems be the big difference between the two numba implementations, let's time these on the array data, as the array data creation was in itself heavy on runtime and they both depend on it -
In [62]: x = np.array(x)
In [63]: %timeit cumsum_breach_numba(x, 10)# with appending
10 loops, best of 3: 31.5 ms per loop
In [64]: %timeit cumsum_breach_array_init(x, 10)
1000 loops, best of 3: 1.8 ms per loop
To force the output to have it own memory space, we can make a copy. Won't change the things in a big way though -
In [65]: %timeit cumsum_breach_array_init(x, 10).copy()
100 loops, best of 3: 2.67 ms per loop
Loops are not always bad (especially when you need one). Also, There is no tool or algorithm that will make this quicker than O(n). So let's just make a good loop.
Generator Function
def cumsum_breach(x, target):
total = 0
for i, y in enumerate(x):
total += y
if total >= target:
yield i
total = 0
list(cumsum_breach(x, 10))
[4, 9]
Just In Time compiling with Numba
Numba is a third party library that needs to be installed.
Numba can be persnickety about what features are supported. But this works.
Also, as pointed out by Divakar, Numba performs better with arrays
from numba import njit
#njit
def cumsum_breach_numba(x, target):
total = 0
result = []
for i, y in enumerate(x):
total += y
if total >= target:
result.append(i)
total = 0
return result
cumsum_breach_numba(x, 10)
Testing the Two
Because I felt like it ¯\_(ツ)_/¯
Setup
np.random.seed([3, 1415])
x0 = np.random.randint(100, size=1_000_000)
x1 = x0.tolist()
Accuracy
i0 = cumsum_breach_numba(x0, 200_000)
i1 = list(cumsum_breach(x1, 200_000))
assert i0 == i1
Time
%timeit cumsum_breach_numba(x0, 200_000)
%timeit list(cumsum_breach(x1, 200_000))
582 µs ± 40.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
64.3 ms ± 5.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Numba was on the order of 100 times faster.
For a more true apples to apples test, I convert a list to a Numpy array
%timeit cumsum_breach_numba(np.array(x1), 200_000)
%timeit list(cumsum_breach(x1, 200_000))
43.1 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
62.8 ms ± 327 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Which brings them to about even.
I have an N-by-M array, at each entry of whom, I need to do some NumPy operations and put the result there.
Right now, I'm doing it the naive way with a double loop:
import numpy as np
N = 10
M = 11
K = 100
result = np.zeros((N, M))
is_relevant = np.random.rand(N, M, K) > 0.5
weight = np.random.rand(3, 3, K)
values1 = np.random.rand(3, 3, K)
values2 = np.random.rand(3, 3, K)
for i in range(N):
for j in range(M):
selector = is_relevant[i, j, :]
result[i, j] = np.sum(
np.multiply(
np.multiply(
values1[..., selector],
values2[..., selector]
), weight[..., selector]
)
)
Since all the in-loop operations are simply NumPy operations, I think there must be a way to do this faster or loop-free.
We can use a combination of np.einsum and np.tensordot -
a = np.einsum('ijk,ijk,ijk->k',values1,values2,weight)
out = np.tensordot(a,is_relevant,axes=(0,2))
Alternatively, with one einsum call -
np.einsum('ijk,ijk,ijk,lmk->lm',values1,values2,weight,is_relevant)
And with np.dot and einsum -
is_relevant.dot(np.einsum('ijk,ijk,ijk->k',values1,values2,weight))
Also, play around with the optimize flag in np.einsum by setting it as True to use BLAS.
Timings -
In [146]: %%timeit
...: a = np.einsum('ijk,ijk,ijk->k',values1,values2,weight)
...: out = np.tensordot(a,is_relevant,axes=(0,2))
10000 loops, best of 3: 121 µs per loop
In [147]: %timeit np.einsum('ijk,ijk,ijk,lmk->lm',values1,values2,weight,is_relevant)
1000 loops, best of 3: 851 µs per loop
In [148]: %timeit np.einsum('ijk,ijk,ijk,lmk->lm',values1,values2,weight,is_relevant,optimize=True)
1000 loops, best of 3: 347 µs per loop
In [156]: %timeit is_relevant.dot(np.einsum('ijk,ijk,ijk->k',values1,values2,weight))
10000 loops, best of 3: 58.6 µs per loop
Very large arrays
For very large arrays, we can leverage numexpr to make use of multi-cores -
import numexpr as ne
a = np.einsum('ijk,ijk,ijk->k',values1,values2,weight)
out = np.empty((N, M))
for i in range(N):
for j in range(M):
out[i,j] = ne.evaluate('sum(is_relevant_ij*a)',{'is_relevant_ij':is_relevant[i,j], 'a':a})
Another very simple option is just:
result = (values1 * values2 * weight * is_relevant[:, :, np.newaxis, np.newaxis]).sum((2, 3, 4))
Divakar's last solution is faster than this though. Timings for comparison:
%timeit np.tensordot(np.einsum('ijk,ijk,ijk->k',values1,values2,weight),is_relevant,axes=(0,2))
# 30.9 µs ± 1.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.einsum('ijk,ijk,ijk,lmk->lm',values1,values2,weight,is_relevant)
# 379 µs ± 486 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.einsum('ijk,ijk,ijk,lmk->lm',values1,values2,weight,is_relevant,optimize=True)
# 145 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit is_relevant.dot(np.einsum('ijk,ijk,ijk->k',values1,values2,weight))
# 15 µs ± 124 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit (values1 * values2 * weight * is_relevant[:, :, np.newaxis, np.newaxis]).sum((2, 3, 4))
# 152 µs ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I have an array like:
[10 20 30 40]
And I want to build a matrix M1 like this:
10 0 0 0
20 10 0 0
30 20 10 0
40 30 20 10
My approach is to first build the following matrix M2 out of consecutive "rolls" of the array:
10 20 30 40
20 10 40 30
30 20 10 40
40 30 20 10
And then take the lower triangular matrix with np.tril. I would be interested then in efficient methods to build M2 or M1 directly without through M2.
A simple way to build M2 could be:
import numpy as np
def M2_simple(a):
a = np.asarray(a)
return np.stack([a[np.arange(-i, len(a) - i)] for i in range(len(a))]).T
print(M2_simple(np.array([10, 20, 30, 40])))
# [[10 40 30 20]
# [20 10 40 30]
# [30 20 10 40]
# [40 30 20 10]]
After some trying I came to the following, better solution, based on advanced indexing:
def M2_indexing(a):
a = np.asarray(a)
r = np.arange(len(a))[np.newaxis]
return a[np.newaxis][np.zeros_like(r), r - r.T].T
This is obviously much faster than the previous, but measuring the performance still seems not as fast as it could be (for example, it takes order of magnitude longer than tiling, not being a so different operation), and it requires me to build big indexing matrices.
Is there a better way to build these matrices?
EDIT:
Actually, you can build M1 directly using the same method:
import numpy as np
def M1_strided(a):
a = np.asarray(a)
n = len(a)
s, = a.strides
a0 = np.concatenate([np.zeros(len(a) - 1, a.dtype), a])
return np.lib.stride_tricks.as_strided(
a0, (n, n), (s, s), writeable=False)[:, ::-1]
print(M1_strided(np.array([10, 20, 30, 40])))
# [[10 0 0 0]
# [20 10 0 0]
# [30 20 10 0]
# [40 30 20 10]]
In this case the speed benefit is even better, since you are saving the call to np.tril:
N = 100
a = np.square(np.arange(N))
%timeit np.tril(M2_simple(a))
# 792 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.tril(M2_indexing(a))
# 259 µs ± 9.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.tril(M2_strided(a))
# 134 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit M1_strided(a)
# 45.2 µs ± 583 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
You can build the M2 matrix more efficiently with np.lib.stride_tricks.as_strided:
import numpy as np
from numpy.lib.stride_tricks import as_strided
def M2_strided(a):
a = np.asarray(a)
n = len(a)
s, = a.strides
return np.lib.stride_tricks.as_strided(
np.tile(a[::-1], 2), (n, n), (s, s), writeable=False)[::-1]
As an extra benefit, you will only use twice as much memory as the original array (as opposed to the squared size). You just need to be careful not to write to the array created like this (which should not be a problem if you are going to call np.tril later on in) - I added writeable=False to disallow writing operations.
A quick speed comparison with IPython:
N = 100
a = np.square(np.arange(N))
%timeit M2_simple(a)
# 693 µs ± 17.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit M2_indexing(a)
# 163 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit M2_strided(a)
# 38.3 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Another one using as_strided similar to #jdehesa's solution , but with negative strides that saves us the flipping at the end, like so -
def strided_app2(a):
n = len(a)
ae = np.concatenate((np.zeros(n-1,dtype=a.dtype),a))
s = a.strides[0]
return np.lib.stride_tricks.as_strided(ae[n-1:],(n,n),(s,-s),writeable=False)
Sample run -
In [66]: a
Out[66]: array([10, 20, 30, 40])
In [67]: strided_app2(a)
Out[67]:
array([[10, 0, 0, 0],
[20, 10, 0, 0],
[30, 20, 10, 0],
[40, 30, 20, 10]])
Digging further
Going deeper into the timings for each step, it's revealed that the bottleneck is the concatenation part. So, we can employ array-initialization, giving us an alternative one and seems to be much better for large arrays, like so -
def strided_app3(a):
n = len(a)
ae = np.zeros(2*n-1,dtype=a.dtype)
ae[-n:] = a
s = a.strides[0]
return np.lib.stride_tricks.as_strided(ae[n-1:],(n,n),(s,-s),writeable=False)
Timings -
In [55]: a = np.random.rand(100000)
In [56]: %timeit M1_strided(a) ##jdehesa's soln
...: %timeit strided_app2(a)
...: %timeit strided_app3(a)
10000 loops, best of 3: 107 µs per loop
10000 loops, best of 3: 94.5 µs per loop
10000 loops, best of 3: 84.4 µs per loop
In [61]: a = np.random.rand(1000000)
In [62]: %timeit M1_strided(a) ##jdehesa's soln
...: %timeit strided_app2(a)
...: %timeit strided_app3(a)
100 loops, best of 3: 2.02 ms per loop
100 loops, best of 3: 2 ms per loop
1000 loops, best of 3: 1.84 ms per loop
In [63]: a = np.random.rand(10000000)
In [64]: %timeit M1_strided(a) ##jdehesa's soln
...: %timeit strided_app2(a)
...: %timeit strided_app3(a)
10 loops, best of 3: 25.2 ms per loop
10 loops, best of 3: 24.6 ms per loop
100 loops, best of 3: 13.9 ms per loop
Actually, there is a builtin for that:
>>> import scipy.linalg as sl
>>> sl.toeplitz([10,20,30,40], [0,0,0,0])
array([[10, 0, 0, 0],
[20, 10, 0, 0],
[30, 20, 10, 0],
[40, 30, 20, 10]])
I know this is a very basic question but for some reason I can't find an answer. How can I get the index of certain element of a Series in python pandas? (first occurrence would suffice)
I.e., I'd like something like:
import pandas as pd
myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])
print myseries.find(7) # should output 3
Certainly, it is possible to define such a method with a loop:
def find(s, el):
for i in s.index:
if s[i] == el:
return i
return None
print find(myseries, 7)
but I assume there should be a better way. Is there?
>>> myseries[myseries == 7]
3 7
dtype: int64
>>> myseries[myseries == 7].index[0]
3
Though I admit that there should be a better way to do that, but this at least avoids iterating and looping through the object and moves it to the C level.
Converting to an Index, you can use get_loc
In [1]: myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])
In [3]: Index(myseries).get_loc(7)
Out[3]: 3
In [4]: Index(myseries).get_loc(10)
KeyError: 10
Duplicate handling
In [5]: Index([1,1,2,2,3,4]).get_loc(2)
Out[5]: slice(2, 4, None)
Will return a boolean array if non-contiguous returns
In [6]: Index([1,1,2,1,3,2,4]).get_loc(2)
Out[6]: array([False, False, True, False, False, True, False], dtype=bool)
Uses a hashtable internally, so fast
In [7]: s = Series(randint(0,10,10000))
In [9]: %timeit s[s == 5]
1000 loops, best of 3: 203 µs per loop
In [12]: i = Index(s)
In [13]: %timeit i.get_loc(5)
1000 loops, best of 3: 226 µs per loop
As Viktor points out, there is a one-time creation overhead to creating an index (its incurred when you actually DO something with the index, e.g. the is_unique)
In [2]: s = Series(randint(0,10,10000))
In [3]: %timeit Index(s)
100000 loops, best of 3: 9.6 µs per loop
In [4]: %timeit Index(s).is_unique
10000 loops, best of 3: 140 µs per loop
I'm impressed with all the answers here. This is not a new answer, just an attempt to summarize the timings of all these methods. I considered the case of a series with 25 elements and assumed the general case where the index could contain any values and you want the index value corresponding to the search value which is towards the end of the series.
Here are the speed tests on a 2012 Mac Mini in Python 3.9.10 with Pandas version 1.4.0.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: data = [406400, 203200, 101600, 76100, 50800, 25400, 19050, 12700, 950
...: 0, 6700, 4750, 3350, 2360, 1700, 1180, 850, 600, 425, 300, 212, 150, 1
...: 06, 75, 53, 38]
In [4]: myseries = pd.Series(data, index=range(1,26))
In [5]: assert(myseries[21] == 150)
In [6]: %timeit myseries[myseries == 150].index[0]
179 µs ± 891 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [7]: %timeit myseries[myseries == 150].first_valid_index()
205 µs ± 3.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [8]: %timeit myseries.where(myseries == 150).first_valid_index()
597 µs ± 4.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [9]: %timeit myseries.index[np.where(myseries == 150)[0][0]]
110 µs ± 872 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [10]: %timeit pd.Series(myseries.index, index=myseries)[150]
125 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [11]: %timeit myseries.index[pd.Index(myseries).get_loc(150)]
49.5 µs ± 814 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [12]: %timeit myseries.index[list(myseries).index(150)]
7.75 µs ± 36.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %timeit myseries.index[myseries.tolist().index(150)]
2.55 µs ± 27.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit dict(zip(myseries.values, myseries.index))[150]
9.89 µs ± 79.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [15]: %timeit {v: k for k, v in myseries.items()}[150]
9.99 µs ± 67 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#Jeff's answer seems to be the fastest - although it doesn't handle duplicates.
Correction: Sorry, I missed one, #Alex Spangher's solution using the list index method is by far the fastest.
Update: Added #EliadL's answer.
Hope this helps.
Amazing that such a simple operation requires such convoluted solutions and many are so slow. Over half a millisecond in some cases to find a value in a series of 25.
2022-02-18 Update
Updated all the timings with the latest Pandas version and Python 3.9. Even on an older computer, all the timings have significantly reduced (10 to 70%) compared to the previous tests (version 0.25.3).
Plus: Added two more methods utilizing dictionaries.
In [92]: (myseries==7).argmax()
Out[92]: 3
This works if you know 7 is there in advance. You can check this with
(myseries==7).any()
Another approach (very similar to the first answer) that also accounts for multiple 7's (or none) is
In [122]: myseries = pd.Series([1,7,0,7,5], index=['a','b','c','d','e'])
In [123]: list(myseries[myseries==7].index)
Out[123]: ['b', 'd']
Another way to do this, although equally unsatisfying is:
s = pd.Series([1,3,0,7,5],index=[0,1,2,3,4])
list(s).index(7)
returns:
3
On time tests using a current dataset I'm working with (consider it random):
[64]: %timeit pd.Index(article_reference_df.asset_id).get_loc('100000003003614')
10000 loops, best of 3: 60.1 µs per loop
In [66]: %timeit article_reference_df.asset_id[article_reference_df.asset_id == '100000003003614'].index[0]
1000 loops, best of 3: 255 µs per loop
In [65]: %timeit list(article_reference_df.asset_id).index('100000003003614')
100000 loops, best of 3: 14.5 µs per loop
If you use numpy, you can get an array of the indecies that your value is found:
import numpy as np
import pandas as pd
myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])
np.where(myseries == 7)
This returns a one element tuple containing an array of the indecies where 7 is the value in myseries:
(array([3], dtype=int64),)
you can use Series.idxmax()
>>> import pandas as pd
>>> myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])
>>> myseries.idxmax()
3
>>>
This is the most native and scalable approach I could find:
>>> myindex = pd.Series(myseries.index, index=myseries)
>>> myindex[7]
3
>>> myindex[[7, 5, 7]]
7 3
5 4
7 3
dtype: int64
Another way to do it that hasn't been mentioned yet is the tolist method:
myseries.tolist().index(7)
should return the correct index, assuming the value exists in the Series.
Often your value occurs at multiple indices:
>>> myseries = pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1])
>>> myseries.index[myseries == 1]
Int64Index([3, 4, 5, 6, 10, 11], dtype='int64')
The Pandas has builtin class Index with a function called get_loc. This function will either return
index (element index)
slice (if the specified number is in sequence)
array (bool array if the number is at multiple indexes)
Example:
import pandas as pd
>>> mySer = pd.Series([1, 3, 8, 10, 13])
>>> pd.Index(mySer).get_loc(10) # Returns index
3 # Index of 10 in series
>>> mySer = pd.Series([1, 3, 8, 10, 10, 10, 13])
>>> pd.Index(mySer).get_loc(10) # Returns slice
slice(3, 6, None) # 10 occurs at index 3 (included) to 6 (not included)
# If the data is not in sequence then it would return an array of bool's.
>>> mySer = pd.Series([1, 10, 3, 8, 10, 10, 10, 13, 10])
>>> pd.Index(mySer).get_loc(10)
array([False, True, False, False, True, True, False, True])
There are many other options too but I found it very simple for me.
df.index method will help you to find the exact row number
my_fl2=(df['ConvertedCompYearly'] == 45241312 )
print (df[my_fl2].index)
Name: ConvertedCompYearly, dtype: float64
Int64Index([66910], dtype='int64')
I'm using the numpy.array() function to create numpy.float64 ndarrays from lists.
I noticed that this is very slow when either the list contains None or a list of lists is provided.
Below are some examples with times. There are obvious workarounds but why is this so slow?
Examples for list of None:
### Very slow to call array() with list of None
In [3]: %timeit numpy.array([None]*100000, dtype=numpy.float64)
1 loops, best of 3: 240 ms per loop
### Problem doesn't exist with array of zeroes
In [4]: %timeit numpy.array([0.0]*100000, dtype=numpy.float64)
100 loops, best of 3: 9.94 ms per loop
### Also fast if we use dtype=object and convert to float64
In [5]: %timeit numpy.array([None]*100000, dtype=numpy.object).astype(numpy.float64)
100 loops, best of 3: 4.92 ms per loop
### Also fast if we use fromiter() insead of array()
In [6]: %timeit numpy.fromiter([None]*100000, dtype=numpy.float64)
100 loops, best of 3: 3.29 ms per loop
Examples for list of lists:
### Very slow to create column matrix
In [7]: %timeit numpy.array([[0.0]]*100000, dtype=numpy.float64)
1 loops, best of 3: 353 ms per loop
### No problem to create column vector and reshape
In [8]: %timeit numpy.array([0.0]*100000, dtype=numpy.float64).reshape((-1,1))
100 loops, best of 3: 10 ms per loop
### Can use itertools to flatten input lists
In [9]: %timeit numpy.fromiter(itertools.chain.from_iterable([[0.0]]*100000),dtype=numpy.float64).reshape((-1,1))
100 loops, best of 3: 9.65 ms per loop
I've reported this as a numpy issue. The report and patch files are here:
https://github.com/numpy/numpy/issues/3392
After patching:
# was 240 ms, best alternate version was 3.29
In [5]: %timeit numpy.array([None]*100000)
100 loops, best of 3: 7.49 ms per loop
# was 353 ms, best alternate version was 9.65
In [6]: %timeit numpy.array([[0.0]]*100000)
10 loops, best of 3: 23.7 ms per loop
My guess would be that the code for converting lists just calls float on everything. If the argument defines __float__, we call that, otherwise we treat it like a string (throwing an exception on None, we catch that and puts in np.nan). The exception handling should be relatively slower.
Timing seems to verify this hypothesis:
import numpy as np
%timeit [None] * 100000
> 1000 loops, best of 3: 1.04 ms per loop
%timeit np.array([0.0] * 100000)
> 10 loops, best of 3: 21.3 ms per loop
%timeit [i.__float__() for i in [0.0] * 100000]
> 10 loops, best of 3: 32 ms per loop
def flt(d):
try:
return float(d)
except:
return np.nan
%timeit np.array([None] * 100000, dtype=np.float64)
> 1 loops, best of 3: 477 ms per loop
%timeit [flt(d) for d in [None] * 100000]
> 1 loops, best of 3: 328 ms per loop
Adding another case just to be obvious about where I'm going with this. If there was an explicit check for None, it would not be this slow above:
def flt2(d):
if d is None:
return np.nan
try:
return float(d)
except:
return np.nan
%timeit [flt2(d) for d in [None] * 100000]
> 10 loops, best of 3: 45 ms per loop