How to replace a list comprehension with a numpy command? - python

Is there a way to replace the following python list comprehension with a numpy function that doesn't work with loops?
a = np.array([0, 1, 1, 1, 0, 3])
bins = np.bincount(a)
>>> bins: [2 3 0 1]
a_counts = [bins[val] for val in y_true]
>>> a_counts: [2, 3, 3, 3, 2, 1]
So the basic idea is to generate an array where the actual values are replaced by the number of occurrences of that specific value in the array.
I want to do this calculation in a custom keras loss function which, to my knowledge, doesn't work with loops or list comprehensions.

You just need to index the result from np.bincount with a:
a = np.array([0, 1, 1, 1, 0, 3])
bins = np.bincount(a)
a_counts = bins[a]
print(a_counts)
# array([2, 3, 3, 3, 2, 1], dtype=int64)

Or use collections.Counter:
from collections import Counter
l = [0, 1, 1, 1, 0, 3]
print(Counter(l))
Which Outputs:
Counter({1: 3, 0: 2, 3: 1})

If you want to avoid loops, you may use pandas library:
import pandas as pd
import numpy as np
a = np.array([0, 1, 1, 1, 0, 3])
a_counts = pd.value_counts(a)[a].values
>>> a_counts: array([2, 3, 3, 3, 2, 1], dtype=int64)

Related

partial cumulative sum in python

Suppose I have a numpy array (or pandas Series if it makes it any easier), which looks like this:
foo = np.array([1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0])
I want to transform into an array
bar = np.array([0, 1, 2, 3, 4,0, 1, 2, 0, 1, 2, 3])
where the entry is how many steps you need to walk to the left to find a 1 in foo.
Now, obviously one can write a loop to compute bar from foo, but this will be bog slow. Is there anything more clever one can do?
UPDATE The pd.Series solution is around 7 times slower than the pure numpy solution. The stupid loop solution is very slow (no surprise), but when jit compiled with numba is as fast as the numpy solution.
You could do cumcount with pandas
s = pd.Series(foo)
bar = s.groupby(s.cumsum()).cumcount().to_numpy()
Out[13]: array([0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2, 3], dtype=int64)
One option, specifically for the shared example, with numpy:
# get positions where value is 1
pos = foo.nonzero()[0]
# need this when computing the cumsum
values = np.diff(pos) - 1
arr = np.ones(foo.size, dtype=int)
arr[0] = 0
arr[pos[1:]] = -values
arr.cumsum()
array([0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2, 3])

Find runs and lengths of consecutive values in an array

I'd like to find equal values in an array and their indices if they occur consecutively more then 2 times.
[0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4]
so in this example I would find value "2" occured "4" times, starting from position "8". Is there any build in function to do that?
I found a way with collections.Counter
collections.Counter(a)
# Counter({0: 3, 1: 4, 3: 2, 5: 1, 4: 1})
but this is not what I am looking for.
Of course I can write a loop and compare two values and then count them, but may be there is a more elegant solution?
Find consecutive runs and length of runs with condition
import numpy as np
arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4])
res = np.ones_like(arr)
np.bitwise_xor(arr[:-1], arr[1:], out=res[1:]) # set equal, consecutive elements to 0
# use this for np.floats instead
# arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2.4, 2.4, 2.4, 2, 1, 3, 4, 4, 4, 5])
# res = np.hstack([True, ~np.isclose(arr[:-1], arr[1:])])
idxs = np.flatnonzero(res) # get indices of non zero elements
values = arr[idxs]
counts = np.diff(idxs, append=len(arr)) # difference between consecutive indices are the length
cond = counts > 2
values[cond], counts[cond], idxs[cond]
Output
(array([2]), array([4]), array([8]))
# (array([2.4, 4. ]), array([3, 3]), array([ 8, 14]))
_, i, c = np.unique(np.r_[[0], ~np.isclose(arr[:-1], arr[1:])].cumsum(),
return_index = 1,
return_counts = 1)
for index, count in zip(i, c):
if count > 1:
print([arr[index], count, index])
Out[]: [2, 4, 8]
A little more compact way of doing it that works for all input types.

Extracting values from an array using a logical array

I have the following MATLAB code which I would like to replicate using Python.
The MATLAB code creates a logical array for when xDiff == 2 and then uses that logical array to extract corresponding values from the tDiff array to create the resulting array, tTacho.
MATLAB code:
tTacho = tDiff(xDiff == 2)
You can do boolean indexing with NumPy.
For example:
import numpy as np
x_diff = np.array([0, 2, 2, 0, 0, 2])
t_diff = np.array([0, 1, 2, 3, 4, 5])
print(t_diff[x_diff == 2])
gives:
array([1, 2, 5])
If you don't want to use NumPy, then you can use list comprehensions with zip:
x_diff = [0, 2, 2, 0, 0, 2]
t_diff = [0, 1, 2, 3, 4, 5]
print([t for t, x in zip(t_diff, x_diff) if x == 2])
gives:
[1, 2, 5]
You can use list indexing also.
tDiff=[1,2,4,5,6,6,7]
xDiff=[2,3,2,2,2,2,2]
for x in range(0,len(xDiff)):
if xDiff[x]==2:
print tDiff[x]
If this helps.

python is there any way to update a list base on a list of pair of (index, values)

I am new to python, and I am having trouble with efficiently update a current vector
eg.
>>>idx_val_list = [[0, 1], [2, 3], [4, 5]]
or
>>>idx_list = [0, 2, 4]
>>>val_list = [1, 3, 5]
>>>vector = [0,0,0,0,0,0,0,0]
looking for an efficient way to achieve batch update something like
>>>vector.update(indexes=idx_list, values=val_list)
>>>vector
[1,0,3,0,5,0,0,0]
Is there any efficient way other than for loops to achieve this requirement?
I suggest using np.put instead of loops or list comprehension.
Try this;
>>> vector = np.array([0,0,0,0,0,0,0,0])
>>> idx_list = [0, 2, 4]
>>> val_list = [1, 3, 5]
>>> vector
array([0, 0, 0, 0, 0, 0, 0, 0])
>>> np.put(vector, idx_list, val_list)
>>> vector
array([1, 0, 3, 0, 5, 0, 0, 0])
for idx,val in zip(idx_list, val_list):
vector[idx] = val
In [149]: idx_list = [0, 2, 4]
...:
...: val_list = [1, 3, 5]
...:
...: vector = [0,0,0,0,0,0,0,0]
...:
In [150]: for i,j in zip(idx_list, val_list):
...: vector[i] = j
...:
In [151]: vector
Out[151]: [1, 0, 3, 0, 5, 0, 0, 0]
If you don't have repeating indices, #Memduh has it right using np.put
You can also just do v[idx_list] = val_list
If you want to accumulate the values in place, you can also use np.add.at
v = np.array(vector)
np.add.at(v, idx_list, val_list)
v
Out[]: array([1, 0, 3, 0, 5, 0, 0, 0])
If your initial vector is going to be very large, you may want to make it a scipy.sparse.lil_matrix as this allows directly assigning the values to their locations without assigning zeroes elsewhere.
Using numpy
import numpy
vector=numpy.array(vector)
vector[idx_list]=val_list
print vector

Getting the indexes to the duplicate columns of a numpy array [duplicate]

This question already has answers here:
Find unique columns and column membership
(3 answers)
Closed 8 years ago.
I have a numpy array with duplicate columns:
import numpy as np
A = np.array([[1, 1, 1, 0, 1, 1],
[1, 2, 2, 0, 1, 2],
[1, 3, 3, 0, 1, 3]])
I need to find the indexes to those duplicates or something like that:
[0, 4]
[1, 2, 5]
I have a hard time dealing with indexes in Python. I really don't know to approach it.
Thanks
I tried identifying the unique columns first with this function:
def unique_columns(data):
ind = np.lexsort(data)
return data.T[ind[np.concatenate(([True], any(data.T[ind[1:]]!=data.T[ind[:-1]], axis=1)))]].T
But I can't figure out the indexes from there.
There is not a simple way to do this unfortunately. Using a np.unique answer. This method requires that the axis you want to unique is contiguous in memory and numpy's typical memory layout is C contiguous or contiguous in rows. Fortunately numpy makes this conversion simple:
A = np.array([[1, 1, 1, 0, 1, 1],
[1, 2, 2, 0, 1, 2],
[1, 3, 3, 0, 1, 3]])
def unique_columns2(data):
dt = np.dtype((np.void, data.dtype.itemsize * data.shape[0]))
dataf = np.asfortranarray(data).view(dt)
u,uind = np.unique(dataf, return_inverse=True)
u = u.view(data.dtype).reshape(-1,data.shape[0]).T
return (u,uind)
Our result:
u,uind = unique_columns2(A)
u
array([[0, 1, 1],
[0, 1, 2],
[0, 1, 3]])
uind
array([1, 2, 2, 0, 1, 2])
I am not really sure what you want to do from here, for example you can do something like this:
>>> [np.where(uind==x)[0] for x in range(u.shape[0])]
[array([3]), array([0, 4]), array([1, 2, 5])]
Some timings:
tmp = np.random.randint(0,4,(30000,500))
#BiRico and OP's answer
%timeit unique_columns(tmp)
1 loops, best of 3: 2.91 s per loop
%timeit unique_columns2(tmp)
1 loops, best of 3: 208 ms per loop
Here is an outline of how to approach it. Use numpy.lexsort to sort the columns, that way all the duplicates will be grouped together. Once the duplicates are all together, you can easily tell which columns are duplicates and the indices that correspond with those columns.
Here's an implementation of the method described above.
import numpy as np
def duplicate_columns(data, minoccur=2):
ind = np.lexsort(data)
diff = np.any(data.T[ind[1:]] != data.T[ind[:-1]], axis=1)
edges = np.where(diff)[0] + 1
result = np.split(ind, edges)
result = [group for group in result if len(group) >= minoccur]
return result
A = np.array([[1, 1, 1, 0, 1, 1],
[1, 2, 2, 0, 1, 2],
[1, 3, 3, 0, 1, 3]])
print(duplicate_columns(A))
# [array([0, 4]), array([1, 2, 5])]

Categories

Resources