building a specific sequence with python numpy - python

Hi I have two arrays as below.
1) An array made up of 1 and 0. 1 Signifies that its an active day and 0 is a holiday.
2) An arithmetic sequence which is smaller in length compared to array 1.
The result array needs to be a combination of 1) & 2) wherein the arithmetic sequence needs to follow the positions of 1s. In other words the array 2 needs to expanded to array 1 length with 0s inserted in the same position as array 1.
One way i could solve this is using numpy.insert with slice. However, since the lengths are different and the array 1 is dynamic i need an efficient way to achieve this.
Thanks

An alternative one-liner solution
Setup
binary = np.array([1, 1, 0, 1, 0, 1, 1, 0, 1, 0])
arithmetic = np.arange(1, 7)
Solution
#find 1s from binary and set values from arithmetic
binary[np.where(binary>0)] = arithmetic
Out[400]: array([1, 2, 0, 3, 0, 4, 5, 0, 6, 0])

Create the result array of the right length (len(binary)) filled with 0s, then use the binary array as a mask to assign into the result array. Make sure the binary mask is of the bool dtype.
>>> binary = np.array([1, 1, 0, 1, 0, 1, 1, 0, 1, 0], dtype=bool)
>>> arithmetic = np.arange(1, 7)
>>> result = np.zeros(len(binary), dtype=arithmetic.dtype)
>>> result[binary] = arithmetic
>>> result
array([1, 2, 0, 3, 0, 4, 5, 0, 6, 0])

Another way would be to create a copy of binary array and replace all non-zero values by arithmatic array
seq = np.arange(1, len(bi_arr[bi_arr != 0])+1)
final_result = bi_arr.copy()
final_result[final_result != 0] = seq
print(final_result)
array([1, 2, 0, 3, 0, 4, 5, 0, 6, 0])
The bi-arr remains unchanged

Related

Python make each matrix in a list to same size

Assume multiple matrices with different amount of rows. They are contained in a list. How can I fill the rows of the smaller matrices, so that they have the same size as the biggest matrix ?
list_of_matrices = []
list_of_matrices.append(np.array([[3,3],[4,4]]))
list_of_matrices.append(np.array([[1,1,3],[2,2,5]]))
list_of_matrices.append(np.array([[1,1,3,7],[2,2,5,9]]))
From list_of_matrices I want to create a 3D numpy array of e.g. shape 3x4x2 where the missing values (because the first to 2D matrices are too small) are filled with a scalar value (more specific the mean of each matrix around axis 1). I want to do that in a performant way (no for loops).
I tried various ways and concluded this should be readable and quite efficient:
z = np.zeros((3, 2, 4), dtype=int)
for i, n in enumerate(list_of_matrices):
z[i, :n.shape[0], :n.shape[1]] = n
While trying to find any other method than looping I concluded that every list of list_of_matrices has unbalanced amounts of cells to be assigned to z so even the most clever ways requires nothing better than concatenation of these groups which is slow in numpy.
This is an example of concatenation of index groups:
a = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]
b = [0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]
c = [0, 1, 0, 1, 0, 1, 2, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3]
z[a, b, c] = np.concatenate([n.ravel() for n in list_of_matrices])
It also requires np.concatenate and list comprehension therefore efficiency is also lost here. But if you really need to optimise it further, you can try replace concatenation like so:
# Use list of lists instead because arrays are slow while iterating
list_of_matrices = [[[3,3],[4,4]], [[1,1,3],[2,2,5]], [[1,1,3,7],[2,2,5,9]]]
from itertools import chain
concatenation = list(chain(*list(chain(*list_of_matrices))))
and create abovementioned sequences of indexes a, b, c applying tricks of np.repeat + repetition of specific blocks

Creating 2D numpy array of start and end indices of "streaks" in another array.

Say I have a 1D numpy array of numbers myArray = ([1, 1, 0, 2, 0, 1, 1, 1, 1, 0, 0 ,1, 2, 1, 1, 1]).
I want to create a 2D numpy array that describe the first (column 1) and last (column 2) indices of any "streak" of consecutive 1's that is longer than 2.
So for the example above, the 2D array should look like this:
indicesArray =
([5, 8],
[13, 15])
Since there are at least 3 consecutive ones in the 5th, 6th, 7th, 8th places and in the 13th, 14th, 15th places.
Any help would be appreciated.
Approach #1
Here's one approach inspired by this post -
def start_stop(a, trigger_val, len_thresh=2):
# "Enclose" mask with sentients to catch shifts later on
mask = np.r_[False,np.equal(a, trigger_val),False]
# Get the shifting indices
idx = np.flatnonzero(mask[1:] != mask[:-1])
# Get lengths
lens = idx[1::2] - idx[::2]
return idx.reshape(-1,2)[lens>len_thresh]-[0,1]
Sample run -
In [47]: myArray
Out[47]: array([1, 1, 0, 2, 0, 1, 1, 1, 1, 0, 0, 1, 2, 1, 1, 1])
In [48]: start_stop(myArray, trigger_val=1, len_thresh=2)
Out[48]:
array([[ 5, 8],
[13, 15]])
Approach #2
Another with binary_erosion -
from scipy.ndimage.morphology import binary_erosion
mask = binary_erosion(myArray==1,structure=np.ones((3)))
idx = np.flatnonzero(mask[1:] != mask[:-1])
out = idx.reshape(-1,2)+[0,1]

How maintain sequence of occurence of numbers from ndarray into set using python?

The Scenario
I'm trying to get the number of clusters a dataframe belongs to.
Whose Data type is <type 'numpy.ndarray'> and data as below
records_Array = array([0, 0, 0, 0, 2, 2, 1, 1, 1], dtype=int32)
Obviously while printing I see [0 0 0 ..., 1 1 1] in this format.
Now, I need the numbers only once, so I convert into set and then to List,
cluster_set = list(set(records_Array))
The Output
On printing cluster_set, I get [0, 1, 2]
where as the clusters are in sequence of 0, 2, 1
Required
I need some function / method, that preserves the sequence of records_Array and returns in cluster_set
You want Pandas' pd.unique as it does not sort as it finds unique values. Numpy's unique function does.
a = np.array([0, 0, 0, 0, 2, 2, 1, 1, 1])
pd.unique(a)
array([0, 2, 1])

Apply an offset to the indices obtained from np.where

I have a 3d numpy array and I obtain the indices that meet a certain condition, for example:
a = np.tile([[1,2],[3,4]],(2,2,2))
indices = np.where(a == 2)
To this indices, I need to apply an offset, fo example (0, 0, 1), and view if meet another condition.
Something like this:
offset = [0, 0, 1]
indices_shift = indices + offset
count = 0
for i in indices_shift:
if a[i] == 3:
count += 1
In this example, with the offset of (0,0,1), the indices looks like:
indices = (array([0, 0, 0, 0, 1, 1, 1, 1], dtype=int64), array([0, 0, 2, 2, 0, 0, 2, 2], dtype=int64), array([1, 3, 1, 3, 1, 3, 1, 3], dtype=int64))
and I think that adding the offset the results should be something like:
indices_shift = (array([0, 0, 0, 0, 1, 1, 1, 1], dtype=int64), array([0, 0, 2, 2, 0, 0, 2, 2], dtype=int64), array([2, 4, 2, 4, 2, 4, 2, 4], dtype=int64))
Is there any easy way to do that?
Thanks.
Here's one approach -
idx = np.argwhere(a == 2)+[0,0,1]
valid_mask = (idx< a.shape).all(1)
valid_idx = idx[valid_mask]
count = np.count_nonzero(a[tuple(valid_idx.T)] == 3)
Steps :
Get the indices for matches against 2. Use np.argwhere here to get those in a nice 2D array with each column representing an axis. Another benefit is that this makes it generic to handle arrays with generic number of dimensions. Then, add offset in a broadcasted manner. This is idx.
Among the indices in idx, there would be few invalid ones that go beyond the array shape. So, get a valid mask valid_mask and hence valid indices valid_idx among them.
Finally index into input array with those, compare against 3 and count the number of matches.

Numpy finding element index in another array

I have an array/set with unique positive integers, i.e.
>>> unique = np.unique(np.random.choice(100, 4, replace=False))
And an array containing multiple elements sampled from this previous array, such as
>>> A = np.random.choice(unique, 100)
I want to map the values of the array A to the position of which those values occur in unique.
So far the best solution I found is through a mapping array:
>>> table = np.zeros(unique.max()+1, unique.dtype)
>>> table[unique] = np.arange(unique.size)
The above assigns to each element the index on the array, and thus, can be used later to map A through advanced indexing:
>>> table[A]
array([2, 2, 3, 3, 3, 3, 1, 1, 1, 0, 2, 0, 1, 0, 2, 1, 0, 0, 2, 3, 0, 0, 0,
0, 3, 3, 2, 1, 0, 0, 0, 2, 1, 0, 3, 0, 1, 3, 0, 1, 2, 3, 3, 3, 3, 1,
3, 0, 1, 2, 0, 0, 2, 3, 1, 0, 3, 2, 3, 3, 3, 1, 1, 2, 0, 0, 2, 0, 2,
3, 1, 1, 3, 3, 2, 1, 2, 0, 2, 1, 0, 1, 2, 0, 2, 0, 1, 3, 0, 2, 0, 1,
3, 2, 2, 1, 3, 0, 3, 3], dtype=int32)
Which already gives me the proper solution. However, if the unique numbers in unique are very sparse and large, this approach implies creating a very large table array just to store a few numbers for later mapping.
Is there any better solution?
NOTE: both A and unique are sample arrays, not real arrays. So the question is not how to generate positional indexes, it is just how to efficiently map elements of A to indexes in unique, the pseudocode of what I'd like to speedup in numpy is as follows,
B = np.zeros_like(A)
for i in range(A.size):
B[i] = unique.index(A[i])
(assuming unique is a list in the above pseudocode).
The table approach described in your question is the best option when unique if pretty dense, but unique.searchsorted(A) should produce the same result and doesn't require unique to be dense. searchsorted is great with ints, if anyone is trying to do this kind of thing with floats which have precision limitations, consider something like this.
You can use standard python dict with np.vectorize
inds = {e:i for i, e in enumerate(unique)}
B = np.vectorize(inds.get)(A)
The numpy_indexed package (disclaimer: I am its author) contains a vectorized equivalent of list.index, which does not require memory proportional to the max element, but only proportional to the input itself:
import numpy_indexed as npi
npi.indices(unique, A)
Note that it also works for arbitrary dtypes and dimensions. Also, the array being queried does not need to be unique; the first index encountered will be returned, the same as for list.

Categories

Resources