Two arrays (one indicating an index, the other number of repetition) . I want to remove index based on the number of repetition (python)

Two arrays (one indicating an index, the other number of repetition) . I want to remove index based on the number of repetition (python) - python

I am working in a colab with some dataframes and I have two numpy arrays:
-First one indicates the index of a row.
-The other one indicates the number of repetitions (I did some methods before all this).
If I print both arrays I get something like this:
print(uniqueValues, occurCount)
OUTPUT: [ 13 33 66 ... 99907 99911 99928] [7 1 6 ... 1 6 4]
We can interpret it as: 13 is repeated 7 times, 33 is repeated 1 time....
Now the question:
How can i remove the index and the repetition from both arrays, based on the number of repetition?
Example:
if < 5 then remove element
Expected output:[ 13 66 ... 99911] [7 6 ... 6]

You can use the matching values from occurCount as a filter on uniqueValues and occurCount using boolean indexing:
uniqueValues = uniqueValues[occurCount >= 5]
occurCount = occurCount[occurCount >= 5]
For example:
import numpy as np
uniqueValues = np.array([13, 33, 66, 99907, 99911, 99928])
occurCount = np.array([7, 1, 6, 1, 6, 4])
uniqueValues = uniqueValues[occurCount >= 5]
occurCount = occurCount[occurCount >= 5]
print(uniqueValues )
print(occurCount)
Output:
[ 13 66 99911]
[7 6 6]

uniqueValues = np.array([13, 33, 66, 99907, 99911, 99928])
occurCount = np.array([7, 1, 6, 1, 6, 4])
np.array([uniqueValues, occurCount])[:, occurCount >= 5]
will return a 2 dim array with your results. but the logic is the same as pointed by Nick.

Create a new array where you will append the indexes for occurCount values that meet the criteria of <5. Then use these index value to delete these values from both arrays and store the new version of the array. Need to assign it to the variables because the original np arrays are immutable.
import numpy as np
uniqueValues = np.array([13, 33, 66, 99907, 99911, 99928])
occurCount = np.array([7, 1, 6, 1, 6, 4])
indexes = []
for index, item in enumerate(y):
if item < 5:
indexes.append(index)
y = np.delete(y, indexes)
x = np.delete(x, indexes)
print(x, y)

Related

In python numpy, how to replace some rows in array A with array B if we know the index

In python numpy, how to replace some rows in array A with array B if we know the index.
For example
we have
a = np.array([[1,2],[3,4],[5,6]])
b = np.array([[10,10],[1000, 1000]])
index = [0,2]
I want to change a to
a = np.array([[10,10],[3,4],[1000,1000]])
I have considered the funtion np.where but it need to create the bool condition, not very convenient,

I would do it following way
import numpy as np
a = np.array([[1,2],[3,4],[5,6]])
b = np.array([[10,10],[1000, 1000]])
index = [0,2]
a[index] = b
print(a)
gives output
[[ 10 10]
[ 3 4]
[1000 1000]]

You can use :
a[index] = b
For example :
import numpy as np
a = np.array([[1,2],[3,4],[5,6]])
b = np.array([[10,10],[1000, 1000]])
index = [0,2]
a[index] = b
print(a)
Result :
[[ 10 10]
[ 3 4]
[1000 1000]]

In Python's NumPy library, you can use the numpy.put() method to replace some rows in array A with array B if you know the index. Here's an example:
import numpy as np
# Initialize array A
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Initialize array B
B = np.array([[10, 20, 30], [40, 50, 60]])
# Indices of the rows to be replaced in array A
indices = [0, 1]
# Replace rows in array A with rows in array B
np.put(A, indices, B)
print(A)
In this example, the first two rows in array A are replaced with the first two rows in array B, so the output will be
[[10 20 30]
[40 50 60]
[ 7 8 9]]

Simply a[indices] = b or if you want to be more fancy np.put(a, indices, b)

How to calculate moving average of NumPy array with varying window sizes defined by an array of indices?

Which is the most pythonic way to average the values in a 2d array (axis=1) based on a range in a 1d array?
I am trying to average arrays of environmental variables (my 2d array) based on every 2 degrees of latitude (my id array). I have a latitude array that goes from -33.9 to 29.5. I'd like to average the environmental variables within every 2 degrees from -34 to 30.
The number of elements within each 2 degrees may be different, for example:
arr = array([[5,3,4,5,6,4,2,4,5,8],
[4,5,8,5,2,3,6,4,1,7],
[8,3,5,8,5,2,5,9,9,4]])
idx = array([1,1,1,2,2,3,3,3,3,4])
I would then average the values in arr based on idx[0:3], idx[3:9], idx[9].
I would like to get a result of:
arrAvg = array([4,4.2,8],
[6.3,3.5,7],
[5.3,6.3,4])

#Andyk already explained in his post how to calculate the average having a list of indices.
I will provide a solution for getting those indices.
Here is a general approach:
from typing import Optional
import numpy as np
def get_split_indices(array: np.ndarray,
*,
window_size: int,
start_value: Optional[int] = None) -> np.ndarray:
"""
:param array: input array with consequent integer indices
:param window_size: specifies range of indices
which will be included in a separate window
:param start_value: from which the window will start
:return: array of indices marking the borders of the windows
"""
if start_value is None:
start_value = array[0]
diff = np.diff(array)
diff_indices = np.where(diff)[0] + 1
slice_ = slice(window_size - 1 - (array[0] - start_value) % window_size,
None,
window_size)
return diff_indices[slice_]
Examples of usage:
Checking it with your example data:
# indices: 3 9
idx = np.array([1,1,1, 2,2,3,3,3,3, 4])
you can get the indices separating different windows like this:
get_split_indices(idx,
window_size=2,
start_value=0)
>>> array([3, 9])
With this function you can also specify different window sizes:
# indices: 7 11 17
idx = np.array([0,1,1,2,2,3,3, 4,5,6,7, 8,9,10,11,11,11, 12,13])
get_split_indices(idx,
window_size=4,
start_value=0)
>>> array([ 7, 11, 17])
and different starting values:
# indices: 1 7 10 13 18
idx = np.array([0, 1,1,2,2,3,3, 4,5,6, 7,8,9, 10,11,11,11,12, 13])
get_split_indices(idx,
window_size=3,
start_value=-2)
>>> array([ 1, 7, 10, 13, 18])
Note that I made the first element of array a starting value by default.

You could use the np.hsplit function. For your example of indices 0:3, 3:9, 9 it goes like this:
np.hsplit(arr, [3, 9])
which gives you a list of arrays:
[array([[5, 3, 4],
[4, 5, 8],
[8, 3, 5]]),
array([[5, 6, 4, 2, 4, 5],
[5, 2, 3, 6, 4, 1],
[8, 5, 2, 5, 9, 9]]),
array([[8],
[7],
[4]])]
Then you can compute the mean as follows:
m = [np.mean(a, axis=1) for a in np.hsplit(arr, [3, 9])]
And convert it back to an array:
np.vstack(m).T

Grouping elements of a numpy array using an array of group counts

Given two arrays, one representing a stream of data, and another representing group counts, such as:
import numpy as np
# given group counts: 3 4 3 2
# given flattened data:[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ]
group_counts = np.array([3,4,3,2])
data = np.arange(group_counts.sum()) # placeholder data, real life application will be a very large array
I want to generate matrices based on the group counts for the streamed data, such as:
target_count = 3 # I want to make a matrix of all data items who's group_counts = target_count
# Expected result
# [[ 0 1 2]
# [ 7 8 9]]
To do this I wrote the following:
# Find all matches
match = np.where(groups == group_target)[0]
i1 = np.cumsum(groups)[match] # start index for slicing
i0 = i1 - groups[match] # end index for slicing
# Prep the blank matrix and fill with resuls
matched_matrix = np.empty((match.size,target_count))
# Is it possible to get rid of this loop?
for i in xrange(match.size):
matched_matrix[i] = data[i0[i]:i1[i]]
matched_matrix
# Result: array([[ 0, 1, 2],
[ 7, 8, 9]]) #
This works, but I would like to get rid of the loop and I can't figure out how.
Doing some research I did find numpy.split and numpy.array_split:
match = np.where(group_counts == target_count)[0]
match = np.array(np.split(data,np.cumsum(groups)))[match]
# Result: array([array([0, 1, 2]), array([7, 8, 9])], dtype=object) #
But numpy.split produces a list of dtype=object that I have to convert.
Is there an elegant way to produce the desired result without a loop?

You can repeat group_counts so it has the same size as data, then filter and reshape based on the target:
group_counts = np.array([3,4,3,2])
data = np.arange(group_counts.sum())
target = 3
data[np.repeat(group_counts, group_counts) == target].reshape(-1, target)
#array([[0, 1, 2],
# [7, 8, 9]])

NumPy random shuffle rows independently

I have the following array:
import numpy as np
a = np.array([[ 1, 2, 3],
[ 1, 2, 3],
[ 1, 2, 3]])
I understand that np.random.shuffle(a.T) will shuffle the array along the row, but what I need is for it to shuffe each row idependently. How can this be done in numpy? Speed is critical as there will be several million rows.
For this specific problem, each row will contain the same starting population.

import numpy as np
np.random.seed(2018)
def scramble(a, axis=-1):
"""
Return an array with the values of `a` independently shuffled along the
given axis
"""
b = a.swapaxes(axis, -1)
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
b = b[..., idx]
return b.swapaxes(axis, -1)
a = a = np.arange(4*9).reshape(4, 9)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
# [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
# [18, 19, 20, 21, 22, 23, 24, 25, 26],
# [27, 28, 29, 30, 31, 32, 33, 34, 35]])
print(scramble(a, axis=1))
yields
[[ 3 8 7 0 4 5 1 2 6]
[12 17 16 9 13 14 10 11 15]
[21 26 25 18 22 23 19 20 24]
[30 35 34 27 31 32 28 29 33]]
while scrambling along the 0-axis:
print(scramble(a, axis=0))
yields
[[18 19 20 21 22 23 24 25 26]
[ 0 1 2 3 4 5 6 7 8]
[27 28 29 30 31 32 33 34 35]
[ 9 10 11 12 13 14 15 16 17]]
This works by first swapping the target axis with the last axis:
b = a.swapaxes(axis, -1)
This is a common trick used to standardize code which deals with one axis.
It reduces the general case to the specific case of dealing with the last axis.
Since in NumPy version 1.10 or higher swapaxes returns a view, there is no copying involved and so calling swapaxes is very quick.
Now we can generate a new index order for the last axis:
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
Now we can shuffle b (independently along the last axis):
b = b[..., idx]
and then reverse the swapaxes to return an a-shaped result:
return b.swapaxes(axis, -1)

If you don't want a return value and want to operate on the array directly, you can specify the indices to shuffle.
>>> import numpy as np
>>>
>>>
>>> a = np.array([[1,2,3], [1,2,3], [1,2,3]])
>>>
>>> # Shuffle row `2` independently
>>> np.random.shuffle(a[2])
>>> a
array([[1, 2, 3],
[1, 2, 3],
[3, 2, 1]])
>>>
>>> # Shuffle column `0` independently
>>> np.random.shuffle(a[:,0])
>>> a
array([[3, 2, 3],
[1, 2, 3],
[1, 2, 1]])
If you want a return value as well, you can use numpy.random.permutation, in which case replace np.random.shuffle(a[n]) with a[n] = np.random.permutation(a[n]).
Warning, do not do a[n] = np.random.shuffle(a[n]). shuffle does not return anything, so the row/column you end up "shuffling" will be filled with nan instead.

Good answer above. But I will throw in a quick and dirty way:
a = np.array([[1,2,3], [1,2,3], [1,2,3]])
ignore_list_outpput = [np.random.shuffle(x) for x in a]
Then, a can be something like this
array([[2, 1, 3],
[4, 6, 5],
[9, 7, 8]])
Not very elegant but you can get this job done with just one short line.

Building on my comment to #Hun's answer, here's the fastest way to do this:
def shuffle_along(X):
"""Minimal in place independent-row shuffler."""
[np.random.shuffle(x) for x in X]
This works in-place and can only shuffle rows. If you need more options:
def shuffle_along(X, axis=0, inline=False):
"""More elaborate version of the above."""
if not inline:
X = X.copy()
if axis == 0:
[np.random.shuffle(x) for x in X]
if axis == 1:
[np.random.shuffle(x) for x in X.T]
if not inline:
return X
This, however, has the limitation of only working on 2d-arrays. For higher dimensional tensors, I would use:
def shuffle_along(X, axis=0, inline=True):
"""Shuffle along any axis of a tensor."""
if not inline:
X = X.copy()
np.apply_along_axis(np.random.shuffle, axis, X) # <-- I just changed this
if not inline:
return X

You can do it with numpy without any loop or extra function, and much more faster. E. g., we have an array of size (2, 6) and we want a sub array (2,2) with independent random index for each column.
import numpy as np
test = np.array([[1, 1],
[2, 2],
[0.5, 0.5],
[0.3, 0.3],
[4, 4],
[7, 7]])
id_rnd = np.random.randint(6, size=(2, 2)) # select random numbers, use choice and range if don want replacement.
new = np.take_along_axis(test, id_rnd, axis=0)
Out:
array([[2. , 2. ],
[0.5, 2. ]])
It works for any number of dimensions.

As of NumPy 1.20.0 released in January 2021 we have a permuted() method on the new Generator type (introduced with the new random API in NumPy 1.17.0, released in July 2019). This does exactly what you need:
import numpy as np
rng = np.random.default_rng()
a = np.array([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
])
shuffled = rng.permuted(a, axis=1)
This gives you something like
>>> print(shuffled)
[[2 3 1]
[1 3 2]
[2 1 3]]
As you can see, the rows are permuted independently. This is in sharp contrast with both rng.permutation() and rng.shuffle().
If you want an in-place update you can pass the original array as the out keyword argument. And you can use the axis keyword argument to choose the direction along which to shuffle your array.

Operations on 'N' dimensional numpy arrays

I am attempting to generalize some Python code to operate on arrays of arbitrary dimension. The operations are applied to each vector in the array. So for a 1D array, there is simply one operation, for a 2-D array it would be both row and column-wise (linearly, so order does not matter). For example, a 1D array (a) is simple:
b = operation(a)
where 'operation' is expecting a 1D array. For a 2D array, the operation might proceed as
for ii in range(0,a.shape[0]):
b[ii,:] = operation(a[ii,:])
for jj in range(0,b.shape[1]):
c[:,ii] = operation(b[:,ii])
I would like to make this general where I do not need to know the dimension of the array beforehand, and not have a large set of if/elif statements for each possible dimension.
Solutions that are general for 1 or 2 dimensions are ok, though a completely general solution would be preferred. In reality, I do not imagine needing this for any dimension higher than 2, but if I can see a general example I will learn something!
Extra information:
I have a matlab code that uses cells to do something similar, but I do not fully understand how it works. In this example, each vector is rearranged (basically the same function as fftshift in numpy.fft). Not sure if this helps, but it operates on an array of arbitrary dimension.
function aout=foldfft(ain)
nd = ndims(ain);
for k = 1:nd
nx = size(ain,k);
kx = floor(nx/2);
idx{k} = [kx:nx 1:kx-1];
end
aout = ain(idx{:});

In Octave, your MATLAB code does:
octave:19> size(ain)
ans =
2 3 4
octave:20> idx
idx =
{
[1,1] =
1 2
[1,2] =
1 2 3
[1,3] =
2 3 4 1
}
and then it uses the idx cell array to index ain. With these dimensions it 'rolls' the size 4 dimension.
For 5 and 6 the index lists would be:
2 3 4 5 1
3 4 5 6 1 2
The equivalent in numpy is:
In [161]: ain=np.arange(2*3*4).reshape(2,3,4)
In [162]: idx=np.ix_([0,1],[0,1,2],[1,2,3,0])
In [163]: idx
Out[163]:
(array([[[0]],
[[1]]]), array([[[0],
[1],
[2]]]), array([[[1, 2, 3, 0]]]))
In [164]: ain[idx]
Out[164]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
Besides the 0 based indexing, I used np.ix_ to reshape the indexes. MATLAB and numpy use different syntax to index blocks of values.
The next step is to construct [0,1],[0,1,2],[1,2,3,0] with code, a straight forward translation.
I can use np.r_ as a short cut for turning 2 slices into an index array:
In [201]: idx=[]
In [202]: for nx in ain.shape:
kx = int(np.floor(nx/2.))
kx = kx-1;
idx.append(np.r_[kx:nx, 0:kx])
.....:
In [203]: idx
Out[203]: [array([0, 1]), array([0, 1, 2]), array([1, 2, 3, 0])]
and pass this through np.ix_ to make the appropriate index tuple:
In [204]: ain[np.ix_(*idx)]
Out[204]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
In this case, where 2 dimensions don't roll anything, slice(None) could replace those:
In [210]: idx=(slice(None),slice(None),[1,2,3,0])
In [211]: ain[idx]
======================
np.roll does:
indexes = concatenate((arange(n - shift, n), arange(n - shift)))
res = a.take(indexes, axis)
np.apply_along_axis is another function that constructs an index array (and turns it into a tuple for indexing).

If you are looking for a programmatic way to index the k-th dimension an n-dimensional array, then numpy.take might help you.
An implementation of foldfft is given below as an example:
In[1]:
import numpy as np
def foldfft(ain):
result = ain
nd = len(ain.shape)
for k in range(nd):
nx = ain.shape[k]
kx = (nx+1)//2
shifted_index = list(range(kx,nx)) + list(range(kx))
result = np.take(result, shifted_index, k)
return result
a = np.indices([3,3])
print("Shape of a = ", a.shape)
print("\nStarting array:\n\n", a)
print("\nFolded array:\n\n", foldfft(a))
Out[1]:
Shape of a = (2, 3, 3)
Starting array:
[[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 1 2]
[0 1 2]
[0 1 2]]]
Folded array:
[[[2 0 1]
[2 0 1]
[2 0 1]]
[[2 2 2]
[0 0 0]
[1 1 1]]]

You could use numpy.ndarray.flat, which allows you to linearly iterate over a n dimensional numpy array. Your code should then look something like this:
b = np.asarray(x)
for i in range(len(x.flat)):
b.flat[i] = operation(x.flat[i])

The folks above provided multiple appropriate solutions. For completeness, here is my final solution. In this toy example for the case of 3 dimensions, the function 'ops' replaces the first and last element of a vector with 1.
import numpy as np
def ops(s):
s[0]=1
s[-1]=1
return s
a = np.random.rand(4,4,3)
print '------'
print 'Array a'
print a
print '------'
for ii in np.arange(a.ndim):
a = np.apply_along_axis(ops,ii,a)
print '------'
print ' Axis',str(ii)
print a
print '------'
print ' '
The resulting 3D array has a 1 in every element on the 'border' with the numbers in the middle of the array unchanged. This is of course a toy example; however ops could be any arbitrary function that operates on a 1D vector.
Flattening the vector will also work; I chose not to pursue that simply because the book-keeping is more difficult and apply_along_axis is the simplest approach.
apply_along_axis reference page

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Two arrays (one indicating an index, the other number of repetition) . I want to remove index based on the number of repetition (python) - python

uniqueValues = np.array([13, 33, 66, 99907, 99911, 99928]) occurCount = np.array([7, 1, 6, 1, 6, 4]) np.array([uniqueValues, occurCount])[:, occurCount >= 5] will return a 2 dim array with your results. but the logic is the same as pointed by Nick.

Related

In python numpy, how to replace some rows in array A with array B if we know the index

How to calculate moving average of NumPy array with varying window sizes defined by an array of indices?

Grouping elements of a numpy array using an array of group counts

NumPy random shuffle rows independently

Operations on 'N' dimensional numpy arrays

Categories

Resources