splitting an array into two smaller arrays in python - python

I have an array of size 80x40 and want to send each row into one of two smaller arrays based on a value in a specific column (10). I have code similar to below-but this ends up flattening the array. I don't know the Y dimensions of the output arrays (Array2,Array3). I guess I could have some code count all the values above and below 50 to get the Y dimensions of the output axes and then make 2 output arrays of np.zeros(Array.shape[0],Yvalues) and append row by row to that but I'm still not sure how that would work.
Array.shape=(80,40)
Array2=[]
Array3=[]
for x in range(0,Array.shape[0]):
if Array[x,10]<50:
Array2.append(Array[x,:])
else:
Array3.append(Array[x,:])

As a smaller example:
a = np.array([[1, 10], [1, 20], [2, 30], [2, 40], [1, 50], [3, 60], [1, 70]])
a2 = a[a[:, 0] < 1.5]
a3 = a[a[:, 0] >= 1.5]
a2 is now:
array([[ 1, 10],
[ 1, 20],
[ 1, 50],
[ 1, 70]])
and a3 is now:
array([[ 2, 30],
[ 2, 40],
[ 3, 60]])
So in your case, use:
a2 = a[a[:, 10] < 50]
a3 = a[a[:, 10] >= 50]

Related

Function over each value in Python Array (without using def)

The input array is x with dimensions (1 x 3) and the output array is 3 x 3 (column of input x column of input). The output array's diagonals are the values^2. If row != column, then the formula is x(row)+x(col) for each value. Currently for 1 x 3 but should assume a variety of dimensions as input. Cannot use 'def'. The current code does not work, what would you recommend?
x = np.array([[0, 5, 10]])
output array formulas =
[[i^2, x(row)+x(col), x(row)+x(col)]
[x(row)+x(col), i^2, x(row)+x(col)]
[x(row)+x(col), x(row)+x(col), i^2]]
# where row and column refer to the output matrix row, column. For example, the value in (1,2) is x(1)+x(2)= 5
ideal output =
[[0 5 10]
[5 25 15]
[10 15 100]]
Code Attempted:
x = np.array([[0, 5, 10]])
r, c = np.shape(x)
results = np.zeros((c, c))
g[range(c), range(c)] = x**2
for i in x:
for j in i:
results[i,j] = x[i]+x[j]
Learn to use numpy methods and broadcasting:
>>> x
array([[ 0, 5, 10]])
>>> x.T
array([[ 0],
[ 5],
[10]])
>>> x.T + x
array([[ 0, 5, 10],
[ 5, 10, 15],
[10, 15, 20]])
>>> result = x.T + x
>>> result
array([[ 0, 5, 10],
[ 5, 10, 15],
[10, 15, 20]])
Then this handy built-in:
>>> np.fill_diagonal(result, x**2)
>>> result
array([[ 0, 5, 10],
[ 5, 25, 15],
[ 10, 15, 100]])
Can replace the results[range(c), range(c)] = x**2
Try this:
x.repeat(x.shape[1], axis=0)
x = x+x.T
x[np.arange(len(x)),np.arange(len(x))] = (np.diag(x)/2)**2

Apply np.vectorize along one axis

Say I have two arrays arr1 and arr2:
arr1 = [0, 1, 2]
arr2 = [
[0, 1, 2],
[3, 4, 5],
[6, 7, 8],
]
And say I have a function that does something to the elements of this array:
def func(arr):
new_arr = arr.copy()
new_arr[0] = new_arr[0] * 2
new_arr[1] = new_arr[1] * 10
new_arr[2] = new_arr[2] * 100
return new_arr
Now I want to vectorize this, so that it works for both arr1 and arr2:
func(arr1)
# returns [0, 10, 200]
func(arr2)
# returns
# [0, 10, 200],
# [6, 40, 500],
# [12, 70, 800],
np.vectorize doesn't work because it breaks down each and every element in my array parameter. I want it to apply the function only along the first axis.
np.apply_along_axis almost works, except it won't consider 1-D array parameter to be a single parameter.
What's the best way to do this?
You can just directly multiply the arrays. It works thanks to numpy broadcasting:
factor = np.array([2, 10, 100])
arr1 * factor
array([ 0, 10, 200])
arr2 * factor
array([[ 0, 10, 200],
[ 6, 40, 500],
[ 12, 70, 800]])
If you take time to read the np.vectorize docs, you'll eventually encounter the signature option:
In [27]: f= np.vectorize(func, signature='(n)->(n)')
In [28]: f(arr1)
Out[28]: array([ 0, 10, 200])
In [29]: f(arr2)
Out[29]:
array([[ 0, 10, 200],
[ 6, 40, 500],
[ 12, 70, 800]])
And reading a bit further you'll encounter the caveats about performance.
Just do this:
import numpy as np
a = np.array([0, 1, 2])
b = np.array([
[0, 1, 2],
[3, 4, 5],
[6, 7, 8],
])
c = np.array([2, 10, 100])
print(a*c)
print(b*c)
Output:
[ 0 10 200]
[[ 0 10 200]
[ 6 40 500]
[ 12 70 800]]

Average of elements in a 2d list

I have a list like the following:
[[1, 1], [7, 7], [20, 20], [9, 9], [-12, -12]]
And I'm trying to have a new list which has the same number of lists inside, but changes the value of the elements by calculating the average of an element with the element after and before.
What do I mean by that ?:
Let's say I have the sub-list sub = [7,7]
at index 1. I want this list to be [9,9], because sub[1][0] + lst_before_sub[0][0] + lst_after_sub[1][0] = 7 + 1 + 20 = 28, and 28//3 = 9 (I want integer divison).
The ideal output would be:
[[4, 4], [9, 9], [12, 12], [5, 5], [-1, -1]]
I have currently this code:
copy_l = copy.deepcopy(audio_data)
sub_list = []
for i in range(0, len(audio_data)-1):
sub_data = []
for j in range(2):
if i == 0:
audio_data[i][j] += int(audio_data[i+1][j] / 2)
sub_data.append(audio_data[i][j])
elif audio_data[i+1] == audio_data[-1]:
audio_data[i+1][j] = int((audio_data[i+1][j]+audio_data[i][j])/2)
sub_data.append(audio_data[i+1][j])
else:
audio_data = copy_l
audio_data[i][j] = int((audio_data[i-1][j] + audio_data[i][j] + audio_data[i+1][j])/3)
sub_data.append(audio_data[i][j])
sub_list.append(sub_data)
print(sub_list)
where audio_data is the list [[1, 1], [7, 7], [20, 20], [9, 9], [-12, -12]] that I passed in.
(I have separated the average calculation in three cases:
- First element of the list: [1,1] so the average is just 1 + 7 // 2 (no element before [1,1])
- Last element of the list: [-12,-12] so the average is just -12 + 9 // 2 (no element after [-12,-12])
- All the elements in between
)
Problem is, my output (sub_list) is:
[[4, 4], [9, 9], [12, 12], [-1, -1]]
And it seems that [9,9] never turns into [5,5]
Does someone have any idea how to achieve what I want, or even an idea to make it simpler ? I hope I was clear enough, if not feel free to ask me more details, thank you!
EDIT:
I'm seeking a solution without numpy, list comprehension or zip.
Here is a way to do it:
data = [[1, 1], [7, 7], [20, 20], [9, 9], [-12, -12]]
out = []
for i in range(len(data)):
first = max(i -1, 0) # don't have the start of the slice <0
last = min(i + 2, len(data)) # neither beyond the end of the list
mean = [sum(col) // (last-first) for col in zip(*data[first:last])]
out.append(mean)
print(out)
# [[4, 4], [9, 9], [12, 12], [5, 5], [-2, -2]]
We take slices of data around the current item.
Then, we zip the sublists, and we calculate the result on the first (resp. second) values of the sublists.
Also, note that using Python's integer division, we get -2 for -3//2, not -1 as you got by rounding to the closest to 0. If you really want to do that, you'll have to use a custom function for the division.
Here's a NumPy solution:
import numpy as np
def mean3(data):
return np.convolve(np.r_[data[:2].mean(), data, data[-2:].mean()], np.ones(3), 'valid')//3
>>> np.apply_along_axis(mean3, 0, audio_data)
array([[ 4., 4.],
[ 9., 9.],
[12., 12.],
[ 5., 5.],
[-2., -2.]])
Or, if you prefer the int(x/y) definition of integer division:
import numpy as np
def mean3(data):
return (np.convolve(np.r_[data[:2].mean(), data, data[-2:].mean()], np.ones(3), 'valid')/3).astype(int)
>>> np.apply_along_axis(mean3, 0, audio_data)
array([[ 4, 4],
[ 9, 9],
[12, 12],
[ 5, 5],
[-1, -1]])

how to understand such shuffling data code in Numpy

I am learning at Numpy and I want to understand such shuffling data code as following:
# x is a m*n np.array
# return a shuffled-rows array
def shuffle_col_vals(x):
rand_x = np.array([np.random.choice(x.shape[0], size=x.shape[0], replace=False) for i in range(x.shape[1])]).T
grid = np.indices(x.shape)
rand_y = grid[1]
return x[(rand_x, rand_y)]
So I input an np.array object as following:
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
And I get a output of shuffle_col_vals(x1) like comments as following:
array([[ 1, 5, 11, 15],
[ 3, 8, 9, 14],
[ 4, 6, 12, 16],
[ 2, 7, 10, 13]], dtype=int64)
I get confused about the initial way of rand_x and I didn't get such way in numpy.array
And I have been thinking it a long time, but I still don't understand why return x[(rand_x, rand_y)] will get a shuffled-rows array.
If not mind, could anyone explain the code to me?
Thanks in advance.
In indexing Numpy arrays, you can take single elements. Let's use a 3x4 array to be able to differentiate between the axes:
In [1]: x1 = np.array([[1, 2, 3, 4],
...: [5, 6, 7, 8],
...: [9, 10, 11, 12]], dtype=int)
In [2]: x1[0, 0]
Out[2]: 1
If you review Numpy Advanced indexing, you will find that you can do more in indexing, by providing lists for each dimension. Consider indexing with x1[rows..., cols...], let's take two elements.
Pick from the first and second row, but always from the first column:
In [3]: x1[[0, 1], [0, 0]]
Out[3]: array([1, 5])
You can even index with arrays:
In [4]: x1[[[0, 0], [1, 1]], [[0, 1], [0, 1]]]
Out[4]:
array([[1, 2],
[5, 6]])
np.indices creates a row and col array, that if used for indexing, give back the original array:
In [5]: grid = np.indices(x1.shape)
In [6]: np.alltrue(x1[grid[0], grid[1]] == x1)
Out[6]: True
Now if you shuffle the values of grid[0] col-wise, but keep grid[1] as-is, and then use these for indexing, you get an array with the values of the columns shuffled.
Each column index vector is [0, 1, 2]. The code now shuffles these column index vectors for each column individually, and stacks them together into rand_x into the same shape as x1.
Create a single shuffled column index vector:
In [7]: np.random.seed(0)
In [8]: np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
Out[8]: array([2, 1, 0])
The stacking works by (pseudo-code) stacking with [random-index-col-vec for cols in range(x1.shape[1])] and then transposing (.T).
To make it a little clearer we can rewrite i as col and use column_stack instead of np.array([... for col]).T:
In [9]: np.random.seed(0)
In [10]: col_list = [np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
for col in range(x1.shape[1])]
In [11]: col_list
Out[11]: [array([2, 1, 0]), array([2, 0, 1]), array([0, 2, 1]), array([2, 0, 1])]
In [12]: rand_x = np.column_stack(col_list)
In [13]: rand_x
Out[13]:
array([[2, 2, 0, 2],
[1, 0, 2, 0],
[0, 1, 1, 1]])
In [14]: x1[rand_x, grid[1]]
Out[14]:
array([[ 9, 10, 3, 12],
[ 5, 2, 11, 4],
[ 1, 6, 7, 8]])
Details to note:
the example output you give is different from what the function you provide does. It seems to be transposed.
the use of rand_x and rand_y in the sample code can be confusing when being used to the convention of x=column index, y=row index
See output:
import numpy as np
def shuffle_col_val(x):
print("----------------------------\n A rand_x\n")
f = np.random.choice(x.shape[0], size=x.shape[0], replace=False)
print(f, "\nNow I transpose an array.")
rand_x = np.array([f]).T
print(rand_x)
print("----------------------------\n B rand_y\n")
print("Grid gives you two possibilities\n you choose second:")
grid = np.indices(x.shape)
print(format(grid))
rand_y = grid[1]
print("\n----------------------------\n C Our rand_x, rand_y:")
print("\nThe order of values in the column CHANGE:\n has random order\n{}".format(rand_x))
print("\nThe order of values in the row NO CHANGE:\n has normal order 0, 1, 2, 3\n{}".format(rand_y))
return x[(rand_x, rand_y)]
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
print("\n----------------------------\n D Our shuffled-rows: \n{}\n".format(shuffle_col_val(x1)))
Output:
A rand_x
[2 3 0 1]
Now I transpose an array.
[[2]
[3]
[0]
[1]]
----------------------------
B rand_y
Grid gives you two possibilities, you choose second:
[[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]]
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]]
----------------------------
C Our rand_x, rand_y:
The order of values in the column CHANGE: has random order
[[2]
[3]
[0]
[1]]
The order of values in the row NO CHANGE: has normal order 0, 1, 2, 3
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
----------------------------
D Our shuffled-rows:
[[ 9 10 11 12]
[13 14 15 16]
[ 1 2 3 4]
[ 5 6 7 8]]

Multiplying two 2D numpy arrays to a 3D array

I've got two 2D numpy arrays called A and B, where A is M x N and B is M x n. My problem is that I wish to multiply each element of each row of B with corresponding row of A and create a 3D matrix C which is of size M x n x N, without using for-loops.
As an example, if A is:
A = np.array([[1, 2, 3],
[4, 5, 6]])
and B is
B = np.array([[1, 2],
[3, 4]])
Then the resulting multiplication C = A x B would look something like
C = [
[[1, 2],
[12, 16]],
[[2, 4],
[15, 20]],
[[3, 6],
[18, 24]]
]
Is it clear what I'm trying to achieve, and is it possible doing without any for-loops? Best, tingis
C=np.einsum('ij,ik->jik',A,B)
It is possible by creating a new axis in each array and transposing the modified A:
A[np.newaxis,...].T * B[np.newaxis,...]
giving:
array([[[ 1, 2],
[12, 16]],
[[ 2, 4],
[15, 20]],
[[ 3, 6],
[18, 24]]])

Categories

Resources