As you can see below I have created three arrays that contain different random numbers:
np.random.seed(200)
Array1 = np.random.randn(300)
Array2 = Array1 + np.random.randn(300) * 2
Array3 = Array1 + np.random.randn(300) * 2
data = np.array([Array1, Array2 , Array3])
#data.reshape(data, (Array3, Array1)
mydf = pd.DataFrame(data)
mydf.tail()
My objective is to build a DataFrame with those three arrays. Each array should show its values in a different column. The DataFrame should have three columns and the index. My problem with the above code is that the Dataframe is built in horizontal position instead of vertical position. The DataFrame looks like this:
I have tried to use the reshape function to reshape the numpy array called ”data” but I couldn’t make it work. Any help would be more than welcome. Thanks!
You can use .T to transpose either the data data = np.array([Array1, Array2 , Array3]).T or the dataframe mydf = pd.DataFrame(data).T.
Output:
0 1 2
295 -0.126758 1.697413 0.399351
296 0.548405 1.402154 -4.396156
297 -1.063243 0.279774 -0.636649
298 -0.678952 -2.061554 0.244339
299 -0.527970 -0.290680 -0.930381
Or build a 2D array right away
arr = np.random.randn(300, 3)
arr[:, 1:] *= 2
mydf = pd.DataFrame(arr)
Related
I am trying to put two NumPy arrays into a matrix or horizontally stack them. Each array is 76 elements long, and I want the ending matrix to have 76 rows and 2 columns. I basically have a velocity/frequency model and want to have two columns with corresponding frequency/velocity values in each row.
Here is my code ('f' is frequency and 'v' the velocity values, previously already defined):
print(f.shape)
print(v.shape)
print(type(f))
print(type(v))
x = np.concatenate((f, v), axis = 1)
This returns
(76,)
(76,)
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
And an error about the concatenate line that says:
AxisError: axis 1 is out of bounds for array of dimension 1
I've also tried hstack except for concatenate, as well as vstack and transposing .T, and have the same error. I've also tried using Pandas, but I need to use NumPy, because when I save it into a txt/dat file, Pandas gives me an extra column with numbering that I do not need to have.
Your problem is that your vectors are one-dimensional, like in this example:
f_1d = np.array([1,2,3,4])
print(f_1d.shape)
> (4,)
As you can see, only the first dimension is given. So instead you could create your vectors like this:
f = np.expand_dims(np.array([1,2,3,4]), axis=1)
v = np.expand_dims(np.array([5,6,7,8]), axis=1)
print(f.shape)
print(v.shape)
>(4,1)
>(4,1)
As you may notice, the second dimension is equal to one, but now your vector is represented in matrix form.
It is now possible to transpose the matrix-vectors:
f_t = f.T
v_t = v.T
print(f_t)
> (1,4)
Instead of using concatenate, you could use vstack or hstack to create cleaner code:
x = np.hstack((f,v))
x_t = np.vstack((f_t,v_t))
print(x.shape)
print(x_t.shape)
>(4,2)
>(2,4)
I have 3 different 2D numpy arrays, each having a different number of columns:
import numpy as np
arr1 = np.random.randn(10, 5)
arr2 = np.random.randn(10, 6)
arr3 = np.random.randn(10, 4)
Now I want to merge these 3 2D arrays into a single 3D array:
myList = []
myList.append(arr1)
myList.append(arr2)
myList.append(arr3)
3darr = np.dstack(myList)
However, since the arrays have different number of columns, I get an error:
ValueError: all the input array dimensions for the concatenation axis
must match exactly, but along dimension 1, the array at index 0 has
size 5 and the array at index 1 has size 6
How can I create a 3D array by taking the minimum number of columns across the 3 2D arrays? In other words, the minimum number of columns across the 3 2D arrays is equal to 4. Thus, I want to drop the columns after 4th in all 2D arrays that have more than 4 columns.
hope following could help.
import numpy as np
a = []
a.append(np.random.randn(10, 5))
a.append(np.random.randn(10, 6))
a.append(np.random.randn(10, 4))
k = min([i.shape[1] for i in a])
out = np.zeros((a[0].shape[0], k, len(a)))
for ind in range(len(a)):
out[:,:,i] = a[i][:,:k]
I have a 2D numpy array of 2D points:
np.random.seed(0)
a = np.random.rand(3, 4, 2) # each value is a 2D point
I would like to sort each row by the norm of every point
norms = np.linalg.norm(a, axis=2) # shape(3, 4)
indices = np.argsort(norms, axis=0) # indices of each sorted row
Now I would like to create an array with the same shape and values as a. that will have each row of 2D points sorted by their norm.
How can I achieve that?
I tried variations of np.take & np.take_along_axis but with no success.
for example:
np.take(a, indices, axis=1) # shape (3,3,4,2)
This samples a 3 times, once for each row in indices.
I would like to sample a just once. each row in indices has the columns that should be sampled from the corresponding row.
If I understand you correctly, you want this:
norms = np.linalg.norm(a,axis=2) # shape(3,4)
indices = np.argsort(norms , axis=1)
np.take_along_axis(a, indices[:,:,None], axis=1)
output for your example:
[[[0.4236548 0.64589411]
[0.60276338 0.54488318]
[0.5488135 0.71518937]
[0.43758721 0.891773 ]]
[[0.07103606 0.0871293 ]
[0.79172504 0.52889492]
[0.96366276 0.38344152]
[0.56804456 0.92559664]]
[[0.0202184 0.83261985]
[0.46147936 0.78052918]
[0.77815675 0.87001215]
[0.97861834 0.79915856]]]
I am very new to python and am very familiar with R, but my question is very simple using Numpy Arrays:
Observe:
I have one array X of dimension (100,2) of floating point type and I want to add a 3rd column, preferably into a new Numpy array of dimension (100,3) such that the 3rd column = col(1)^2 for every row in array of X.
My understanding is Numpy arrays are generally of fixed dimension so I'm OK with creating a new array of dim 100x3, I just don't know how to do so using Numpy arrays.
Thanks!
One way to do this is by creating a new array and then concatenating it. For instance, say that M is currently your array.
You can compute col(1)^2 as C = M[:,0] ** 2 (which I'm interpreting as column 1 squared, not column 1 to the power of the values in column two). C will now be an array with shape (100, ), so we can reshape it using C = np.expand_dims(C, 1) which will create a new axis of length 1, so our new column now has shape (100, 1). This is important because we want all both of our arrays to have the same number of dimensions when concatenating them.
The last step here is to concatenate them using np.concatenate. In total, our result looks like this
C = M[:, 0] ** 2
C = np.expand_dims(C, 1)
M = np.concatenate([M, C], axis=1) #third row will now be col(1) ^ 2
If you're the kind of person who likes to do things in one line, you have:
M = np.concatenate([M, np.expand_dims(M[:, 0] ** 2, 0)], axis=1)
That being said, I would recommend looking at Pandas, it supports these actions more naturally, in my opinion. In Pandas, it would be
M["your_col_3_name"] = M["your_col_1_name"] ** 2
where M is a pandas dataframe.
Append with axis=1 should work.
a = np.zeros((5,2))
b = np.ones((5,1))
print(np.append(a,b,axis=1))
This should return:
[[0,0,1],
[0,0,1],
[0,0,1],
[0,0,1],
[0,0,1]]
# generate an array with shape (100,2), fill with 2.
a = np.full((100,2),2)
# calcuate the square to first column, this will be a 1-d array.
squared=a[:,0]**2
# concatenate the 1-d array to a,
# first need to convert it to 2-d arry with shape (100,1) by reshape(-1,1)
c = np.concatenate((a,squared.reshape(-1,1)),axis=1)
i have a 2-d array by 19 rows and 1280 columns .i want split it into 2 array by 19 rows and 70% of columns for train and 30% of columns for test.and this columns select randomly.my code is in python.please help me.thank you
Edited to include randomised shuffle
You can use slicing to slice arrays into your desired shape and numpy.random.shuffle() to obtain randomiced array indices.
import numpy as np
from copy import deepcopy
# create example data
num_cols, num_rows = 10, 3
arr = np.array([[f'{row}_{col}' for col in range(num_cols)] for row in range(num_rows)])
# create a list of random indices
random_cols = list(range(arr.shape[1]))
np.random.shuffle(random_cols)
# calculate truncation index as 70% of total number of columns
truncation_index = int(arr.shape[1] * 0.7)
# use arrray slicing to extract two sub_arrays
train_array = arr[:, random_cols[:truncation_index]]
test_array = arr[:, random_cols[truncation_index:]]
print(f'arr: \n{arr} \n')
print(f'train array: \n{train_array} \n')
print(f'test array: \n{test_array} \n')
With output
arr:
[['0_0' '0_1' '0_2' '0_3' '0_4' '0_5' '0_6' '0_7' '0_8' '0_9']
['1_0' '1_1' '1_2' '1_3' '1_4' '1_5' '1_6' '1_7' '1_8' '1_9']
['2_0' '2_1' '2_2' '2_3' '2_4' '2_5' '2_6' '2_7' '2_8' '2_9']]
train array:
[['0_5' '0_8' '0_0' '0_7' '0_6' '0_1' '0_4']
['1_5' '1_8' '1_0' '1_7' '1_6' '1_1' '1_4']
['2_5' '2_8' '2_0' '2_7' '2_6' '2_1' '2_4']]
test array:
[['0_3' '0_9' '0_2']
['1_3' '1_9' '1_2']
['2_3' '2_9' '2_2']]