I want to convert two numpy array to one DataFrame containing two columns.
The first numpy array 'images' is of shape 102, 1024.
The second numpy array 'label' is of shape (1020, )
My core code is:
images=np.array(images)
label=np.array(label)
l=np.array([images,label])
dataset=pd.DataFrame(l)
But it turns out to be an error saying that:
ValueError: could not broadcast input array from shape (1020,1024) into shape (1020)
What should I do to convert these two numpy array into two columns in one dataframe?
You can't stack them easily, especially if you want them as different columns, because you can't insert a 2D array in one column of a DataFrame, so you need to convert it to something else, for example a list.
So something like this would work:
import pandas as pd
import numpy as np
images = np.array(images)
label = np.array(label)
dataset = pd.DataFrame({'label': label, 'images': list(images)}, columns=['label', 'images'])
This will create a DataFrame with 1020 rows and 2 columns, where each item in the second column contains 1D arrays of length 1024.
Coming from engineering, I like the visual side of creating matrices.
matrix_aux = np.vstack([label,images])
matrix = np.transpose(matrix_aux)
df_lab_img = pd.DataFrame(matrix)
Takes a little bit more of code but leaves you with the Numpy array too.
You can also use hstack
import pandas as pd
import numpy as np
dataset = pd.DataFrame(np.hstack((images, label.reshape(-1, 1))))
Related
Goal: I am working with RNNs in PyTorch, and my data is given by a list of DataFrames, where each DataFrame means one observation like:
import numpy as np
data = [pd.DataFrame(np.zeros((5,50))) for x in range(100)]
which means 100 observation, with 50 parameters and 5 timesteps each. For my Model i need a tensor of shape (100,5,50).
Issue: I tried a lot of things but nothing seems to work, does anyone know how this is done?
This approaches doesn't work:
import torch
torch.tensor(np.array(data))
I thing the problem is to convert the DataFrames into Arrays and the List into a Tensor at the same time.
I don't think you can convert the list of dataframes in a single command, but you can convert the list of dataframes into a list of tensors and then concatenate the list.
E.g.
import pandas as pd
import numpy as np
import torch
data = [pd.DataFrame(np.zeros((5,50))) for x in range(100)]
list_of_arrays = [np.array(df) for df in data]
torch.tensor(np.stack(list_of_arrays))
#or
list_of_tensors = [torch.tensor(np.array(df)) for df in data]
torch.stack(list_of_tensors)
it's too late, but if someonelse is still asking this question somewhere... This is for you <3
import torch
import numpy as np
list_of_dataframe : List[pd.DataFrame] #= ....
my_tensor = torch.tensor(np.array(list_of_dataframe))
(python 3.9, numpy 1.20, pytorch 1.10)
I'm trying to get maximum and minimum values out of a numpy array. In order to have a good overview of the array, I used pandas. Based on this resulting array, I wanted to get a column of maximum and minimum values.
import pandas as pd
import numpy as np
TEST = np.load('NPY TEST.npy')
input_array = pd.DataFrame(TEST)
print(input_array)
inputs_max = np.max(input_array, axis=0)
print(inputs_max)
inputs_min = np.min(input_array[np.nonzero(input_array)], axis=0)
print(inputs_min)
The problem is that if I use
np.min(input_array, axis=0)
the resulting column only consists of zeros, although there is not one 0 in my numpy array. So I tried to use the np.nonzero command, which led to many errors:
AttributeError: 'DataFrame' object has no attribute 'nonzero'
Could anyone help me? Thanks in advance.
I can just guess what your data is looking like, but I'll give it a try:
inputs_min = input_array[input_array != 0.].min(axis=0)
Hey guys Ii need help..
I want to use tensorflows data import, where data is loaded by calling the features/labels vectors from a structured numpy array.
https://www.tensorflow.org/programmers_guide/datasets#consuming_numpy_arrays
I want to create such an structured array by adding consecutively the 2 vectors (feature_vec and label_vec) to an numpy structured array.
import numpy as np
# example vectors
feature_vec= np.arange(10)
label_vec = np.arange(10)
# structured array which should get the vectors
struc_array = np.array([feature_vec,label_vec],dtype=([('features',np.float32), ('labels',np.float32)]))
# How can I add now new vectors to struc_array?
struc_array.append(---)
I want later when this array is loaded from file call either the feature vectors (which is a matrix now) by using the fieldname:
with np.load("/var/data/training_data.npy") as data:
features = data["features"] # matrix containing feature vectors as rows
labels = data["labels"] #matrix containing labels vectors as rows
Everything I tried to code was complete crap.. never got a correct output..
Thanks for your help!
Don't create a NumPy array and then append to it. That doesn't really make sense, as NumPy arrays have a fixed size and require a full copy to append a single row or column. Instead, create a list, append to it, then construct the array at the end:
vecs = [feature_vec,label_vec]
dtype = [('features',np.float32), ('labels',np.float32)]
# append as many times as you want:
vecs.append(other_vec)
dtype.append(('other', np.float32))
struc_array = np.array(vecs, dtype=dtype)
Of course, you probably need ot
Unfortunately, this doesn't solve the problem.
i want to get just the labels or the features from structured array by using:
labels = struc_array['labels']
features = struc_array['features']
But when i use the structured array like you did, labels and also features contains all given appended vectors:
import numpy as np
feature_vec= np.arange(10)
label_vec = np.arange(0,5,0.5)
vecs = [feature_vec,label_vec]
dtype = [('features',np.float32), ('labels',np.float32)]
other_vec = np.arange(6,11,0.5)
vecs.append(other_vec)
dtype.append(('other', np.float32))
struc_array = np.array(vecs, dtype=dtype)
# This contains all vectors.. not just the labels vector
labels = struc_array['labels']
# This also contains all vectors.. not just the feature vector
features = struc_array['features']
I am newbie on python and I loaded a big data from a csv into a pandas dataframe. However, I cannot find a method to create a 2d array for each row of the dataframe where each row of the new np array correspond to X range of values. For example, in my code:
import pandas as pd
import numpy as np
data = pd.read_csv("categorization/dataAll10Overfit.csv",header=None)
#print(data)
rec = data.iloc[:,0:3968] # outputs i rows x 3969 columns
There are 3968 values in each row of the dataframe and I would like to create a 124x32 numpy array so each block of 124 values become a row in the 2d np array. I know C# and there it will work to fill the new array using a for loop but I guess there should be a one-line function in python to split all the data of the dataframe's arrow into a new np array. If this question is duplicated, please refer me to the other post. Thanks in advance
If you want all 2D arrays within one 3D array you can do:
arr = np.zeros((data.shape[0], 124, 32))
for idx, row in data.iterrows():
arr[idx] = np.asarray(row).reshape(124, 32)
Or as a one-liner list of arrays:
arr = [np.asarray(row).reshape(124, 32) for idx, row in data.iterrows()]
I assume you don't want to replace the array in place.
nested_record = pd.DataFrame(columns=['record'], index=range(3968))
for i in range(3968):
nested_record['records'].iloc[i] = data.iloc[i].reshape(124, 32)
I have two raster files which I have converted into NumPy arrays (arcpy.RasterToNumpyArray) to work with the values in the raster cells with Python.
One of the raster has two values True and False. The other raster has different values in the range between 0 to 1000. Both rasters have exactly the same extent, so both NumPy arrays are build up identically (columns and rows), except the values.
My aim is to identify all positions in NumPy array A which have the value True. These positions shall be used for getting the value at these positions from NumPy array B.
Do you have any idea how I can implement this?
If I understand your description right, you should just be able to do B[A].
You can use the array with True and False values to simply index into the other. Here's a sample:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[True,False,False],[False,True,False],[False,False,True]])
a[b] ## gives array([1, 5, 9])