python - 2D numpy array from a pandas dataframe row with delimited range - python

I am newbie on python and I loaded a big data from a csv into a pandas dataframe. However, I cannot find a method to create a 2d array for each row of the dataframe where each row of the new np array correspond to X range of values. For example, in my code:
import pandas as pd
import numpy as np
data = pd.read_csv("categorization/dataAll10Overfit.csv",header=None)
#print(data)
rec = data.iloc[:,0:3968] # outputs i rows x 3969 columns
There are 3968 values in each row of the dataframe and I would like to create a 124x32 numpy array so each block of 124 values become a row in the 2d np array. I know C# and there it will work to fill the new array using a for loop but I guess there should be a one-line function in python to split all the data of the dataframe's arrow into a new np array. If this question is duplicated, please refer me to the other post. Thanks in advance

If you want all 2D arrays within one 3D array you can do:
arr = np.zeros((data.shape[0], 124, 32))
for idx, row in data.iterrows():
arr[idx] = np.asarray(row).reshape(124, 32)
Or as a one-liner list of arrays:
arr = [np.asarray(row).reshape(124, 32) for idx, row in data.iterrows()]

I assume you don't want to replace the array in place.
nested_record = pd.DataFrame(columns=['record'], index=range(3968))
for i in range(3968):
nested_record['records'].iloc[i] = data.iloc[i].reshape(124, 32)

Related

How to create the Numpy array X of shape (2638, 1838) for a dataframe has shape (2638, 1840)?

Hi, can someone please help me with this? What should do if I want to use NumPy to get an array X which has a shape (2638, 1838) while the dataframe has a shape of (2638, 1840)?
Here is my code:
import pandas as pd
import numpy as np
df = pd.read_csv('pbmc_data.csv', index_col = 0)
df.shape
Conversion to Numpy and back to Pandas, as advised in one of
comments to your post, is not any elegant solution.
Fortunately, Pandas is able to do your tasks on its own.
Your first task is to select all columns of the input df
except for 2 last columns (cell_type and cell_type_string).
To do it, run:
X = df.iloc[:, :-2]
The second task is to extract the last but one column (second
from the end). To do it, run:
y = df.iloc[:, -2]

Insert a vector (array 1D) into array two-dimensional?

how can I insert a vector (array 1D) into an array two-dimensional python?
I have a code generate array one dimension each time length =11, and i want to save this vector in array two dimensions (i,11)
i represent the number of rows, and 11 represent the column
please any suggestion
I solve this problem and the solution bellow.
from array import *
import numpy as np
col_num=4
row_num=4
multi_list = [[0 for col in range(col_num)] for row in range(row_num)]
b=np.array([1,2,3,4])
a=np.array([0,0,0,0])
for row in range(row_num):
for col in range(col_num):
multi_list[row][col]= b[col]
for col in range(col_num):
for row in range(row_num):
a[col]= a[col]+multi_list[row][col]

Concatenate columns while maintaining rows

I have a numpy array that I would like to concatenate the columns into a single value for the row. Below is what I have tried so far.
import numpy as np
randoma=np.random.choice(list('ACTG'),(5,21),replace=True)# create a 7x21 raqndom matrix with A,C,T,G
randoma=np.concatenate(randoma, axis=None)
expected results is something like
randoma = ['AAGCCGCACACAGACCCTGAG',
'AAGCTGCACGCAGACCCTGAG',
'AGGCTGCACGCAGACCCTGAG',
'AAGCTGCACGTGGACCCTGAG',
'AGGCTGCACGTGGACCCTGAG',
'AGGCTGCACGTGGACCCTGAG',
'AAGCTGCATGTGGACCCTGAG']
import numpy as np
randoma = np.random.choice(list('ACTG'),(5,21),replace=True) # create a 7x21 raqndom matrix with A,C,T,G
new_list = [''.join(x) for x in randoma.tolist()]
new_list
['CGGGACGCACTTCCTGTGCAG',
'TGTAGCGGCTTGGTGTCCAAG',
'GAAAGTTTAGGATTGCGTCGG',
'AGTATTGTGATTCTATCTGAC',
'TTAGTAAGAGTGTCTCACTAT']

How to add several vectors to numpy structered array and call matrix later from fieldname?

Hey guys Ii need help..
I want to use tensorflows data import, where data is loaded by calling the features/labels vectors from a structured numpy array.
https://www.tensorflow.org/programmers_guide/datasets#consuming_numpy_arrays
I want to create such an structured array by adding consecutively the 2 vectors (feature_vec and label_vec) to an numpy structured array.
import numpy as np
# example vectors
feature_vec= np.arange(10)
label_vec = np.arange(10)
# structured array which should get the vectors
struc_array = np.array([feature_vec,label_vec],dtype=([('features',np.float32), ('labels',np.float32)]))
# How can I add now new vectors to struc_array?
struc_array.append(---)
I want later when this array is loaded from file call either the feature vectors (which is a matrix now) by using the fieldname:
with np.load("/var/data/training_data.npy") as data:
features = data["features"] # matrix containing feature vectors as rows
labels = data["labels"] #matrix containing labels vectors as rows
Everything I tried to code was complete crap.. never got a correct output..
Thanks for your help!
Don't create a NumPy array and then append to it. That doesn't really make sense, as NumPy arrays have a fixed size and require a full copy to append a single row or column. Instead, create a list, append to it, then construct the array at the end:
vecs = [feature_vec,label_vec]
dtype = [('features',np.float32), ('labels',np.float32)]
# append as many times as you want:
vecs.append(other_vec)
dtype.append(('other', np.float32))
struc_array = np.array(vecs, dtype=dtype)
Of course, you probably need ot
Unfortunately, this doesn't solve the problem.
i want to get just the labels or the features from structured array by using:
labels = struc_array['labels']
features = struc_array['features']
But when i use the structured array like you did, labels and also features contains all given appended vectors:
import numpy as np
feature_vec= np.arange(10)
label_vec = np.arange(0,5,0.5)
vecs = [feature_vec,label_vec]
dtype = [('features',np.float32), ('labels',np.float32)]
other_vec = np.arange(6,11,0.5)
vecs.append(other_vec)
dtype.append(('other', np.float32))
struc_array = np.array(vecs, dtype=dtype)
# This contains all vectors.. not just the labels vector
labels = struc_array['labels']
# This also contains all vectors.. not just the feature vector
features = struc_array['features']

Convert two numpy array to dataframe

I want to convert two numpy array to one DataFrame containing two columns.
The first numpy array 'images' is of shape 102, 1024.
The second numpy array 'label' is of shape (1020, )
My core code is:
images=np.array(images)
label=np.array(label)
l=np.array([images,label])
dataset=pd.DataFrame(l)
But it turns out to be an error saying that:
ValueError: could not broadcast input array from shape (1020,1024) into shape (1020)
What should I do to convert these two numpy array into two columns in one dataframe?
You can't stack them easily, especially if you want them as different columns, because you can't insert a 2D array in one column of a DataFrame, so you need to convert it to something else, for example a list.
So something like this would work:
import pandas as pd
import numpy as np
images = np.array(images)
label = np.array(label)
dataset = pd.DataFrame({'label': label, 'images': list(images)}, columns=['label', 'images'])
This will create a DataFrame with 1020 rows and 2 columns, where each item in the second column contains 1D arrays of length 1024.
Coming from engineering, I like the visual side of creating matrices.
matrix_aux = np.vstack([label,images])
matrix = np.transpose(matrix_aux)
df_lab_img = pd.DataFrame(matrix)
Takes a little bit more of code but leaves you with the Numpy array too.
You can also use hstack
import pandas as pd
import numpy as np
dataset = pd.DataFrame(np.hstack((images, label.reshape(-1, 1))))

Categories

Resources