I have a numpy array that I would like to concatenate the columns into a single value for the row. Below is what I have tried so far.
import numpy as np
randoma=np.random.choice(list('ACTG'),(5,21),replace=True)# create a 7x21 raqndom matrix with A,C,T,G
randoma=np.concatenate(randoma, axis=None)
expected results is something like
randoma = ['AAGCCGCACACAGACCCTGAG',
'AAGCTGCACGCAGACCCTGAG',
'AGGCTGCACGCAGACCCTGAG',
'AAGCTGCACGTGGACCCTGAG',
'AGGCTGCACGTGGACCCTGAG',
'AGGCTGCACGTGGACCCTGAG',
'AAGCTGCATGTGGACCCTGAG']
import numpy as np
randoma = np.random.choice(list('ACTG'),(5,21),replace=True) # create a 7x21 raqndom matrix with A,C,T,G
new_list = [''.join(x) for x in randoma.tolist()]
new_list
['CGGGACGCACTTCCTGTGCAG',
'TGTAGCGGCTTGGTGTCCAAG',
'GAAAGTTTAGGATTGCGTCGG',
'AGTATTGTGATTCTATCTGAC',
'TTAGTAAGAGTGTCTCACTAT']
Related
I used the following code to sum all the rows in a 2D matrix but I want to sum all the columns instead:
row_sum = sum(map(sum,[arr]))
You can try the code below:
import numpy as np
arr: <2D matrix>
col_sum = np.sum(arr, axis=1, keepdims=True)
Goal: I am working with RNNs in PyTorch, and my data is given by a list of DataFrames, where each DataFrame means one observation like:
import numpy as np
data = [pd.DataFrame(np.zeros((5,50))) for x in range(100)]
which means 100 observation, with 50 parameters and 5 timesteps each. For my Model i need a tensor of shape (100,5,50).
Issue: I tried a lot of things but nothing seems to work, does anyone know how this is done?
This approaches doesn't work:
import torch
torch.tensor(np.array(data))
I thing the problem is to convert the DataFrames into Arrays and the List into a Tensor at the same time.
I don't think you can convert the list of dataframes in a single command, but you can convert the list of dataframes into a list of tensors and then concatenate the list.
E.g.
import pandas as pd
import numpy as np
import torch
data = [pd.DataFrame(np.zeros((5,50))) for x in range(100)]
list_of_arrays = [np.array(df) for df in data]
torch.tensor(np.stack(list_of_arrays))
#or
list_of_tensors = [torch.tensor(np.array(df)) for df in data]
torch.stack(list_of_tensors)
it's too late, but if someonelse is still asking this question somewhere... This is for you <3
import torch
import numpy as np
list_of_dataframe : List[pd.DataFrame] #= ....
my_tensor = torch.tensor(np.array(list_of_dataframe))
(python 3.9, numpy 1.20, pytorch 1.10)
I want to convert two numpy array to one DataFrame containing two columns.
The first numpy array 'images' is of shape 102, 1024.
The second numpy array 'label' is of shape (1020, )
My core code is:
images=np.array(images)
label=np.array(label)
l=np.array([images,label])
dataset=pd.DataFrame(l)
But it turns out to be an error saying that:
ValueError: could not broadcast input array from shape (1020,1024) into shape (1020)
What should I do to convert these two numpy array into two columns in one dataframe?
You can't stack them easily, especially if you want them as different columns, because you can't insert a 2D array in one column of a DataFrame, so you need to convert it to something else, for example a list.
So something like this would work:
import pandas as pd
import numpy as np
images = np.array(images)
label = np.array(label)
dataset = pd.DataFrame({'label': label, 'images': list(images)}, columns=['label', 'images'])
This will create a DataFrame with 1020 rows and 2 columns, where each item in the second column contains 1D arrays of length 1024.
Coming from engineering, I like the visual side of creating matrices.
matrix_aux = np.vstack([label,images])
matrix = np.transpose(matrix_aux)
df_lab_img = pd.DataFrame(matrix)
Takes a little bit more of code but leaves you with the Numpy array too.
You can also use hstack
import pandas as pd
import numpy as np
dataset = pd.DataFrame(np.hstack((images, label.reshape(-1, 1))))
I am newbie on python and I loaded a big data from a csv into a pandas dataframe. However, I cannot find a method to create a 2d array for each row of the dataframe where each row of the new np array correspond to X range of values. For example, in my code:
import pandas as pd
import numpy as np
data = pd.read_csv("categorization/dataAll10Overfit.csv",header=None)
#print(data)
rec = data.iloc[:,0:3968] # outputs i rows x 3969 columns
There are 3968 values in each row of the dataframe and I would like to create a 124x32 numpy array so each block of 124 values become a row in the 2d np array. I know C# and there it will work to fill the new array using a for loop but I guess there should be a one-line function in python to split all the data of the dataframe's arrow into a new np array. If this question is duplicated, please refer me to the other post. Thanks in advance
If you want all 2D arrays within one 3D array you can do:
arr = np.zeros((data.shape[0], 124, 32))
for idx, row in data.iterrows():
arr[idx] = np.asarray(row).reshape(124, 32)
Or as a one-liner list of arrays:
arr = [np.asarray(row).reshape(124, 32) for idx, row in data.iterrows()]
I assume you don't want to replace the array in place.
nested_record = pd.DataFrame(columns=['record'], index=range(3968))
for i in range(3968):
nested_record['records'].iloc[i] = data.iloc[i].reshape(124, 32)
I have a numpy array vector, and I want to get a subset based on the indexes:
import numpy as np
input=np.array([1,2,3,4,5,6,7,8,9,10])
index=np.array([0,1,0,0,0,0,1,0,0,1])
what is a pythonic way to get out output=[2,7,10]?
output = input[index.astype(np.bool)]
or
output = input[np.where(index)[0]]