pandas dataframe extract spectrograms - python

I have a pandas dataframe that has 8528 rows × 18287 columns. Each row has a label. How can I extract 8528 spectrograms from each row and save these spectrograms in a variable in order to feed them into an Alexnet for training process?


Keras preprocessing trading data

I have a problem with preprocessing my trading data from .csv so that it fits into sgd model neural network input/output.
I have imported the data using pandas lib but maybe theres a better way to do it?
I need to set column names, data inside needs to be double type, and convert it into
I have 2 data sets: testingdata.csv and trainingdata.csv
each have 4 columns: Open, max, min, close
'Open' column is a forecasting value Y, while 'max', 'min' and 'close' are X Inputs.
inside my .csv file
Also i have no idea what is 'metric' in keras and what metric should i use here
So my questions: what is the best way to do it and how to do it.
using pd.read_csv is good way to import .csv files
import pandas as pd ​
​df= pd.read_csv('data.csv')
but you need to change column names to custom names so you can do this:
df = pd.read_csv('data.csv',
names=["open", "max","min","close"],
you can see the imported .csv file head in dataframe:
if you want Converting from Pandas dataframe to TensorFlow Dataset:
import tensorflow as tf
target = df.pop('Open')
dataset =, target.values))

dividing big dataset python

My dataset Features shape is (80102, 2592) and label.shape (80102, 2). I want to consider only few rows for traning as it is taking lot of time for training the CNN model. How can I divide the dataset in python and consider only few rows for traning and tesing both.
If your data is in the form of arrays let X be the array containing the data and y be the array containing the labels. You can use sklearn train_test_split function to create new samples of the data per the code below
from sklearn.model_selection import train_test_split
percent=.1 specify the percentof data you want to use, in this case 10%
X_data, X_dummy, y_labels, y_dummy=train_test_split(X,y,train_size=percent,randon_state=123, shuffle=True)
X_data will contain 10% of the original data and will be shuffled
y_labels will contain 10% of the corresponding labels.
If you want to specifically set the number of samples set train_size to an integer value. If you need further information the documentation is located here. If you data is a pandas dataframe you can use the pandas function pandas.DataFrame.sample..Documentation for that is here.. Assume your data frame is called data. The code below will produce a new data frame with a specified percent of the original rows
percent=.1, frac=percent, replace=False, weights=None, random_state=123, axis=0)

Use multiple csv files as test and training set for CNN

I am writing a code to detect minerals from their Raman Spectra data using CNN. I have data (RRUFF dataset) for different minerals written into different csv/text files each consisting of 2 columns: Intensity and corresponding Raman Shift value of the mineral.
How should I use these multiple files for training and testing my CNN?
Can I use flow_from_directory directly for csv files under Train and Test folders?
Total csv/txt files in dataset: 3696
def merge_data(csv_files, columns, output_file):
df = pandas.DataFrame(columns=columns)
for file in csv_files:
df = df.append(pandas.read_csv(file), sort=False)
return df
Now, call the function df = merge_data(['file1.csv', file2.csv], ['column1', 'column2'], 'all_data.csv')
Then, split the merged data into train and test set using train_test_split from sklearn.model_selection

How can i put the data in dataframe after use imputer?

I have some code that help me to predic tsome missing values.This is the code
from datawig import SimpleImputer
from datawig.utils import random_split
from sklearn.metrics import f1_score, classification_report
df_train, df_test = random_split(df, split_ratios=[0.8, 0.2])
# Initialize a SimpleImputer model
imputer = SimpleImputer(
input_columns=['SITUACION_DNI_A'], # columns containing information about
the column we want to impute
output_column='EXTRANJERO_A', # the column we'd like to impute values for
output_path='imputer_model' # stores model data and metrics
# Fit an imputer model on the train data, num_epochs=10)
# Impute missing values and return original dataframe with predictions
predictions = imputer.predict(df_test)
After that i get a new dataframe with less rows than the original, how can i insert the values that i get in the prediction into my original dataframe, or there's is a way to run the code with all my dataframe and not the test
If both the dataframe have a unique column or something that can act like an ID, then this method will work
df_test = df_test.set_index('unique_col')
If the above method does not work, then drop the rows with that missing values and append the imputer predictions to the dataframe. look the following links for help
Delete rows if there are null values in a specific column in Pandas dataframe

How to one hot encode with pandas on a new dataset?

I have a training data set that has categorical features on which I use pd.get_dummies to one hot encode. This produces a data set with n features. I then train a classification model on this data set with n features. If I now get some new data with the same categorical features and again perform one hot encoding, the resultant number of features is m < n.
I cannot predict the classes of the new data set if the dimensions don't match with the original training data.
Is there a way to include all of the original n features in the new data set after one hot encoding?
EDIT: I am using sklearn.ensemble.RandomForestClassifier as my classification library.
For example ,
You have tradf with column ['A_1','A_2']
With your new df you have column['A'] but only have one category 1 , you can do

