pandas dataframe extract spectrograms - python

I have a pandas dataframe that has 8528 rows × 18287 columns. Each row has a label. How can I extract 8528 spectrograms from each row and save these spectrograms in a variable in order to feed them into an Alexnet for training process?

Related

Keras preprocessing trading data

I have a problem with preprocessing my trading data from .csv so that it fits into sgd model neural network input/output.
I have imported the data using pandas lib but maybe theres a better way to do it?
I need to set column names, data inside needs to be double type, and convert it into tf.data.Dataset.
I have 2 data sets: testingdata.csv and trainingdata.csv
each have 4 columns: Open, max, min, close
'Open' column is a forecasting value Y, while 'max', 'min' and 'close' are X Inputs.
inside my .csv file
Also i have no idea what is 'metric' in keras and what metric should i use here
So my questions: what is the best way to do it and how to do it.
Thanks
using pd.read_csv is good way to import .csv files
import pandas as pd ​
​df= pd.read_csv('data.csv')
but you need to change column names to custom names so you can do this:
df = pd.read_csv('data.csv',
header=None,
names=["open", "max","min","close"],
encoding='utf-16')
you can see the imported .csv file head in dataframe:
df.head(5)
if you want Converting from Pandas dataframe to TensorFlow Dataset:
import tensorflow as tf
target = df.pop('Open')
dataset = tf.data.Dataset.from_tensor_slices((df.values, target.values))

dividing big dataset python

My dataset Features shape is (80102, 2592) and label.shape (80102, 2). I want to consider only few rows for traning as it is taking lot of time for training the CNN model. How can I divide the dataset in python and consider only few rows for traning and tesing both.
If your data is in the form of arrays let X be the array containing the data and y be the array containing the labels. You can use sklearn train_test_split function to create new samples of the data per the code below
from sklearn.model_selection import train_test_split
percent=.1 specify the percentof data you want to use, in this case 10%
X_data, X_dummy, y_labels, y_dummy=train_test_split(X,y,train_size=percent,randon_state=123, shuffle=True)
X_data will contain 10% of the original data and will be shuffled
y_labels will contain 10% of the corresponding labels.
If you want to specifically set the number of samples set train_size to an integer value. If you need further information the documentation is located here. If you data is a pandas dataframe you can use the pandas function pandas.DataFrame.sample..Documentation for that is here.. Assume your data frame is called data. The code below will produce a new data frame with a specified percent of the original rows
percent=.1
new_data=pandas.data.sample(n=None, frac=percent, replace=False, weights=None, random_state=123, axis=0)

Use multiple csv files as test and training set for CNN

I am writing a code to detect minerals from their Raman Spectra data using CNN. I have data (RRUFF dataset) for different minerals written into different csv/text files each consisting of 2 columns: Intensity and corresponding Raman Shift value of the mineral.
How should I use these multiple files for training and testing my CNN?
Can I use flow_from_directory directly for csv files under Train and Test folders?
Total csv/txt files in dataset: 3696
def merge_data(csv_files, columns, output_file):
df = pandas.DataFrame(columns=columns)
for file in csv_files:
df = df.append(pandas.read_csv(file), sort=False)
return df
Now, call the function df = merge_data(['file1.csv', file2.csv], ['column1', 'column2'], 'all_data.csv')
Then, split the merged data into train and test set using train_test_split from sklearn.model_selection

How can i put the data in dataframe after use imputer?

I have some code that help me to predic tsome missing values.This is the code
from datawig import SimpleImputer
from datawig.utils import random_split
from sklearn.metrics import f1_score, classification_report
df_train, df_test = random_split(df, split_ratios=[0.8, 0.2])
# Initialize a SimpleImputer model
imputer = SimpleImputer(
input_columns=['SITUACION_DNI_A'], # columns containing information about
the column we want to impute
output_column='EXTRANJERO_A', # the column we'd like to impute values for
output_path='imputer_model' # stores model data and metrics
)
# Fit an imputer model on the train data
imputer.fit(train_df=df_train, num_epochs=10)
# Impute missing values and return original dataframe with predictions
predictions = imputer.predict(df_test)
After that i get a new dataframe with less rows than the original, how can i insert the values that i get in the prediction into my original dataframe, or there's is a way to run the code with all my dataframe and not the test
If both the dataframe have a unique column or something that can act like an ID, then this method will work
df_test = df_test.set_index('unique_col')
df_test.fillna(predictions.set_index('unique_col'))
If the above method does not work, then drop the rows with that missing values and append the imputer predictions to the dataframe. look the following links for help
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html
Delete rows if there are null values in a specific column in Pandas dataframe

How to one hot encode with pandas on a new dataset?

I have a training data set that has categorical features on which I use pd.get_dummies to one hot encode. This produces a data set with n features. I then train a classification model on this data set with n features. If I now get some new data with the same categorical features and again perform one hot encoding, the resultant number of features is m < n.
I cannot predict the classes of the new data set if the dimensions don't match with the original training data.
Is there a way to include all of the original n features in the new data set after one hot encoding?
EDIT: I am using sklearn.ensemble.RandomForestClassifier as my classification library.
For example ,
You have tradf with column ['A_1','A_2']
With your new df you have column['A'] but only have one category 1 , you can do
pd.get_dummies(df).reindex(columns=tradf.columns,fill_value=0)

Categories

Resources