My python code produces a pandas dataframe that looks as follows:
enter image description here
I need to transform it to another format to achieve following: loop through every row in the dataframe and output as many data frames as rows in the table. Each dataframe should have a additional column: timestamp and be named as the value in "Type" Column. So for instance I'd have
enter image description here
I am struggling with where to start- I hope someone here can advise me?
Here is a code for what you want to achieve. It takes a csv file like yours. Loops through the rows. Adds a column with current time and saves each row in a separate csv. Let me know if it works for you.
import pandas as pd
from datetime import datetime
#Give path to your csv
df = pd.read_csv('C:/Users/username/Downloads/test.csv')
#iterating on rows in dataframe
for index, row in df.iterrows():
#adding a new columns with value in the row
df.loc[index, 'Timestamp'] = datetime.now().strftime('%c')
print(df.loc[index])
#saving row in a new dataframe
df_new = df.loc[index].to_frame().T
#saving the dataframe in a separate csv
df_new.to_csv(f'C:/Users/username/Downloads/test_{index}.csv', index= False)
Pandas' .to_records? is your friend (doc here.)
from datetime import datetime
list_of_final_dataframes = []
for record in df.to_dict(orient='records'):
record_with_timestamp = {**record, **{'timestamp': datetime.now()}}
list_of_final_dataframes.append(pd.DataFrame([record_with_timestamp]))
Related
I saw this code
combine rows and add up value in dataframe,
but I want to add the values in cells for the same day, i.e. add all data for a day. how do I modify the code to achieve this?
Check below code:
import pandas as pd
df = pd.DataFrame({'Price':[10000,10000,10000,10000,10000,10000],
'Time':['2012.05','2012.05','2012.05','2012.06','2012.06','2012.07'],
'Type':['Q','T','Q','T','T','Q'],
'Volume':[10,20,10,20,30,10]
})
df.assign(daily_volume = df.groupby('Time')['Volume'].transform('sum'))
Output:
In one of the code snippet, the authors provide the input as:
variants = [ 'rs425277', 'rs1571149', 'rs1240707', 'rs1240708', 'rs873927', 'rs880051', 'rs1878745', 'rs2296716', 'rs2298217', 'rs2459994' ]
However I have similar values as one of the column in csv format. I would like to know how I can supply one of the column as input similar to above example?
Thanks in advance
First, import your csv as a Pandas df.
df = pd.read_csv('data.csv')
Then, you can get a list from pandas dataframe column:
col_one_list = df['column_one'].tolist()
I have a csv file with a wrong first row data. The names of labels are in the row number 2. So when I am storing this file to the DataFrame the names of labels are incorrect. And correct names become values of the row 0. Is there any function similar to reset_index() but for columns? PS I can not change csv file. Here is an image for better understanding. DataFrame with wrong labels
Hello let's suppose you csv file is data.csv :
Try this code:
import pandas as pd
#reading the csv file
df = pd.read_csv('data.csv')
#changing the headers name to integers
df.columns = range(df.shape[1])
#saving the data in another csv file
df.to_csv('data_without_header.csv',header=None,index=False)
#reading the new csv file
new_df = pd.read_csv('data_without_header.csv')
#plotting the new data
new_df.head()
If you do not care about the rows preceding your column names, you can pass in the "header" argument with the value of the correct row, for example if the proper column names are in row 2:
df = pd.read_csv('my_csv.csv', header=2)
Keep in mind that this will erase the previous rows from the DataFrame. If you still want to keep them, you can do the following thing:
df = pd.read_csv('my_csv.csv')
df.columns = df.iloc[2, :] # replace columns with values in row 2
Cheers.
I am trying to create a dataframe where the column lengths are not equal. How can I do this?
I was trying to use groupby. But I think this will not be the right way.
import pandas as pd
data = {'filename':['file1','file1'], 'variables':['a','b']}
df = pd.DataFrame(data)
grouped = df.groupby('filename')
print(grouped.get_group('file1'))
Above is my sample code. The output of which is:
What can I do to just have one entry of 'file1' under 'filename'?
Eventually I need to write this to a csv file.
Thank you
If you only have one entry in a column the other will be NaN. So you could just filter the NaNs by doing something like df = df.at[df["filename"].notnull()]
Hello I have xlsx files and merged them into one dataframe by using pandas. It worked but instead of getting back the column names that I had in the xlsx file I got numbers as columns instead and the column titles became a row: Like this:
Output: 1 2 3
COLTITLE1 COLTITLE2 COLTITLE3
When they should be like this:
Output: COLTITLE1 COLTITLE2 COLTITLE3
The column titles are not column titles but rather they have become a row. How can I get back the rightful column names that I had within the xlsx file. Just for clarity all the column names are the same within both the xlsx files. Help would be appreciated heres my code below:
# import modules
from IPython.display import display
import pandas as pd
import numpy as np
pd.set_option("display.max_rows", 999)
pd.set_option('max_colwidth',100)
%matplotlib inline
# filenames
file_names = ["data/OrderReport.xlsx", "data/OrderReport2.xlsx"]
# read them in
excels = [pd.ExcelFile(name) for name in file_names]
# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]
# concatenate them
atlantic_data = pd.concat(frames)
# write it out
combined.to_excel("c.xlsx", header=False, index=False)
I hope I understood your question correctly. You just need to get rid of the index_col=None and it will return the column name as usual:
frames = [x.parse(x.sheet_names[0], header=None) for x in excels]
If you add index_col=None pandas will treat your column name as 1 row of data rather than a column for the dataframe.