I have this table which i export to CSV Using this code:
df['time'] = df['time'].astype("datetime64").dt.date
df = df.set_index("time")
df = df.groupby(df.index).agg(['min', 'max', 'mean'])
df = df.reset_index()
df = df.to_csv(r'C:\****\Exports\exportMMA.csv', index=False)
While exporting this, my result is:
column 1
column 2
column 3
time
BufTF2
BufTF3
12/12/2022
10
150
I want to get rid of column 1,2,3 and replace the header with BufFT2 and BufFT3
Tried this :
new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header #set the header row as the df header
And This :
df.columns = df.iloc[0]
df = df[1:]
Somehow it wont work, I not realy in need to replace the headers in the dataframe having the right headers in csv is more important.
Thanks!
You can try rename:
df = df.rename(columns=df.iloc[0]).drop(df.index[0])
when loading the input file you can specify which row to use as the header
pd.read_csv(inputfile,header=1) # this will use the 2nd row in the file as column titles
Related
I'm only allowed 4 lines of code to complete 6 objectives.
df = pd.DataFrame(columns = ['Name', 'Year 1','Year 2'])
df = df.rename(columns = {'Name':'Project', 'Year 1)':'y23','Year 2':'y24'})
df.to_csv('data/df.csv')
df2 = pd.read_csv('data/df.csv')
print(df2)
I've got to get this down to 4 lines, whilst also indexing a column. Basically, create a df, with the first 3 columns, then append the 3 columns. Then save it as a csv, and then read it with an index column and then print it.
Update:
df = pd.DataFrame(columns = ['Name', 'Year 1','Year 2']).rename(columns = {'Name':'Project', 'Year 1)':'y23','Year
2':'y24'}).to_csv('data/df.csv')
df2 = pd.read_csv('data/df.csv')
print(df2)
Solved
a few of your '.' functions can be joined onto the previous line, for example:
df = pd.DataFrame(data)
df = df.function()
can be written as:
df = pd.DataFrame(data).function()
other than some exceptions including for the to_csv or to_excel, where:
df = pd.DataFrame(data)
df.to_csv('file.csv')
can be written as:
pd.DataFrame(data).to_csv('file.csv')
(Answer deliberately vague as homework question).
I have the following df, with the row 0 being the header:
teacher,grade,subject
black,a,english
grayson,b,math
yodd,a,science
What is the best way to use export_csv in python to save each row to a csv so that the files are named:
black.csv
grayson.csv
yodd.csv
Contents of black.csv will be:
teacher,grade,subject
black,a,english
Thanks in advance!
Updated Code:
df8['CaseNumber'] = df8['CaseNumber'].map(str)
df8.set_index('CaseNumber', inplace=True)
for Casenumber, data in df8.iterrows():
data.to_csv('c:\\users\\admin\\' + Casenumber + '.csv')'''
This can be done simply by using pandas:
import pandas as pd
# Preempt the issue of columns being numeric by marking dtype=str
df = pd.read_csv('your_data.csv', header=1, dtype=str)
df.set_index('teacher', inplace=True)
for teacher, data in df.iterrows():
data.to_csv(teacher + '.csv')
Edits:
df8.set_index('CaseNumber', inplace=True)
for Casenumber, data in df8.iterrows():
# Use r and f strings to make your life easier:
data.to_csv(rf'c:\users\admin\{Casenumber}.csv')
I have multiple dataframes that look like this, the data is irrelevant.
I want it to look like this, i want to insert a title above the column headers.
I want to combine them into multiple tabs in an excel file.
Is it possible to add another row above the column headers and insert a Title into the first cell before saving the file to excel.
I am currently doing it like this.
with pd.ExcelWriter('merged_file.xlsx',engine='xlsxwriter') as writer:
for filename in os.listdir(directory):
if filename.endswith('xlsx'):
print(filename)
if 'brands' in filename:
some function
elif 'share' in filename:
somefunction
else:
some function
df.to_excel(writer,sheet_name=f'{filename[:-5]}',index=True,index_label=True)
writer.close()
But the sheet_name is too long, that's why I want to add the title above the column headers.
I tried this code,
columns = df.columns
columns = list(zip([f'{filename[:-5]}'] * len(df.columns), columns))
columns = pd.MultiIndex.from_tuples(columns)
df2 = pd.DataFrame(df,index=df.index,columns=columns)
df2.to_excel(writer,sheet_name=f'{filename[0:3]}',index=True,index_label=True)
But it ends up looking like this with all the data gone,
It should look like this
You can write data from sedond row first and then write to first cell your text:
df = pd.DataFrame({'col': list('abc'), 'col1': list('def')})
print (df)
col col1
0 a d
1 b e
2 c f
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', startrow = 1, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
text = 'sometitle'
worksheet.write(0, 0, text)
writer.save()
Then for reading need:
title = pd.read_excel('test.xlsx', nrows=0).columns[0]
print (title)
sometitle
df = pd.read_excel('test.xlsx', skiprows=1)
print (df)
col col1
0 a d
1 b e
2 c f
You can use MultiIndex. There is an example:
import pandas as pd
df = pd.read_excel('data.xls')
header = pd.MultiIndex.from_product([['Title'],
list(df.columns)])
pd.DataFrame(df.to_numpy(), None , columns = header)
Also, I can share with you my solution with real data in Deepnote (my favorite tool). Feel free to duplicate and play with your own .xls:
https://deepnote.com/publish/3cfd4171-58e8-48fd-af21-930347e8e713
I am reading xlsx file like this
df = pd.read_excel('cleaned_data.xlsx', header=0)
df = df.drop(df.columns[0], axis=1)
df.head()
Problem is column names coming as first row of data.
# reading data from csv file
df = pd.read_excel('cleaned_data.xlsx', header=0)
#df = df.drop(df.columns[0], axis=1)
df = df.drop(0, inplace=True)
df.head()
I tried this way but still not luck. Any suggestion?
One idea is use header=1:
df = pd.read_excel('cleaned_data.xlsx', header=1)
Another is skip first row:
df = pd.read_excel('cleaned_data.xlsx', skiprows=1)
Here is my function for reading a huge CSV file chunk by chunk and writing it back in the same manner, in chunks.
what i want to do is:
Skip reading the first row which is header row but keep it and add it later as headers or header row. I saw a piece of code on stack-overflow which probably extract the header but i don't know how to add it back when writing the data using to_sql
def csv_to_sqlite(input_file_name, output_db, output_db_table_name, size_of_chunk):
number_of_lines = sum(1 for row in (open(input_file_name)))
for eachRow in range(0, number_of_lines, size_of_chunk):
df = pd.read_csv(input_file_name,
header=None,
nrows=size_of_chunk,
skiprows=eachRow,
low_memory=False,
error_bad_lines=False)
# new_header = df.iloc[0]
# df = df[1:]
# df.columns = new_header
df = df.drop_duplicates(keep='last')
df = df.apply(lambda x: x.astype(str).str.lower())
df.to_sql(output_db_table_name, output_db, if_exists='append',
index=False,
chunksize=size_of_chunk)
There is no need to reinvent the wheel - use 'chunksize' parameter:
for df in pd.read_csv(filename, ..., chunksize=size_of_chunk):
#process chunk here