Pandas excel file reading gives first column name as unnamed - python

I am reading xlsx file like this
df = pd.read_excel('cleaned_data.xlsx', header=0)
df = df.drop(df.columns[0], axis=1)
df.head()
Problem is column names coming as first row of data.
# reading data from csv file
df = pd.read_excel('cleaned_data.xlsx', header=0)
#df = df.drop(df.columns[0], axis=1)
df = df.drop(0, inplace=True)
df.head()
I tried this way but still not luck. Any suggestion?

One idea is use header=1:
df = pd.read_excel('cleaned_data.xlsx', header=1)
Another is skip first row:
df = pd.read_excel('cleaned_data.xlsx', skiprows=1)

Related

issue with index on a saved DataFrame imported with to_csv() function

hi i have create a DataFrame with pandas by a csv in this way
elementi = pd.read_csv('elementi.csv')
df = pd.DataFrame(elementi)
lst= []
lst2=[]
for x in df['elementi']:
a = x.split(";")
lst.append(a[0])
lst2.append(a[1])
ipo_oso = np.random.randint(0,3,76)
oso = np.random.randint(3,5,76)
ico = np.random.randint(5,6,76)
per_ico = np.random.randint(6,7,76)
df = pd.DataFrame(lst,index=lst2,columns=['elementi'])
# drop the element i don't use in the periodic table
df = df.drop(df[103:117].index)
df = df.drop(df[90:104].index)
df = df.drop(df[58:72].index)
df.head()
df['ipo_oso'] = ipo_oso
df['oso'] = oso
df['ico'] = ico
df['per_ico'] = per_ico
df.to_csv('period_table')
df.head()
and looks like this
when i save this table with to_csv() and import it in another project with read_csv() the index of table is considered as a column but is the index
e= pd.read_csv('period_table')
e.head()
or
e= pd.read_csv('period_table')
df =pd.DataFrame(e)
df.head()
how can i fix that :)
Just use index_col=0 as parameter of read_csv:
df = pd.read_csv('period_table', index_col=0)
df.head()

Pandas CSV Move first row to header row

I have this table which i export to CSV Using this code:
df['time'] = df['time'].astype("datetime64").dt.date
df = df.set_index("time")
df = df.groupby(df.index).agg(['min', 'max', 'mean'])
df = df.reset_index()
df = df.to_csv(r'C:\****\Exports\exportMMA.csv', index=False)
While exporting this, my result is:
column 1
column 2
column 3
time
BufTF2
BufTF3
12/12/2022
10
150
I want to get rid of column 1,2,3 and replace the header with BufFT2 and BufFT3
Tried this :
new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header #set the header row as the df header
And This :
df.columns = df.iloc[0]
df = df[1:]
Somehow it wont work, I not realy in need to replace the headers in the dataframe having the right headers in csv is more important.
Thanks!
You can try rename:
df = df.rename(columns=df.iloc[0]).drop(df.index[0])
when loading the input file you can specify which row to use as the header
pd.read_csv(inputfile,header=1) # this will use the 2nd row in the file as column titles

How would I go about renaming a column name in pandas, in the same line that I made a column

I'm only allowed 4 lines of code to complete 6 objectives.
df = pd.DataFrame(columns = ['Name', 'Year 1','Year 2'])
df = df.rename(columns = {'Name':'Project', 'Year 1)':'y23','Year 2':'y24'})
df.to_csv('data/df.csv')
df2 = pd.read_csv('data/df.csv')
print(df2)
I've got to get this down to 4 lines, whilst also indexing a column. Basically, create a df, with the first 3 columns, then append the 3 columns. Then save it as a csv, and then read it with an index column and then print it.
Update:
df = pd.DataFrame(columns = ['Name', 'Year 1','Year 2']).rename(columns = {'Name':'Project', 'Year 1)':'y23','Year
2':'y24'}).to_csv('data/df.csv')
df2 = pd.read_csv('data/df.csv')
print(df2)
Solved
a few of your '.' functions can be joined onto the previous line, for example:
df = pd.DataFrame(data)
df = df.function()
can be written as:
df = pd.DataFrame(data).function()
other than some exceptions including for the to_csv or to_excel, where:
df = pd.DataFrame(data)
df.to_csv('file.csv')
can be written as:
pd.DataFrame(data).to_csv('file.csv')
(Answer deliberately vague as homework question).

How to save each row to csv in dataframe AND name the file based on the the first column in each row

I have the following df, with the row 0 being the header:
teacher,grade,subject
black,a,english
grayson,b,math
yodd,a,science
What is the best way to use export_csv in python to save each row to a csv so that the files are named:
black.csv
grayson.csv
yodd.csv
Contents of black.csv will be:
teacher,grade,subject
black,a,english
Thanks in advance!
Updated Code:
df8['CaseNumber'] = df8['CaseNumber'].map(str)
df8.set_index('CaseNumber', inplace=True)
for Casenumber, data in df8.iterrows():
data.to_csv('c:\\users\\admin\\' + Casenumber + '.csv')'''
This can be done simply by using pandas:
import pandas as pd
# Preempt the issue of columns being numeric by marking dtype=str
df = pd.read_csv('your_data.csv', header=1, dtype=str)
df.set_index('teacher', inplace=True)
for teacher, data in df.iterrows():
data.to_csv(teacher + '.csv')
Edits:
df8.set_index('CaseNumber', inplace=True)
for Casenumber, data in df8.iterrows():
# Use r and f strings to make your life easier:
data.to_csv(rf'c:\users\admin\{Casenumber}.csv')

How do use python to iterate through a directory and delete specific columns from all csvs?

I have a directory with several csvs.
files = glob('C:/Users/jj/Desktop/Bulk_Wav/*.csv')
Each csv has the same below columns. Reprex below-
yes no maybe ofcourse
1 2 3 4
I want my script to iterate through all csvs in the folder and delete the columns maybe and ofcourse.
If glob provides you with file paths, you can do the following with pandas:
import pandas as pd
files = glob('C:/Users/jj/Desktop/Bulk_Wav/*.csv')
drop = ['maybe ', 'ofcourse']
for file in files:
df = pd.read_csv(file)
for col in drop:
if col in df:
df = df.drop(col, axis=1)
df.to_csv(file)
Alternatively if you want a cleaner way to not get KeyErrors from drop you can do this:
import pandas as pd
files = glob('C:/Users/jj/Desktop/Bulk_Wav/*.csv')
drop = ['maybe ', 'ofcourse']
for file in files:
df = pd.read_csv(file)
df = df.drop([c for c in drop if c in df], axis=1)
df.to_csv(file)
Do you mean by:
files = glob('C:/Users/jj/Desktop/Bulk_Wav/*.csv')
for filename in files:
df = pd.read_csv(filename)
df = df.drop(['maybe ', 'ofcourse'], axis=1)
df.to_csv(filename)
This code will remove the maybe and ofcourse columns and save it back to the csv.
You can use panda to read csv file to a dataframe then use drop() to drop specific columns. something like below:
df = pd.read_csv(csv_filename)
df.drop(['maybe', 'ofcourse'], axis=1)
import pandas as pd
from glob import glob
files = glob(r'C:/Users/jj/Desktop/Bulk_Wav/*.csv')
for filename in files:
df = pd.read_csv(filename, sep='\t')
df.drop(['maybe', 'ofcourse'], axis=1, inplace=True)
df.to_csv(filename, sep='\t', index=False)
If the files look exactly like what you have there, then maybe something like this

Categories

Resources