experts, i want to remove a very first row from excel file using python. I am sharing a a screen print of my source excel file
i want out put as
I am using below python code to remove first row from excel but when i am reading it as data frame and printing that i am observing that data in data frame is being read as shown in below screen print
and the code which i am using is
import pandas as pd
import os
def remove_header():
file_name = "AV Clients.xlsx"
os.chmod(file_name, 0o777)
df = pd.read_excel(file_name) #Read Excel file as a DataFrame
#df = df.drop([0])
print(df)
#df.to_excel("AV_Clients1.xlsx", index=False)
remove_header()
Please suggest how i can remove a very first row from excel file whose screen print i have shared at top.
Thanks in advance
Kawaljeet
Just add skiprows argument while reading excel.
import pandas as pd
import os
def remove_header():
file_name = "AV Clients.xlsx"
os.chmod(file_name, 0o777)
df = pd.read_excel(file_name, skiprows = 1) #Read Excel file as a DataFrame
print(df)
df.to_excel("AV_Clients1.xlsx", index=False)
remove_header()
Related
I am trying to insert data into a specific cell in csv. My code is as follows.
The existing file.
Output
The data in cell A1("Custmor") is replaced with new data("Name").
My code is as follows.
import pandas as pd
#The existing CSV file
file_source = r"C:\Users\user\Desktop\Customer.csv"
#Read the existing CSV file
df = pd.read_csv(file_source)
#Insert"Name"into cell A1 to replace "Customer"
df[1][0]="Name"
#Save the file
df.to_csv(file_source, index=False)
And it doesn't work. Please help me finding the bug.
Customer is column header, you need do
df = df.rename(columns={'Customer': 'Name'})
I am assuming you are going to want to work with header less csv so if that's the case, your code is already correct, just need to add header=None while reading from csv
import pandas as pd
#The existing CSV file
file_source = r"C:\Users\user\Desktop\Customer.csv"
#Read the existing CSV file
df = pd.read_csv(file_source,header=None) #notice this line is now different
#Insert"Name"into cell A1 to replace "Customer"
df[1][0]="Name"
#Save the file
df.to_csv(file_source, index=False,header=None) #made this header less too
I am having below file(file1.xlsx) as input. In total i am having 32 columns in this file and almost 2500 rows. Just for example i am mentioning 5 columns in screen print
I want to edit same file with python and want output as (file1.xlsx)
it should be noted i am adding one column named as short and data is a kind of substring upto first decimal of data present in name(A) column of same excel.
Request you to please help
Regards
Kawaljeet
Here is what you need...
import pandas as pd
file_name = "file1.xlsx"
df = pd.read_excel(file_name) #Read Excel file as a DataFrame
df['short'] = df['Name'].str.split(".")[0]
df.to_excel("file1.xlsx")
hello guys i solved the problem with below code:
import pandas as pd
import os
def add_column():
file_name = "cmdb_inuse.xlsx"
os.chmod(file_name, 0o777)
df = pd.read_excel(file_name,) #Read Excel file as a DataFrame
df['short'] = [x.split(".")[0] for x in df['Name']]
df.to_excel("cmdb_inuse.xlsx", index=False)
enter image description here I have file named Example.xls in which i have data in tab sales and purchase.
We have data in both tab from Column A to E.
When i import these data through pandas module, i want that result like Column A to F where column F should display the name sheet name. How to display the name of sheet name in pandas module?
I am using code
all= pd.read_excel(Example.xlsx',sheet_name=['Sales','Purchas'])
enter image description here
and then
df= pd.concat(All[frame]for fram in All.keys())
and then after i want to put the name of tabs in my data frame "All" in the last column which is F respectively
I think this is the simplest way.
import pandas as pd
path = r'path_of_your_file'
workbook = pd.read_excel(path, sheet_name = None)
df= pd.DataFrame()
for sheet_name, sheet in workbook.items():
sheet['sheet'] = sheet_name
df = df.append(sheet)
# Reset your index or you'll have duplicates
df = df.reset_index(drop=True)
The below code will solve your problem:
import os
from glob import glob
import pandas as pd
f_mask = r'path\*.xlsx' ## The folder path where your Example.xlsx is stored
df = \
pd.concat([df.assign(file=os.path.splitext(os.path.basename(f))[0],
sheet=sheet)
for f in glob(f_mask)
for sheet, df in pd.read_excel(f, sheet_name=None).items()],
ignore_index=True)
The code works in following way:
Check the base folder and take all the .xlsx files in it
Read the files one by one
Make two additional columns, one for file name other for sheet name
This solution will work if you want to do the exercise for more than 1 .xlsx file
I have around 100 excel files in a folder. I need to extract a cell, say name column D6 from the sheet-1 of the excel files and output the same to a new excel file/sheet. I have a followed a few SO questions but have not been able to find the desired output. This is what my issue is when I run the below program`
TypeError: cannot concatenate a non-NDFrame object
`
import os
import pandas as pd
import xlrd
import xlwt
files = os.listdir(path)
files
all_data = pd.DataFrame()
for file in files:
wb = xlrd.open_workbook(file)
sheet = wb.sheet_by_index(0)
df = sheet.cell_value(5,3)
all_data.append(df,ignore_index=True)
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
all_data.to_excel(writer,'sheet1')
writer.save()
Your error says that you can only concat a dataframe with another dataframe. when you read the cell with xlrd you don't get a df-object. so either make the single cell a dataframe or store it temorarily and make the dataframe afterwards.
something like this (untested) should do it.
all_data = [] # list
for file in files:
df = pd.read_excel(file, sheetname='sheet-1')
all_data.append(df.iloc[5,3])
all_data = pd.DataFrame(all_data) # dataframe
all_data.to_excel('all_data.xlsx')
or one could use other libraries as well to make the same, like openpyxl for example.
I am still learning python. I am trying to import multiple workbooks and all the worksheets into one data frame.
Here is what I have so far:
import pandas as pd
import numpy as np
import os #checking the working directory
print(os.getcwd())
all_data = pd.DataFrame() #creating an empty data frame
for file in glob.glob("*.xls"): #import every file that ends in .xls
df = pd.read_excel(file)
all_data = all_data.append(df, ignore_index = True)
all_data.shape #12796 rows with 19 columns # we will have to find a way to check if this is accurate
I am having real trouble finding any documentation that will confirm/explain whether or not this code imports all the data sheets in every workbook. Some of these files have 15-20 sheets
Here is a link to where I found the glob explanation: http://pbpython.com/excel-file-combine.html
Any and all advice is greatly appreciated. I am still really new to R and Python so if you could explain this in as much detail as possible I would greatly appreciate it!
What you are missing is importing all the sheets in the workbook.
import pandas as pd
import numpy as np
import os #checking the working directory
print(os.getcwd())
all_data = pd.DataFrame() #creating an empty data frame
rows = 0
for file in glob.glob("*.xls"): #import every file that ends in .xls
# df = pd.read_excel(file).. This will import only first sheet
xls = pd.ExcelFile(file)
sheets = xls.sheet_names # To get names of all the sheets
for sheet_name in sheets:
df = pd.read_excel(file, sheetname=sheet_name)
rows += df.shape[0]
all_data = all_data.append(df, ignore_index = True)
print(all_data.shape[0]) # Now you will get all the rows which should be equal to rows
print(rows)