Can anyone know that how to delete same single column from multiple xlsx sheet using python?
After that those sheets save it to same path.
First of all you should create a list with the Excel workbooks and save in a variable called "files":
dfs = []
for file in files:
df = pd.read_excel(file, sheet_name = None)
dfs.append(df)
col_to_delete = ['Your columns']
for df in dfs:
#Here you delete one by one
df.drop(labels = col_to_delete, inplace = True)
After that, I don't know if you want to merge all the .xlsx or overwrite, but this is another question,
look at documentation for pd.Write() to save the files
Hope it helps
Related
My first time using pandas. I am sure the answer is something along the lines of storing the worksheet names in a list, then looping through the list based on the name I am looking for. I'm just not experienced enough to know how to do that.
The goal is to use pandas to extract and concatenate data from multiple worksheets from a user selected workbook. The final output being a single worksheet excel containing all data extracted from the various worksheets.
The excel workbook consist of approximately 100 worksheets. The qty of visible sheets will always vary, with the qty of sheets occurring before 'Main Frames BUP1' being variable as well.
I currently have the portion of code checking for page visibility working. I can not seem to figure out how to start at a specific worksheet when that worksheets position in the workbook could vary (i.e. not always the 3rd worksheet starting from 0 it could be the 5th in a users excel). It will however, always be the sheet that data should start being pulled from. Everything I find are examples of specifying specific sheets to read.
Any help/direction would be appreciated.
# user selected file from GUI
xl = values["-xl_file-"]
loc = os.path.dirname(xl)
xls = pd.ExcelFile(xl)
sheets = xls.book.worksheets
for x in sheets:
print(x.title, x.sheet_state)
if x.sheet_state == 'visible':
df = pd.concat(pd.read_excel(xls, sheet_name=None, header=None,
skiprows=5, nrows=32, usecols='M:AD'), ignore_index=True)
writer = pd.ExcelWriter(f'{loc}/test.xlsx')
df.to_excel(writer, 'bananas')
writer.save()
*******Additional clarification on final goal: Exclude all sheets occurring before 'Main Frames BUP 1', only consider visible sheets, pull data from 'M6:AD37', if entire row is blank do not add(or at least remove) from data frame, stop pulling data at the sheet just before a worksheet who's name has a partial match to 'panel'
If I create a dictionary of visible sheets, how do you create a new dictionary useing that dictionary only consisting of 'Main Frames BUP 1' to whatever sheet occurs just before a partial match of 'panel'? Then I can use that dictionary for my data pull.
I created a minimal sample myself and worked it out for you.
xls = pd.ExcelFile('data/Test.xlsx')
sheets = xls.book.worksheets
sList = [x.title for x in sheets if x.sheet_state == 'visible']
dfs = [pd.read_excel('data/Test.xlsx', sheet_name=s, skiprows=5, nrows=32, usecols='M:AD') for s in sList]
dfconcat = pd.concat(dfs)
Now you need adjust the columns, headers and so on as you did in your question. I hope that it works out for you. From my side here it worked like a charm.
It is a bit hard without actually see what is going on with your data.
I believe that what you are missing is that you need to create one dataframe first and after concat the others. Also you need to pass a sheet(x) in order to pandas be able to read it, otherwise it will become a dictionary. In case it does not work, get the first sheet and create a df, then you concat.
# user selected file from GUI
xl = values["-xl_file-"]
loc = os.path.dirname(xl)
xls = pd.ExcelFile(xl)
sheets = xls.book.worksheets
df = pd.DataFrame()
for x in sheets:
print(x.title, x.sheet_state)
if x.sheet_state == 'visible':
df = pd.concat(pd.read_excel(xls, sheet_name=x, header=None,
skiprows=5, nrows=32, usecols='M:AD'), ignore_index=True)
writer = pd.ExcelWriter(f'{loc}/test.xlsx')
df.to_excel(writer, 'bananas')
writer.save()
You can also put all the dfs in a dictionary, again it is difficult without knowing what you are working with.
xl = pd.ExcelFile('yourFile.xlsx')
#collect all sheet names
sheets = xl.sheet_names
#build dictionaries from all sheets passing None to sheet_name
diDF = pd.read_excel('yourFile.xlsx', sheet_name=None)
di = {k : diDF[k] for k in diDF if k in sheets}
for x in sheets:
if x.sheet_state == 'visible':
dfs = {x: pd.DataFrame(di[x])}
I'm currently creating a dataframe from an excel spreadsheet in Pandas. For most of the files, they only contain 1 sheet. However, with some of the files that I have the sheet is not the first sheet. However, all of the sheets in all of the files have the same format. They have 'ITD_XXX_XXXX'. Is there a way to input into pandas to select the sheet that has the form.
df = pd.read_excel(path, sheet_name = contains('ITD_')
Here pandas would only select data from the sheet that has the string 'ITD_' in front of it?
Cheers.
I think the answer here would probably give you what you need.
Bring in the file as an Excelfile before reading it as a dataframe. Get the Sheet_names, and then extract the sheet_name that has 'ITD_'.
excel = pd.ExcelFile("your_excel.xlsx")
excel.sheet_names
# ["Sheet1", "Sheet2"]
for n in excel.sheet_names:
if n.startswith('ITD_'):
sheetname = n
break
df = excel.parse(sheetname)
I have 1 list of excel with different tables:
List1:
I want to unify them into one table in pandas. Could you advice me how to do it in pandas?
Thanks in advance!
If you want to concatenate the first sheet of some excel files, you can use the following code block:
import os
import pandas as pd
cwd = os.path.abspath('')
files = os.listdir(cwd)
## gets the first sheet of a given file
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel(file), ignore_index=True)
df.head()
df.to_excel('total_sales.xlsx')
and if you want to merge various sheets of a given excel file in one pandas data frame, you can use the following code:
##gets all sheets of a given file
df_total = pd.DataFrame()
for file in files: # loop through Excel files
if file.endswith('.xlsx'):
excel_file = pd.ExcelFile(file)
sheets = excel_file.sheet_names
for sheet in sheets: # loop through sheets inside an Excel file
df = excel_file.parse(sheet_name = sheet)
df_total = df_total.append(df)
df_total.to_excel('combined_file.xlsx')
You can put your different tables in various sheets of one excel file or different excel files and then concatenate them using the above codes.
I am trying to read multiple excel files in a loop using read_excel :
Different excel files contain sheet names which contain the word "staff"
eg Staff_2013 , Staff_list etc
Is there a way to read all these files dynamically using some wild card concept ?
Something like the code below :
df = pd.read_excel(folder,col_names=True,sheet_name='Staff*')
You can list the sheets and select the ones you want to read one by one.
For instance:
xls_file = pd.ExcelFile('my_excel_file.xls')
staff_fnames = [sheet for sheet in xls.sheet_names if sheet.startswith('Staff')]
for staff_fname in staff_fnames:
df = pd.read_excel('my_excel_file.xls'), sheet_name=staff_fname)
Or, if you don't mind loading all the sheets, you can also use sheet_name=None to load all sheets in a dict and filter afterwards:
dfs_dict = pd.read_excel('my_excel_file.xls', sheet_name=None)
dfs_dict = {s: df for s, df in dfs_dict.items() if s.startswith('Staff')}
I have one excel file with several identical structured sheets on it (same headers and number of columns) (sheetsname: 01,02,...,12).
How can I get this into one dataframe?
Right now I would load it all seperate with:
df1 = pd.read_excel('path.xls', sheet_name='01')
df2 = pd.read_excel('path.xls', sheet_name='02')
...
and would then concentate it.
What is the most pythonic way to do it and get directly one dataframe with all the sheets? Also assumping I do not know every sheetname in advance.
read the file as:
collection = pd.read_excel('path.xls', sheet_name=None)
combined = pd.concat([value.assign(sheet_source=key)
for key,value in collection.items()],
ignore_index=True)
sheet_name = None ensures all the sheets are read in.
collection is a dictionary, with the sheet_name as key, and the actual data as the values. combined uses the pandas concat method to get you one dataframe. I added the extra column sheet_source, in case you need to track where the data for each row comes from.
You can read more about it on the pandas doco
you can use:
df_final = pd.concat([pd.read_excel('path.xls', sheet_name="{:02d}".format(sheet)) for sheet in range(12)], axis=0)