I have 10 excel files with one sheet in each excel file:-
Sheet Name= Report Output 1
I have created dataframe based on 10 excel files by importing all 10 files through glob and pandas.
import glob
import pandas as pd
df = pd.DataFrame()
for f in glob.glob(filename*.xlsx):
info = pd.read_excel(f, sheetname='Report Output 1')
df = df.append(info)
Did some filtration, merging and calculations as per the requirement.
Now I have one consolidated dataframefinal_df which has data for 10 files after my calculations.
I want to paste the dataframe final_df back to all respective 10 files in New Sheet by splitting or groupby with unique value (column name Source in each file which has unique value) keeping original data in existing files(Sheet Name - Report Output 1) as it is.
I know openpyxl can perform this function through Python OpenPyXl dataframe_to_rows but how to write the code which will copy the dataframe to separate sheets.
Related
I am trying to stack multiple workbooks by sheet in python. Currently my folder contains over 50 individual workbooks which are separated by date and usually contain up to 3 sheets, although not always.
The sheets include "Pay", "Tablet", "SIMO".
I want to try to stack all data from the "Tablet" sheet into a new workbook and have been using the following code.
**import os
import pandas as pd
path = r"path_location"
files = os.listdir(path)
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel(file, sheet_name='Tablet'), ignore_index=True)
df.head()
df.to_csv('Tablet_total.csv')**
however after checking the files I realised this is not pulling from all workbooks that have the sheet "Tablet". I suspect this may be due to the fact that not all workbooks have this sheet but in case I missed anything I'd greatly appreciate some ideas to what I may be doing wrong.
Also, as a final request, the sheet "Tablet" across all workbooks has unnecessary columns in the beginning.
I have tried incorporating df.drop(index=df.index[:7], axis=0, inplace=True) into my loop yet this only removes the first 7 rows from the first iteration. Again any support with this would be greatly appreciated.
Many thanks
First, I would check that you do not have any .xls files or other excel file suffixes with:
import os
path = r"path_location"
files = os.listdir(path)
print({
file.split('.')[1]
for file in files
})
Then, I would check that you don't have any sheets with trailing white spaces or capitalization issues with:
import os
import pandas
path = r"path_location"
files = os.listdir(path)
print({
sheet_name
for file in files
if file.endswith('.xlsx')
for sheet_name in pandas.ExcelFile(file).sheet_names
})
I would use pandas.concat() with a list comprehension to concatenate the sheets. I would also add a check to ensure that the worksheet as a sheet named 'Tablet'. Finally, if want to not use the first seven columns you should a) do it on each dataframe as it's read in and before it is concatenated together with the other dataframes, and b) first include all the rows and then specify the columns with .iloc[:, 7:]
import os
import pandas
path = r"path_location"
files = os.listdir(path)
df = pandas.concat([
pandas.read_excel(file, sheet_name='Tablet').iloc[:, 7:]
for file in files
if file.endswith('.xlsx') and 'Tablet' in pandas.ExcelFile(file).sheet_names
])
df.head()
Check if you have excel files with other extensions - xlsm, xlsb and others?
In order to remove seven rows at each iteration, you need to select a temporary dataframe when reading from excel and delete them from it.
df_tmp = pd.read_excel(file, sheet_name='Tablet')
df_tmp.drop(index=df_tmp.index[:7], axis=0, inplace=True)
Since append is deprecated, use concat() instead.
pandas.DataFrame.append
pd.concat([df, df_tmp], ignore_index=True)
After parsing eml files and extracting them to create many dataframes, I want to save them to one Excel file.
After saving all my dataframes into the Excel file I still have only the last dataframe among them not all of them in the Sheet.
Someone have an idea how I can solve that ?
you should use sheet name parameter:
import pandas as pd
df_1 = pd.DataFrame()
df_2 = pd.DataFrame()
df_1.to_excel(filename, sheet_name='df1')
df_2.to_excel(filename, sheet_name='df2')
I have 5 sheets in an excel workbook. I would like to export each sheet to csv using python libraries.
This is a sheet showing sales in 2019. I have named the seets according to the year they represent as shown here.
I have read the excel spreadsheet using pandas. I have used the for loop since I am interested in saving the csv file like the_sheet_name.csv. This is my code in a jupyter notebook:
import pandas as pd
df = pd.DataFrame()
myfile = 'sampledata.xlsx’
xl = pd.ExcelFile(myfile)
for sheet in xl.sheet_names:
df_tmp = xl.parse(sheet)
print(df_tmp)
df = df.append(df_tmp, ignore_index=True,sort=False)
csvfile = f'{sheet_name}.csv'
df.to_csv(csvfile, index=False)
Executing the code is producing just one csv file that has the data for all the other sheets. I would like to know if there is a way to customize my code so that I can produce individual sheets e.g sales2011.csv, sales2012.csv and so on.
Use sheet_name=None returns a dictionary of dataframes:
dfs = pd.read_excel('file.xlsx', sheet_name=None)
for sheet_name, data in dfs.items():
data.to_csv(f"{sheet_name}.csv")
firstly I ask admin not to close the topic. Because last time I opened and it was closed that there are similar topics. But those are not same. Thanks in advance.
Every day I receive 15-20 excel files with huge number of worksheets (more than 200). Fortunately worksheet names and counts are same for all excel files. I want to merge all excel files into one but with multiple worksheets. I am new in Python, actually I watched and read a lot about the options how to do but could not find a way. Thanks for your support.
example I tried: I have two files with two sheets (actual sheet count is huge as mentioned above), I want to merge both files into one with two sheets as sum.xlsx.
Data1.xlsx
Data1.xlsx
sum.xlsx
import os
import openpyxl
import pandas as pd
files = os.listdir(r'C:\Python\db') # we open the folder where files are located
os.chdir(r'C:\Python\db') # we change working direction
df = pd.DataFrame() # we create an empty data frame
wb = openpyxl.load_workbook(r'C:\Python\db\Data1.xlsx') # we load one of file to extract list of sheet names
sh_name = wb.sheetnames # we extract list of names into a list
for i in sh_name:
for f in files:
data = pd.read_excel(f, sheet_name=i)
df = df.append(data)
df.to_excel('sum.xlsx', index=False, sheet_name=i)
I have one excel file with many rows of data. I have a second file with multiple sheets. Using python, I want to loop through each sheet on the second file, and merge it with the data on the first file (they have the same column headers).
As a final export, I would like to have all the merged data back on the first file.
I'm relatively new to python and don't have any code written except for reading in the pandas library and the two files.
Given that file1.xlsx is your main file and file2.xlsx is your file with the multiple sheets:
import pandas
df_main = pd.read_excel('file1.xlsx')
multiple_sheets = pd.read_excel('file2.xlsx', sheet_name=None) # None means all sheets, this produces a dict of DataFrames with the keys as the sheet names.
for x in multiple_sheets.values(): # Loop through dict with x as the df per sheet
# Cleanup before adding.
df_main = pd.concat([df_main, x], ignore_index=True)
From there, you can now do your cleanup and save the DataFrame as a new Excel file (i.e., df_main.to_excel('file1.xlsx')).
References:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html