how to append columns in existing excel sheet using panda in python - python

import pandas as pd
from pandas import ExcelWriter
trans=pd.read_csv('HMIS-DICR-2011-12-Manipur-Bishnupur.csv')
df=trans[["April 10-11","May 10-11","June 10-11","July 10-11","August 10-11","September 10-11","October 10-11","November 10-11","December 10-11","January 10-11","February 10-11","March 10-11","April 11-12","May 11-12","June 11-12","July 11-12","August 11-12","September 11-12","October 11-12","November 11-12","December 11-12","January 11-12","February 11-12","March 11-12"]]
writer1 = ExcelWriter('manipur1.xlsx')
df.to_excel(writer1,'Sheet1',index=False)
writer1.save()
this code successfully writes the data in a sheet 1 but how can append data of another data frame(df) from different excel file(mention below) into existing sheet(sheet1) "manipur1" excel file
for example:
my data frame is like:
trans=pd.read_csv('HMIS-DICR-2013-2014-Manipur-Bishnupur.csv')
df=trans[["April 12-13","May 12-13","June 12-13","July 12-13","August 12-13","September 12-13","October 12-13","November 12-13","December 12-13","January 12-13","February 12-13","March 12-13","April 13-14","May 13-14","June 13-14","July 13-14","August 13-14","September 13-14","October 13-14","November 13-14","December 13-14","January 13-14","February 13-14","March 13-14"]]

You can only append new data to an existing excel file while loading the existing data into pandas, appending the new data, and saving the concatenated data frame again.
To preserve existing sheets which are supposed to remain unchanged, you need to iterate over the entire workbook and handle each sheet. Sheets to be changed and appended are defined in the to_update dictionary.
# get data to be appended
trans=pd.read_csv('HMIS-DICR-2011-12-Manipur-Bishnupur.csv')
df_append = trans[["April 12-13","May 12-13","June 12-13","July 12-13","August 12-13","September 12-13","October 12-13","November 12-13","December 12-13","January 12-13","February 12-13","March 12-13","April 13-14","May 13-14","June 13-14","July 13-14","August 13-14","September 13-14","October 13-14","November 13-14","December 13-14","January 13-14","February 13-14","March 13-14"]]
# define what sheets to update
to_update = {"Sheet1": df_append}
# load existing data
file_name = 'manipur1.xlsx'
excel_reader = pd.ExcelFile(file_name)
# write and update
excel_writer = pd.ExcelWriter(file_name)
for sheet in excel_reader.sheet_names:
sheet_df = excel_reader.parse(sheet)
append_df = to_update.get(sheet)
if append_df is not None:
sheet_df = pd.concat([sheet_df, append_df], axis=1)
sheet_df.to_excel(excel_writer, sheet, index=False)
excel_writer.save()
However, any layouting/formatting in your existing excel will be lost. You can use openpyxl if you want to retain the formatting but this is more complicated.

Related

Using pandas in Python to loop through Worksheets updating a range of cells

I am trying to take a workbook, loop through specific worksheets retrieve a dataframe, manipulate it and essentially paste the dataframe back in the same place without changing any of the other data / sheets in the document, this is what I am trying:
path= '<folder location>.xlsx'
wb = pd.ExcelFile(path)
for sht in ['sheet1','sheet2','sheet3']:
df= pd.read_excel(wb,sheet_name = sht, skiprows = 607,nrows = 11, usecols = range(2,15))
# here I manipulate the df, to then save it down in the same place
df.to_excel(wb,sheet_name = sht, startcol=3, startrow=607)
# Save down file
wb.save(path))
wb.close()
My solution so far will just save the first sheet down with ONLY the data that I manipulated, I lose all other sheets and data that was on the sheet that I want to stay, so I end up with just sheet1 with only the data I manipulated.
Would really appreciate any help, thank you
Try using an ExcelWriter instead of an ExcelFile:
path= 'folder location.xlsx'
with pd.ExcelWriter(path) as writer:
for sht in ['sheet1','sheet2','sheet3']:
df= pd.read_excel(wb,sheet_name = sht, skiprows = 607,nrows = 11, usecols = range(2,15))
####here I manipulate the df, to then save it down in the same place###
df.to_excel(writer,sheet_name = sht, startcol=3, startrow=607)
Although I am not sure how it will behave when the file already exists and you overwrite some of them. It might be easier to read everything in first, manipulate the required sheets and save to a new file.

Excel Copy data without Formulas openpyxl

I'm trying copy and paste some data from one sheet to another sheet. The code works fine but I only need the value.
original_wb = xl.load_workbook(filename1)
copy_to_wb = xl.load_workbook(filename1)
source_sheet = original_wb.worksheets[0] # The first worksheet
copy_to_sheet = copy_to_wb.create_sheet(source_sheet.title+"_copy")
for row in source_sheet:
for cell in row:
copy_to_sheet[cell.coordinate].value = cell.value
copy_to_wb.save(str(filename1))
Can this be done in pandas instead?
if you want just values to be read and copied to new sheet . try read excel and write excel commands.
file_name= r"path"
#Read
df= (pd.read_excel(io=file_name,sheet_name='name'))
#process required data
#write to new work book or sheet
df.to_excel( file_name ,sheet_name= 'name')

Overwrite sheets in Excel with Python

I'm new to Python (and programming in general) and am running into a problem when writing data out to sheets in Excel.
I'm reading in an Excel file, performing a sum calculation on specific columns, and then writing the results out to a new workbook. Then at the end, it creates two charts based on the results.
The code works, except every time I run it, it creates new sheets with numbers appended to the end. I really just want it to overwrite the sheet names I provide, instead of creating new ones.
I'm not familiar enough with all the modules to understand all the options that are available. I've researched openpyxl, and pandas, and similar examples to what I'm trying to do either aren't easy to find, or don't seem to work when I try them.
import pandas as pd
import xlrd
import openpyxl as op
from openpyxl import load_workbook
import matplotlib.pyplot as plt
# declare the input file
input_file = 'TestData.xlsx'
# declare the output_file name to be written to
output_file = 'TestData_Output.xlsx'
book = load_workbook(output_file)
writer = pd.ExcelWriter(output_file, engine='openpyxl')
writer.book = book
# read the source Excel file and calculate sums
excel_file = pd.read_excel(input_file)
num_events_main = excel_file.groupby(['Column1']).sum()
num_events_type = excel_file.groupby(['Column2']).sum()
# create dataframes and write names and sums out to new workbook/sheets
df_1 = pd.DataFrame(num_events_main)
df_2 = pd.DataFrame(num_events_type)
df_1.to_excel(writer, sheet_name = 'TestSheet1')
df_2.to_excel(writer, sheet_name = 'TestSheet2')
# save and close
writer.save()
writer.close()
# dataframe for the first sheet
df = pd.read_excel(output_file, sheet_name='TestSheet1')
values = df[['Column1', 'Column3']]
# dataframe for the second sheet
df = pd.read_excel(output_file, sheet_name='TestSheet2')
values_2 = df[['Column2', 'Column3']]
# create the graphs
events_graph = values.plot.bar(x = 'Column1', y = 'Column3', rot = 60) # rot = rotation
type_graph = values_2.plot.bar(x = 'Column2', y = 'Column3', rot = 60) # rot = rotation
plt.show()
I get the expected results, and the charts work fine. I'd really just like to get the sheets to overwrite with each run.
From the pd.DataFrame.to_excel documentation:
Multiple sheets may be written to by specifying unique sheet_name.
With all data written to the file it is necessary to save the changes.
Note that creating an ExcelWriter object with a file name that already
exists will result in the contents of the existing file being erased.
Try writing to the book like
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6]})
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'first_df')
df.to_excel(writer, sheet_name = 'second_df')
writer.save()
If you inspect the workbook, you will have two worksheets.
Then lets say you wanted to write new data to the same workbook:
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'new_df')
writer.save()
If you inspect the workbook now, you will just have one worksheet named new_df
If there are other worksheets in the excel file that you want to keep and just overwrite the desired worksheets, you would need to use load_workbook.
Before you wrtie any data, you could delete the sheets you want to write to with:
std=book.get_sheet_by_name(<sheee_name>)
book.remove_sheet(std)
That will stop the behavior where a number gets appended to the worksheet name once you attempt to write a workbook with a duplicate sheet name.

through a list have the dataframe print to excel a different document. If not into the same document different sheet

How to get separate excel file of the data frame from a list?
I don't want to make 4 separate functions for each costcenter.
import pandas as pd
df = pd.read_excel(r'c:\temp\code.xlsx')
costcenter = ['1130', '1236', '3427', '3148' ]
for each_costcenter in costcenter:
y = df[df['COST CENTER']== each_costcenter]
y.to_excel(r'c:\temp\code\finalput.xlsx', sheet_name=each_costcenter)
I thought I can get finalput with 4 sheets of the data but I end up with one sheet with the last from the list.
I wouldn't mind getting 4 separate files with cost center names.
You need to create the Excel writer object first:
writer = pd.ExcelWriter(r'c:\temp\code\finalput.xlsx')
Then in your loop:
y.to_excel(excel_writer=writer, sheet_name=each_costcenter)
Then after the loop save the file:
writer.save()

How to export a dataframe to excel with sheets based on values in a column

I have created a data frame in Python based on marging multiple excel files, and now I would like to export that combined data to one .xlsx file with multiple sheets based on values in a column and have those sheets named after the value in the column.
using examples from the images below, I can currently export all the data to an .xlsx file.
What I would like to do is filter the data by the zone column, and export all the data associated with the Zone1 flag in the Zone column to a sheet named "Zone1" and all the data associated with the flag Zone2 to a sheet named "Zone2" . Ideally, in the second image, the highlighted data is the only data what would show up in the current "Zone1" Sheet, and the unhighlighted data would be in sheet "Zone2"
I'm using the following code to pull in the data, merge, and export.
import pandas as pd
import numpy as np
import glob
glob.glob("/Users/xxx/Desktop/PythonTests/Test_Zone*.xlsx")
all_data = pd.DataFrame()
for f in glob.glob("/Users/xxx/Desktop/PythonTests/Test_Zone*.xlsx"):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
all_data.to_excel("/Users/xxx/Desktop/merged.xlsx",index=False)
Assuming you have a dataframe, df, that holds all of your data (if you have it already saved as an excel file, you can just use df = pd.read_excel('path_to_file.xlsx')), you can use the following code to subset your dataframe by Zone and save each resultant subsetted dataframe to the dictionary df_dict:
df_dict = {}
for zone in df['Zone'].unique():
zone_df = df[df['Zone'] == zone]
df_dict[zone] = zone_df
Once you have this dictionary, you can use pd.ExcelWriter and to_excel to write your dataframes into separate sheets of an excel file:
def save_xlsx(df_dict, path):
"""
Save a dictionary of dataframes to an excel file, with each dataframe as a seperate page
"""
with pd.ExcelWriter(path) as writer:
for key in df_dict:
df_dict[key].to_excel(writer, key, index=False)
writer.save()
Calling the function will give you your desired result:
save_xlsx(df_dict, 'path_to_file.xlsx')
If you wanted to take out the spaces in your sheet names (as you have done in your example) you can modify the save_xlsx function accordingly:
df_dict[key].to_excel(writer, key.replace(' ',''), index=False)
Here is the code I tried that only exported Zone2 rather than 1 and 2"
import pandas as pd
import numpy as np
import glob
#list
glob.glob("/Users/TTT/Desktop/PythonTests/Test_Zone*.xlsx")
ALLDATA = pd.DataFrame()
for f in glob.glob("/Users/TTT/Desktop/PythonTests/Test_Zone*.xlsx"):
df = pd.read_excel(f)
ALLDATA = ALLDATA.append(df,ignore_index=True)
df_dict = {}
for zone in df['Zone'].unique():
zone_df = df[df['Zone'] == zone]
df_dict[zone] = zone_df
save_xlsx(df_dict, '/Users/TTT/Desktop/PythonTests/ExportTest.xlsx')

Categories

Resources