I have a DataFrame read from an excel sheet in which I've made a few new columns to using Xlsxwriter. Now I need to filter this new set of data using the new column I created in Xlsxwriter (which is a date column btw). Is there a way to turn this new worksheet into a dataframe again so I can filter the new column? I'll try to provide any useful code:
export = "files/extract.xlsx"
future_days = 12
writer = pd.ExcelWriter('files/new_report-%s.xlsx' % (date.today()), engine ='xlsxwriter')
workbook = writer.book
df = pd.read_excel(export)
df.to_excel(writer, 'Full Log', index=False)
log_sheet = writer.sheets['Full Log']
new_headers = ('todays date', 'Milestone Date')
log_sheet.write_row('CW1', new_headers)
# This for loop just writes in the formula for my new columns on every line
for row_num in range(2, len(df.index)+2):
log_sheet.write_formula('CX' + str(row_num),'=IF(AND($BS{0}>1/1/1990,$BT{0}<>"Yes"),IF($BS{0}<=$CW{0},$BS{0},"Date In Future"),IF(AND($BW{0}>1/1/1990,$BX{0}<>"Yes"),IF($BW{0}<=CW{0},$BW{0},"Date In Future"),IF(AND($CA{0}>1/1/1990,$CCW{0}<>"Yes"),IF($CA{0}<=CW{0},$CA{0},"Date In Future"),IF(AND($CE{0}>1/1/1990,$CF{0}<>"Yes"),IF($CE{0}<CW{0},$CE{0},"Date In Future"),IF(AND($CI{0}>1/1/1990,$CJ{0}<>"Yes"),IF($CI{0}<CW{0},$CI{0},"Date In Future"),IF(AND($CM{0}>1/1/1990,$CN{0}<>"Yes"),IF($CM{0}<CW{0},$CM{0},"Date In Future"),"No Date"))))))'.format(row_num))
log_sheet.write_formula('CW' + str(row_num), '=TODAY()+' + str(future_days))
log_sheet.write_formula('CY' + str(row_num), '=IF(AND(AI{0}>DATEVALUE("1/1/1900"), AH{0}>DATEVALUE("1/1/1900"),A{0}<>"Test",A{0}<>"Dummy Test"),NETWORKDAYS(AH{0},AI{0}-1),"Test")'.format(row_num))
So now that's all done I need to filter this "full log" sheet so it only gets data where the values in the new milestone date column have passed the date of today. I've used Xlsxwriters Autofilter for this but I don't like it as it doesn't actually apply the filter. just sets it.
You can call the save function on the writer then load the file into a new dataframe
writer.save()
df2 = pd.read_excel('Full Log')
Related
I have input data in the form of a dictionary consisting of 3 dataframes of numbers. I wish to iterate through each dataframe with some operations and then finally write results for each dataframe to excel.
The following code works fine except that it only writes the resulting dataframe for the last key in the dictionary.
How do I get results for all 3 dataframes written to individual sheets?
Input_Data={'k1':test1,'k2':test24,'k3':test3}
for v in Input_Data.values():
df1 = v[126:236]
df=df1.sort_index(ascending=False)
Indexer=df.columns.tolist()
df = [(pd.concat([df[Indexer[0]],df[Indexer[num]]],axis=1)) for num in [1,2,3,4,5,6]]
df = [(df[num].astype(str).agg(','.join, axis=1)) for num in [0,1,2,3,4,5]]
df=pd.DataFrame(df)
dff=df.loc[0].append(df.loc[1].append(df.loc[2].append(df.loc[3].append(df.loc[4].append(df.loc[5])))))
dff.to_excel('test.xlsx',index=False, header=False)
Your first issue is that with each iteration of the loop you are opening a new file.
As per pandas documentation:
"Multiple sheets may be written to by specifying unique sheet_name. With all data written to the file it is necessary to save the changes. Note that creating an ExcelWriter object with a file name that already exists will result in the contents of the existing file being erased."
Second, you are not providing a variable sheet name, so each time the data is being re-written as the same sheet.
An example solution, with ExcelWriter
#df1, df2, df3 - dataframes
input_data={
'sheet_name1' : df1,
'sheet_name2' : df2,
'sheet_name3' : df3
}
# Initiate ExcelWriter - use xlsx engine
writer = pd.ExcelWriter('multiple_sheets.xlsx', engine='xlsxwriter')
# Iterate over input_data dictionary
for sheet_name, df in input_data.items():
"""
Perform operations here
"""
# Write each dataframe to a different worksheet.
df.to_excel(writer, sheet_name=sheet_name)
# Finally, save ExcelWriter to file
writer.save()
Note 1. You only initiate and save the ExcelWriter object once, the iterations only add sheets to that object
Note 2. Compared to your code, the variable "sheet_name" is provided to the "to_excel()" function
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
# Write each dataframe to a different worksheet.
for sheet_name, df in zip(sheet_names, dfs):
df.to_excel(writer, sheet_name=sheet_name)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Try to change the file name at each iteration:
Input_Data={'k1':test1,'k2':test24,'k3':test3}
file_number = 1
for v in Input_Data.values():
df1 = v[126:236]
df=df1.sort_index(ascending=False)
Indexer=df.columns.tolist()
df = [(pd.concat([df[Indexer[0]],df[Indexer[num]]],axis=1)) for num in [1,2,3,4,5,6]]
df = [(df[num].astype(str).agg(','.join, axis=1)) for num in [0,1,2,3,4,5]]
df=pd.DataFrame(df)
dff=df.loc[0].append(df.loc[1].append(df.loc[2].append(df.loc[3].append(df.loc[4].append(df.loc[5])))))
file_name='test'
file_number=str(file_number)
dff.to_excel( str(file_name+file_number)+".xlsx",index=False, header=False)
file_number=int(file_number)
file_number = file_number+1
I am working on a pandas program, where i fetch rows from other excel sheets and append them to the main file:
import pandas as pd
from openpyxl import load_workbook
#reading all three ticket excel sheets
df1 = pd.read_excel("sheet a.xlsx")
df2 = pd.read_excel("sheet b.xlsx")
df3 = pd.read_excel("sheet c.xlsx")
#Creating Panadas Excel writer using xlsxwriter as engine
writer = pd.ExcelWriter(r"main_excel.xlsx", engine = "openpyxl")
writer.book = load_workbook(r"main_excel.xlsx")
sheets = writer.book.sheetnames
reader1 = pd.read_excel(r"main_excel.xlsx", "sheet a")
reader2 = pd.read_excel(r"main_excel.xlsx", "sheet b")
reader3 = pd.read_excel(r"main_excel.xlsx", "sheet c")
df1.to_excel(writer, sheet_name =sheets[0], index = False, header = False,startrow=len(reader1)+1)
df2.to_excel(writer, sheet_name =sheets[2], index = False, header = False,startrow=len(reader2)+1)
df3.to_excel(writer, sheet_name =sheets[4], index = False, header = False,startrow=len(reader3)+1)
writer.save()
writer.close()
After writing the data to the excel file, I have to calculate the month and the week number from the dates in the data and fill in the missing columns.
The data gets appended each week, so i would to append the data to the pre-existing columns.
is there a way to do that without writing the formula in the excel sheet itself? by coding it in the program?
You can convert your date column using the code below:
df['Opened'] = pd.to_datetime(df['Opened'])
Then you can get your other columns using:
df['Month'] = df['Opened'].dt.month_name()
df['Week'] = df['Opened'].dt.week
I'm new to Python (and programming in general) and am running into a problem when writing data out to sheets in Excel.
I'm reading in an Excel file, performing a sum calculation on specific columns, and then writing the results out to a new workbook. Then at the end, it creates two charts based on the results.
The code works, except every time I run it, it creates new sheets with numbers appended to the end. I really just want it to overwrite the sheet names I provide, instead of creating new ones.
I'm not familiar enough with all the modules to understand all the options that are available. I've researched openpyxl, and pandas, and similar examples to what I'm trying to do either aren't easy to find, or don't seem to work when I try them.
import pandas as pd
import xlrd
import openpyxl as op
from openpyxl import load_workbook
import matplotlib.pyplot as plt
# declare the input file
input_file = 'TestData.xlsx'
# declare the output_file name to be written to
output_file = 'TestData_Output.xlsx'
book = load_workbook(output_file)
writer = pd.ExcelWriter(output_file, engine='openpyxl')
writer.book = book
# read the source Excel file and calculate sums
excel_file = pd.read_excel(input_file)
num_events_main = excel_file.groupby(['Column1']).sum()
num_events_type = excel_file.groupby(['Column2']).sum()
# create dataframes and write names and sums out to new workbook/sheets
df_1 = pd.DataFrame(num_events_main)
df_2 = pd.DataFrame(num_events_type)
df_1.to_excel(writer, sheet_name = 'TestSheet1')
df_2.to_excel(writer, sheet_name = 'TestSheet2')
# save and close
writer.save()
writer.close()
# dataframe for the first sheet
df = pd.read_excel(output_file, sheet_name='TestSheet1')
values = df[['Column1', 'Column3']]
# dataframe for the second sheet
df = pd.read_excel(output_file, sheet_name='TestSheet2')
values_2 = df[['Column2', 'Column3']]
# create the graphs
events_graph = values.plot.bar(x = 'Column1', y = 'Column3', rot = 60) # rot = rotation
type_graph = values_2.plot.bar(x = 'Column2', y = 'Column3', rot = 60) # rot = rotation
plt.show()
I get the expected results, and the charts work fine. I'd really just like to get the sheets to overwrite with each run.
From the pd.DataFrame.to_excel documentation:
Multiple sheets may be written to by specifying unique sheet_name.
With all data written to the file it is necessary to save the changes.
Note that creating an ExcelWriter object with a file name that already
exists will result in the contents of the existing file being erased.
Try writing to the book like
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6]})
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'first_df')
df.to_excel(writer, sheet_name = 'second_df')
writer.save()
If you inspect the workbook, you will have two worksheets.
Then lets say you wanted to write new data to the same workbook:
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'new_df')
writer.save()
If you inspect the workbook now, you will just have one worksheet named new_df
If there are other worksheets in the excel file that you want to keep and just overwrite the desired worksheets, you would need to use load_workbook.
Before you wrtie any data, you could delete the sheets you want to write to with:
std=book.get_sheet_by_name(<sheee_name>)
book.remove_sheet(std)
That will stop the behavior where a number gets appended to the worksheet name once you attempt to write a workbook with a duplicate sheet name.
I am trying to add an empty excel sheet into an existing Excel File using python xlsxwriter.
Setting the formula up as follows works well.
workbook = xlsxwriter.Workbook(file_name)
worksheet_cover = workbook.add_worksheet("Cover")
Output4 = workbook
Output4.close()
But once I try to add further sheets with dataframes into the Excel it overwrites the previous excel:
with pd.ExcelWriter('Luther_April_Output4.xlsx') as writer:
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')
How should I write the code, so that I can add empty sheets and existing data frames into an existing excel file.
Alternatively it would be helpful to answer how to switch engines, once I have produced the Excel file...
Thanks for any help!
If you're not forced use xlsxwriter try using openpyxl. Simply pass 'openpyxl' as the Engine for the pandas built-in ExcelWriter class. I had asked a question a while back on why this works. It is helpful code. It works well with the syntax of pd.to_excel() and it won't overwrite your already existing sheets.
from openpyxl import load_workbook
import pandas as pd
book = load_workbook(file_name)
writer = pd.ExcelWriter(file_name, engine='openpyxl')
writer.book = book
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')
writer.save()
You could use pandas.ExcelWriter with optional mode='a' argument for appending to existing Excel workbook.
You can also append to an existing Excel file:
>>> with ExcelWriter('path_to_file.xlsx', mode='a') as writer:`
... df.to_excel(writer, sheet_name='Sheet3')`
However unfortunately, this requires using a different engine, since as you observe the ExcelWriter does not support the optional mode='a' (append). If you try to pass this parameter to the constructor, it raises an error.
So you will need to use a different engine to do the append, like openpyxl. You'll need to ensure that the package is installed, otherwise you'll get a "Module Not Found" error. I have tested using openpyxl as the engine, and it is able to append new a worksheet to existing workbook:
with pd.ExcelWriter(engine='openpyxl', path='Luther_April_Output4.xlsx', mode='a') as writer:
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')
I think you need to write the data into a new file. This works for me:
# Write multiple tabs (sheets) into to a new file
import pandas as pd
from openpyxl import load_workbook
Work_PATH = r'C:\PythonTest'+'\\'
ar_source = Work_PATH + 'Test.xlsx'
Output_Wkbk = Work_PATH + 'New_Wkbk.xlsx'
# Need workbook from openpyxl load_workbook to enumerage tabs
# is there another way with only xlsxwriter?
workbook = load_workbook(filename=ar_source)
# Set sheet names in workbook as a series.
# You can also set the series manually tabs = ['sheet1', 'sheet2']
tabs = workbook.sheetnames
print ('\nWorkbook sheets: ',tabs,'\n')
# Replace this function with functions for what you need to do
def default_col_width (df, sheetname, writer):
# Note, this seems to use xlsxwriter as the default engine.
for column in df:
# map col width to col name. Ugh.
column_width = max(df[column].astype(str).map(len).max(), len(column))
# set special column widths
narrower_col = ['OS','URL'] #change to fit your workbook
if column in narrower_col: column_width = 10
if column_width >30: column_width = 30
if column == 'IP Address': column_width = 15 #change for your workbook
col_index = df.columns.get_loc(column)
writer.sheets[sheetname].set_column(col_index,col_index,column_width)
return
# Note nothing is returned. Writer.sheets is global.
with pd.ExcelWriter(Output_Wkbk,engine='xlsxwriter') as writer:
# Iterate throuth he series of sheetnames
for tab in tabs:
df1 = pd.read_excel(ar_source, tab).astype(str)
# I need to trim my input
df1.drop(list(df1)[23:],axis='columns', inplace=True, errors='ignore')
try:
# Set spreadsheet focus
df1.to_excel(writer, sheet_name=tab, index = False, na_rep=' ')
# Do something with the spreadsheet - Calling a function
default_col_width(df1, tab, writer)
except:
# Function call failed so just copy tab with no changes
df1.to_excel(writer, sheet_name=tab, index = False,na_rep=' ')
If I use the input file name as the output file name, it fails and erases the original. No need to save or close if you use With... it closes autmatically.
I'd like for the code to run 12345 thru the loop, input it in a worksheet, then start on 54321 and do the same thing except input the dataframe into a new worksheet but in the same workbook. Below is my code.
workbook = xlsxwriter.Workbook('Renewals.xlsx')
groups = ['12345', '54321']
for x in groups:
(Do a bunch of data manipulation and get pandas df called renewals)
writer = pd.ExcelWriter('Renewals.xlsx', engine='xlsxwriter')
worksheet = workbook.add_worksheet(str(x))
renewals.to_excel(writer, sheet_name=str(x))
When this runs, I am left with a workbook with only 1 worksheet (54321).
try something like this:
import pandas as pd
#initialze the excel writer
writer = pd.ExcelWriter('MyFile.xlsx', engine='xlsxwriter')
#store your dataframes in a dict, where the key is the sheet name you want
frames = {'sheetName_1': dataframe1, 'sheetName_2': dataframe2,
'sheetName_3': dataframe3}
#now loop thru and put each on a specific sheet
for sheet, frame in frames.iteritems(): # .use .items for python 3.X
frame.to_excel(writer, sheet_name = sheet)
#critical last step
writer.save()
import pandas as pd
writer = pd.ExcelWriter('Renewals.xlsx', engine='xlsxwriter')
renewals.to_excel(writer, sheet_name=groups[0])
renewals.to_excel(writer, sheet_name=groups[1])
writer.save()
Building on the accepted answer, you can find situations where the sheet name will cause the save to fail if it has invalid characters or is too long. This could happen if you are using grouped values for the sheet name as an example. A helper function could address this and save you some pain.
def clean_sheet_name(sheet):
"""Clean sheet name so that it is a valid Excel sheet name.
Removes characters in []:*?/\ and limits to 30 characters.
Args:
sheet (str): Name to use for sheet.
Returns:
cleaned_sheet (str): Cleaned sheet name.
"""
if sheet in (None, ''):
return sheet
clean_sheet = sheet.translate({ord(i): None for i in '[]:*?/\\'})
if len(clean_sheet) > 30: # Set value you feel is appropriate
clean_sheet = clean_sheet[:30]
return clean_sheet
Then add a call to the helper function before writing to Excel.
for sheet, frame in groups.items():
# Clean sheet name for length and invalid characters
sheet = clean_sheet_name(sheet)
frame.to_excel(writer, sheet_name = sheet, index=False)
writer.save()