I am trying to to find out all the duplicates using pandas and I have managed to do so. However, there are multiple worksheets in the excel and I would like repeat the process for all the worksheets and the final excel should be all the new data without duplicates and they are placed on their individual worksheet. I am currently stuck as my code will only loop through and the result will only the last looped worksheet. Hence, it would be great if anyone can enlighten me on this issue. The below is my code:
final_audited_filepath = '<file_path>\\test12.xlsx'
x1=pd.ExcelFile(final_audited_filepath)
writer=pd.ExcelWriter("<file_path>\\test123.xlsx")
for sheet in x1.sheet_names: #scan for the number of worksheets in the excel
data=pd.read_excel(final_audited_filepath, sheet_name=sheet)
data_first_record= data.drop_duplicates(subset=['Reference ID','Check Description'], keep="first")
data_first_record.to_excel(writer, index=False, sheet_name=sheet)
To write more than one sheet in the workbook or excel file using Pandas, you need to use an ExcelWriter object,
with pd.ExcelWriter('output.xlsx') as f:
df1.to_excel(f, sheet_name='sheet 1')
df2.to_excel(f, sheet_name='sheet 2')
After dropping the duplicates, create separate data frames df1, df2 and so on.. for different worksheets and try the code above.
Click here to check Pandas to_excel Documentation Link
Related
I am trying to export a dataframe I've generated in Pandas to an Excel Workbook. I have been able to get that part working, but unfortunately no matter what I try, the dataframe goes into the workbook as a brand new worksheet.
What I am ultimately trying to do here is create a program that pulls API data from a website and imports it in an existing Excel sheet in order to create some sort of "live updating excel workbook". This means that the worksheet already has proper formatting, vba, and other calculated columns applied, and all of this would ideally stay the same except for the basic data in the dataframe I'm importing.
Anyway to go about this? Any direction at all would be quite helpful. Thanks.
Here is my current code:
file='testbook.xlsx'
writer = pd.ExcelWriter(file, engine = 'xlsxwriter')
df.to_excel(writer, sheet_name="Sheet1")
workbook = writer.book
worksheet = writer.sheets["Sheet1")
writer.save
In case u have both existing excel file and DataFrame in same format then you can simply import your exiting excel file into another DataFrame and concat both the DataFrames then save into new excel or existing one.
df1["df"] = pd.read_excel('testbook.xlsx')
df2["df"] = 1#your dataFrame
df = pd.concat([df1, df2])
df.to_excel('testbook.xlsx')
There are multiple ways of doing it if you want to do it completely using pandas library this will work.
I have a dataframe in my Jupyter notebook that I can successfully write to an Excel file with pandas ExcelWriter, but I'd rather split the dataframe into smaller dataframes (based on its index), then loop through them to write each to a different sheet in one Excel file. This seems syntactically correct but my code cell just runs without ever finishing:
path = r'/root/notebooks/my_file.xlsx'
writer = ExcelWriter(path)
sheets = df.index.unique().tolist()
for sheet in sheets:
df.loc[sheet].to_excel(writer, sheet_name=sheet, index=False)
writer.save()
I've tried a few different approaches without any luck. Am I missing something simple?
It is hard to determine the issue in your system without the error message (as you have said, you have an infinite loop). You might check the size of your dataset as you are putting only one row for each excel sheet. If you have plenty of rows, then you will have that many sheets.
However, I tried your code with my own dataset and there are some errors that can be fixed anyway.
path = 'raw/test_so.xlsx'
writer = pd.ExcelWriter(path)
sheets = df.index.unique().tolist()
for sheet in sheets:
df.loc[[sheet]].to_excel(writer, sheet_name=str(sheet), index=False)
writer.save()
See the df.loc[[sheet]] for each sheet to still get the dataframe format on excel (with column headers).
If your dataframe index is in integer, make sure that you do sheet_name=str(sheet), as it can't accept integer for the sheet name.
I need to write multiple dataframes to an excel file. These dataframes needs to be written to a specific sheet and it should not overwrite existing data on that sheet.
The code I have is as follows:
excelbook = test.xlsx
book = load_workbook(excelbook)
writer = pd.ExcelWriter(excelbook, engine = 'openpyxl')
writer.book = book
df.to_excel(writer, sheet_name = 'apple', startcol=5, startrow=0)
writer.save()
writer.close()
Problem with my code is, each time I run it to write a dataframe, it is creating a new sheet in the excel file. For example, if the sheet name I need is "apple", then since I'm running this piece of code 3 times (to write 3 dataframes to the same sheet), it is creating a new sheet each time and naming them as - "apple1", "apple2" and "apple3"
I need to write multiple dataframes to the same excel file, to the same sheet in that file, without overwriting the existing data in the sheet.
Please help. Thanks in advance.
I want to import the values from a Pandas dataframe into an existing Excel sheet. I want to insert the data inside the sheet without deleting what is already there in the other cells (like formulas using those datas etc).
I tried using data.to_excel like:
writer = pd.ExcelWriter(r'path\TestBook.xlsm')
data.to_excel(writer, 'Sheet1', startrow=1, startcol=11, index = False)
writer.save()
The problem is that this way i overwrite the entire sheet.
Is there a way to only add the dataframe? It would be perfect if I could also keep the format of the destination cells.
Thanks
I found a good solution for it. Xlwings natuarally supports pandas dataframe:
https://docs.xlwings.org/en/stable/datastructures.html#pandas-dataframes
The to_excel function provides a mode parameter to insert (w) of append (a) a data frame into an excel sheet, see below example:
with pd.ExcelWriter(p_file_name, mode='a') as writer:
df.to_excel(writer, sheet_name='Data', startrow=2, startcol=2)
I'm pretty new to Python but I was having some difficulty on getting started on this. I am using Python 3.
I've googled and found quite a few python modules that help with this but was hoping for a more defined answer here. So basically, I need to read from a csv file certain columns i.e G, H, I, K, and M. The ones I need aren't consecutive.
I need to read those columns from the csv file and transfer them to empty columns in an existing xls with data already in it.
I looked in to openpyxl but it doesn't seem to work with csv/xls files, only xlsx.
Can I use xlwt module to do this?
Any guidance on which module may work best for my usecase would be greatly appreciated. Meanwhile, i'm going to tinker around with xlwt/xlrd.
I recommend using pandas. It has convenient functions to read and write csv and xls files.
import pandas as pd
from openpyxl import load_workbook
#read the csv file
df_1 = pd.read_csv('c:/test/test.csv')
#lets say df_1 has columns colA and colB
print(df_1)
#read the xls(x) file
df_2=pd.read_excel('c:/test/test.xlsx')
#lets say df_2 has columns aa and bb
#now add a column from df_1 to df_2
df_2['colA']=df_1['colA']
#save the combined output
writer = pd.ExcelWriter('c:/test/combined.xlsx')
df_2.to_excel(writer)
writer.save()
#alternatively, if you want to add just one column to an existing xlsx file:
#i.e. get colA from df_1 into a new dataframe
df_3=pd.DataFrame(df_1['colA'])
#create writer using openpyxl engine
writer = pd.ExcelWriter('c:/test/combined.xlsx', engine='openpyxl')
#need this workaround to provide a list of work sheets in the file
book = load_workbook('c:/test/combined.xlsx')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
column_to_write=16 #this would go to column Q (zero based index)
writeRowIndex=0 #don't plot row index
sheetName='Sheet1' #which sheet to write on
#now write the single column df_3 to the file
df_3.to_excel(writer, sheet_name=sheetName, columns =['colA'],startcol=column_to_write,index=writeRowIndex)
writer.save()
You could try XlsxWriter , which is fully featured python module for writing Excel 2007+ XLSX file format.
https://pypi.python.org/pypi/XlsxWriter