Pandas dataframe limit using xlwings? - python

I am pulling data from an excel sheet using xlwings and pandas however, there are 3063 rows in the sheet but pandas only pulls in 3000 rows. How can I pull in all 3063 rows?
Code:
wb = xw.Book("Sheet.xlsb", read_only=True, update_links=False)
sheet = wb.sheets["Sheet1"]
df = sheet.range('A3').options(pd.DataFrame,
header=1,
index=False,
expand='table').value
wb.close()

Related

Read each excel sheet as a different dataframe in Python

I have an excel file with 40 sheet_names. I want to read each sheet to a different dataframe, so I can export an xlsx file for each sheet.
Instead of writing all the sheet names one by one, I want to create a loop that will get all sheet names and add them as a variable in the "sheet_name" option of "pandas_read_excel"
I am trying to avoid this:
df1 = pd.read_excel(r'C:\Users\filename.xlsx', sheet_name= 'Sheet1');
df2 = pd.read_excel(r'C:\Users\filename.xlsx', sheet_name= 'Sheet2');
....
df40 = pd.read_excel(r'C:\Users\filename.xlsx', sheet_name= 'Sheet40');
thank you all guys
Specifying sheet_name as None with read_excel reads all worksheets and returns a dict of DataFrames.
import pandas as pd
file = 'C:\Users\filename.xlsx'
xl = pd.read_excel(file, sheet_name=None)
sheets = xl.keys()
for sheet in sheets:
xl[sheet].to_excel(f"{sheet}.xlsx")
I think this is what you are looking for.
import pandas as pd
xlsx = pd.read_excel('file.xlsx', sheet_name=None, header=None)
for sheet in xlsx.keys(): xlsx[sheet].to_excel(sheet+'.xlsx', header=False, index=False)

Append data to the last row of the Excel sheet using Pandas

I have an excel data for three variables (Acct, Order, Date) in a Sheet name called Orders
I have created a data frame by reading this Sheet
import pandas as pd
sheet_file=pd_ExcelFile("Orders.xlsx", engine="openpyxl")
for sheet_name in worksheets:
df=pd.read_excel(sheet_file,sheet_name,header=1)
append_data.append(df)
append_data=pd.concat(append_data)
I have another Excel file called "Total_Orders.xlsx" with ~100k rows and I need to append the above dataframe to this excel file (Sheet Name="Orders")
with pd.ExcelWriter('Total_Orders.xlsx',sheet_name='Orders',engine="openpyxl") as writer:
append_data.to_excel(writer,startrow=2,header=False,index=False)
writer.save()
The above is overwriting the data instead of appending it. I know startrow is the key here but I am not sure how to fix this. Any help is much appreciated
Have you tried in mode="a", along these lines:
with pd.ExcelWriter("Total_Orders.xlsx", mode="a", engine="openpyxl") as writer:
append_data.to_excel(writer, sheet_name="Orders")
EDIT - in response to comment
import pandas as pd
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import load_workbook
append_data = pd.DataFrame([{'Acct':3, 'Order':333, 'Note':'third'},
{'Acct':4, 'Order':444, 'Note':'fourth'}])
wb = load_workbook(filename = "stackoverflow.xlsx")
ws = wb["Orders"]
for r in dataframe_to_rows(append_data, index=False, header=False): #No index and don't append the column headers
ws.append(r)
wb.save("stackoverflow.xlsx")
The stackoverflow.xlsx before:
The stackoverflow.xlsx after (the 'Other' sheet was not affected):

how to append dataframe in existing sheet of excel file using python

You can find what I've tried so far below:
import pandas
from openpyxl import load_workbook
book = load_workbook('C:/Users/Abhijeet/Downloads/New Project/Masterfil.xlsx')
writer = pandas.ExcelWriter('C:/Users/Abhijeet/Downloads/New Project/Masterfiles.xlsx', engine='openpyxl',mode='a',if_sheet_exists='replace')
df.to_excel(writer,'b2b')
writer.save()
writer.close()
Generate Sample data
import pandas as pd
# dataframe Name and Age columns
df = pd.DataFrame({'Col1': ['A', 'B', 'C', 'D'],
'Col2': [10, 0, 30, 50]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('sample.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index=False)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
This code will add two columns, Col1 and Col2, with data to Sheet1 of sample.xlsx.
To Append data to existing excel
import pandas as pd
from openpyxl import load_workbook
# new dataframe with same columns
df = pd.DataFrame({'Col1': ['E','F','G','H'],
'Col2': [100,70,40,60]})
writer = pd.ExcelWriter('sample.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('sample.xlsx')
# copy existing sheets
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
# read existing file
reader = pd.read_excel(r'sample.xlsx')
# write out the new sheet
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.close()
This code will append data at the end of an excel.
Check these as well
how to append data using openpyxl python to excel file from a specified row?
Suppose you have excel file abc.xlsx.
and You Have Dataframe to be appended as "df1"
1.Read File using Pandas
import pandas as pd
df = pd.read_csv("abc.xlsx")
2.Concat Two dataframes and write to 'abc.xlsx'
finaldf = pd.concat(df,df1)
# write finaldf to abc.xlsx and you are done

Python reading an updated excel file

I’m really stuck on what should be an easy problem.
I have an excel workbook that I’m making an update to 2 Columns for one record for the clean_data sheet. From there, I’m saving and closing the file.
After that, I’m trying to pull in the updated roll up sheet values as a data frame (graphs_rolling) which has formulas utilizing the clean_data sheet.
When I view the data frame, all the values are Nan. I can open the exel file and see the updated values on the graphs_rolling sheet. What can I do to fix the data frame to populate with values?
Code is shown below:
import pandas as pd
import openpyxl
from openpyxl import load_workbook
#Import Data with Correct Rows and Columns for SSM Commercial
book = load_workbook('//CPI Projects//Test//SampleSSM//NewSSM.xlsx')
writer = pd.ExcelWriter('//CPI Projects//Test//SampleSSM//NewSSM.xlsx', engine = 'openpyxl')
writer.book = book
df1 = pd.read_excel('//CPI Projects//Test//SampleSSM//NewSSM.xlsx',sheet_name='clean_data')
df1.loc[df1['ev_id']==20201127, 'commercial_weight'] = 0 df1.loc[df1['ev_id']==20201127, 'commercial'] = 0
book.remove(book['clean_data'])
df1.to_excel(writer, sheet_name = 'clean_data',index=False)
writer.save()
writer.close()
df5 = pd.read_excel('//CPI Projects//Test//SampleSSM//NewSSM.xlsx',sheet_name='graphs_rolling_avg',skiprows=30)
print(df5)

How to separate multiple data frames in pd.read_html() when saving to excel using Python

I am attempting to save data from multiple tables brought in through pd.read_html(). If I print df, I can see it captured all the data, but when saving the data it is only saving the first table to excel. How do I separate out the tables so I can save each one to a separate sheet in excel (i.e. Quarterly Income Statement on sheet1, Annual Income Statement on sheet2, etc.). Below is my code. Any help is appreciated.
dfs = pd.read_html(https://www.google.com/finance?q=googl&fstype=ii, flavor='html5lib')
writer = pd.ExcelWriter(output.xlsx, engine='xlsxwriter')
for df in dfs:
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
You can iterate on your list and flush them to a new sheet of the same workbook
import pandas as pd
dfs = pd.read_html('https://www.google.com/finance?q=googl&fstype=ii', flavor='html5lib')
# Create a Pandas Excel writer.
xlWriter = pd.ExcelWriter('myworkbook.xlsx', engine='xlsxwriter')
# Write each df to its own sheet
for i, df in enumerate(dfs):
df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
# Close the writer and output the Excel file (mandatory!)
xlWriter.save()

Categories

Resources