How to parse from from streaming excel sheet to python pandas df? - python

I have an excel sheet with [.xls] format containing live streaming of stock data from a software.
I want to read and process the data from the sheet in python after every 5 seconds.
Python is getting refreshed data only when i manually save the .xls file. It is not automatically getting new data points on running script after 1st time.
Any help?

This should help you:
import threading
import pandas as pd
def main_task():
threading.Timer(5.0, main_task).start() #Repeats the function main_task every 5 seconds
df = pd.read_excel("filename.xls") #Reads the excel file
main_task() #Calls the function
This code will update your pandas DataFrame with the new values every 5 seconds.

Related

Read excel file to python but time format has changed

when I read the excel file to python.
import pandas as pd
data = pd.read_excel('1.xlsx')
data
Some part of my time data uploaded successfully but the another part of time data has some problems. Problem is on these columns (in_time, call_time, process_in_time, out_time).
Why is this happened?
And how to handle and normalize this time data ?

Python: Save dataframe to excel (win32.com)

I have a code that downloads data from yahoo finance to df for list of stocks. Than I create new spreadsheet for each stock. But I cannot manage to copy data from df to this spreadsheet.
n=number_of_stocks
m = 0
while n > 0:
x = Input_Stock_Names[m]
m+=1
n-=1
df = pdr.get_data_yahoo(x,starting_date,ending_date)
df = df.reset_index()
ExcelWrksht = ExcelWrkbook.Worksheets.Add()
ExcelWrksht.Name = x
ExcelWrksht = ExcelWrkbook.Worksheets(x)
Also excel file is open while code is running.
If the task is to simply store downloaded data, then using win32com is over-engineering. Simply use the facilities within pandas to write to the Excel file in the .xlsx format:
import pandas as pd
import yfinance as yf
from datetime import date,timedelta
#90 days of history from today
end_date = date.today()
start_date = end_date - timedelta(days=90)
Input_Stock_Names = ['AAPL','TSLA','MSFT']
with pd.ExcelWriter('c:\\somepath\\stocks.xlsx',mode='w') as ew:
for stock in Input_Stock_Names:
df = yf.download(stock,start=start_date,end=end_date)
df.to_excel(ew,stock)
This will create a new Excel file, with one sheet for each stock.
win32com allows you to 'drive' Excel, and do pretty much everything you would do if you had the Excel application open. However, it is relatively slow: the Excel application has to be started (and closed) and all the data and commands have to cross the 'process boundary' from one process (python) to the other (Excel).
Using ExcelWriter you simply write data to a file in the .xlsx format, so that Excel can read it later. If all you want to do is store data this is very much more efficient than using win32com.

Saving data to an Excel file but too many entries

I am using this script to grab a CSV from a local microcontroller and am storing the information in an Excel file. The issue I am running into is I hit the limit for how many entries can be in an Excel file so I need to find a way to adapt the script to say something like
if excel_file == full:
open new excel sheet and print data there
Does anyone have any ideas?
Here is the exact error in case anyone is curious:
ValueError('This sheet is too large! Your sheet size is: 1744517, 27 Max sheet size is: 1048576, 16384')
Solved by putting everything into a pandas dataframe and using
df_1 = df.iloc[:1000000,:]
df_2 = df.iloc[1000001:,:]
which splits the df after a million entries then added them to different sheets of the same Excel file.

Python: load excel header without loading remaining data

I am working with very big Excel files, which take a long time to be loaded with Pandas in Python. Before processing the data, the user has to select quite a few options related to the data, which only require the names of the each column in each dataset. It is very inconvenient for the user to have to wait sometimes minutes until the data is loaded to be able to select the necessary options and then let the program do the actual processing for another few minutes.
So, my question is: is there a way to load only the data header from an Excel file with Python? In a way I think of it as an alternate version to the "skiprows" parameter in the read_excel Pandas function, where instead of skipping rows in the beginning of the data, I would like to skip rows at the end of the data. I want to emphasize that my goal is to reduce the time Python takes to load the files. I also know there are ways to do this with csv files, but unfortunately it didn't help me.
Thank you for the help!
You can try to use the sxl module (https://pypi.org/project/sxl/). Here is the code I tried for a large excel file (around 75,000 rows) and the timing results:
from datetime import datetime
startTime = datetime.now()
import pandas as pd
import sxl
startTime = datetime.now()
df = pd.read_excel('\\Big_Excel.xlsx')
print("Time taken to load whole data with pandas read excel is {}".format((datetime.now() - startTime)))
startTime = datetime.now()
df = pd.read_excel('\\Big_Excel.xlsx', nrows = 5)
print("Time taken with top 5 rows with pandas read excel is {}".format((datetime.now() - startTime)))
startTime = datetime.now()
wb = sxl.Workbook('\\Big_Excel.xlsx')
ws = wb.sheets[1]
data = ws.head(5)
print("Time taken to load top 5 rows using sxl is {}".format((datetime.now() - startTime)))
Pandas read excel loads the whole data in memory, so there is not much of a difference difference in timing. Here are the outputs from the above:
Time taken to load whole data with pandas read excel is 0:00:49.174538
Time taken with top 5 rows with pandas read excel is 0:00:44.478523
Time taken to load top 5 rows using sxl is 0:00:00.671717
I hope this helps!!
You can use 'skipfooter' parameter or 'nrows' parameter in both .xlsx & .csv. However, both cannot be used together.
path = r'c:\users\abc\def\stack.xlsx'
df = pd.read_excel(path, skipfooter = 99999)
which means, 99999 rows will be skipped from footer to top & remaining records from header will load.
path = r'c:\users\abc\def\stack.xlsx'
df = pd.read_excel(path, nrows= 5)
which means, first 5 rows will be shown with header.
Also refer this Stack over flow Question.
from dask import dataframe as dd
df= dd.read_csv(“filename”)
Trust me its fast I am reading 800 mb of file

Update Excel Spreadsheet with Real-Time Python Data

I'm quite new to Python and have mostly targeted learning the language exactly to automate some processes and update / populate excel spreadsheets with realtime data. Is there a way (e.g. through openpyxl) to update specific cells with data that's extracted through python packages such as pandas or web scraping through BeautifulSoup ?
I already have the necessary code to extract the data-series that I need for my project in Python but am stuck entirely on how to link this data to excel.
import pandas as pd
import pandas_datareader.data as web
import datetime as dt
start = dt.datetime(2000,1,1)
end = dt.datetime.today()
tickers = [
"PYPL",
"BABA",
"SAP"
]
df = web.DataReader (tickers, 'yahoo', start, end)
print (df.tail(365)['Adj Close'])
Pandas has a method to export a Dataframe to Excel. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html
filename = "output.xlsx"
df.to_excel(filename)
One option is to run your python script run on a schedule and output to .csv or another format that Excel can link to. This option allows the data to be updated whenever the python script is executed.
Setup:
Output your dataframe to csv/database or other Excel readable format
Setup your python file to run on a schedule (either by scheduling, or a loop with a delay)
Create a data connection from Excel to your python outputted file/database
Build pivot tables based on table in Excel
Refresh data connection/pivot tables in Excel to get the new data
(Appreciate that this is an old question). Real time data in Excel is possible with xlOil. xlOil allows you to very easily define an Excel RTD (real time data) function in python. Excel's RTD functions operate outside the normal calc cycle and can push data onto a sheet.
Your example could be written as:
import xloil, datetime as dt, asyncio
import pandas_datareader.data as web
start = dt.datetime(2000,1,1)
#xloil.func
async def pyGetTickers(names:list, fetch_every_secs=10):
while True:
yield web.DataReader(
names, 'yahoo', start, dt.datetime.now())
await asyncio.sleep(fetch_every_secs)
Which would appear as a worksheet function pyGetTickers.
One easy solution is using xlwings library
import xlwings as xw
..
xw.Book(file_path).sheets['Sheet_name'].range('A1').value = df
this would print out your df to cell A1 of an excel file, via COM - which means it actually writes the values while file is open.
Hope this is helpful

Categories

Resources