I am developing one application in python using MS Excel as my frontend for python. Where I need to read excel cells every time to check if the value is update by user. After Update there is some logic to be called to Pull data from DB and put it to Next cell.
Input as my search field and output as a table
I am using following Logic but its consuming a lot of resources and making life difficult if using cursor to move from one cell to another.
sht = self.wb.sheets[0]
while True:
search_id = sht.range('C10').value
search_result = search_for_id(search_id)
sht.range("C12").value = search_result
time.sleep(1)
My excel is getting stuck every time i use. I can increase delay but it will affect my performance. I need this to work on live basis. without even a single second of delay will cost much.
Note: I am working on fintech project. for ease of understanding made my scenario very simple
Related
I am trying to develop a Python program for a task. The task is to find all routes in a state/county which are not reachable within 60 minutes from certain locations.
The task will take data (route info, speed limit, etc.) from excel. The excel data will be changed because there are many states/counties in USA/Canada.
I want to link the python code with excel such that we can get answers by clicking on the excel sheet.
Can you tell me how this can be done? Any ideas?
I am regularly receiving a spreadsheet from an external source (via google docs) that I have to convert into a local (kinda proprietary) format. To do that, I have written a script that can convert the spreadsheet as an ODS file into the needed format (non-ODS).
This script needs to interact with a lot of higher-level business-specific PHP stuff, so I use PhpSpreadsheet for this purpose (https://github.com/PHPOffice/PhpSpreadsheet/).
This PHP library does theoretically everything I need, but it cannot deal with overly complex spreadsheets without taking an gigantic amount of time dealing with all the cross-referencing formulas. To speed up the processing in the script, I manually prepare the ODS file by hand by converting all formulas to values (Select all Cells in the needed Sheets, then trigger [Data] > [Calculate] > [Formula to Value]) in the needed sheets. Then I delete all the unneeded sheets (which otherwise only contain source-data for the replaced formulas). The resulting file is a lot smaller and does not contain any formulas. The execution of the PHP script finishes within a few seconds with the simplified spreadsheet file, while it runs out of memory after a long while with the original spreadsheet file.
I now seek to automate this process of converting all the formulas to values using a new python script (This needs to happen on a linux server, so my best bet would be a headless libreoffice controlled via an UNO socket in python, correct?).
So far I have managed to connect to the libreoffice UNO socket and manipulate the cells via the old OpenOffice-API (https://www.openoffice.org/api/docs/common/ref/com/sun/star/sheet/module-ix.html).
My current big question is:
How do I access the UI-Formula to Value-functionality on all cells of a sheet at once via the UNO API in Python?
I have tried searching the old OpenOffice API documentation for this for a while, but so far I cannot find what I am looking for.
Currently the python script looks (in essence) like this:
import uno
localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext(
"com.sun.star.bridge.UnoUrlResolver",
localContext
)
context = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
serviceManager = context.ServiceManager
desktop = serviceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context)
# com.sun.star.lang.XComponent / com.sun.star.sheet.SpreadsheetDocument
document = desktop.getCurrentComponent()
# com.sun.star.sheet.XSpreadsheets / XNameAccess
sheets = document.getSheets()
# com.sun.star.sheet.XSpreadsheet
# https://www.openoffice.org/api/docs/common/ref/com/sun/star/sheet/XSpreadsheet.html
sheet = sheets.getByName('OneOfTheSheets')
#print(sheet.getCellRangeByName("A1:AP1000"))
# WAY TOO SLOW AND DESTRUCTIVE:
for row in range(0, 1000):
for column in range(0, 42):
cell = sheet.getCellByPosition(column, row)
cell.setFormula(cell.getString())
Thank you for any help you can provide.
#Background
I am currently playing with some web scraping project as I am learning python.
I have a project which scrapes products with information about price etc using Selenium.
Than I add every record to pandas DF, do some additional data manipulation and than store data in csv and upload to google drive. This runs every night
#Question itself
I would like to watch price changes, new products etc. Would you recommend, how to store data with date key, so there is option to flag new products etc?
My idea is to store every load in one csv and add one column with "date_of_load"... But this seems noob_like... Maybe store data in PostrgreDB? I would like to start learning SQL, so I would try making my own DB.
Thanks for your ideas
As for me better to use NoSQL (Mongo) for this task. You can create JSON (data of prices) with keys are date.
This can help you:
https://www.mongodb.com/blog/post/getting-started-with-python-and-mongodb
https://www.mongodb.com/python
https://realpython.com/introduction-to-mongodb-and-python/
https://www.google.com/search?&q=python+mongo
That is cool! I would suggest sqlite3 (https://docs.python.org/3/library/sqlite3.html) just to get a feeling with SQL. As you can see, it says "It’s also possible to prototype an application using SQLite and then port the code to a larger database such as PostgreSQL or Oracle", which is sort of what you suggested(?), so it could be a nice place to start.
However, CSV might do just fine. As long as there is not too much data (it takes forever to load(and process) all your necessary data), it doesn't matter much how you store it as long as you manage to apply it as you desire.
Problem
I was trying to implement an web API(based on Flask), which would be used to query the database given some specific conditions, reconstruct the data and finally export the result to a .csv file.
Since the amount of data is really really huge, I can not construct the whole dataset and generate the .csv file all at once(e.g. create a DataFrame using pandas and finally call df.to_csv()), because that would cause a slow query and maybe the http connection would end up timeout.
So I create a generator which query the database 500 records per time and yield the result one by one, like:
def __generator(q):
[...] # some code here
while True:
if records == None:
break
records = q[offset:offset+limit] # q means a sqlalchemy query object
[...] # omit some reconstruct code
for record in records:
yield record
and finally construct a Response object, and send .csv to client side:
return Response(__generate(q), mimetype='text/csv') # Flask
The generator works well and all data are encoded by 'uft-8', but when I try to open the .csv file using Microsoft Excel, it appears to be messy code.
Measures Already Tried
add a BOM header to the export file, doesn't work;
using some other encode like 'gb18030', and 'cp936', most of the messy code disappear, some still remained, and some part of the table structure become weird.
My Question Is
How can I make my code compatible to Microsoft Excel? That means at least two conditions should be satisfied:
no messy code, well displayed;
well structured table;
I would be really appreciated for your answer!
How are you importing the csv file to excel? Have you tried importing the csv as a text file?
By reading as text format for each column, it wont modify columns that it reads as different types like dates. Your code may be correct, and excel may just be modifying the data when it parses it as a csv - by importing as text format, it wont modify anything.
I would recommend you look into xlutils. It's been around for quite some time, and our company has used it both for reading configuration files to run automated test and for generating reports of test results.
I'm trying to use python-gdata to populate a worksheet in a spreadsheet. The problem is, updating individual cells is woefully slow. (By doing them one at a time, each request takes about 500ms!) Thus, I'm attempting to use the batch mechanism built into gdata to speed things up.
The problem is, I can't seem to insert new cells. I've scoured the web for examples, but I couldn't find any. This is my code, which I've adapted from an example in the documentation. (The documentation does not actually say how to insert cells, but it does show how to update cells. Since this is a new worksheet, it has no cells.)
Furthermore, with debugging enabled I can see that my requests returns HTTP 200 OK.
import time
import gdata.spreadsheet
import gdata.spreadsheet.service
import gdata.spreadsheets.data
email = '<snip>'
password = '<snip>'
spreadsheet_key = '<snip>'
worksheet_id = 'od6'
spr_client = gdata.spreadsheet.service.SpreadsheetsService()
spr_client.email = email
spr_client.password = password
spr_client.source = 'Example Spreadsheet Writing Application'
spr_client.ProgrammaticLogin()
# create a cells feed and batch request
cells = spr_client.GetCellsFeed(spreadsheet_key, worksheet_id)
batchRequest = gdata.spreadsheet.SpreadsheetsCellsFeed()
# create a cell entry
cell_entry = gdata.spreadsheet.SpreadsheetsCell()
cell_entry.cell = gdata.spreadsheet.Cell(inputValue="foo", text="bar", row='1', col='1')
# add the cell entry to the batch request
batchRequest.AddInsert(cell_entry)
# submit the batch request
updated = spr_client.ExecuteBatch(batchRequest, cells.GetBatchLink().href)
My hunch is that I'm simply misunderstanding the API, and that this should work with changes. Any help is much appreciated.
I recently ran across this as well (when trying to delete) but per the docs here it doesn't appear that batch insert or delete operations are supported:
A number of batch operations can be combined into a single request.
The two types of batch operations supported are query and update.
insert and delete are not supported because the cells feed cannot be
used to insert or delete cells. Remember that the worksheets feed must
be used to do that.
I'm not sure of your use case, but would using the ListFeed help at all? It still won't let you batch operations, so there will be the associated latency, but it may be more tolerable than what you're dealing with now (or were at the time).
As of Google I/O 2016, the latest Google Sheets API supports batch cell updates (and reads). Be aware however, that GData is now deprecated, along with most GData-based APIs, including your sample above as the new API is not GData. Also putting email addresses and passwords in plain text in code is a security risk, so new(er) Google APIs use OAuth2 for authorization. You need to get the latest Google APIs Client Library for Python. It's as easy as pip install -U google-api-python-client [or pip3 for Python 3].
As far as batch insert goes, here's a simple code sample. Assume you have multiple rows of data in rows. To mass-inject this into a Sheet, say with file ID SHEET_ID & starting at the upper-left in cell A1, you'd make one call like this:
SHEETS.spreadsheets().values().update(spreadsheetId=SHEET_ID, range='A1',
body={'values': rows}, valueInputOption='RAW').execute()
If you want a longer example, see the first video below where those rows are read out of a relational database. For those new to this API, here's one code sample from the official docs to help get you kickstarted. For slightly longer, more "real-world" examples, see these videos & blog posts:
Migrating SQL data to a Sheet plus code deep dive post
Formatting text using the Sheets API plus code deep dive post
Generating slides from spreadsheet data plus code deep dive post
The latest Sheets API provides features not available in older releases, namely giving developers programmatic document-oriented access to a Sheet as if you were using the user interface (create frozen rows, perform cell formatting, resizing rows/columns, adding pivot tables, creating charts, etc.)
However, to perform file-level access on Sheets, such as import/export, copy, move, rename, etc., you'd use the Google Drive API. Examples of using the Drive API:
Exporting a Google Sheet as CSV (blogpost)
"Poor man's plain text to PDF" converter (blogpost) (*)
(*) - TL;DR: upload plain text file to Drive, import/convert to Google Docs format, then export that Doc as PDF. Post above uses Drive API v2; this follow-up post describes migrating it to Drive API v3, and here's a developer video combining both "poor man's converter" posts.