Database in Excel using win32com or xlrd Or Database in mysql - python

I have developed a website where the pages are simply html tables. I have also developed a server by expanding on python's SimpleHTTPServer. Now I am developing my database.
Most of the table contents on each page are static and doesn't need to be touched. However, there is one column per table (i.e. page) that needs to be editable and stored. The values are simply text that the user can enter. The user enters the text via html textareas that are appended to the tables via javascript.
The database is to store key/value pairs where the value is the user entered text (for now at least).
Current situation
Because the original format of my webpages was xlsx files I opted to use an excel workbook as my database that basically just mirrors the displayed web html tables (pages).
I hook up to the excel workbook through win32com. Every time the table (page) loads, javascript iterates through the html textareas and sends an individual request to the server to load in its respective text from the database.
Currently this approach works but is terribly slow. I have tried to optimize everything as much as I can and I believe the speed limitation is a direct consequence of win32com.
Thus, I see four possible ways to go:
Replace my current win32com functionality with xlrd
Try to load all the html textareas for a table (page) at once through one server call to the database using win32com
Switch to something like sql (probably use mysql since it's simple and robust enough for my needs)
Use xlrd but make a single call to the server for each table (page) as in (2)
My schedule to build this functionality is around two days.
Does anyone have any thoughts on the tradeoffs in time-spent-coding versus speed of these approaches? If anyone has any better/more streamlined methods in mind please share!

Probably not the answer you were looking for, but your post is very broad, and I've used win32coma and Excel a fair but and don't see those as good tools towards your goal. An easier strategy is this:
for the server, use Flask: it is a Python HTTP server that makes it crazy easy to respond to HTTP requests via Python code and HTML templates. You'll have a fully capable server running in 5 minutes, then you will need a bit of time create code to get data from your DB and render from templates (which are really easy to use).
for the database, use SQLite (there is far more overhead intergrating with MysQL); because you only have 2 days, so
you could also use a simple CSV file, since the API (Python has a CSV file read/write module) is much simpler, less ramp up time. One CSV per user, easy to manage. You don't worry about insertion of rows for a user, you just append; and you don't implement remove of rows for a user, you just mark as inactive (a column for active/inactive in your CSV). In processing GET request from client, as you read from the CSV, you can count how many certain rows are inactive, and do a re-write of the CSV, so once in a while the request will be a little slower to respond to client.
even simpler yet you could use in-memory data structure of your choice if you don't need persistence across restarts of the server. If this is for a demo this should be acceptable limitation.
for the client side, use jQuery on top of javascript -- maybe you are doing that already. Makes it super easy to manipulate the DOM and use effects like slide-in/out etc. Get yourself the book "Learning jQuery", you'll be able to make good use of jQuery in just a couple hours.
If you only have two days it might be a little tight, but you will probably need more than 2 days to get around the issues you are facing with your current strategy, and issues you will face imminently.

Related

How to store data from web scraping poject

#Background
I am currently playing with some web scraping project as I am learning python.
I have a project which scrapes products with information about price etc using Selenium.
Than I add every record to pandas DF, do some additional data manipulation and than store data in csv and upload to google drive. This runs every night
#Question itself
I would like to watch price changes, new products etc. Would you recommend, how to store data with date key, so there is option to flag new products etc?
My idea is to store every load in one csv and add one column with "date_of_load"... But this seems noob_like... Maybe store data in PostrgreDB? I would like to start learning SQL, so I would try making my own DB.
Thanks for your ideas
As for me better to use NoSQL (Mongo) for this task. You can create JSON (data of prices) with keys are date.
This can help you:
https://www.mongodb.com/blog/post/getting-started-with-python-and-mongodb
https://www.mongodb.com/python
https://realpython.com/introduction-to-mongodb-and-python/
https://www.google.com/search?&q=python+mongo
That is cool! I would suggest sqlite3 (https://docs.python.org/3/library/sqlite3.html) just to get a feeling with SQL. As you can see, it says "It’s also possible to prototype an application using SQLite and then port the code to a larger database such as PostgreSQL or Oracle", which is sort of what you suggested(?), so it could be a nice place to start.
However, CSV might do just fine. As long as there is not too much data (it takes forever to load(and process) all your necessary data), it doesn't matter much how you store it as long as you manage to apply it as you desire.

What is the correct way to upload files to a SQL Server inside my web application?

I am developing a web application in which users can upload excel files. I know I can use the OPENROWSET function to read data from excel into a SQL Server but I am refraining from doing so because this function requires a file path.
It seems kind of indirect as I am uploading a file to a directory and then telling SQL Server go look in that directory for the file instead of just giving SQL Server the file.
The other option would be to read the Excel file into a pandas dataframe and then use the to_sql function but pandas read_excel function is quite slow and the other method I am sure would be faster.
Which of these two methods is "correct" when handling file uploads from a web application?
If the first method is not frowned upon or "incorrect", then I am almost certain it is faster and will use that. I just want an experienced developers thoughts or opinions. The webapp's backend is Python and flask.
If I am understanding your question correctly, you are trying to load the contents of an xls(s) file into a SQLServer database. This is actually not trivial to do, as depending on what is in the Excel file you might want to have one table, or more probably multiple tables based on the data. So I would step back for a bit and ask three questions:
What is the data I need to save and how should that data be structured in my SQL tables. Forget about excel at this point -- maybe just examine the first row of data and see how you need to save it.
How do I get the file into my web application? For example, when the user uploads a file you would want to use a POST form and send the file data to your server and your server to save that file (for example, either on S3, or in a /tmp folder, or into memory for temporary processing).
Now that you know what your input is (the xls(x) file and its location) and how you need to save your data (the sql schema), now it's time to decide what the best tool for the job is. Pandas is probably not going to be a good tool, unless you literally just want to load the file and dump it as-is with minimal (if any) changes to a single table. At this point I would suggest using something like xlrd if only xls files, or openpyxl for xls and xlsx files. This way you can shape your data any way you want. For example, if the user enters in malformed dates; empty cells (should they default to something?); mismatched types, etc.
In other words, the task you're describing is not trivial at all. It will take quite a bit of planning and designing, and then quite a good deal of python code once you have your design decided. Feel free to ask more questions here for more specific questions if you need to (for example, how to capture the POST data in a file update or whatever you need help with).

How to extract serialised data from Access

I want to enable a user to export some data to a web application I am building. The data from the legacy application can be accessed through MS Acces (ODBC). The web application is written in Django/Python, but that is not very relevant.
The user would have to export data from time to time and import it into the web app. The table structure in the web app more-or-less mirrors the one in the legacy application.
My question of how to get the data from Access to a format that is easily parseable in the web app. The data is from 5 different tables and interrelated. Is there a way to serialise the data from Access into an XML / JSON file? I know that you can do an XML export, but as far as I know that is limited to a query, so I wouldn't have the hierarchy... Is there a VBA library to help with the task?
You can reference Microsoft XML, v5.0 (or whatever version) in the Visual Basic Editor and create XML programmatically.
See
- Simple example
- Introduction to XML in Microsoft Windows (in depth example)
Answering my own question here. I did some googling and it looks like you can export data from a table together with selected other tables. For that, it is necessary to draw the relationships within Access.
That might also solve my problem (and without composing the XML manually). Will find out if this works and check back later.
source: http://msdn.microsoft.com/en-us/library/office/aa167823(v=office.11).aspx#odc_accessnewxmlfeatures_includingrelatedtableswhenexportingxml

Best way to scrape CSVs on the web with Python

I am looking to replace Yahoo Query Language with something more manageable and dependable. Right now we use it to scrape public CSV files and use the information in our web app.
Currently I am having trouble trying to find an alternative and it seems that scraping websites with Python is the best bet. However I don't even know where to start.
My question is what is needed to scrape a CSV, save the data and use it elsewhere in a web application using Python? Do I need a dedicated database or can I save the data a different way?
A simple explanation is appreciated
This is a bit broad, but let's divide it in separate tasks
My question is what is needed to scrape a CSV
If you mean downloading CSVs files from already known URLs, you can simply use urllib. If you don't have the CSVs URLs you'll have to obtain them somehow. If you want to get the URLs from webpages, beautifulsoup is commonly used to parse HTML. scrapy is used for larger-scale scraping.
save the data.
Do I need a dedicated database or can I save the data a different way?
Not at all. You can save the CSV files directly to your disk., store them with pickle, serialize them to JSON or use a relational or NoSQL database. What you should use depends heavily on what you want to do and what of access you need to the data (local/remote, centralized/distributed).
and use it elsewhere in a web application using Python
You'll probably want to learn how to use a web framework for that (django, flask and cherrypy are common choices). If you don't need concurrent write access, any of the storage approaches I mentioned would work with these

How to display database query results of 100,000 rows or more with HTML?

We're rewriting a website used by one of our clients. The user traffic on it is very low, less than 100 unique visitors a week. It's basically just a nice interface to their data in our databases. It allows them to query and filter on different sets of data of theirs.
We're rewriting the site in Python, re-using the same Oracle database that the data is currently on. The current version is written in an old, old version of Coldfusion. One of the things that Coldfusion does well though is displays tons of database records on a single page. It's capable of displaying hundreds of thousands of rows at once without crashing the browser. It uses a Java applet, and it looks like the contents of the rows are perhaps compressed and passed in through the HTML or something. There is a large block of data in the HTML but it's not displayed - it's just rendered by the Java applet.
I've tried several JavaScript solutions but they all hinge on the fact that the data will be present in an HTML table or something along those lines. This causes browsers to freeze and run out of memory.
Does anyone know of any solutions to this situation? Our client loves the ability to scroll through all of this data without clicking a "next page" link.
I have done just what you are describing using the following (which works very well):
jQuery Datatables
It enables you to do 'fetch as you scroll' pagination, so you can disable the pagination arrows in favor of a 'forever' scroll.
Give a try with Jquery scroll.
Instead of image scroll , you need to have data scroll.
You should poulate data in the divs , instead of images.
http://www.smoothdivscroll.com/#quickdemo
It should work. I wish.
You gotta great client anyway :-)
Something related to your Q
http://www.9lessons.info/2009/07/load-data-while-scroll-with-jquery-php.html
http://api.jquery.com/scroll/
I'm using Open Rico's LiveGrid in a project to display a table with thousands of rows in a page as an endless scrolling table. It has been working really fine so far. The table requests data on demand when you scroll through the rows. The parameters are send as simple GET parameters and the response you have to create on the serverside is simple XML. It should be possible to implement a data backend for a Rico LiveGrid in Python.
Most people, in this case, would use a framework. The best documented and most popular framework in Python is Django. It has good database support (including Oracle), and you'll have the easiest time getting help using it since there's such an active Django community.
You can try some other frameworks, but if you're tied to Python I'd recommend Django.
Of course, Jython (if it's an option), would make your job very easy. You could take the existing Java framework you have and just use Jython to build a frontend (and continue to use your Java applet and Java classes and Java server).
The memory problem is an interesting one; I'd be curious to see what you come up with.
Have you tried jqGrid? It can be buggy at times, but overall it's one of the better JavaScript grids. It's fairly efficient in dealing with large datasets. It also has a feature whereby the grid retrieves data asynchronously in chunks, but still allows continuous scrolling. It just asks for more data as the user scrolls down to it.
I did something like this a while ago and successfully implemented YUI's data table combined with Django
http://developer.yahoo.com/yui/datatable/
This gives you column sorting, pagination, scrolling and so on. It also allows you to use a variety of data sources such as JSON or XML.

Categories

Resources