Ways to reduce loading time of wxPython GUI - python

This question is a continuation of my question Desktop GUI Loading Slow.
I have a desktop GUI developed in wxPython which uses sqlAlchemy for many record fetch queries from database. I am putting the fetched records in Python dictionaries and populating the GUI using that. But, since I am reading thousands of data in background, the GUI gets stuck and loads very slowly. Now the question is:
Should I create individual threads for each of the sqlalchemy data fetch queries? If the answer for this is yes, is the wx.callAfter() the method I have to focus on (for each query)? If someone give sample/untested code or link then it will be helpful.
Is there any other way to implement lazy loading in a desktop GUI ?
P.S.: Please note that this is first time I am doing multithreading and wxPython. I was earlier web developer on Python/Django. Also, I can't share codes due to restriction.

You should redesign your app so that data loading part and data display part are separate. Load data in a separate thread which should populate a DB Model in your app, use that Model to populate GUI, so when app loads GUI will load fast but will display 'loading...' or something like that at places where data has not loaded yet.
Another way to speedup things is don't run queries until they are needed e.g. wrap them in a class with get method, on get query DB, but all of it will depend on context.
Also if GUI is mostly for view then you can may be load a first set of small data and push other data to other views which user has to go thru some menu or tabs, that way you can delay loading until it is needed or load them in background.

There are several ways to prevent your GUI from hanging. Of course you can do multi-threading and stuff the records in a global dictionary. But you'd probably run into the global interpreter lock (GIL), which would probably not help the reponsiveness of your GUI.
The first option is to use the event-driven nature of the GUI toolkit and use the "timeout" or "timer" functionality provided by the toolkit to call a function that loads a couple of records every time it is called. A generator function would work nicely for that. This is probably the easiest to implement. How many records you can load in one go depends on the speed of the machine. I would suggest to start with a single record, measure the loading of records, and increment the amount of record so that each invocation doesn't take longer than say 0.1 second.
Second is to use a second process for loading data, and then send it to the GUI in small chunks. Using a separate process (using the multiprocessing module) has the advantage that you cannot run into Python's GIL. Note that this is method more or less includes the first method, because you still have to process messages from the second process in the event loop of the GUI.

You don't mention which widgets you use to load your data into, but if you use wx.grid.Grid or the ListCtrl, then yes, there are some "lazy" loading stuff in the virtual implementations of the respective widgets. See the wxPython demo for a grid that can hold a million cells, for example. Also see Anurag's answer. You really don't need to load all the data at once. Just load the data that you can actually display. Then you can load more when the user scrolls (or pre-load it in a background thread).

Related

live stock tick data transfer from one python code to other python code or flutter app

I am working on a personal project and would appreciate community's help. I am not highly experienced and my question may be trivial too, please apologize.
i would like to transfer live tick data (dataframe) of a stock price from one python code to other. I understand multiprocessing pipes/que may be used. However, i wish to post the question here so that I will able to take a wise decision. Request your thoughts on the below requirement.
Basically, I have live tick by tick data of few hundred stock symbols coming in from websocket. The websocket's call back function slightly cleans up the data and stores it in a RedisTimeseries dB (and to influxDB in future). I do not wish to add extra overheads to this code since it handles realtime processing and storing function and I prefer it to be a single function application.
At the same time, i need to forward the tick data packet to a different code which will do analysis (example watchlist update / plotting / auto trade algorithm etc). I believe that redis does not emit any signals on data update (may be sql dB can). I am currently polling the dB for a particular symbol data several times a second. This is inefficient and moreover, I would like the algo to act on data as and when it arrives instead of polling each symbol data continuously which inturn loads the dB.
is there a efficient method like signals and slot / a call back method like the websocket does? i prefer the first app to just emit the dataframe from *.py app and utilize it in the other *.py application. Actually, I definitely have to use a second *.py code for auto trade algo (cannot be combined with the first). In the near future, I may also consider using a front end app for plotting graphs / UI etc and may be written in flutter or some other language. So the generic requirement is to emit the dataframe from *.py app and collect it from some other app irrespective of the language used by 2nd application.

Python API implementing a simple log file

I have a Python script that will regulary check an API for data updates. Since it runs without supervision I would like to be able monitor what the script does to make sure it works properly.
My initial thought is just to write every communication attempt with the API to a text file with date, time and if data was pulled or not. A new line for every imput. My question to you is if you would recommend doing it in another way? Write to excel for example to be able to sort the columns? Or are there any other options worth considering?
I would say it really depends on two factors
How often you update
How much interaction do you want with the monitoring data (i.e. notification, reporting etc)
I have had projects where we've updated Google Sheets (using the API) to be able to collaboratively extract reports from update data.
However, note that this means a web call at every update, so if your updates are close together, this will affect performance. Also, if your app is interactive, there may be a delay while the data gets updated.
The upside is you can build things like graphs and timelines really easily (and collaboratively) where needed.
Also - yes, definitely the logging module as answered below. I sort of assumed you were using the logging module already for the local file for some reason!
Take a look at the logging documentation.
A new line for every input is a good start. You can configure the logging module to print date and time automatically.

How to load from disk, process, then store data in a common hdf5 concurrently with python, pyqt, h5py?

Premise:
I've created a mainwindow. One of the drop down menu's has an 'ProcessData' item. When it's selected, I create a QProgressDialog. I then do a lot of processing in the main loop and periodically update the label and percentage in the QProgressDialog.
My processing looks like: read a large amount of data from a file (numpy memmapped array), do some signal processing, write the output to a common h5py file. I iterate over the available input files, and all of the output is stored in a common h5py hdf5 file. The entire process takes about two minutes per input file and pins one CPU to 100%.
Goal:
How do I make this process non-blocking, so that the UI is still responsive? I'd still like my processing function to be able to update the QProgressDialog and it's associated label.
Can I extend this to process more than one dataset concurrently and retain the ability to update the progressbar info?
Can I write into h5py from more than one thread/process/etc.? Will I have to implement locking on the write operation?
Software Versions:
I use python 3.3+ with numpy/scipy/etc. UI is in PyQt4 4.11/ Qt 4.8, although I'd be interested in solutions that use python 3.4 (and therefore asyncio) or PyQt5.
This is quite a complex problem to solve, and this format is not really suited to providing complete answers to all your questions. However, I'll attempt to put you on the right track.
How do I make this process non-blocking, so that the UI is still responsive? I'd still like my processing function to be able to update the QProgressDialog and it's associated label.
To make it non-blocking, you need to offload the processing into a Python thread or QThread. Better yet, offload it into a subprocess that communicates progress back to the main program via a thread in the main program.
I'll leave you to implement (or ask another question on) creating subprocesses or threads. However, you need to be aware that only the MainThread can access GUI methods. This means you need to emit a signal if using a QThread or use QApplication.postEvent() from a python thread (I've wrapped the latter up into a library for Python 2.7 here. Python 3 compatibility will come one day)
Can I extend this to process more than one dataset concurrently and retain the ability to update the progressbar info?
Yes. One example would be to spawn many subprocesses. Each subprocess can be configured to send messages back to an associated thread in the main process, which communicates the progress information to the GUI via the method described for the above point. How you display this progress information is up to you.
Can I write into h5py from more than one thread/process/etc.? Will I have to implement locking on the write operation?
You should not write to a hdf5 file from more than one thread at a time. You will need to implement locking. I think possibly even read access should be serialised.
A colleague of mine has produced something along these lines for Python 2.7 (see here and here), you are welcome to look at it or fork it if you wish.

Python program on server - control via browser

I have to setup a program which reads in some parameters from a widget/gui, calculates some stuff based on database values and the input, and finally sends some ascii files via ftp to remote servers.
In general, I would suggest a python program to do the tasks. Write a Qt widget as a gui (interactively changing views, putting numbers into tables, setting up check boxes, switching between various layers - never done something as complex in python, but some experience in IDL with event handling etc), set up data classes that have unctions, both to create the ascii files with the given convention, and to send the files via ftp to some remote server.
However, since my company is a bunch of Windows users, each sitting at their personal desktop, installing python and all necessary libraries on each individual machine would be a pain in the ass.
In addition, in a future version the program is supposed to become smart and do some optimization 24/7. Therefore, it makes sense to put it to a server. As I personally rather use Linux, the server is already set up using Ubuntu server.
The idea is now to run my application on the server. But how can the users access and control the program?
The easiest way for everybody to access something like a common control panel would be a browser I guess. I have to make sure only one person at a time is sending signals to the same units at a time, but that should be doable via flags in the database.
After some google-ing, next to QtWebKit, django seems to the first choice for such a task. But...
Can I run a full fledged python program underneath my web application? Is django the right tool to do so?
As mentioned previously, in the (intermediate) future ( ~1 year), we might have to implement some computational expensive tasks. Is it then also possible to utilize C as it is within normal python?
Another question I have is on the development. In order to become productive, we have to advance in small steps. Can I first create regular python classes, which later on can be imported to my web application? (Same question applies for widgets / QT?)
Finally: Is there a better way to go? Any standards, any references?
Django is a good candidate for the website, however:
It is not a good idea to run heavy functionality from a website. it should happen in a separate process.
All functions should be asynchronous, I.E. You should never wait for something to complete.
I would personally recommend writing a separate process with a message queue and the website would only ask that process for statuses and always display a result immediatly to the user
You can use ajax so that the browser will always have the latest result.
ZeroMQ or Celery are useful for implementing the functionality.
You can implement functionality in C pretty easily. I recomment however that you write that functionality as pure c with a SWIG wrapper rather that writing it as an extension module for python. That way the functionality will be portable and not dependent on the python website.

Where should I define data gathering methods in Django

I am working on a simple music website which gathers data from Last.Fm, MusicBrainz, and others.
So far when I add an Artist (just by name) it gathers all it's albums and songs and saves them. I decided to put this in a customized save method for the Artist.
This process can last up to 2 or 3 minutes using runserver.
I was wondering if this is the "right" place to do all this things, since when I add a new Artist I have to wait for the page to load when it finishes all the gathering.
Is there a better place to do this?
You'd be much better off doing this sort of task in a background process – one that doesn't block the request/response cycle of HTTP. There's a couple of decent ways to do, you can write a management command that you would run via python manage.py ....
However, I strongly suggest you have a look at Celery. There's more overhead initially to get it setup, but it's really a better direction to head in (rather than rolling your own background process stuff).

Categories

Resources