Where should I define data gathering methods in Django

Where should I define data gathering methods in Django - python

I am working on a simple music website which gathers data from Last.Fm, MusicBrainz, and others.
So far when I add an Artist (just by name) it gathers all it's albums and songs and saves them. I decided to put this in a customized save method for the Artist.
This process can last up to 2 or 3 minutes using runserver.
I was wondering if this is the "right" place to do all this things, since when I add a new Artist I have to wait for the page to load when it finishes all the gathering.
Is there a better place to do this?

You'd be much better off doing this sort of task in a background process – one that doesn't block the request/response cycle of HTTP. There's a couple of decent ways to do, you can write a management command that you would run via python manage.py ....
However, I strongly suggest you have a look at Celery. There's more overhead initially to get it setup, but it's really a better direction to head in (rather than rolling your own background process stuff).

Related

How do I run Python scripts automatically, while my Flask website is running on a VPS?

Okay, so basically I am creating a website. The data I need to display on this website is delivered twice daily, where I need to read the delivered data from a file and store this new data in the database (instead of the old data).
I have created the python functions to do this. However, I would like to know, what would be the best way to run this script, while my flask application is running? This may be a very simple answer, but I have seen some answers saying to incorporate the script into the website design (however these answers didn't explain how), and others saying to run it separately. The script needs to run automatically throughout the day with no monitoring or input from me.
TIA

Generally it's a really bad idea to put a webserver to handle such tasks, that is the flask application in your case. There are many reasons for it so just to name a few:
Python's Achilles heel - GIL.
Sharing system resources of the application between users and other operations.
Crashes - it happens, it could be unlikely but it does. And if you are not careful, the web application goes down along with it.
So with that in mind I'd advise you to ditch this idea and use crontabs. Basically write a script that does whatever transformations or operations it needs to do and create a cron job at a desired time.

django run a daemon?

I would like to recreate some data in my project every 30 minutes (prices that change). also I got another job that needs to refresh every minute.
Now I heard I should use a daemon. but I'm not sure how that works.
Can someone put me into the right direction.
Also should i make an extra model to save that temporary data or is that part of the daemon?
PS: not sure if stack overflow can be used for this sort of questions, but i don't know where to search for this sort of information

You don't want a daemon. You just want cron jobs.
The best thing to do is to write your scripts as custom Django management commands and use cron to trigger them to run at the specified intervals.

Python program on server - control via browser

I have to setup a program which reads in some parameters from a widget/gui, calculates some stuff based on database values and the input, and finally sends some ascii files via ftp to remote servers.
In general, I would suggest a python program to do the tasks. Write a Qt widget as a gui (interactively changing views, putting numbers into tables, setting up check boxes, switching between various layers - never done something as complex in python, but some experience in IDL with event handling etc), set up data classes that have unctions, both to create the ascii files with the given convention, and to send the files via ftp to some remote server.
However, since my company is a bunch of Windows users, each sitting at their personal desktop, installing python and all necessary libraries on each individual machine would be a pain in the ass.
In addition, in a future version the program is supposed to become smart and do some optimization 24/7. Therefore, it makes sense to put it to a server. As I personally rather use Linux, the server is already set up using Ubuntu server.
The idea is now to run my application on the server. But how can the users access and control the program?
The easiest way for everybody to access something like a common control panel would be a browser I guess. I have to make sure only one person at a time is sending signals to the same units at a time, but that should be doable via flags in the database.
After some google-ing, next to QtWebKit, django seems to the first choice for such a task. But...
Can I run a full fledged python program underneath my web application? Is django the right tool to do so?
As mentioned previously, in the (intermediate) future ( ~1 year), we might have to implement some computational expensive tasks. Is it then also possible to utilize C as it is within normal python?
Another question I have is on the development. In order to become productive, we have to advance in small steps. Can I first create regular python classes, which later on can be imported to my web application? (Same question applies for widgets / QT?)
Finally: Is there a better way to go? Any standards, any references?

Django is a good candidate for the website, however:
It is not a good idea to run heavy functionality from a website. it should happen in a separate process.
All functions should be asynchronous, I.E. You should never wait for something to complete.
I would personally recommend writing a separate process with a message queue and the website would only ask that process for statuses and always display a result immediatly to the user
You can use ajax so that the browser will always have the latest result.
ZeroMQ or Celery are useful for implementing the functionality.
You can implement functionality in C pretty easily. I recomment however that you write that functionality as pure c with a SWIG wrapper rather that writing it as an extension module for python. That way the functionality will be portable and not dependent on the python website.

Speed up feedparser

I'm using feedparser to print the top 5 Google news titles. I get all the information from the URL the same way as always.
x = 'https://news.google.com/news/feeds?pz=1&cf=all&ned=us&hl=en&topic=t&output=rss'
feed = fp.parse(x)
My problem is that I'm running this script when I start a shell, so that ~2 second lag gets quite annoying. Is this time delay primarily from communications through the network, or is it from parsing the file?
If it's from parsing the file, is there a way to only take what I need (since that is very minimal in this case)?
If it's from the former possibility, is there any way to speed this process up?

I suppose that a few delays are adding up:
The Python interpreter needs a while to start and import the module
Network communication takes a bit
Parsing probably consumes only little time but it does
I think there is no straightforward way of speeding things up, especially not the first point. My suggestion is that you have your feeds downloaded on a regularly basis (you could set up a cron job or write a Python daemon) and stored somewhere on your disk (i.e. a plain text file) so you just need to display them at your terminal's startup (echo would probably be the easiest and fastest).
I personally made good experiences with feedparser. I use it to download ~100 feeds every half hour with a Python daemon.

Parse at real time not better case if you want faster result.
You can try does it asynchronously by Celery or by similar other solutions. I like the Celery, it gives many abilities. There are abilities as task as the cron or async and more.

Ways to reduce loading time of wxPython GUI

This question is a continuation of my question Desktop GUI Loading Slow.
I have a desktop GUI developed in wxPython which uses sqlAlchemy for many record fetch queries from database. I am putting the fetched records in Python dictionaries and populating the GUI using that. But, since I am reading thousands of data in background, the GUI gets stuck and loads very slowly. Now the question is:
Should I create individual threads for each of the sqlalchemy data fetch queries? If the answer for this is yes, is the wx.callAfter() the method I have to focus on (for each query)? If someone give sample/untested code or link then it will be helpful.
Is there any other way to implement lazy loading in a desktop GUI ?
P.S.: Please note that this is first time I am doing multithreading and wxPython. I was earlier web developer on Python/Django. Also, I can't share codes due to restriction.

You should redesign your app so that data loading part and data display part are separate. Load data in a separate thread which should populate a DB Model in your app, use that Model to populate GUI, so when app loads GUI will load fast but will display 'loading...' or something like that at places where data has not loaded yet.
Another way to speedup things is don't run queries until they are needed e.g. wrap them in a class with get method, on get query DB, but all of it will depend on context.
Also if GUI is mostly for view then you can may be load a first set of small data and push other data to other views which user has to go thru some menu or tabs, that way you can delay loading until it is needed or load them in background.

There are several ways to prevent your GUI from hanging. Of course you can do multi-threading and stuff the records in a global dictionary. But you'd probably run into the global interpreter lock (GIL), which would probably not help the reponsiveness of your GUI.
The first option is to use the event-driven nature of the GUI toolkit and use the "timeout" or "timer" functionality provided by the toolkit to call a function that loads a couple of records every time it is called. A generator function would work nicely for that. This is probably the easiest to implement. How many records you can load in one go depends on the speed of the machine. I would suggest to start with a single record, measure the loading of records, and increment the amount of record so that each invocation doesn't take longer than say 0.1 second.
Second is to use a second process for loading data, and then send it to the GUI in small chunks. Using a separate process (using the multiprocessing module) has the advantage that you cannot run into Python's GIL. Note that this is method more or less includes the first method, because you still have to process messages from the second process in the event loop of the GUI.

You don't mention which widgets you use to load your data into, but if you use wx.grid.Grid or the ListCtrl, then yes, there are some "lazy" loading stuff in the virtual implementations of the respective widgets. See the wxPython demo for a grid that can hold a million cells, for example. Also see Anurag's answer. You really don't need to load all the data at once. Just load the data that you can actually display. Then you can load more when the user scrolls (or pre-load it in a background thread).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.