Lock files while process is running - python

I have skeleton code where the user specifies a scripts.txt file that contains service names. Whenever the application is installed, the code automatically generates .service files for every entry in scripts.txt. These services are placed in /lib/systemd/system, so that whenever the machine crashes the application gets restarted.
I would like to add a deinstallation functionality that removes the .service files that were created. For that the code has to again look into scripts.txt, but if the user changed this file meanwhile, the application will not know which services to remove.
Therefore, I would like to know, if there is a way to lock a file from being edited while a certain process is running?

Related

Calling Python CLI program from django views and displaying the results back asynchronously?

I have a django project having a web interface where you can upload files and after the upload is successful it calls the cli version of the software to process it and returns the result after it's successful execution
Here, is a bit of snippet I use in my views.py
from cliproject.main import clirunner
# Some code for file upload and saving
clirunner()
This will run the command line python script main.py which is present inside cliproject/ directory and it will do some stuff and saves the output
The problem is, this whole process is Synchronous at this moment. Hence, the user page loads after they upload the file from the UI and until it gets processed by Python CLI script behind the scenes.
The flow is as
Django UI
| (User upload files)
views.py gets request and saves it somewhere
| (views run clirunner() to give python cli program control)
cliproject runs
| (After doing the stuff which is intended, it saves the output file)
views.py resumes
| (Reads the output file)
Django UI displays the output file
So, we can see the problem here that I am calling a different CLI program from views.py to do the stuff which I want. But it happens synchronously.
What I need is to make the process Asynchronous and I want to show something as a Loading Bar to notify them that the cli program is executing at the back-side and it's status.
After CLI program is done executing then loading bar will reach 100% and asynchronously django UI will display the output
I tried Celery. But I could not figure out how to make this loading bar work based on the python cli script. Any ideas?
I have a thought, you need:
A) To launch the task asynchronously
B) To be able to get the value of its current status.
Here's an idea:
1) Make the task a manage.py command that you can invoke using threads or have a Celery task call.
2) As the task executes, have it write its current completion state using a Django model to your DB of choice. (The step above is meant to simplify using the DB. You can always write directly if you need to do so.)
3) Pass the task id (assigned by you or generated by Celery, on a db column that's indexed) to the template context and use an AJAX call to ping a view that returns the percentage complete from a database lookup, then set your completion from there.
This way, your view submits and launches the task, it takes care of marking its own work, and then the other view just makes a quick db query to find out where it is.
Edited to add: You could also use the cache backend and write to a key in something like memcached, redis, etc. to avoid pings on your relational database.

How to insert uWSGI spool files entry meta data into a database table?

I'm integrating the uWSGI spooler into my system. The uWSGI spooler is a task queue that works by writing the task files to a specific directory and then background processes take those files one by one, process them and finally delete them. I want to be able to save the task file entry into a database. So everytime a task file is written to the specified directory, I want to be able to put an entry into the DB as well and as soon as the task is processed, I want to mark that process as complete in the DB. How should I go about it?
Write the db record at the start of the spooler function (in the function itself), update it before the function returns. (eventually you can write a handy decorator for it)

How to download part of a file over SFTP connection?

So I have a Python program that pulls access logs from remote servers and processes them. There are separate log files for each day. The files on the servers are in this format:
access.log
access.log-20130715
access.log-20130717
The file "access.log" is the log file for the current day, and is modified throughout the day with new data. The files with the timestamp appended are archived log files, and are not modified. If any of the files in the directory are ever modified, it is either because (1) data is being added to the "access.log" file, or (2) the "access.log" file is being archived, and an empty file takes its place. Every minute or so, my program checks for the most recent modification time of any files in the directory, and if it changes it pulls down the "access.log" file and any newly archived files
All of this currently works fine. However, if a lot of data is added to the log file throughout the day, downloading the whole thing over and over just to get some of the data at the end of the file will create a lot of traffic on the network, and I would like to avoid that. Is there any way to only download a part of the file? If I have already processed, say 1 GB of the file, and another 500 bytes suddenly get added to the log file, is there a way to only download the 500 bytes at the end?
I am using Python 3.2, my local machine is running Windows, and the remote servers all run Linux. I am using Chilkat for making SSH and SFTP connections. Any help would be greatly appreciated!
Call ResumeDownloadFileByName. Here's the description of the method in the Chilkat reference documentation:
Resumes an SFTP download. The size of the localFilePath is checked and
the download begins at the appropriate position in the remoteFilePath.
If localFilePath is empty or non-existent, then this method is
identical to DownloadFileByName. If the localFilePath is already fully
downloaded, then no additional data is downloaded and the method will
return True.
See http://www.chilkatsoft.com/refdoc/pythonCkSFtpRef.html
You could do that, or you could massively reduce your complexity by splitting the latest log file down into hours, or tens of minutes.

Python to check if file status is being uploading

Python 2.6
My script needs to monitor some 1G files on the ftp, when ever it's changed/modified, the script will download it to another place. Those file name will remain unchanged, people will delete the original file on ftp first, then upload a newer version. My script will checking the file metadata like file size and date modified to see if any difference.
The question is when the script checking metadata, the new file may be still being uploading. How to handle this situation? Is there any file attribute indicates uploading status (like the file is locked)? Thanks.
There is no such attribute. You may be unable to GET such file, but it depends on the server software. Also, file access flags may be set one way while the file is being uploaded and then changed when upload is complete; or incomplete file may have modified name (e.g. original_filename.ext.part) -- it all depends on the server-side software used for upload.
If you control the server, make your own metadata, e.g. create an empty flag file alongside the newly uploaded file when upload is finished.
In the general case, I'm afraid, the best you can do is monitor file size and consider the file completely uploaded if its size is not changing for a while. Make this interval sufficiently large (on the order of minutes).
Your question leaves out a few details, but I'll try to answer.
If you're running your status checker
program on the same server thats
running ftp:
1) Depending on your operating system, if you're using Linux and you've built inotify into your kernel you could use pyinotify to watch your upload directory -- inotify distinguishes from open, modify, close events and lets you asynchronously watch filesystem events so you're not polling constantly. OSX and Windows both have similar but differently implemented facilities.
2) You could pythonically tail -f to see when a new file is put on the server (if you're even logging that) and just update when you see related update messages.
If you're running your program remotely
3) If your status checking utility has to run on a remote host from the FTP server, you'd have to poll the file for status and build in some logic to detect size changes. You can use the FTP 'SIZE' command for this for an easily parse-able string.
You'd have to put some logic into it such that if the filesize gets smaller you would assume it's being replaced, and then wait for it to get bigger until it stops growing and stays the same size for some duration. If the archive is compressed in a way that you could verify the sum you could then download it, checksum, and then reupload to the remote site.

Where should I place the one-time operation operation in the Django framework?

I want to perform some one-time operations such as to start a background thread and populate a cache every 30 minutes as initialize action when the Django server is started, so it will not block user from visiting the website. Where should I place all this code in Django?
Put them into the setting.py file does not work. It seems it will cause a circular dependency.
Put them into the __init__.py file does not work. Django server call it many times (What is the reason?)
I just create standalone scripts and schedule them with cron. Admittedly it's a bit low-tech, but It Just Works. Just place this at the top of a script in your projects top-level directory and call as needed.
#!/usr/bin/env python
from django.core.management import setup_environ
import settings
setup_environ(settings)
from django.db import transaction
# random interesting things
# If you change the database, make sure you use this next line
transaction.commit_unless_managed()
We put one-time startup scripts in the top-level urls.py. This is often where your admin bindings go -- they're one-time startup, also.
Some folks like to put these things in settings.py but that seems to conflate settings (which don't do much) with the rest of the site's code (which does stuff).
For one operation in startserver, you can use customs commands or if you want a periodic task or a queue of taske you can use celery
__init__.py will be called every time the app is imported. So if you're using mod_wsgi with Apache for instance with the prefork method, then every new process created is effectively 'starting' the project thus importing __init__.py. It sounds like your best method would be to create a new management command, and then cron that up to run every so often if that's an option. Either that, or run that management command before starting the server. You could write up a quick script that runs that management command and then starts the server for instance.

Categories

Resources