Concurrent access to a data file in Python - python

I have a small web server doing some operations on a POST request. It reads a data file, does some checks, and then resaves the file adding some informations from the POST in it.
The issue I have is that if two clients are doing a POST request at almost the same time, both will read the same file, then one will write the file containing the new information, and then the other client will write the file containing its new information, but without the information from the other client, since that part wasn't in the file when it was read.
f = open("foo.txt", "r+")
tests_data = yaml.safe_load(f)
post_data = json.loads(web.data())
#Some checks
f.write(json.dumps(tests_data))
f.close()
I wanted the script to "wait", without giving an error, at the "open" line if the file is already opened by another process of the same code, then read the file when the other process is done and has closed the file.
Or something else if other solutions exist.

Would a standard lock not suit your needs? The lock would need to be at the module level.
from threading import Lock
# this needs to be module level variable
lock = Lock
with lock:
# do your stuff. only one thread at a time can
# work in this space...

Related

How to save and edit server rendering data?

I am using Flask server with python.
I have an integer pics_to_show. Every time a request is recieved, the user recieves pics_to_show integer. And pics_to_show gets decremented by 1.
pics_to_show is an integer that`s shared with all website users. I could make a database to save it, but I want something simpler and flexible. Is there any OTHER way to save this integer.
I made a class that saves such variables in a JSON file.
class GlobalSate:
def __init__(self, path_to_file):
self.path = path_to_file
try:
open(path_to_file)
except FileNotFoundError:
f = open(path_to_file, 'x')
f.write('{}')
def __getitem__(self, key):
file = self.load_file()
data = json.loads(file.read())
return data[key]
def __setitem__(self, key, value):
file = self.load_file()
data = json.loads(file.read())
data[key] = value
json.dump(data, open(self.path, 'w+'), indent=4)
def load_file(self):
return open(self.path, 'r+')
The class is over-simplified ofcourse. I initialize an instance in the __init__.py and import it to all routes files (I am using BluePrints).
My apllication is threaded, so this class might not work... Since multiple users are editing the data at the same time. Does anybody have another solution?
Note:
The g variable would not work, since data is shared across users not requests.
Also, what if I want to increment such variable every week? Would it be thread safe to run a seperate python script to keep track of the date, or check the date on each request to the server?
You will definitely end up with inconsistent state, with out a locking mechanism between reads and writes, you will have race conditions. so you will loose some increments.
Also you are not closing the open file, if you do that enough times it will crash the application.
Also one good pice of advice is that you do not want to write a state managing software, (database) it is very very difficult to get it right.
I think in your situation the best solution is to use sqlite, as it is a lib that you call from your app, there is not additional server.
"I could make a database to save it, but I want something simpler and flexible"
In a multi threaded app you can not go simpler than sqlite, (if you want your app to be correct that is).
If you do not like SQL then there are some simpler options:
zodb http://www.zodb.org/en/latest/guide/transactions-and-threading.html
pikleDB https://github.com/patx/pickledb
python shelve but you will need to use file system locks

python seek function return empty string on dynamic updated data

I am using seek function to extract new lines in an updated file. My code looks like this:
read_data=open('path-to-myfile','r')
read_data.seek(0,2)
while True:
time.sleep(sometime)
new_data=read_data.readlines()
do something with new_data
myfile is a csv file that will be constantly updated
The problem is that usually after several loops inside the while, new_data return nothing. It can be different loop numbers. While I checked myfile, it is still updating..... So any problem I have on my code ? Or is there any other way to do this ?
Any help appreciated !!
You have two programs accessing the same file on disk? If that is the case, then the resource may be locking. I set up an example script that writes to a file, and another file that reads for changes based on the code you provided.
So in one instance of python:
import time
while True:
time.sleep(2)
with open('test.txt','a') as read_data:
read_data.seek(0,2)
read_data.write("bibbity boopity\n")
And in another instance of python
import time
read_data=open('test.txt','r')
read_data.seek(0,2)
while True:
time.sleep(1)
new_data=read_data.readlines()
print(new_data)
In this case, the resource is updating slower than its being read, so changes printed by the bottom prog will be blank. But if I speed up the changes per second, well I still see them. But there are some instances where not all the updates are seen.
You may want to use asynchronous file reading to catch all the changes. Python 3 asyncio library doesn't support async file read/write, but curio does.
See also this question

File closes before async call finishes, causing IO error

I wrote a package that includes a function to upload something asynchronously. The intent is that the user can use my package, open a file, and upload it async. The problem is, depending on how the user writes their code, I get an IO error.
# EXAMPLE 1
with open("my_file", "rb") as my_file:
package.upload(my_file)
# I/O operation on closed file error
#EXAMPLE 2
my_file = open("my_file", "rb")
package.upload(my_file)
# everything works
I understand that in the first example the file is closing immediately because the call is async. I don't know how to fix this though. I can't tell the user they can't open files in the style of example 1. Can I do something in my package.upload() implementation to prevent the file from closing?
You can use os.dup to duplicate the file descriptor and shield the async process from a close in the caller. The duplicated handle shares other characteristics of the original such as the current file position, so you are not completely shielded from bad things the caller can do.
This also limits your process to things that have file descriptors. If you stick to using the standard file calls, then a user can hand in any file-like object instead of just a file on disk.
def upload(my_file):
my_file = os.fdopen(os.dup(my_file.fileno()))
# ...queue for async
If you are using with to open files, it will close when code block execution finishes inside with. In your case, just pass filename and open inside asynchronus function

Python: Reread contents of a file

I have a file that an application updates every few seconds, and I want to extract a single number field in that file, and record it into a list for use later. So, I'd like to make an infinite loop where the script reads a source file, and any time it notices a change in a particular figure, it writes that figure to an output file.
I'm not sure why I can't get Python to notice that the source file is changing:
#!/usr/bin/python
import re
from time import gmtime, strftime, sleep
def write_data(new_datapoint):
output_path = '/media/USBHDD/PythonStudy/torrent_data_collection/data_one.csv'
outfile = open(output_path, 'a')
outfile.write(new_datapoint)
outfile.close()
forever = 0
previous_data = "0"
while forever < 1:
input_path = '/var/lib/transmission-daemon/info/stats.json'
infile = open(input_path, "r")
infile.seek(0)
contents = infile.read()
uploaded_bytes = re.search('"uploaded-bytes":\s(\d+)', contents)
if uploaded_bytes:
current_time = strftime("%Y-%m-%d %X", gmtime())
current_data = uploaded_bytes.group(1)
if current_data != previous_data:
write_data(","+ current_time + "$" + uploaded_bytes.group(1))
previous_data = uploaded_bytes.group(1)
infile.close()
sleep(5)
else:
print "couldn't write" + strftime("%Y-%m-%d %X", gmtime())
infile.close()
sleep(60)
As is now, the (messy) script writes once correctly, and then I can see that although my source file (stats.json) file is changing, my script never picks up on any changes. It keeps on running, but my output file doesn't grow.
I thought that an open() and a close() would do the trick, and then tried throwing in a .seek(0).
What file method am I missing to ensure that python re-opens and re-reads my source file, (stats.json)?
Unless you are implementing some synchronization mechanism or could guarantee somehow atomic read and write, I think you are calling for race condition and subtle bugs here.
Imagine the "reader" accessing the file whereas the "writer" hasn't completed its write cycle. There is a risk of reading incomplete/inconsistent data. In "modern" systems, you could also hit the cache -- and not seeing file modifications "live" as they appends.
I can think of two possible solutions:
You forgot the parentheses on the close in the else of the infinite loop.
infile.close --> infile.close()
The program that is changing the JSON file is not closing the file, and therefore it is not actually changing.
Two problems I see:
Are you sure your file is really updated on filesystem? I do not know on what operating system you are playing with your code, but caching may kick your a$$ in this case, if the files is not flushed by producer.
Your problem is worth considering pipe instead of file, however I cannot guarantee what transmission will do if it stuck on writing to pipe if your consumer is dead.
Answering your problems, consider using one of the following:
pynotifyu
watchdog
watcher
These modules are intended to monitor changes on filesystem and then call proper actions. Method in your example is primitive, has big performance penalty and couple other problems mentioned already in other answers.
Ilya, would it help to check(os.path.getmtime), whether stats.json changed before you process the file?
Moreover, i'd suggest to make advantage of the fact it's JSON file:
import json
import os
import sys
dir_name ='/home/klaus/.config/transmission/'
# stats.json of daemon might be elsewhere
file_name ='stats.json'
full_path = os.path.join(dir_name, file_name)
with open(full_path) as fp:
json.load(fp)
data = json.load(fp)
print data['uploaded-bytes']
Thanks for all the answers, unfortunately my error was in the shell, and not in the script with Python.
The cause of the problem turned out to be the way I was putting the script in the background. I was doing: Ctrl+Z which I thought would put the task in the background. But it does not, Ctrl+Z only suspends the task and returns you to the shell, a subsequent bg command is necessary for the script to run on infinite loop in the background

Record streaming and saving internet radio in python

I am looking for a python snippet to read an internet radio stream(.asx, .pls etc) and save it to a file.
The final project is cron'ed script that will record an hour or two of internet radio and then transfer it to my phone for playback during my commute. (3g is kind of spotty along my commute)
any snippits or pointers are welcome.
The following has worked for me using the requests library to handle the http request.
import requests
stream_url = 'http://your-stream-source.com/stream'
r = requests.get(stream_url, stream=True)
with open('stream.mp3', 'wb') as f:
try:
for block in r.iter_content(1024):
f.write(block)
except KeyboardInterrupt:
pass
That will save a stream to the stream.mp3 file until you interrupt it with ctrl+C.
So after tinkering and playing with it Ive found Streamripper to work best. This is the command i use
streamripper http://yp.shoutcast.com/sbin/tunein-station.pls?id=1377200 -d ./streams -l 10800 -a tb$FNAME
If you find that your requests or urllib.request call in Python 3 fails to save a stream because you receive "ICY 200 OK" in return instead of an "HTTP/1.0 200 OK" header, you need to tell the underlying functions ICY 200 OK is OK!
What you can effectively do is intercept the routine that handles reading the status after opening the stream, just before processing the headers.
Simply put a routine like this above your stream opening code.
def NiceToICY(self):
class InterceptedHTTPResponse():
pass
import io
line = self.fp.readline().replace(b"ICY 200 OK\r\n", b"HTTP/1.0 200 OK\r\n")
InterceptedSelf = InterceptedHTTPResponse()
InterceptedSelf.fp = io.BufferedReader(io.BytesIO(line))
InterceptedSelf.debuglevel = self.debuglevel
InterceptedSelf._close_conn = self._close_conn
return ORIGINAL_HTTP_CLIENT_READ_STATUS(InterceptedSelf)
Then put these lines at the start of your main routine, before you open the URL.
ORIGINAL_HTTP_CLIENT_READ_STATUS = urllib.request.http.client.HTTPResponse._read_status
urllib.request.http.client.HTTPResponse._read_status = NiceToICY
They will override the standard routine (this one time only) and run the NiceToICY function in place of the normal status check when it has opened the stream. NiceToICY replaces the unrecognised status response, then copies across the relevant bits of the original response which are needed by the 'real' _read_status function. Finally the original is called and the values from that are passed back to the caller and everything else continues as normal.
I have found this to be the simplest way to get round the problem of the status message causing an error. Hope it's useful for you, too.
I am aware this is a year old, but this is still a viable question, which I have recently been fiddling with.
Most internet radio stations will give you an option of type of download, I choose the MP3 version, then read the info from a raw socket and write it to a file. The trick is figuring out how fast your download is compared to playing the song so you can create a balance on the read/write size. This would be in your buffer def.
Now that you have the file, it is fine to simply leave it on your drive (record), but most players will delete from file the already played chunk and clear the file out off the drive and ram when streaming is stopped.
I have used some code snippets from a file archive without compression app to handle a lot of the file file handling, playing, buffering magic. It's very similar in how the process flows. If you write up some sudo-code (which I highly recommend) you can see the similarities.
I'm only familiar with how shoutcast streaming works (which would be the .pls file you mention):
You download the pls file, which is just a playlist. It's format is fairly simple as it's just a text file that points to where the real stream is.
You can connect to that stream as it's just HTTP, that streams either MP3 or AAC. For your use, just save every byte you get to a file and you'll get an MP3 or AAC file you can transfer to your mp3 player.
Shoutcast has one addition that is optional: metadata. You can find how that works here, but is not really needed.
If you want a sample application that does this, let me know and I'll make up something later.
In line with the answer from https://stackoverflow.com/users/1543257/dingles (https://stackoverflow.com/a/41338150), here's how you can achieve the same result with the asynchronous HTTP client library - aiohttp:
import functools
import aiohttp
from aiohttp.client_proto import ResponseHandler
from aiohttp.http_parser import HttpResponseParserPy
class ICYHttpResponseParser(HttpResponseParserPy):
def parse_message(self, lines):
if lines[0].startswith(b"ICY "):
lines[0] = b"HTTP/1.0 " + lines[0][4:]
return super().parse_message(lines)
class ICYResponseHandler(ResponseHandler):
def set_response_params(
self,
*,
timer = None,
skip_payload = False,
read_until_eof = False,
auto_decompress = True,
read_timeout = None,
read_bufsize = 2 ** 16,
timeout_ceil_threshold = 5,
) -> None:
# this is a copy of the implementation from here:
# https://github.com/aio-libs/aiohttp/blob/v3.8.1/aiohttp/client_proto.py#L137-L165
self._skip_payload = skip_payload
self._read_timeout = read_timeout
self._reschedule_timeout()
self._timeout_ceil_threshold = timeout_ceil_threshold
self._parser = ICYHttpResponseParser(
self,
self._loop,
read_bufsize,
timer=timer,
payload_exception=aiohttp.ClientPayloadError,
response_with_body=not skip_payload,
read_until_eof=read_until_eof,
auto_decompress=auto_decompress,
)
if self._tail:
data, self._tail = self._tail, b""
self.data_received(data)
class ICYConnector(aiohttp.TCPConnector):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._factory = functools.partial(ICYResponseHandler, loop=self._loop)
This can then be used as follows:
session = aiohttp.ClientSession(connector=ICYConnector())
async with session.get("url") as resp:
print(resp.status)
Yes, it's using a few private classes and attributes but this is the only solution to change the handling of something that's part of HTTP spec and (theoretically) should not ever need to be changed by the library's user...
All things considered, I would say this is still rather clean in comparison to monkey patching which would cause the behavior to be changed for all requests (especially true for asyncio where setting before and resetting after a request does not guarantee that something else won't make a request while request to ICY is being made). This way, you can dedicate a ClientSession object specifically for requests to servers that respond with the ICY status line.
Note that this comes with a performance penalty for requests made with ICYConnector - in order for this to work, I am using the pure Python implementation of HttpResponseParser which is going to be slower than the one that aiohttp uses by default and is written in C. This cannot really be done differently without vendoring the whole library as the behavior for parsing status line is deeply hidden in the C code.

Categories

Resources