How to save and edit server rendering data?

How to save and edit server rendering data? - python

I am using Flask server with python.
I have an integer pics_to_show. Every time a request is recieved, the user recieves pics_to_show integer. And pics_to_show gets decremented by 1.
pics_to_show is an integer that`s shared with all website users. I could make a database to save it, but I want something simpler and flexible. Is there any OTHER way to save this integer.
I made a class that saves such variables in a JSON file.
class GlobalSate:
def __init__(self, path_to_file):
self.path = path_to_file
try:
open(path_to_file)
except FileNotFoundError:
f = open(path_to_file, 'x')
f.write('{}')
def __getitem__(self, key):
file = self.load_file()
data = json.loads(file.read())
return data[key]
def __setitem__(self, key, value):
file = self.load_file()
data = json.loads(file.read())
data[key] = value
json.dump(data, open(self.path, 'w+'), indent=4)
def load_file(self):
return open(self.path, 'r+')
The class is over-simplified ofcourse. I initialize an instance in the __init__.py and import it to all routes files (I am using BluePrints).
My apllication is threaded, so this class might not work... Since multiple users are editing the data at the same time. Does anybody have another solution?
Note:
The g variable would not work, since data is shared across users not requests.
Also, what if I want to increment such variable every week? Would it be thread safe to run a seperate python script to keep track of the date, or check the date on each request to the server?

You will definitely end up with inconsistent state, with out a locking mechanism between reads and writes, you will have race conditions. so you will loose some increments.
Also you are not closing the open file, if you do that enough times it will crash the application.
Also one good pice of advice is that you do not want to write a state managing software, (database) it is very very difficult to get it right.
I think in your situation the best solution is to use sqlite, as it is a lib that you call from your app, there is not additional server.
"I could make a database to save it, but I want something simpler and flexible"
In a multi threaded app you can not go simpler than sqlite, (if you want your app to be correct that is).
If you do not like SQL then there are some simpler options:
zodb http://www.zodb.org/en/latest/guide/transactions-and-threading.html
pikleDB https://github.com/patx/pickledb
python shelve but you will need to use file system locks

Related

Occasional error with shelve module (pickle.UnpicklingError: pickle data was truncated and EOFError: Ran out of input)

I'm writing a bot that listens to streams and notifies it's subscribers about posts. I'm working with module shelve which is giving me what to me seems as random errors (they usually don't occur, but sometimes when starting the bot they do and the bot is no longer being able to start until I remove my database files). This means that some posts that were already sent to subscribers are going to be resent because of loss of data.
Two errors are being raised:
EOFError: Ran out of input
and
pickle.UnpicklingError: pickle data was truncated
I was (I believe) able to diagnose the cause of EOFError which happens when bot is interrupted (e.g. KeyboardInterrupt) during IO operations. This error is less of an issue as it won't occur often in real world use but when it does I'm still forced to delete the whole database.
The "pickle data was truncated" error however stays a mystery to me as I can't seem to figure out when it happens or what exactly goes wrong. It also happens more often. The line that raises the error is: if message.id in self.db['processed_messages']:. This is the first line apart from the constructor that does anything with the database.
I'm hoping for some help in resolving these two errors. I included code that relates to the errors in case anyone sees what could be causing them from it.
Code that is relative to my problem:
import shelve
class Bot():
def __init__(self):
# get data from file or create it if it does not exist
self.db = shelve.open('database')
# init keys if they do not exist (example: first time running the program with empty database):
if 'processed_messages' not in self.db:
self.db['processed_messages'] = []
if 'processed_submissions' not in self.db:
self.db['processed_submissions'] = []
if 'subscribers' not in self.db:
self.db['subscribers'] = []
def handler(self, message):
if message.id in self.db['processed_messages']:
return
self.store_data('processed_messages', message.id)
def store_data(self, key, obj):
""" stores data locally with shelve module """
temp = self.db[key]
temp.append(obj)
self.db[key] = temp
On completely different note: if someone knows a better (more elegant) way of handling case of empty database I would also love to hear some input as current approach in constructor is rather wonky

Flask get request not using the updated version of a global variable

I'm new to both flask and python. I've got an application I'm working on to hold weather data. I'm allowing for both get and post commands to come into my flask application. unfortunately, the automated calls for my API are not always coming back with the proper results. I'm currently storing my data in a global variable when a post command is called, the new data is appended to my existing data. Unfortunately sometimes when the get is called, it is not receiving the most up to date version of my global data variable. I believe that the issue is that the change is not being passed up from the post function to the global variable before the get is called because I can run the get and the proper result comes back.
weatherData = [filed with data read from csv on initialization]
class FullHistory(Resource):
def get(self):
ret = [];
for row in weatherData:
val = row['DATE']
ret.append({"DATE":str(val)})
return ret
def post(self):
global weatherData
newWeatherData = weatherData
args = parser.parse_args()
newVal = int(args['DATE'])
newWeatherData.append({'DATE':int(args['DATE']),'TMAX':float(args['TMAX']),'TMIN':float(args['TMIN'])})
weatherData = newWeatherData
#time.sleep(5)
return {"DATE":str(newVal)},201
class SelectHistory(Resource):
def get(self, date_id):
val = int(date_id)
bVal = False
#time.sleep(5)
global weatherData
for row in weatherData:
if(row['DATE'] == val):
wd = row
bVal = True
break
if bVal:
return {"DATE":str(wd['DATE']),"TMAX":float(wd['TMAX']),"TMIN":float(wd['TMIN'])}
else:
return "HTTP Error code 404",404
def delete(self, date_id):
val = int(date_id)
wdIter = None
for row in weatherData:
if(row['DATE'] == val):
wdIter = row
break
if wdIter != None:
weatherData.remove(wdIter)
return {"DATE":str(val)},204
else:
return "HTTP Error code 404",404
Is there any way I can assure that my global variable is up to date or make my API wait to return until I'm sure that the update has been passed along? This was supposed to be a simple application. I would really rather not have to learn how to use threads in python just yet. I've made sure that my calls get request is not starting until after the post has given a response. I know that one workaround was to use sleep to delay my responses, I would rather understand why my update isn't occurring immediately in the first place.

I believe your problem is the application context. As stated here:
The application context is created and destroyed as necessary. It
never moves between threads and it will not be shared between
requests. As such it is the perfect place to store database connection
information and other things. The internal stack object is called
flask._app_ctx_stack. Extensions are free to store additional
information on the topmost level, assuming they pick a sufficiently
unique name and should put their information there, instead of on the
flask.g object which is reserved for user code.
Though it says you can store data at the "topmost level," it's not reliable, and if you extrapolate your project to use worker processes with uWSGI, for instance, you'll need persistence to share data between threads regardless. You should be using a database, redis, or at very least updating your .csv file each time you mutate your data.

Concurrent access to a data file in Python

I have a small web server doing some operations on a POST request. It reads a data file, does some checks, and then resaves the file adding some informations from the POST in it.
The issue I have is that if two clients are doing a POST request at almost the same time, both will read the same file, then one will write the file containing the new information, and then the other client will write the file containing its new information, but without the information from the other client, since that part wasn't in the file when it was read.
f = open("foo.txt", "r+")
tests_data = yaml.safe_load(f)
post_data = json.loads(web.data())
#Some checks
f.write(json.dumps(tests_data))
f.close()
I wanted the script to "wait", without giving an error, at the "open" line if the file is already opened by another process of the same code, then read the file when the other process is done and has closed the file.
Or something else if other solutions exist.

Would a standard lock not suit your needs? The lock would need to be at the module level.
from threading import Lock
# this needs to be module level variable
lock = Lock
with lock:
# do your stuff. only one thread at a time can
# work in this space...

My python program is running really slow

I'm making a program that (at least right now) retrives stream information from TwitchTV (streaming platform). This program is to self educate myself but when i run it, it's taking 2 minutes to print just the name of the streamer.
I'm using Python 2.7.3 64bit on Windows7 if that is important in anyway.
classes.py:
#imports:
import urllib
import re
#classes:
class Streamer:
#constructor:
def __init__(self, name, mode, link):
self.name = name
self.mode = mode
self.link = link
class Information:
#constructor:
def __init__(self, TWITCH_STREAMS, GAME, STREAMER_INFO):
self.TWITCH_STREAMS = TWITCH_STREAMS
self.GAME = GAME
self.STREAMER_INFO = STREAMER_INFO
def get_game_streamer_names(self):
"Connects to Twitch.TV API, extracts and returns all streams for a spesific game."
#start connection
self.con = urllib2.urlopen(self.TWITCH_STREAMS + self.GAME)
self.info = self.con.read()
self.con.close()
#regular expressions to get all the stream names
self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info
#run in a for to reduce all "live_user_NAME" values
for name in self.streamers_names:
if name.startswith("live_user_"):
self.streamers_names.remove(name)
#end method
return self.streamers_names
def get_streamer_mode(self, name):
"Returns a streamers mode (on/off)"
#start connection
self.con = urllib2.urlopen(self.STREAMER_INFO + name)
self.info = self.con.read()
self.con.close()
#check if stream is online or offline ("stream":null indicates offline stream)
if self.info.count('"stream":null') > 0:
return "offline"
else:
return "online"
main.py:
#imports:
from classes import *
#consts:
TWITCH_STREAMS = "https://api.twitch.tv/kraken/streams/?game=" #add the game name at the end of the link (space = "+", eg: Game+Name)
STREAMER_INFO = "https://api.twitch.tv/kraken/streams/" #add streamer name at the end of the link
GAME = "League+of+Legends"
def main():
#create an information object
info = Information(TWITCH_STREAMS, GAME, STREAMER_INFO)
streamer_list = [] #create a streamer list
for name in info.get_game_streamer_names():
#run for every streamer name, create a streamer object and place it in the list
mode = info.get_streamer_mode(name)
streamer_name = Streamer(name, mode, 'http://twitch.tv/' + name)
streamer_list.append(streamer_name)
#this line is just to try and print something
print streamer_list[0].name, streamer_list[0].mode
if __name__ == '__main__':
main()
the program itself works perfectly, just really slow
any ideas?

Program efficiency typically falls under the 80/20 rule (or what some people call the 90/10 rule, or even the 95/5 rule). That is, 80% of the time the program is actually running in 20% of the code. In other words, there is a good shot that your code has a "bottleneck": a small area of the code that is running slow, while the rest runs very fast. Your goal is to identify that bottleneck (or bottlenecks), then fix it (them) to run faster.
The best way to do this is to profile your code. This means you are logging the time of when a specific action occurs with the logging module, use timeit like a commenter suggested, use some of the built-in profilers, or simply print out the current time at very points of the program. Eventually, you will find one part of the code that seems to be taking the most amount of time.
Experience will tell you that I/O (stuff like reading from a disk, or accessing resources over the internet) will take longer than in-memory calculations. My guess as to the problem is that you're using 1 HTTP connection to get a list of streamers, and then one HTTP connection to get the status of that streamer. Let's say that there are 10000 streamers: your program will need to make 10001 HTTP connections before it finishes.
There would be a few ways to fix this if this is indeed the case:
See if Twitch.TV has some alternatives in their API that allows you to retrieve a list of users WITH their streaming mode so that you don't need to call an API for each streamer.
Cache results. This won't actually help your program run faster the first time it runs, but you might be able to make it so that if it runs a second time within a minute, it can reuse results.
Limit your application to only dealing with a few streamers at a time. If there are 10000 streamers, what exactly does your application do that it really needs to look at the mode of all 10000 of them? Perhaps it's better to just grab the top 20, at which point the user can press a key to get the next 20, or close the application. Often times, programming is not just about writing code, but managing expectations of what your users want. This seems to be a pet project, so there might not be "users", meaning you have free reign to change what the app does.
Use multiple connections. Right now, your app makes one connection to the server, waits for the results to come back, parses the results, saves it, then starts on the next connection. This process might take an entire half a second. If there were 250 streamers, running this process for each of them would take a little over two minutes total. However, if you could run four of them at a time, you could potentially reduce your time to just under 30 seconds total. Check out the multiprocessing module. Keep in mind that some APIs might have limits to how many connections you can make at a certain time, so hitting them with 50 connections at a time might irk them and cause them to forbid you from accessing their API. Use caution here.

You are using the wrong tool here to parse the json data returned by your URL. You need to use json library provided by default rather than parsing the data using regex.
This will give you a boost in your program's performance
Change the regex parser
#regular expressions to get all the stream names
self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info
To json parser
self.info = json.loads(self.info) #This will parse the json data as a Python Object
#Parse the name and return a generator
return (stream['name'] for stream in data[u'streams'])

Record streaming and saving internet radio in python

I am looking for a python snippet to read an internet radio stream(.asx, .pls etc) and save it to a file.
The final project is cron'ed script that will record an hour or two of internet radio and then transfer it to my phone for playback during my commute. (3g is kind of spotty along my commute)
any snippits or pointers are welcome.

The following has worked for me using the requests library to handle the http request.
import requests
stream_url = 'http://your-stream-source.com/stream'
r = requests.get(stream_url, stream=True)
with open('stream.mp3', 'wb') as f:
try:
for block in r.iter_content(1024):
f.write(block)
except KeyboardInterrupt:
pass
That will save a stream to the stream.mp3 file until you interrupt it with ctrl+C.

So after tinkering and playing with it Ive found Streamripper to work best. This is the command i use
streamripper http://yp.shoutcast.com/sbin/tunein-station.pls?id=1377200 -d ./streams -l 10800 -a tb$FNAME

If you find that your requests or urllib.request call in Python 3 fails to save a stream because you receive "ICY 200 OK" in return instead of an "HTTP/1.0 200 OK" header, you need to tell the underlying functions ICY 200 OK is OK!
What you can effectively do is intercept the routine that handles reading the status after opening the stream, just before processing the headers.
Simply put a routine like this above your stream opening code.
def NiceToICY(self):
class InterceptedHTTPResponse():
pass
import io
line = self.fp.readline().replace(b"ICY 200 OK\r\n", b"HTTP/1.0 200 OK\r\n")
InterceptedSelf = InterceptedHTTPResponse()
InterceptedSelf.fp = io.BufferedReader(io.BytesIO(line))
InterceptedSelf.debuglevel = self.debuglevel
InterceptedSelf._close_conn = self._close_conn
return ORIGINAL_HTTP_CLIENT_READ_STATUS(InterceptedSelf)
Then put these lines at the start of your main routine, before you open the URL.
ORIGINAL_HTTP_CLIENT_READ_STATUS = urllib.request.http.client.HTTPResponse._read_status
urllib.request.http.client.HTTPResponse._read_status = NiceToICY
They will override the standard routine (this one time only) and run the NiceToICY function in place of the normal status check when it has opened the stream. NiceToICY replaces the unrecognised status response, then copies across the relevant bits of the original response which are needed by the 'real' _read_status function. Finally the original is called and the values from that are passed back to the caller and everything else continues as normal.
I have found this to be the simplest way to get round the problem of the status message causing an error. Hope it's useful for you, too.

I am aware this is a year old, but this is still a viable question, which I have recently been fiddling with.
Most internet radio stations will give you an option of type of download, I choose the MP3 version, then read the info from a raw socket and write it to a file. The trick is figuring out how fast your download is compared to playing the song so you can create a balance on the read/write size. This would be in your buffer def.
Now that you have the file, it is fine to simply leave it on your drive (record), but most players will delete from file the already played chunk and clear the file out off the drive and ram when streaming is stopped.
I have used some code snippets from a file archive without compression app to handle a lot of the file file handling, playing, buffering magic. It's very similar in how the process flows. If you write up some sudo-code (which I highly recommend) you can see the similarities.

I'm only familiar with how shoutcast streaming works (which would be the .pls file you mention):
You download the pls file, which is just a playlist. It's format is fairly simple as it's just a text file that points to where the real stream is.
You can connect to that stream as it's just HTTP, that streams either MP3 or AAC. For your use, just save every byte you get to a file and you'll get an MP3 or AAC file you can transfer to your mp3 player.
Shoutcast has one addition that is optional: metadata. You can find how that works here, but is not really needed.
If you want a sample application that does this, let me know and I'll make up something later.

In line with the answer from https://stackoverflow.com/users/1543257/dingles (https://stackoverflow.com/a/41338150), here's how you can achieve the same result with the asynchronous HTTP client library - aiohttp:
import functools
import aiohttp
from aiohttp.client_proto import ResponseHandler
from aiohttp.http_parser import HttpResponseParserPy
class ICYHttpResponseParser(HttpResponseParserPy):
def parse_message(self, lines):
if lines[0].startswith(b"ICY "):
lines[0] = b"HTTP/1.0 " + lines[0][4:]
return super().parse_message(lines)
class ICYResponseHandler(ResponseHandler):
def set_response_params(
self,
*,
timer = None,
skip_payload = False,
read_until_eof = False,
auto_decompress = True,
read_timeout = None,
read_bufsize = 2 ** 16,
timeout_ceil_threshold = 5,
) -> None:
# this is a copy of the implementation from here:
# https://github.com/aio-libs/aiohttp/blob/v3.8.1/aiohttp/client_proto.py#L137-L165
self._skip_payload = skip_payload
self._read_timeout = read_timeout
self._reschedule_timeout()
self._timeout_ceil_threshold = timeout_ceil_threshold
self._parser = ICYHttpResponseParser(
self,
self._loop,
read_bufsize,
timer=timer,
payload_exception=aiohttp.ClientPayloadError,
response_with_body=not skip_payload,
read_until_eof=read_until_eof,
auto_decompress=auto_decompress,
)
if self._tail:
data, self._tail = self._tail, b""
self.data_received(data)
class ICYConnector(aiohttp.TCPConnector):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._factory = functools.partial(ICYResponseHandler, loop=self._loop)
This can then be used as follows:
session = aiohttp.ClientSession(connector=ICYConnector())
async with session.get("url") as resp:
print(resp.status)
Yes, it's using a few private classes and attributes but this is the only solution to change the handling of something that's part of HTTP spec and (theoretically) should not ever need to be changed by the library's user...
All things considered, I would say this is still rather clean in comparison to monkey patching which would cause the behavior to be changed for all requests (especially true for asyncio where setting before and resetting after a request does not guarantee that something else won't make a request while request to ICY is being made). This way, you can dedicate a ClientSession object specifically for requests to servers that respond with the ICY status line.
Note that this comes with a performance penalty for requests made with ICYConnector - in order for this to work, I am using the pure Python implementation of HttpResponseParser which is going to be slower than the one that aiohttp uses by default and is written in C. This cannot really be done differently without vendoring the whole library as the behavior for parsing status line is deeply hidden in the C code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.