Python GSpead On-Change Listener? - python

I'm making a script that checks a google sheet from a google form and returns the result as a live-feed visualization of a poll. I need to figure out how to update the value counts, but only when the google sheet is updated, as opposed to checking every 60 seconds (or something).
Here is my current setup:
string = ""
while True:
responses = gc.open("QOTD Responses").sheet1
data = pd.DataFrame(responses.get_all_records())
vals = data['Response'].value_counts()
str = "{} currently has {} votes. \n{} currently has {} votes.".format(vals.index[0], vals[0], vals.index[1],
vals[1])
if(str != string):
string = str
print(string)
time.sleep(60) # Updates 1440 times per day
I'm almost certain that there has to be a better way to do this, but what would that be?
Thanks!

You won't be able to do it with Python alone. You'll need to integrate with a trigger function from Google Apps script.
You could use the onEdit trigger function to send a signal to your python script (via an http call for example).
To use a simple trigger, simply create a function that uses one of
these reserved function names:
onOpen(e) runs when a user opens a spreadsheet, document,
presentation, or form that the user has permission to edit.
onEdit(e) runs when a user changes a value in a spreadsheet.

Related

My Viber bot working very slow(Python) How can I make it faster

Here is part of my code, db.users_vi() is a list file. When the program comes to def viber_not, it starts working very slow, it send 1 message per 30 sec or even slower. How can I make it work faster, and why it's so slow?
def viber_not():
users = db.users_vi()
text = random.choice(texts)
for k in users:
try:
viber.send_messages(k[1], [TextMessage(text=text)])
except:
pass
Try to serialize data requested from DB. As most of them returns "cursor", not data. For some of them wrapping in list is enough, but look to documentation of that you use.
users = list(db.users_vi())

Why does Firebase event return empty object on second and subsequent events?

I have a Python Firebase SDK on the server, which writes to Firebase real-time DB.
I have a Javascript Firebase client on the browser, which registers itself as a listener for "child_added" events.
Authentication is handled by the Python server.
With Firebase rules allowing reads, the client listener gets data on the first event (all data at that FB location), but only a key with empty data on subsequent child_added events.
Here's the listener registration:
firebaseRef.on
(
"child_added",
function(snapshot, prevChildKey)
{
console.log("FIREBASE REF: ", firebaseRef);
console.log("FIREBASE KEY: ", snapshot.key);
console.log("FIREBASE VALUE: ", snapshot.val());
}
);
"REF" is always good.
"KEY" is always good.
But "VALUE" is empty after the first full retrieval of that db location.
I tried instantiating the firebase reference each time anew inside the listen function. Same result.
I tried a "value" event instead of "child_added". No improvement.
The data on the Firebase side looks perfect in the FB console.
Here's how the data is being written by the Python admin to firebase:
def push_value(rootAddr, childAddr, data):
try:
ref = db.reference(rootAddr)
posts_ref = ref.child(childAddr)
new_post_ref = posts_ref.push()
new_post_ref.set(data)
except Exception:
raise
And as I said, this works perfectly to put the data at the correct place in FB.
Why the empty event objects after the first download of the database, on subsequent events?
I found the answer. Like most things, it turned out to be simple, but took a couple of days to find. Maybe this will save someone else.
On the docs page:
http://firebase.google.com/docs/database/admin/save-data#section-push
"In JavaScript and Python, the pattern of calling push() and then
immediately calling set() is so common that the Firebase SDK lets you
combine them by passing the data to be set directly to push() as
follows..."
I suggest the wording should emphasize that you must do it that way.
The earlier Python example on the same page doesn't work:
new_post_ref = posts_ref.push()
new_post_ref.set({
'author': 'gracehop',
'title': 'Announcing COBOL, a New Programming Language'
})
A separate empty push() followed by set(data) as in this example, won't work for Python and Javascript because in those cases the push() implicitly also does a set() and so an empty push triggers unwanted event listeners with empty data, and the set(data) didn't trigger an event with data, either.
In other words, the code in the question:
new_post_ref = posts_ref.push()
new_post_ref.set(data)
must be:
new_post_ref = posts_ref.push(data)
with set() not explicitly called.
Since this push() code happens only when new objects are written to FB, the initial download to the client wasn't affected.
Though the documentation may be trying to convey the evolution of the design, it fails to point out that only the last Python and Javascript example given will work and the others shouldn't be used.

Flask get request not using the updated version of a global variable

I'm new to both flask and python. I've got an application I'm working on to hold weather data. I'm allowing for both get and post commands to come into my flask application. unfortunately, the automated calls for my API are not always coming back with the proper results. I'm currently storing my data in a global variable when a post command is called, the new data is appended to my existing data. Unfortunately sometimes when the get is called, it is not receiving the most up to date version of my global data variable. I believe that the issue is that the change is not being passed up from the post function to the global variable before the get is called because I can run the get and the proper result comes back.
weatherData = [filed with data read from csv on initialization]
class FullHistory(Resource):
def get(self):
ret = [];
for row in weatherData:
val = row['DATE']
ret.append({"DATE":str(val)})
return ret
def post(self):
global weatherData
newWeatherData = weatherData
args = parser.parse_args()
newVal = int(args['DATE'])
newWeatherData.append({'DATE':int(args['DATE']),'TMAX':float(args['TMAX']),'TMIN':float(args['TMIN'])})
weatherData = newWeatherData
#time.sleep(5)
return {"DATE":str(newVal)},201
class SelectHistory(Resource):
def get(self, date_id):
val = int(date_id)
bVal = False
#time.sleep(5)
global weatherData
for row in weatherData:
if(row['DATE'] == val):
wd = row
bVal = True
break
if bVal:
return {"DATE":str(wd['DATE']),"TMAX":float(wd['TMAX']),"TMIN":float(wd['TMIN'])}
else:
return "HTTP Error code 404",404
def delete(self, date_id):
val = int(date_id)
wdIter = None
for row in weatherData:
if(row['DATE'] == val):
wdIter = row
break
if wdIter != None:
weatherData.remove(wdIter)
return {"DATE":str(val)},204
else:
return "HTTP Error code 404",404
Is there any way I can assure that my global variable is up to date or make my API wait to return until I'm sure that the update has been passed along? This was supposed to be a simple application. I would really rather not have to learn how to use threads in python just yet. I've made sure that my calls get request is not starting until after the post has given a response. I know that one workaround was to use sleep to delay my responses, I would rather understand why my update isn't occurring immediately in the first place.
I believe your problem is the application context. As stated here:
The application context is created and destroyed as necessary. It
never moves between threads and it will not be shared between
requests. As such it is the perfect place to store database connection
information and other things. The internal stack object is called
flask._app_ctx_stack. Extensions are free to store additional
information on the topmost level, assuming they pick a sufficiently
unique name and should put their information there, instead of on the
flask.g object which is reserved for user code.
Though it says you can store data at the "topmost level," it's not reliable, and if you extrapolate your project to use worker processes with uWSGI, for instance, you'll need persistence to share data between threads regardless. You should be using a database, redis, or at very least updating your .csv file each time you mutate your data.

My python program is running really slow

I'm making a program that (at least right now) retrives stream information from TwitchTV (streaming platform). This program is to self educate myself but when i run it, it's taking 2 minutes to print just the name of the streamer.
I'm using Python 2.7.3 64bit on Windows7 if that is important in anyway.
classes.py:
#imports:
import urllib
import re
#classes:
class Streamer:
#constructor:
def __init__(self, name, mode, link):
self.name = name
self.mode = mode
self.link = link
class Information:
#constructor:
def __init__(self, TWITCH_STREAMS, GAME, STREAMER_INFO):
self.TWITCH_STREAMS = TWITCH_STREAMS
self.GAME = GAME
self.STREAMER_INFO = STREAMER_INFO
def get_game_streamer_names(self):
"Connects to Twitch.TV API, extracts and returns all streams for a spesific game."
#start connection
self.con = urllib2.urlopen(self.TWITCH_STREAMS + self.GAME)
self.info = self.con.read()
self.con.close()
#regular expressions to get all the stream names
self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info
#run in a for to reduce all "live_user_NAME" values
for name in self.streamers_names:
if name.startswith("live_user_"):
self.streamers_names.remove(name)
#end method
return self.streamers_names
def get_streamer_mode(self, name):
"Returns a streamers mode (on/off)"
#start connection
self.con = urllib2.urlopen(self.STREAMER_INFO + name)
self.info = self.con.read()
self.con.close()
#check if stream is online or offline ("stream":null indicates offline stream)
if self.info.count('"stream":null') > 0:
return "offline"
else:
return "online"
main.py:
#imports:
from classes import *
#consts:
TWITCH_STREAMS = "https://api.twitch.tv/kraken/streams/?game=" #add the game name at the end of the link (space = "+", eg: Game+Name)
STREAMER_INFO = "https://api.twitch.tv/kraken/streams/" #add streamer name at the end of the link
GAME = "League+of+Legends"
def main():
#create an information object
info = Information(TWITCH_STREAMS, GAME, STREAMER_INFO)
streamer_list = [] #create a streamer list
for name in info.get_game_streamer_names():
#run for every streamer name, create a streamer object and place it in the list
mode = info.get_streamer_mode(name)
streamer_name = Streamer(name, mode, 'http://twitch.tv/' + name)
streamer_list.append(streamer_name)
#this line is just to try and print something
print streamer_list[0].name, streamer_list[0].mode
if __name__ == '__main__':
main()
the program itself works perfectly, just really slow
any ideas?
Program efficiency typically falls under the 80/20 rule (or what some people call the 90/10 rule, or even the 95/5 rule). That is, 80% of the time the program is actually running in 20% of the code. In other words, there is a good shot that your code has a "bottleneck": a small area of the code that is running slow, while the rest runs very fast. Your goal is to identify that bottleneck (or bottlenecks), then fix it (them) to run faster.
The best way to do this is to profile your code. This means you are logging the time of when a specific action occurs with the logging module, use timeit like a commenter suggested, use some of the built-in profilers, or simply print out the current time at very points of the program. Eventually, you will find one part of the code that seems to be taking the most amount of time.
Experience will tell you that I/O (stuff like reading from a disk, or accessing resources over the internet) will take longer than in-memory calculations. My guess as to the problem is that you're using 1 HTTP connection to get a list of streamers, and then one HTTP connection to get the status of that streamer. Let's say that there are 10000 streamers: your program will need to make 10001 HTTP connections before it finishes.
There would be a few ways to fix this if this is indeed the case:
See if Twitch.TV has some alternatives in their API that allows you to retrieve a list of users WITH their streaming mode so that you don't need to call an API for each streamer.
Cache results. This won't actually help your program run faster the first time it runs, but you might be able to make it so that if it runs a second time within a minute, it can reuse results.
Limit your application to only dealing with a few streamers at a time. If there are 10000 streamers, what exactly does your application do that it really needs to look at the mode of all 10000 of them? Perhaps it's better to just grab the top 20, at which point the user can press a key to get the next 20, or close the application. Often times, programming is not just about writing code, but managing expectations of what your users want. This seems to be a pet project, so there might not be "users", meaning you have free reign to change what the app does.
Use multiple connections. Right now, your app makes one connection to the server, waits for the results to come back, parses the results, saves it, then starts on the next connection. This process might take an entire half a second. If there were 250 streamers, running this process for each of them would take a little over two minutes total. However, if you could run four of them at a time, you could potentially reduce your time to just under 30 seconds total. Check out the multiprocessing module. Keep in mind that some APIs might have limits to how many connections you can make at a certain time, so hitting them with 50 connections at a time might irk them and cause them to forbid you from accessing their API. Use caution here.
You are using the wrong tool here to parse the json data returned by your URL. You need to use json library provided by default rather than parsing the data using regex.
This will give you a boost in your program's performance
Change the regex parser
#regular expressions to get all the stream names
self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info
To json parser
self.info = json.loads(self.info) #This will parse the json data as a Python Object
#Parse the name and return a generator
return (stream['name'] for stream in data[u'streams'])

Populate a Google App Engine app's datastore with 20.000 strings

I'm trying to create and store 20000 random codes in my local datastore, before trying this in appspot... This is the model
class PromotionCode (db.Model):
code = db.StringProperty(required=True)
And this is the class that handles the populate request (only a logged admin may use it). It creates random alphanumeric codes and tries to store 20000 of them in the datastore:
class Populate(webapp.RequestHandler):
def GenerateCode(self):
chars = string.letters + string.digits
code = ""
for i in range(8):
code = code + choice(chars)
return code.upper()
def get(self):
codes = "";
code_list = []
for i in range(20000):
new_code = self.GenerateCode()
promotion_code = PromotionCode(code=new_code)
code_list.append(promotion_code)
codes = codes + "<br>" + new_code
db.put(code_list)
self.response.out.write("populating datastore...<br>")
self.response.out.write(codes)
I thought I could try batching all those put(), so I created a list of codes (code_list). It takes 2-5 minutes to do it locally.
Is it possible to do it faster without using the bulkuploader option? Because I'm getting the 500 server error, obviously. Or maybe doing it in consecutive calls or steps...
Why not just change your code above to insert 100 at a time, and just run something like:
for i in {1..200}
do
curl --cookie "ACSID=your-acsid-cookie" http://your-app-id.appspot.com/populatepath
sleep(5)
done
from your command line? The entries are random anyway, you don't need to remember any state.
You can get the ACSID cookie by logging in manually and inspecting the cookies from your browser.
The sleep between requests will prevent you from spinning up a gigantic number of instances or hitting short-term quotas.
The task queue suggestion is good if this is something you need to automate, but if it's a one-time thing you might as well keep it simple.
Can you batching the process in task queues.
Setting batch size high into task queue...
U can archive it doing faster
I don't understand why you have to create 20,000 in advance as opposed to creating each as needed on the fly, but I bet you could speed up your code quite a bit. Something like this (untested):
class Populate(webapp.RequestHandler):
chars = "AB...Z01...9"
def GenerateCode(self):
return ''.join(choice(chars) for _ in xrange(8))
def get(self):
code_list = []
for i in range(20000):
new_code = self.GenerateCode()
promotion_code = PromotionCode(code=new_code)
code_list.append(promotion_code)
db.put(code_list)
self.response.out.write("populating datastore...<br>")
self.response.out.write("done")
Not printing out the codes may save time.
I'm sure others here can do better...
If your task won't complete in the 30 second request deadline, you can break it up into chunks - which should be easy since they're all doing the same thing - and run them in tasks on the Task Queue. You should probably do all your work there anyway, so you don't force the user to wait for it to complete before returning a response.
Like Jeff, though, I'm puzzled why you'd want to generate 20,000 of these upfront rather than just generating them when you need them.

Categories

Resources