My python program is running really slow

My python program is running really slow - python

I'm making a program that (at least right now) retrives stream information from TwitchTV (streaming platform). This program is to self educate myself but when i run it, it's taking 2 minutes to print just the name of the streamer.
I'm using Python 2.7.3 64bit on Windows7 if that is important in anyway.
classes.py:
#imports:
import urllib
import re
#classes:
class Streamer:
#constructor:
def __init__(self, name, mode, link):
self.name = name
self.mode = mode
self.link = link
class Information:
#constructor:
def __init__(self, TWITCH_STREAMS, GAME, STREAMER_INFO):
self.TWITCH_STREAMS = TWITCH_STREAMS
self.GAME = GAME
self.STREAMER_INFO = STREAMER_INFO
def get_game_streamer_names(self):
"Connects to Twitch.TV API, extracts and returns all streams for a spesific game."
#start connection
self.con = urllib2.urlopen(self.TWITCH_STREAMS + self.GAME)
self.info = self.con.read()
self.con.close()
#regular expressions to get all the stream names
self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info
#run in a for to reduce all "live_user_NAME" values
for name in self.streamers_names:
if name.startswith("live_user_"):
self.streamers_names.remove(name)
#end method
return self.streamers_names
def get_streamer_mode(self, name):
"Returns a streamers mode (on/off)"
#start connection
self.con = urllib2.urlopen(self.STREAMER_INFO + name)
self.info = self.con.read()
self.con.close()
#check if stream is online or offline ("stream":null indicates offline stream)
if self.info.count('"stream":null') > 0:
return "offline"
else:
return "online"
main.py:
#imports:
from classes import *
#consts:
TWITCH_STREAMS = "https://api.twitch.tv/kraken/streams/?game=" #add the game name at the end of the link (space = "+", eg: Game+Name)
STREAMER_INFO = "https://api.twitch.tv/kraken/streams/" #add streamer name at the end of the link
GAME = "League+of+Legends"
def main():
#create an information object
info = Information(TWITCH_STREAMS, GAME, STREAMER_INFO)
streamer_list = [] #create a streamer list
for name in info.get_game_streamer_names():
#run for every streamer name, create a streamer object and place it in the list
mode = info.get_streamer_mode(name)
streamer_name = Streamer(name, mode, 'http://twitch.tv/' + name)
streamer_list.append(streamer_name)
#this line is just to try and print something
print streamer_list[0].name, streamer_list[0].mode
if __name__ == '__main__':
main()
the program itself works perfectly, just really slow
any ideas?

Program efficiency typically falls under the 80/20 rule (or what some people call the 90/10 rule, or even the 95/5 rule). That is, 80% of the time the program is actually running in 20% of the code. In other words, there is a good shot that your code has a "bottleneck": a small area of the code that is running slow, while the rest runs very fast. Your goal is to identify that bottleneck (or bottlenecks), then fix it (them) to run faster.
The best way to do this is to profile your code. This means you are logging the time of when a specific action occurs with the logging module, use timeit like a commenter suggested, use some of the built-in profilers, or simply print out the current time at very points of the program. Eventually, you will find one part of the code that seems to be taking the most amount of time.
Experience will tell you that I/O (stuff like reading from a disk, or accessing resources over the internet) will take longer than in-memory calculations. My guess as to the problem is that you're using 1 HTTP connection to get a list of streamers, and then one HTTP connection to get the status of that streamer. Let's say that there are 10000 streamers: your program will need to make 10001 HTTP connections before it finishes.
There would be a few ways to fix this if this is indeed the case:
See if Twitch.TV has some alternatives in their API that allows you to retrieve a list of users WITH their streaming mode so that you don't need to call an API for each streamer.
Cache results. This won't actually help your program run faster the first time it runs, but you might be able to make it so that if it runs a second time within a minute, it can reuse results.
Limit your application to only dealing with a few streamers at a time. If there are 10000 streamers, what exactly does your application do that it really needs to look at the mode of all 10000 of them? Perhaps it's better to just grab the top 20, at which point the user can press a key to get the next 20, or close the application. Often times, programming is not just about writing code, but managing expectations of what your users want. This seems to be a pet project, so there might not be "users", meaning you have free reign to change what the app does.
Use multiple connections. Right now, your app makes one connection to the server, waits for the results to come back, parses the results, saves it, then starts on the next connection. This process might take an entire half a second. If there were 250 streamers, running this process for each of them would take a little over two minutes total. However, if you could run four of them at a time, you could potentially reduce your time to just under 30 seconds total. Check out the multiprocessing module. Keep in mind that some APIs might have limits to how many connections you can make at a certain time, so hitting them with 50 connections at a time might irk them and cause them to forbid you from accessing their API. Use caution here.

You are using the wrong tool here to parse the json data returned by your URL. You need to use json library provided by default rather than parsing the data using regex.
This will give you a boost in your program's performance
Change the regex parser
#regular expressions to get all the stream names
self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info
To json parser
self.info = json.loads(self.info) #This will parse the json data as a Python Object
#Parse the name and return a generator
return (stream['name'] for stream in data[u'streams'])

Related

Receiving Audio Data (and Metadata) from IPhone over Bluetooth Python

I'm trying to write a Python script to retrieve audio data from my IPhone to my Raspberry Pi over bluetooth. Currently, I'm able to get audio to come out of my Pi's speakers just by navigating to Settings > Bluetooth on my phone and selecting the Pi. (I paired it earlier). I've specified the Pi device type as Car Stereo, because I'm interested in later using an AVRCP type connection to retrieve metadata for the songs I'm playing.
I've been using PyBluez to retrieve a list of available bluetooth services with my phone. The code returns a list of dictionaries containing the service classes, profiles, name, description, provider, service id, protocol, port and host for each service, in the following format.
{'service-classes': ['110A'], 'profiles': [('110D', 259)], 'name': 'Audio Source', 'description': None, 'provider': None, 'service-id': None, 'protocol': 'RFCOMM', 'port': 13, 'host': 'FF:FF:FF:FF:FF:FF'}
Unfortunately, that's as far as my code gets. I've set it up to continuously request data, but after printing the available services the program ceases to log anything. I've tried the code with most of the available services, including 'Audio Source', 'Wireless iAP', 'Wireless iAp v2', 'Phonebook' and two instances of 'AVRCP Device'.
Below is my code. It's important to note that it only works if you have your phone open to Settings > Bluetooth, which is evidently the IPhone equivalent of entering into pairing mode. Thanks in advance!
import bluetooth as bt
from bluetooth import BluetoothSocket
if __name__ == "__main__":
services = bt.find_service()
print(sep='\n', *services)
for service in services:
if service['name'] == 'Audio Source':
socket = BluetoothSocket()
socket.bind((service['host'], service['port']))
print('\nListening...')
while True:
print(socket.recv(1024))

I've spent a lot of time on this project, and have found that while guidance for this kind of task is available out there, it can be hard to cross the barrier between useless fodder and helpful information. Below I'll detail the way I solved my most important problems, as well as deliver some quick pointers as to useful functionalities.
After receiving a helpful comment, I moved away from PyBluez. Turns out it's not useful for the streaming of audio data. Instead, I realised that because the Raspberry Pi had already established a connection with my IPhone that allowed me to stream music, I should just find a way to tap into that audio stream. I spent a while looking into various means of doing so, and came up with the Python library PyAudio, which provides bindings for the tool PortAudio. Below I have some example code that worked to read audio data from the stream. I found that using the default output device worked well; it didn't contain any audio data from other sources on the Pi that I could hear, although I believe it may have included other sounds such as notifications from the IPhone.
from pyaudio import PyAudio, paInt16
class AudioSource(object):
def __init__(self):
self.pa = PyAudio()
self.device = self.pa.get_default_output_device_info()
self.sample_format = paInt16
self.channels = 2
self.frames_per_buffer = 1024
self.rate = int(self.device['defaultSampleRate'])
self.stream = self.pa.open(
format = self.sample_format,
channels = self.channels,
rate = self.rate,
frames_per_buffer = self.frames_per_buffer,
input = True)
def read(self):
return self.stream.read(self.frames_per_buffer)
def kill(self):
self.stream.stop_stream()
self.stream.close()
self.pa.terminate()
After leaping that hurdle, I moved onto attempting to retrieve metadata from the music. For this I discovered dbus, a system used by applications to communicate with each other. In this case, we'll be using it to engage a dialogue between our program and the music player on the IPhone via the library pydbus, which provides a way to access dbus in Python. Finally, we will employ the PyGObject library, which provides a way of polling for emitted Bluetooth signals by way of GLib.MainLoop().
Firstly, let's retrieve the object that will provide us with an interface to the music player. Below, you'll see that I've created a class that iterates through all the available objects belonging to the service bluez, which is responsible for Bluetooth connections. Once it finds one ending with '/player0', it returns it. I do this because I don't want to include the Bluetooth address of the IPhone as an input. If you would rather hardcode the address, this can be achieved with the path '/org/bluez/hci0/dev_XX_XX_XX_XX_XX_XX/player0', modified to include your bluetooth address. (The 0 in 'player0' increases in count with multiple connections; I've yet to have more than one).
from pydbus import SystemBus
class MediaPlayer(object):
def __new__(self):
bus = SystemBus()
manager = bus.get('org.bluez', '/')
for obj in manager.GetManagedObjects():
if obj.endswith('/player0'):
return bus.get('org.bluez', obj)
raise MediaPlayer.DeviceNotFoundError
class DeviceNotFoundError(Exception):
def __init__(self):
super().__init__('No bluetooth device was found')
handle = MediaPlayer()
Once you've retrieved the object, you can use it to retrieve various attributes, as well as send various commands. handle.Position, for example, will return the current position of the media player in milliseconds, and handle.Pause() will pause the current track. The full list of commands and attributes can be found in the documentation, under the section MediaPlayer1.
In order for this to work correctly, it's imperative that you employ GLib.MainLoop(), which will poll for Bluetooth signals.
from gi.repository import GLib
loop = GLib.MainLoop()
loop.run()
If you're like me and you need to poll for signals while at the same time running some other sort of mainloop, Glib.MainLoop().run() won't work outright, as it's a blocking function. I've developed a solution below.
from threading import Thread
from gi.repository import GLib
class Receiver(Thread):
def __init__(self):
super().__init__()
self.loop = GLib.MainLoop()
self.context = self.loop.get_context()
self._keep_going = True
def kill(self):
self._keep_going = False
def run(self):
while self._keep_going:
self.context.iteration()
self.context.release()
self.loop.quit()
Something extremely useful for me was the ability to register a callback with the MediaPlayer object. The callback will be called any time an attribute of the MediaPlayer object changes. I found the two most useful properties to be handle.Status, which delivers the current status of the media player, and handle.Track, which can alert you to when the current track finishes, as well as provide metadata.
def callback(self, interface, changed_properties, invalidated_properties):
for change in changed_properties:
pass
subscription = handle.PropertiesChanged.connect(callback)
# To avoid continuing interactions after the program ends:
#subscription.disconnect()
Finally, you're probably going to want the ability to set the value of certain properties of the MediaPlayer object. For this you require the Variant object. ('s' evidently stands for string; I haven't yet had to try this with any other type).
from gi.repository import Variant
def set_property(prop, val):
handle.Set('org.bluez.MediaPlayer1', prop, Variant('s', val))
set_property('Shuffle', 'off')
That's all the advice I have to give. I hope that somebody eventually finds some help here, although I know it's more likely I'll just end up having rambled endlessly to myself. Regardless, if somebody's actually taken the time to read through all this, then good luck with whatever it is you're working on.

Data-logging from an i2c-connected Atlas Scientific Sensor to a CSV file

I am relatively new to Python, and programming as a whole. I am progressively getting the hang of it, however I have been stumped as of late in regards to one of my latest projects. I have a set of Atlas Scientific EZO circuits w/ their corresponding sensors hooked up to my Raspberry Pi 3. I can run the i2c script fine, and the majority of the code makes sense to me. However, I would like to pull data from the sensors and log it with a time stamp in a CSV file, taking data points in timed intervals. I am not quite sure how to pull the data from the sensor, and put it into a CSV. Making CSVs in Python is fairly simple, as is filling them with data, but I cannot seem to understand how I would make the data that goes into the CSV the same as what is displayed in the terminal when one runs the Poll function. Attached is the i2c sample code from Atlas' website. I have annotated it a bit more so as to help me understand it better.
I have already attempted to make sense of the poll function, but am confused in regards to the self.file_write and self.file_read methods used throughout the code. I do believe they would be of use in this instance but I am generally stumped in terms of implementation. Below you will find a link to the Python script (i2c.py) written by Atlas Scientific
https://github.com/AtlasScientific/Raspberry-Pi-sample-code/blob/master/i2c.py

I'm guessing by "the polling function" you are referring to this section of the code:
# continuous polling command automatically polls the board
elif user_cmd.upper().startswith("POLL"):
delaytime = float(string.split(user_cmd, ',')[1])
# check for polling time being too short, change it to the minimum timeout if too short
if delaytime < AtlasI2C.long_timeout:
print("Polling time is shorter than timeout, setting polling time to %0.2f" % AtlasI2C.long_timeout)
delaytime = AtlasI2C.long_timeout
# get the information of the board you're polling
info = string.split(device.query("I"), ",")[1]
print("Polling %s sensor every %0.2f seconds, press ctrl-c to stop polling" % (info, delaytime))
try:
while True:
print(device.query("R"))
time.sleep(delaytime - AtlasI2C.long_timeout)
except KeyboardInterrupt: # catches the ctrl-c command, which breaks the loop above
print("Continuous polling stopped")
If this is the case then if looks like you can recycle most of this code for your use. You can grab the string you are seeing in your console with device.query("R"), instead of printing it, grab the return value and write it to your CSV.

I think You should add method to AtlasI2C class which will write data to file
Just type under AtlasI2C init() this method:
def update_file(self, new_data):
with open(self.csv_file, 'a') as data_file:
try:
data = "{}\n".format(str(new_data))
data_file.write(data)
except Exception as e:
print(e)
add to AtlasI2C init statement about csv file name:
self.csv_file = <my_filename>.csv # replace my_filename with ur name
and then under line 51 (char_list = list(map(lambda x: chr(ord(x) & ~0x80), list(response[1:]))) add this line:
self.update_file(''.join(char_list))
Hope its gonna help You.
Cheers,
Fenrir

Python GSpead On-Change Listener?

I'm making a script that checks a google sheet from a google form and returns the result as a live-feed visualization of a poll. I need to figure out how to update the value counts, but only when the google sheet is updated, as opposed to checking every 60 seconds (or something).
Here is my current setup:
string = ""
while True:
responses = gc.open("QOTD Responses").sheet1
data = pd.DataFrame(responses.get_all_records())
vals = data['Response'].value_counts()
str = "{} currently has {} votes. \n{} currently has {} votes.".format(vals.index[0], vals[0], vals.index[1],
vals[1])
if(str != string):
string = str
print(string)
time.sleep(60) # Updates 1440 times per day
I'm almost certain that there has to be a better way to do this, but what would that be?
Thanks!

You won't be able to do it with Python alone. You'll need to integrate with a trigger function from Google Apps script.
You could use the onEdit trigger function to send a signal to your python script (via an http call for example).
To use a simple trigger, simply create a function that uses one of
these reserved function names:
onOpen(e) runs when a user opens a spreadsheet, document,
presentation, or form that the user has permission to edit.
onEdit(e) runs when a user changes a value in a spreadsheet.

Populate a Google App Engine app's datastore with 20.000 strings

I'm trying to create and store 20000 random codes in my local datastore, before trying this in appspot... This is the model
class PromotionCode (db.Model):
code = db.StringProperty(required=True)
And this is the class that handles the populate request (only a logged admin may use it). It creates random alphanumeric codes and tries to store 20000 of them in the datastore:
class Populate(webapp.RequestHandler):
def GenerateCode(self):
chars = string.letters + string.digits
code = ""
for i in range(8):
code = code + choice(chars)
return code.upper()
def get(self):
codes = "";
code_list = []
for i in range(20000):
new_code = self.GenerateCode()
promotion_code = PromotionCode(code=new_code)
code_list.append(promotion_code)
codes = codes + "<br>" + new_code
db.put(code_list)
self.response.out.write("populating datastore...<br>")
self.response.out.write(codes)
I thought I could try batching all those put(), so I created a list of codes (code_list). It takes 2-5 minutes to do it locally.
Is it possible to do it faster without using the bulkuploader option? Because I'm getting the 500 server error, obviously. Or maybe doing it in consecutive calls or steps...

Why not just change your code above to insert 100 at a time, and just run something like:
for i in {1..200}
do
curl --cookie "ACSID=your-acsid-cookie" http://your-app-id.appspot.com/populatepath
sleep(5)
done
from your command line? The entries are random anyway, you don't need to remember any state.
You can get the ACSID cookie by logging in manually and inspecting the cookies from your browser.
The sleep between requests will prevent you from spinning up a gigantic number of instances or hitting short-term quotas.
The task queue suggestion is good if this is something you need to automate, but if it's a one-time thing you might as well keep it simple.

Can you batching the process in task queues.
Setting batch size high into task queue...
U can archive it doing faster

I don't understand why you have to create 20,000 in advance as opposed to creating each as needed on the fly, but I bet you could speed up your code quite a bit. Something like this (untested):
class Populate(webapp.RequestHandler):
chars = "AB...Z01...9"
def GenerateCode(self):
return ''.join(choice(chars) for _ in xrange(8))
def get(self):
code_list = []
for i in range(20000):
new_code = self.GenerateCode()
promotion_code = PromotionCode(code=new_code)
code_list.append(promotion_code)
db.put(code_list)
self.response.out.write("populating datastore...<br>")
self.response.out.write("done")
Not printing out the codes may save time.
I'm sure others here can do better...

If your task won't complete in the 30 second request deadline, you can break it up into chunks - which should be easy since they're all doing the same thing - and run them in tasks on the Task Queue. You should probably do all your work there anyway, so you don't force the user to wait for it to complete before returning a response.
Like Jeff, though, I'm puzzled why you'd want to generate 20,000 of these upfront rather than just generating them when you need them.

Record streaming and saving internet radio in python

I am looking for a python snippet to read an internet radio stream(.asx, .pls etc) and save it to a file.
The final project is cron'ed script that will record an hour or two of internet radio and then transfer it to my phone for playback during my commute. (3g is kind of spotty along my commute)
any snippits or pointers are welcome.

The following has worked for me using the requests library to handle the http request.
import requests
stream_url = 'http://your-stream-source.com/stream'
r = requests.get(stream_url, stream=True)
with open('stream.mp3', 'wb') as f:
try:
for block in r.iter_content(1024):
f.write(block)
except KeyboardInterrupt:
pass
That will save a stream to the stream.mp3 file until you interrupt it with ctrl+C.

So after tinkering and playing with it Ive found Streamripper to work best. This is the command i use
streamripper http://yp.shoutcast.com/sbin/tunein-station.pls?id=1377200 -d ./streams -l 10800 -a tb$FNAME

If you find that your requests or urllib.request call in Python 3 fails to save a stream because you receive "ICY 200 OK" in return instead of an "HTTP/1.0 200 OK" header, you need to tell the underlying functions ICY 200 OK is OK!
What you can effectively do is intercept the routine that handles reading the status after opening the stream, just before processing the headers.
Simply put a routine like this above your stream opening code.
def NiceToICY(self):
class InterceptedHTTPResponse():
pass
import io
line = self.fp.readline().replace(b"ICY 200 OK\r\n", b"HTTP/1.0 200 OK\r\n")
InterceptedSelf = InterceptedHTTPResponse()
InterceptedSelf.fp = io.BufferedReader(io.BytesIO(line))
InterceptedSelf.debuglevel = self.debuglevel
InterceptedSelf._close_conn = self._close_conn
return ORIGINAL_HTTP_CLIENT_READ_STATUS(InterceptedSelf)
Then put these lines at the start of your main routine, before you open the URL.
ORIGINAL_HTTP_CLIENT_READ_STATUS = urllib.request.http.client.HTTPResponse._read_status
urllib.request.http.client.HTTPResponse._read_status = NiceToICY
They will override the standard routine (this one time only) and run the NiceToICY function in place of the normal status check when it has opened the stream. NiceToICY replaces the unrecognised status response, then copies across the relevant bits of the original response which are needed by the 'real' _read_status function. Finally the original is called and the values from that are passed back to the caller and everything else continues as normal.
I have found this to be the simplest way to get round the problem of the status message causing an error. Hope it's useful for you, too.

I am aware this is a year old, but this is still a viable question, which I have recently been fiddling with.
Most internet radio stations will give you an option of type of download, I choose the MP3 version, then read the info from a raw socket and write it to a file. The trick is figuring out how fast your download is compared to playing the song so you can create a balance on the read/write size. This would be in your buffer def.
Now that you have the file, it is fine to simply leave it on your drive (record), but most players will delete from file the already played chunk and clear the file out off the drive and ram when streaming is stopped.
I have used some code snippets from a file archive without compression app to handle a lot of the file file handling, playing, buffering magic. It's very similar in how the process flows. If you write up some sudo-code (which I highly recommend) you can see the similarities.

I'm only familiar with how shoutcast streaming works (which would be the .pls file you mention):
You download the pls file, which is just a playlist. It's format is fairly simple as it's just a text file that points to where the real stream is.
You can connect to that stream as it's just HTTP, that streams either MP3 or AAC. For your use, just save every byte you get to a file and you'll get an MP3 or AAC file you can transfer to your mp3 player.
Shoutcast has one addition that is optional: metadata. You can find how that works here, but is not really needed.
If you want a sample application that does this, let me know and I'll make up something later.

In line with the answer from https://stackoverflow.com/users/1543257/dingles (https://stackoverflow.com/a/41338150), here's how you can achieve the same result with the asynchronous HTTP client library - aiohttp:
import functools
import aiohttp
from aiohttp.client_proto import ResponseHandler
from aiohttp.http_parser import HttpResponseParserPy
class ICYHttpResponseParser(HttpResponseParserPy):
def parse_message(self, lines):
if lines[0].startswith(b"ICY "):
lines[0] = b"HTTP/1.0 " + lines[0][4:]
return super().parse_message(lines)
class ICYResponseHandler(ResponseHandler):
def set_response_params(
self,
*,
timer = None,
skip_payload = False,
read_until_eof = False,
auto_decompress = True,
read_timeout = None,
read_bufsize = 2 ** 16,
timeout_ceil_threshold = 5,
) -> None:
# this is a copy of the implementation from here:
# https://github.com/aio-libs/aiohttp/blob/v3.8.1/aiohttp/client_proto.py#L137-L165
self._skip_payload = skip_payload
self._read_timeout = read_timeout
self._reschedule_timeout()
self._timeout_ceil_threshold = timeout_ceil_threshold
self._parser = ICYHttpResponseParser(
self,
self._loop,
read_bufsize,
timer=timer,
payload_exception=aiohttp.ClientPayloadError,
response_with_body=not skip_payload,
read_until_eof=read_until_eof,
auto_decompress=auto_decompress,
)
if self._tail:
data, self._tail = self._tail, b""
self.data_received(data)
class ICYConnector(aiohttp.TCPConnector):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._factory = functools.partial(ICYResponseHandler, loop=self._loop)
This can then be used as follows:
session = aiohttp.ClientSession(connector=ICYConnector())
async with session.get("url") as resp:
print(resp.status)
Yes, it's using a few private classes and attributes but this is the only solution to change the handling of something that's part of HTTP spec and (theoretically) should not ever need to be changed by the library's user...
All things considered, I would say this is still rather clean in comparison to monkey patching which would cause the behavior to be changed for all requests (especially true for asyncio where setting before and resetting after a request does not guarantee that something else won't make a request while request to ICY is being made). This way, you can dedicate a ClientSession object specifically for requests to servers that respond with the ICY status line.
Note that this comes with a performance penalty for requests made with ICYConnector - in order for this to work, I am using the pure Python implementation of HttpResponseParser which is going to be slower than the one that aiohttp uses by default and is written in C. This cannot really be done differently without vendoring the whole library as the behavior for parsing status line is deeply hidden in the C code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.