"Snooping" python's telnetlib - python

I have an application that calls telnetlib.read_until(). For the most part, it works fine.
However when my app's telnet connection fails, it's hard to debug the exact cause. Is it my script or is the server connection dodgy? (This is a development lab, so there are a lot of dodgy servers).
What I would like to do is to be able to easily snoop the data placed into the cooked queue before my app calls telnetlib.read_until() (thereby hopefully avoiding impacting my app's operation.)
Poking around in telnetlib.py, I found that 'buf[0]' is just the data I want: the newly-added data without the repetition caused by snooping 'cookedq'.
I can insert a line right before the end of telnetlib.process_rawq() to print out the processed data as it is received from the server.
telnetlib.process_rawq ...
...
self.cookedq = self.cookedq + buf[0]
print("Dbg: Cooked Queue contents = %r" % buf[0] <= my added debug line
self.sbdataq = self.sbdataq + buf[1]
This works well. I can see the data almost exactly as received by my app without impacting its operation at all.
Here's the question: Is there a snazzier way to accomplish this? This approach is basic and works, but I'll have to remember to re-make this change every time I upgrade Python's libraries.
My attempts to simply extend telnet.process_rawq() were unsuccessful, as buf is internal to telnet.process_rawq()
Is there a (more pythonic) way to snoop this telnetlib.process_rawq()-internal value without modifying telnetlib.py?
Thanks.

I just found a much better solution (by reading the code, duh!)
telnetlib has a debugging output option already built-in. Just call set_debuglevel(1) and Bob's your uncle.

The easy hack is to monkey patch the library. Copy and paste the function you want to change into your source (unfortunately, process_rawq is a rather large function), and modify it as you need. You can then replace the method in the class with your own.
import telnetlib
def process_rawq(self):
#existing stuff
self.cookedq = self.cookedq + buf[0]
print("Dbg: Cooked Queue contents = %r" % buf[0]
self.sbdataq = self.sbdataq + buf[1]
telnetlib.Telnet.process_rawq = process_rawq
You could alternatively try the debugging built-in to the telnetlib module with set_debuglevel(1), which prints a lot of info to stdout.
In this situation, I would tend to just grab wireshark/tshark/tcpdump and directly inspect the network session.

Related

Is it reasonable to wrap an entire main loop in a try..finally block?

I've made a map editor in Python2.7.9 for a small project and I'm looking for ways to preserve the data I edit in the event of some unhandled exception. My editor already has a method for saving out data, and my current solution is to have the main loop wrapped in a try..finally block, similar to this example:
import os, datetime #..and others.
if __name__ == '__main__':
DataMgr = DataManager() # initializes the editor.
save_note = None
try:
MainLoop() # unsurprisingly, this calls the main loop.
except Exception as e: # I am of the impression this will catch every type of exception.
save_note = "Exception dump: %s : %s." % (type(e).__name__, e) # A memo appended to the comments in the save file.
finally:
exception_fp = DataMgr.cwd + "dump_%s.kmap" % str(datetime.datetime.now())
DataMgr.saveFile(exception_fp, memo = save_note) # saves out to a dump file using a familiar method with a note outlining what happened.
This seems like the best way to make sure that, no matter what happens, an attempt is made to preserve the editor's current state (to the extent that saveFile() is equipped to do so) in the event that it should crash. But I wonder if encapsulating my entire main loop in a try block is actually safe and efficient and good form. Is it? Are there risks or problems? Is there a better or more conventional way?
Wrapping the main loop in a try...finally block is the accepted pattern when you need something to happen no matter what. In some cases it's logging and continuing, in others it's saving everything possible and quitting.
So you're code is fine.
If your file isn't that big, I would suggest maybe reading the entire input file into memory, closing the file, then doing your data processing on the copy in memory, this will solve any problems you have with not corrupting your data at the cost of potentially slowing down your runtime.
Alternatively, take a look at the atexit python module. This allows you to register a function(s) for a automatic callback function when the program exits.
That being said what you have should work reasonably well.

Is get_result() a required call for put_async() in Google App Engine

With the new release of GAE 1.5.0, we now have an easy way to do async datastore calls. Are we required to call get_result() after calling
'put_async'?
For example, if I have an model called MyLogData, can I just call:
put_async(MyLogData(text="My Text"))
right before my handler returns without calling the matching get_result()?
Does GAE automatically block on any pending calls before sending the result to the client?
Note that I don't really care to handle error conditions. i.e. I don't mind if some of these puts fail.
I don't think there is any sure way to know if get_result() is required unless someone on the GAE team verifies this, but I think it's not needed. Here is how I tested it.
I wrote a simple handler:
class DB_TempTestModel(db.Model):
data = db.BlobProperty()
class MyHandler(webapp.RequestHandler):
def get(self):
starttime = datetime.datetime.now()
lots_of_data = ' '*500000
if self.request.get('a') == '1':
db.put(DB_TempTestModel(data=lots_of_data))
db.put(DB_TempTestModel(data=lots_of_data))
db.put(DB_TempTestModel(data=lots_of_data))
db.put(DB_TempTestModel(data=lots_of_data))
if self.request.get('a') == '2':
db.put_async(DB_TempTestModel(data=lots_of_data))
db.put_async(DB_TempTestModel(data=lots_of_data))
db.put_async(DB_TempTestModel(data=lots_of_data))
db.put_async(DB_TempTestModel(data=lots_of_data))
self.response.out.write(str(datetime.datetime.now()-starttime))
I ran it a bunch of times on a High Replication Application.
The data was always there, making me believe that unless there is a failure in the datastore side of things (unlikely), it's gonna be written.
Here's the interesting part. When the data is written with put_async() (?a=2), the amount of time (to process the request) was on average about 2 to 3 times as fast as put()(?a=1) (not a very scientific test - just eyeballing it).
But the cpu_ms and api_cpu_ms were the same for both ?a=1 and ?a=2.
From the logs:
ms=440 cpu_ms=627 api_cpu_ms=580 cpm_usd=0.036244
vs
ms=149 cpu_ms=627 api_cpu_ms=580 cpm_usd=0.036244
On the client side, looking at the network latency of the requests, it showed the same results, i.e. `?a=2' requests were at least 2 times faster. Definitely a win on the client side... but it seems to not have any gain on the server side.
Anyone on the GAE team care to comment?
db.put_async works fine without get_result when deployed (in fire-and-forget style) but in locally it won't take action until get_result gets called more context
I dunno, but this works:
import datetime
from google.appengine.api import urlfetch
def main():
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, "some://artificially/slow.url")
print "Content-type: text/plain"
print
print str(datetime.datetime.now())
if __name__ == '__main__':
main()
The remote URL sleeps 3 seconds and then sends me an email. The App Engine handler returns immediately, and the remote URL completes as expected. Since both services abstract the same underlying RPC framework, I would guess the datastore behaves similarly.
Good question, though. Perhaps Nick or another Googler can answer definitively.

Record streaming and saving internet radio in python

I am looking for a python snippet to read an internet radio stream(.asx, .pls etc) and save it to a file.
The final project is cron'ed script that will record an hour or two of internet radio and then transfer it to my phone for playback during my commute. (3g is kind of spotty along my commute)
any snippits or pointers are welcome.
The following has worked for me using the requests library to handle the http request.
import requests
stream_url = 'http://your-stream-source.com/stream'
r = requests.get(stream_url, stream=True)
with open('stream.mp3', 'wb') as f:
try:
for block in r.iter_content(1024):
f.write(block)
except KeyboardInterrupt:
pass
That will save a stream to the stream.mp3 file until you interrupt it with ctrl+C.
So after tinkering and playing with it Ive found Streamripper to work best. This is the command i use
streamripper http://yp.shoutcast.com/sbin/tunein-station.pls?id=1377200 -d ./streams -l 10800 -a tb$FNAME
If you find that your requests or urllib.request call in Python 3 fails to save a stream because you receive "ICY 200 OK" in return instead of an "HTTP/1.0 200 OK" header, you need to tell the underlying functions ICY 200 OK is OK!
What you can effectively do is intercept the routine that handles reading the status after opening the stream, just before processing the headers.
Simply put a routine like this above your stream opening code.
def NiceToICY(self):
class InterceptedHTTPResponse():
pass
import io
line = self.fp.readline().replace(b"ICY 200 OK\r\n", b"HTTP/1.0 200 OK\r\n")
InterceptedSelf = InterceptedHTTPResponse()
InterceptedSelf.fp = io.BufferedReader(io.BytesIO(line))
InterceptedSelf.debuglevel = self.debuglevel
InterceptedSelf._close_conn = self._close_conn
return ORIGINAL_HTTP_CLIENT_READ_STATUS(InterceptedSelf)
Then put these lines at the start of your main routine, before you open the URL.
ORIGINAL_HTTP_CLIENT_READ_STATUS = urllib.request.http.client.HTTPResponse._read_status
urllib.request.http.client.HTTPResponse._read_status = NiceToICY
They will override the standard routine (this one time only) and run the NiceToICY function in place of the normal status check when it has opened the stream. NiceToICY replaces the unrecognised status response, then copies across the relevant bits of the original response which are needed by the 'real' _read_status function. Finally the original is called and the values from that are passed back to the caller and everything else continues as normal.
I have found this to be the simplest way to get round the problem of the status message causing an error. Hope it's useful for you, too.
I am aware this is a year old, but this is still a viable question, which I have recently been fiddling with.
Most internet radio stations will give you an option of type of download, I choose the MP3 version, then read the info from a raw socket and write it to a file. The trick is figuring out how fast your download is compared to playing the song so you can create a balance on the read/write size. This would be in your buffer def.
Now that you have the file, it is fine to simply leave it on your drive (record), but most players will delete from file the already played chunk and clear the file out off the drive and ram when streaming is stopped.
I have used some code snippets from a file archive without compression app to handle a lot of the file file handling, playing, buffering magic. It's very similar in how the process flows. If you write up some sudo-code (which I highly recommend) you can see the similarities.
I'm only familiar with how shoutcast streaming works (which would be the .pls file you mention):
You download the pls file, which is just a playlist. It's format is fairly simple as it's just a text file that points to where the real stream is.
You can connect to that stream as it's just HTTP, that streams either MP3 or AAC. For your use, just save every byte you get to a file and you'll get an MP3 or AAC file you can transfer to your mp3 player.
Shoutcast has one addition that is optional: metadata. You can find how that works here, but is not really needed.
If you want a sample application that does this, let me know and I'll make up something later.
In line with the answer from https://stackoverflow.com/users/1543257/dingles (https://stackoverflow.com/a/41338150), here's how you can achieve the same result with the asynchronous HTTP client library - aiohttp:
import functools
import aiohttp
from aiohttp.client_proto import ResponseHandler
from aiohttp.http_parser import HttpResponseParserPy
class ICYHttpResponseParser(HttpResponseParserPy):
def parse_message(self, lines):
if lines[0].startswith(b"ICY "):
lines[0] = b"HTTP/1.0 " + lines[0][4:]
return super().parse_message(lines)
class ICYResponseHandler(ResponseHandler):
def set_response_params(
self,
*,
timer = None,
skip_payload = False,
read_until_eof = False,
auto_decompress = True,
read_timeout = None,
read_bufsize = 2 ** 16,
timeout_ceil_threshold = 5,
) -> None:
# this is a copy of the implementation from here:
# https://github.com/aio-libs/aiohttp/blob/v3.8.1/aiohttp/client_proto.py#L137-L165
self._skip_payload = skip_payload
self._read_timeout = read_timeout
self._reschedule_timeout()
self._timeout_ceil_threshold = timeout_ceil_threshold
self._parser = ICYHttpResponseParser(
self,
self._loop,
read_bufsize,
timer=timer,
payload_exception=aiohttp.ClientPayloadError,
response_with_body=not skip_payload,
read_until_eof=read_until_eof,
auto_decompress=auto_decompress,
)
if self._tail:
data, self._tail = self._tail, b""
self.data_received(data)
class ICYConnector(aiohttp.TCPConnector):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._factory = functools.partial(ICYResponseHandler, loop=self._loop)
This can then be used as follows:
session = aiohttp.ClientSession(connector=ICYConnector())
async with session.get("url") as resp:
print(resp.status)
Yes, it's using a few private classes and attributes but this is the only solution to change the handling of something that's part of HTTP spec and (theoretically) should not ever need to be changed by the library's user...
All things considered, I would say this is still rather clean in comparison to monkey patching which would cause the behavior to be changed for all requests (especially true for asyncio where setting before and resetting after a request does not guarantee that something else won't make a request while request to ICY is being made). This way, you can dedicate a ClientSession object specifically for requests to servers that respond with the ICY status line.
Note that this comes with a performance penalty for requests made with ICYConnector - in order for this to work, I am using the pure Python implementation of HttpResponseParser which is going to be slower than the one that aiohttp uses by default and is written in C. This cannot really be done differently without vendoring the whole library as the behavior for parsing status line is deeply hidden in the C code.

How to solve Python memory leak when using urrlib2?

I'm trying to write a simple Python script for my mobile phone to periodically load a web page using urrlib2. In fact I don't really care about the server response, I'd only like to pass some values in the URL to the PHP. The problem is that Python for S60 uses the old 2.5.4 Python core, which seems to have a memory leak in the urrlib2 module. As I read there's seems to be such problems in every type of network communications as well. This bug have been reported here a couple of years ago, while some workarounds were posted as well. I've tried everything I could find on that page, and with the help of Google, but my phone still runs out of memory after ~70 page loads. Strangely the Garbege Collector does not seem to make any difference either, except making my script much slower. It is said that, that the newer (3.1) core solves this issue, but unfortunately I can't wait a year (or more) for the S60 port to come.
here's how my script looks after adding every little trick I've found:
import urrlib2, httplib, gc
while(true):
url = "http://something.com/foo.php?parameter=" + value
f = urllib2.urlopen(url)
f.read(1)
f.fp._sock.recv=None # hacky avoidance
f.close()
del f
gc.collect()
Any suggestions, how to make it work forever without getting the "cannot allocate memory" error?
Thanks for advance,
cheers, b_m
update:
I've managed to connect 92 times before it ran out of memory, but It's still not good enough.
update2:
Tried the socket method as suggested earlier, this is the second best (wrong) solution so far:
class UpdateSocketThread(threading.Thread):
def run(self):
global data
while 1:
url = "/foo.php?parameter=%d"%data
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('something.com', 80))
s.send('GET '+url+' HTTP/1.0\r\n\r\n')
s.close()
sleep(1)
I tried the little tricks, from above too. The thread closes after ~50 uploads (the phone has 50MB of memory left, obviously the Python shell has not.)
UPDATE:
I think I'm getting closer to the solution! I tried sending multiple data without closing and reopening the socket. This may be the key since this method will only leave one open file descriptor. The problem is:
import socket
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
socket.connect(("something.com", 80))
socket.send("test") #returns 4 (sent bytes, which is cool)
socket.send("test") #4
socket.send("test") #4
socket.send("GET /foo.php?parameter=bar HTTP/1.0\r\n\r\n") #returns the number of sent bytes, ok
socket.send("GET /foo.php?parameter=bar HTTP/1.0\r\n\r\n") #returns 0 on the phone, error on Windows7*
socket.send("GET /foo.php?parameter=bar HTTP/1.0\r\n\r\n") #returns 0 on the phone, error on Windows7*
socket.send("test") #returns 0, strange...
*: error message: 10053, software caused connection abort
Why can't I send multiple messages??
Using the test code suggested by your link, I tested my Python installation and confirmed that it indeed leaks. But, if, as #Russell suggested, I put each urlopen in its own process, the OS should clean up the memory leaks. In my tests, memory, unreachable objects and open files all remain more or less constant. I split the code into two files:
connection.py
import cPickle, urllib2
def connectFunction(queryString):
conn = urllib2.urlopen('http://something.com/foo.php?parameter='+str(queryString))
data = conn.read()
outfile = ('sometempfile'. 'wb')
cPickle.dump(data, outfile)
outfile.close()
if __name__ == '__main__':
connectFunction(sys.argv[1])
###launcher.py
import subprocess, cPickle
#code from your link to check the number of unreachable objects
def print_unreachable_len():
# check memory on memory leaks
import gc
gc.set_debug(gc.DEBUG_SAVEALL)
gc.collect()
unreachableL = []
for it in gc.garbage:
unreachableL.append(it)
return len(str(unreachableL))
#my code
if __name__ == '__main__':
print 'Before running a single process:', print_unreachable_len()
return_value_list = []
for i, value in enumerate(values): #where values is a list or a generator containing (or yielding) the parameters to pass to the URL
subprocess.call(['python', 'connection.py', str(value)])
print 'after running', i, 'processes:', print_unreachable_len()
infile = open('sometempfile', 'rb')
return_value_list.append(cPickle.load(infile))
infile.close()
Obviously, this is sequential, so you will only execute a single connection at a time, which may or may not be an issue for you. If it is, you will have to find a non-blocking way of communicating with the processes you're launching, but I'll leave that as an exercise for you.
EDIT: On re-reading your question, it seems you don't care about the server response. In that case, you can get rid of all the pickling related code. And obviously, you won't have the print_unreachable_len() related bits in your final code either.
There exist a reference cycle in urllib2 created in urllib2.py:1216. The issue is on going and exists since 2009.
https://bugs.python.org/issue1208304
I think this is probably your problem. To summarize that thread, there's a memory leak in Pys60's DNS lookup, and you can work around it by moving DNS lookup outside the inner loop.
This seems like a (very!) hacky workaround, but a bit of googling found this comment on the problem:
Apparently adding f.read(1) will stop the leaking!
import urllib2
f = urllib2.urlopen('http://www.google.com')
f.read(1)
f.close()
EDIT: oh, I see you already have f.read(1)... I'm all out of ideas then :/
Consider using the low-level socket API (related howto) instead of urllib2.
HOST = 'daring.cwi.nl' # The remote host
PORT = 50007 # The same port as used by the server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.send('GET /path/to/file/index.html HTTP/1.0\n\n')
# you'll need to figure out how much data to read and read that exactly
# or wait for read() to return data of zero length (I think!)
DATA_SZ = 1024
data = s.recv(DATA_SZ)
s.close()
print 'Received', repr(data)
How to execute and read a HTTP request via low-level sockets is a bit beyond the scope of the question (and perhaps may make a good question on its own on stackoverflow — I searched but didn't see it), but I hope this points you in the direction of a solution that may resolve your problem!
edit An answer in here about using makefile may be helpful: HTTP basic authentication using sockets in python
This does not leak for me with Python 2.6.1 on a Mac. Which version are you using?
BTW, your program doesn't work due to a few typos. Here is one that does work:
import urllib2, httplib, gc
value = "foo"
count = 0
while(True):
url = "http://192.168.1.1/?parameter=" + value
f = urllib2.urlopen(url)
f.read(1)
f.fp._sock.recv=None # hacky avoidance
f.close()
del f
print "count=",count
count += 1
Depending on platform and python version, python might not release memory back to OS. See this stackoverflow thread. That said, python should not endlessly consume memory. Judging from the code you use, it appears to be bug in python runtime unless, urllib/sockets use globals which I don't believe it does - blame it on Python on S60!
Have you considered other sources of memory leakage? Endless log file open, ever increasing array or smth like that? If it truly is a bug in sockets interface, then your only option is to use the subprocess approach.

Python telnetlib: surprising problem

I am using the Python module telnetlib to create a telnet session (with a chess server), and I'm having an issue I really can't wrap my brain around. The following code works perfectly:
>>> f = login("my_server") #code for login(host) below.
>>> f.read_very_eager()
This spits out everything the server usually prints upon login. However, when I put it inside a function and then call it thus:
>>> def foo():
... f = login("my_server")
... return f.read_very_eager()
...
>>> foo()
I get nothing (the empty string). I can check that the login is performed properly, but for some reason I can't see the text. So where does it get swallowed?
Many thanks.
For completeness, here is login(host):
def login(host, handle="guest", password=""):
try:
f = telnetlib.Telnet(host) #connect to host
except:
raise Error("Could not connect to host")
f.read_until("login: ")
try:
f.write(handle + "\n\r")
except:
raise Error("Could not write username to host")
if handle == "guest":
f.read_until(":\n\r")
else:
f.read_until("password: ")
try:
f.write(password + "\n\r")
except:
raise Error("Could not write password to host")
return f
The reason why this works when you try it out manually but not when in a function is because when you try it out manually, the server has enough time to react upon the login and send data back. When it's all in one function, you send the password to the server and never wait long enough for the server to reply.
If you prefer a (probably more correct) technical answer:
In file telnetlib.py (c:\python26\Lib\telnetlib.py on my Windows computer), function read_very_eager(self) calls self.sock_avail() Now, function sock_avail(self) does the following:
def sock_avail(self):
"""Test whether data is available on the socket."""
return select.select([self], [], [], 0) == ([self], [], [])
What this does is really simple: if there is -anything- to read from our socket (the server has answered), it'll return True, otherwise it'll return False.
So, what read_very_eager(self) does is: check if there is anything available to read. If there is, then read from the socket, otherwise just return an empty string.
If you look at the code of read_some(self) you'll see that it doesn't check if there is any data available to read. It'll try reading till there is something available, which means that if the server takes for instance 100ms before answering you, it'll wait 100ms before returning the answer.
I'm having the same trouble as you, unfortunately the combination of select.select, which I have in a while loop until I am able to read, and then calling read_some() does not work for me, still only reading 1% of the actual output. If I put a time.sleep(10) on before I read and do a read_very_eager() it seems to work...this is a very crude way of doing things but it does work..I wish there was a better answer and I wish I had more reputation points so I could respond to user387821 and see if he has any additional tips.

Categories

Resources