I want to get audio streaming data from server using Python.
I try simple request to audio stream url using urllib:
req = urllib.request.Request(<url>)
but i get exception:
http.client.BadStatusLine: Uª¨Ì5¦`
It looks like server responses and send data without http header including Status code.
Is there any way to get and process response in this case?
Also it is worth to mention results i got to request this URL with clients:
Curl:
curl "http://<server>:81/audiostream.cgi?user=<user>&pwd=<password>&streamid=0&filename=" curl: (1) Received HTTP/0.9 when not allowed
The workaround is use --http0.9 switch.
Chrome/Chromium based browsers shows:
ERR_INVALID_HTTP_RESPONSE
Mozilla Firefox can correctly fetch this data as binary
Screenshot
Can you upload the code fragment? Or maybe you just need to search it on google and SO. I have found several links that have mentioned this problem.
Like:
Why am I getting httplib.BadStatusLine in python?
BadStatusLine exception raised when returning reply from server in Python 3
Why does this url raise BadStatusLine with httplib2 and urllib2?
Issue 42432: Http client, Bad Status Line triggered for no reason
Check again and Think Twice! Search on SO first before start a new thread.
HTTP 0.9 is about the simplest possible http protocol:
The client sends a document request consisting of a line of ASCII characters terminated by a CR LF (carriage return, line feed) pair [...]
This request consists of the word "GET", a space, the document address , omitting the "http:, host and port parts when they are the coordinates just used to make the connection.
The response to a simple GET request is a message in hypertext mark-up language ( HTML ). This is a byte stream of ASCII characters.
source
Thus your server is not sending a valid HTTP 0.9 response, as it's not html. Chrome (etc) is quite within its rights to reject it, although in practice it may not even support http 0.9.
In this case the camera is apparently (ab)using http to start a stream (since presumably it will carry on sending data over the connection, which is also not http 0.9, although not explicitly forbidden). The simplest way to get the data you want is to do it manually:
Create and open a socket with the server's base address
send a GET request for audiostream.cgi?user=<user>&pwd=<password>&streamid=0&filename= (do you really need that last param?)
run socket.recv(max_bytes) in a loop in a thread and transfer to a (thread-safe) buffer, do whatever you want to do with that buffer in another thread.
Alternatively if you're familiar with async programming, use asyncio rather than threads.
You will obviously need to handle decoding the file stream yourself. Hopefully you can identify the format and pass it to a decoder; alternatively something like ffmpeg might be able to guess it.
Have you tried including User-Agent header when doing this request? Sometimes this can be caused by a web-scraping detection.
import urllib2
opener = urllib2.build_opener()
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 5.1; rv:10.0.1) Gecko/20100101 Firefox/10.0.1',
}
opener.addheaders = headers.items()
response = opener.open(<url>)
Related
I need to intercept an HTTP Response packet from the server and replace it with my own response, or at least modify that response, before it arrives to my browser.
I'm already able to sniff this response and print it, the problem is with manipulating/replacing it.
Is there a way to do so wiht scapy library ?
Or do i have to connect my browser through a proxy to manipulate the response ?
If you want to work from your ordinary browser, then you need proxy between browser and server in order to manipulate it. E.g. see https://portswigger.net/burp/ which is a proxy specifically created for penetration testing with easy replacing of responses/requests (which is sriptable, too).
If you want to script all your session in scapy, then you can create requests and responses to your liking, but response does not go to the browser. Also, you can record ordinary web session (with tcpdump/wireshark/scapy) into pcap, then use scapy to read pcap modify it and send similar requests to the server.
I have the basic code (form https://docs.python.org/2/howto/urllib2.html):
import urllib2
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)
the_page = response.read()
I would like to get the of the entire request and the size of the entire response, Is there any way?
(haven't seen one for urllib2 or for requests)
"entire" - means including headers and any meta-data that might be sent with it.
Thanks.
res.headers might or might not contain a field provided by the server which says content-length. So int(res.headers['content-length']) will give you the information - if the server provides it.
A very simple implementation of a HTTP stream might not provide this information at all, so you don't know it until you get EOF.
So, I'm using the following snippet of code to fetch parts of a web page and parse it (not related to this problem).
def load_max_resp(self, resp, size=4096):
it = resp.iter_content()
file_str = StringIO()
for i in xrange(size):
try:
file_str.write(it.next())
except StopIteration:
break
return file_str.getvalue()
the resp element is loaded with:
resp = requests.get(url, stream=True)
This fragment of code works properly on my own machine/network, I have no problem whatsoever. When I upload it to my server, however, sometimes the iter_content() returns an empty iterator (when I call it.next() for the first time it throws as StopIteration exception). This only happens on some (most, actually) website, always the same.
I have tested it in a console/interpreter and if I remove the stream=True parameter it works as intended but I cannot remove it because I need to only download a max amount of bytes from the page (to avoid network congestion). I have upgraded to the latest requests package from pip and I made sure the library is the same on my development and production machine.
My wild guess is that there's an Linux flag somewhere stopping some streamed connection? (Using ubuntu on dev machine, Debian wheezy on production server).
Alternatively, how do I make an HTTP request (a GET) specifying a maximum allowed return value for the resource? I cannot test the headers because some websites don't have a content-length.
Alternatively, how do I make an HTTP request (a GET) specifying a
maximum allowed return value for the resource? I cannot test the
headers because some websites don't have a content-length.
You might want to look at Byte serving
Byte serving is the process of sending only a portion of an HTTP/1.1
message from a server to a client. Byte serving uses the Range HTTP
request header and the Accept-Ranges and Content-Range HTTP response
headers.
I am using urllib2 to do an http post request using Python 2.7.3. My request is returning an HTTPError exception (HTTP Error 502: Proxy Error).
Looking at the messages traffic with Charles, I see the following is happening:
I send the HTTP request (POST /index.asp?action=login HTTP/1.1) using urllib2
The remote server replies with status 303 and a location header of ../index.asp?action=news
urllib2 retries sending a get request: (GET /../index.asp?action=news HTTP/1.1)
The remote server replies with status 502 (Proxy error)
The 502 reply includes this in the response body: "DNS lookup failure for: 10.0.0.30:80index.asp" (Notice the malformed URL)
So I take this to mean that a proxy server on the remote server's network sees the "/../index.asp" URL in the request and misinterprets it, sending my request on with a bad URL.
When I make the same request with my browser (Chrome), the retry is sent to GET /index.asp?action=news. So Chrome takes off the leading "/.." from the URL, and the remote server replies with a valid response.
Is this a urllib2 bug? Is there something I can do so the retry ignores the "/.." in the URL? Or is there some other way to solve this problem? Thinking it might be a urllib2 bug, I swapped out urllib2 with requests but requests produced the same result. Of course, that may be because requests is built on urllib2.
Thanks for any help.
The Location being sent with that 302 is wrong in multiple ways.
First, if you read RFC2616 (HTTP/1.1 Header Field Definitions) 14.30 Location, the Location must be an absoluteURI, not a relative one. And section 10.3.3 makes it clear that this is the relevant definition.
Second, even if a relative URI were allowed, RFC 1808, Relative Uniform Resource Locators, 4. Resolving Relative URLs, step 6, only specifies special handling for .. in the pattern <segment>/../. That means that a relative URL shouldn't start with ... So, even if the base URL is http://example.com/foo/bar/ and the relative URL is ../baz/, the resolved URL is not http://example.com/foo/baz/, but http://example.com/foo/bar/../baz. (Of course most servers will treat these the same way, but that's up to each server.)
Finally, even if you did combine the relative and base URLs before resolving .., an absolute URI with a path starting with .. is invalid.
So, the bug is in the server's configuration.
Now, it just so happens that many user-agents will work around this bug. In particular, they turn /../foo into /foo to block users (or arbitrary JS running on their behalf without their knowledge) from trying to do "escape from webroot" attacks.
But that doesn't mean that urllib2 should do so, or that it's buggy for not doing so. Of course urllib2 should detect the error earlier so it can tell you "invalid path" or something, instead of running together an illegal absolute URI that's going to confuse the server into sending you back nonsense errors. But it is right to fail.
It's all well and good to say that the server configuration is wrong, but unless you're the one in charge of the server, you'll probably face an uphill battle trying to convince them that their site is broken and needs to be fixed when it works with every web browser they care about. Which means you may need to write your own workaround to deal with their site.
The way to do that with urllib2 is to supply your own HTTPRedirectHandler with an implementation of redirect_request method that recognizes this case and returns a different Request than the default code would (in particular, http://example.com/index.asp?action=news instead of http://example.com/../index.asp?action=news).
I'm writing a little tool to monitor class openings at my school.
I wrote a python script that will fetch the current availablity of classes from each department every few minutes.
The script was functioning properly until the uni's site started returning this:
SIS Server is not available at this time
Uni must have blocked my server right? Well, not really because that is the output I get when I goto the URL directly from other PCs. But if I go through the intermediary form on uni's site that does a POST, I don't get that message.
The URL I'm requesting is https://s4.its.unc.edu/SISMisc/SISTalkerServlet
This is what my python code looks like:
data = urllib.urlencode({"progname" : "SIR033WA", "SUBJ" : "busi", "CRS" : "", "TERM" : "20099"})
f = urllib.urlopen("https://s4.its.unc.edu/SISMisc/SISTalkerServlet", data)
s = f.read()
print (s)
I am really stumped! It seems like python isn't sending a proper request. At first I thought it wasn't sending a proper post data but I changed the URL to my localbox and the post data apache recieved seemed just fine.
If you'd like to see the system actually functioning, goto https://s4.its.unc.edu/SISMisc/browser/student_pass_z.jsp and click on the "Enter as Guest" button and then look for "Course Availability". (Now you know why I'm building this!)
Weirdest thing is this was working until 11am! I've had the same error before but it only lasted for few minutes. This tells me it is more of a problem somewhere than any blocking of my server by the uni.
update
Upon suggestion, I tried to play with a more legit referer/user-agent. Same result. This is what I tried:
import httplib
import urllib
headers = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;rv:1.9.0.4) Gecko/2008102920 Firefox/3.0.4',"Content-type": "application/x-www-form-urlencoded","Accept": "text/plain","Referrer": "https://s4.its.unc.edu/SISMisc/SISTalkerServlet"}
data = urllib.urlencode({"progname" : "SIR033WA", "SUBJ" : "busi", "CRS" : "", "TERM" : "20099"})
c = httplib.HTTPSConnection("s4.its.unc.edu",443)
c.request("POST", "/SISMisc/SISTalkerServlet",data,headers)
r = c.getresponse()
print r.read()
This post doesn't attempt to fix your code, but suggest a debugging tool.
Once upon a time I was coding a program to fill out online forms for me. To learn exactly how my browser was handling the POSTs, and cookies, and whatnot, I installed WireShark ( http://www.wireshark.org/ ), a network sniffer. This application allowed me to view, chunk by chunk, the data that was being sent and received on the IP and hardware level.
You might consider trying out a similar program and comparing the network flow. This might highlight differences between what your browser is doing and your script is doing.
After seeing multiple requests from an odd non-browser User-Agent string, it's possible that they are blocking users not being referred to from the site. For example, PHP has a feature called $_SERVER['HTTP_REFERRER'] IIRC, which will check the page which reffered the user to the current one. Since your program is not including one in the User-Agent string (you are trying to directly access it) it is very possible they are preventing you access based upon that. Try adding a referrer into the headers of your http request and see how it goes. (preferably a page which links to the one you're trying to access)
http://whatsmyuseragent.com/ can assist you in building your spoofed user agent.
you then build headers like so...
headers = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}
and then send them as an additional parameter with your HTTPConnection request...
conn.request("POST", "/page/on/site", params, headers)
see the python doc on httplib for further reference and examples.