Python documentation on possibly inherited method - python

I am writing a program (python Python 3.5.2) that uses a HTTPSConnection to get a JSON object as a response. I have it working using some example code, but am not sure where a method comes from.
My question is this: In the code below, the decode('utf-9') method doesn't exist in the documentation at https://docs.python.org/3.4/library/http.client.html#http.client.HTTPResponse under "21.12.2. HTTPResponse Objects". How would I know that the return value from the method "response.read()" has the method "decode('utf-8')" available?
Do Python objects inherit from a base class like C# objects do or am I missing something?
http = HTTPSConnection(get_hostname(token))
http.request('GET', uri_path, headers=get_authorization_header(token))
response = http.getresponse()
print(response.status, response.reason)
feed = json.loads(response.read().decode('utf-8'))
Thank you for your help.

The read method of the response object always returns a byte string (in Python 3, which I presume you are using as you use the print function). The byte string does indeed have a decode method, so there should be no problem with this code. Of course it makes the assumption that the response is encoded in UTF-8, which may or may not be correct.
[Technical note: email is a very difficult medium to handle: messages can be made up of different parts, each of which is differently encoded. At least with web traffic you stand a chance of reading the Content-Type header's charset attribute to find the correct encoding].

Related

Cant replace spaces in a python variable

i tried to replace spaces in a variable in python but it returns me this error
AttributeError: 'HTTPHeaders' object has no attribute 'replace'
this is my code
for req in driver.requests:
print(req.headers)
d = req.headers
x = d.replace("""
""", "")
So, if you check out the class HTTPHeaders you'll see it has a __repr__ function and that it's an HTTPMessage object.
Depending on what you exactly want to achieve (which is still not clear to me!, i.e, for which header do you want to replace spaces?) you can go about this two ways. Use the methods on the HTTPMessage object (documented here) or use the string version of it by calling repr on the response. I recommend you use the first approach as it is much cleaner.
I'll give an example in which I remove spaces for all canary values in all of the requests:
for req in driver.requests:
canary = req.headers.get("canary")
canary = canary.replace(" ", "")
P.S., your question is nowhere near clear enough as it stands. Only after asking multiple times and linking your other question it becomes clear that you are using seleniumwire, for example. Ideally, the code you provide can be run by anyone with the installed packages and reproduces the issue you have. BUT, allright, the comments made it more clear.

Some help understanding my own Python code

I'm starting to learn Python and I've written the following Python code (some of it omitted) and it works fine, but I'd like to understand it better. So I do the following:
html_doc = requests.get('[url here]')
Followed by:
if html_doc.status_code == 200:
soup = BeautifulSoup(html_doc.text, 'html.parser')
line = soup.find('a', class_="some_class")
value = re.search('[regex]', str(line))
print (value.group(0))
My questions are:
What does html_doc.text really do? I understand that it makes "text" (a string?) out of html_doc, but why isn't it text already? What is it? Bytes? Maybe a stupid question but why doesn't requests.get create a really long string containing the HTML code?
The only way that I could get the result of re.search was by value.group(0) but I have literally no idea what this does. Why can't I just look at value directly? I'm passing it a string, there's only one match, why is the resulting value not a string?
requests.get() return value, as stated in docs, is Response object.
re.search() return value, as stated in docs, is MatchObject object.
Both objects are introduced, because they contain much more information than simply response bytes (e.g. HTTP status code, response headers etc.) or simple found string value (e.g. it includes positions of first and last matched characters).
For more information you'll have to study docs.
FYI, to check type of returned value you may use built-in type function:
response = requests.get('[url here]')
print type(response) # <class 'requests.models.Response'>
Seems to me you are lacking some basic knowledge about Classes, Object and methods...etc, you need to read more about it here (for Python 2.7) and about requests module here.
Concerning what you asked, when you type html_doc = requests.get('url'), you are creating an instance of class requests.models.Response, you can check it by:
>>> type(html_doc)
<class 'requests.models.Response'>
Now, html_doc has methods, thus html_doc.text will return to you the server's response
Same goes for re module, each of its methods generates response object that are not simply int or string

AMF serialization for python3

I am trying to write a python3 encoder/decoder for AMF.
The reason I'm doing it is because I didn't find a suitable library that works on python3 (I'm looking for a non-obtrusive library - one that will provide me with the methods and let me handle the gateway myself)
Avaialble libraries I tested for python are amfast, pyamf and amfy. While the first 2 are for python2 (several forks of pyamf suggest that they support python3 but I coudn't get it to work), amfy was designed for python3 but lacks some features that I need (specifically object serialization).
Reading through the specification of AMF0 and AMF3, I was able to add a package encoder/decoder but I stumbled on object serialization and the available documentation was not enough (would love to see some examples). Existing libraries were of no help either.
Using remoteObject (in flex), I managed to send the following request to my parser:
b'\x00\x03\x00\x00\x00\x01\x00\x04null\x00\x02/1\x00\x00\x00\xe0\n\x00\x00\x00\x01\x11
\n\x81\x13Mflex.messaging.messages.CommandMessage\x13operation\x1bcorrelationId\x13
timestamp\x11clientId\x15timeToLive\tbody\x0fheaders\x17destination\x13messageId\x04\x05
\x06\x01\x04\x00\x01\x04\x00\n\x0b\x01\x01\n\x05\tDSId\x06\x07nil%DSMessagingVersion\x04
\x01\x01\x06\x01\x06I03ACB769-9733-6A6C-0923-79F667AE8249'
(notice that newlines were introduced to make the request more readable)
The headers are parsed OK but when I get to the first object (\n near the end of the first line), it is marked as a reference (LSB = 0) while there is no other object it can reference to.
am I reading this wrong? Is this a malformed bytes request?
Any help decoding these bytes will be welcomed.
From the AMF3 spec, section 4.1 NetConnection and AMF3:
The format of this messaging structure is AMF 0 (See [AMF0]. A context header value or message body can switch to AMF 3 encoding using the special avmplus-object-marker type.
What this means is that by default, the message body must be parsed as AMF0. Only when encountering an avmplus-object-marker (0x11) should you switch to AMF3. As a result, the 0x0a type marker in your value is not actually an AMF3 object-marker, but an AMF0 strict-array-marker.
Looking at section 2.12 Strict Array Type in the AMF0 spec, we can see that this type is simply defined as an u32 array-count, followed that number of value-types.
In your data, the array-count is 0x00, 0x00, 0x00, 0x01 (i.e. 1), and the value following that has a type marker of 0x11 - which is the avmplus-object-marker mentioned above. Thus, only after starting to parse the AMF0 array contents should you actually switch to AMF3 to parse the following object.
In this case, the object then is an actual AMF3 object (type marker 0x0a), followed by a non-dynamic U29O-traits with 9 sealed members. But I'm sure you can take it from here. :)

bottle convert post request to unicode

I have a server as app and all works ok except when I save ajax forms. If I save from python script - with right input - data are returned as unicode. But the data from js is strange: on pipe should be only bytes(that's the only data type http knows) , but bottle show me str (it is not utf-8) and I can't encode/decode to get correct value. On js side I try with jquery and form.serialise, works for other frameworks.
#post('/agt/save')
def saveagt():
a = Agent({x: request.forms.get(x) for x in request.forms})
print(a.nume, a.nume.encode())
return {'ret': ags.add(a)}
... and a name like „țânțar” become „ÈânÈar”.
It may be a simple problem, but I think I didn't drink enough coffee yet.
If anyone is curious, bottle don't handle corect the url.
So urllib.parse.unquote(request.body.read().decode()) solve problem.
or
d = urllib.parse.parse_qs(request.body.read().decode())
a = Agent({x: d[x][0] for x in d})
in my case.
Is it a bug of bottle? Or should I tell him to decode URI and I don't know how?
Use
request.forms.getunicode('some_form_field_name')
as shorthand, if you want to get around the character conversion to latin-1.

Python urllib2 Response header

I'm trying to extract the response header of a URL request. When I use firebug to analyze the response output of a URL request, it returns:
Content-Type text/html
However when I use the python code:
urllib2.urlopen(URL).info()
the resulting output returns:
Content-Type: video/x-flv
I am new to python, and to web programming in general; any helpful insight is much appreciated. Also, if more info is needed please let me know.
Thanks in advance for reading this post
Try to request as Firefox does. You can see the request headers in Firebug, so add them to your request object:
import urllib2
request = urllib2.Request('http://your.tld/...')
request.add_header('User-Agent', 'some fake agent string')
request.add_header('Referer', 'fake referrer')
...
response = urllib2.urlopen(request)
# check content type:
print response.info().getheader('Content-Type')
There's also HTTPCookieProcessor which can make it better, but I don't think you'll need it in most cases. Have a look at python's documentation:
http://docs.python.org/library/urllib2.html
Content-Type text/html
Really, like that, without the colon?
If so, that might explain it: it's an invalid header, so it gets ignored, so urllib guesses the content-type instead, by looking at the filename. If the URL happens to have ‘.flv’ at the end, it'll guess the type should be video/x-flv.
This peculiar discrepancy might be explained by different headers (maybe ones of the accept kind) being sent by the two requests -- can you check that...? Or, if Javascript is running in Firefox (which I assume you're using when you're running firebug?) -- since it's definitely NOT running in the Python case -- "all bets are off", as they say;-).
Keep in mind that a web server can return different results for the same URL based on differences in the request. For example, content-type negotiation: the requestor can specify a list of content-types it will accept, and the server can return different results to try to accomodate different needs.
Also, you may be getting an error page for one of your requests, for example, because it is malformed, or you don't have cookies set that authenticate you properly, etc. Look at the response itself to see what you are getting.
according to http://docs.python.org/library/urllib2.html there is only get_header() method and nothing about getheader .
Asking because Your code works fine for
response.info().getheader('Set cookie')
but once i execute
response.info().get_header('Set cookie')
i get:
Traceback (most recent call last):
File "baza.py", line 11, in <module>
cookie = response.info().get_header('Set-Cookie')
AttributeError: HTTPMessage instance has no attribute 'get_header'
edit:
Moreover
response.headers.get('Set-Cookie') works fine as well, not mentioned in urlib2 doc....
for getting raw data for the headers in python2, a little bit of a hack but it works.
"".join(urllib2.urlopen("http://google.com/").info().__dict__["headers"])
basically "".join(list) will the list of headers, which all include "\n" at the end.
__dict__ is a built in python variable for all dicts, basically you can select a list out of a 2d array with it.
and ofcourse ["headers"] is selecting the list value from the .info() response value dict
hope this helped you learn a few ez python tricks :)

Categories

Resources