Urllib2 is somehow changing my post data on send - python

I'm trying to make a POST request to a server. However, when making this post, the data gets messed up somewhere along the way.
My Code:
headers = {"Context-Type" : "application/x-www-form-urlencoded"
"Authorization" : "Basic user pass"
values = {"query" : "select", "table" : "testtable"}
data = urllib.urlencode(values)
request = urllib2.request(url, data, headers = headers)
res = urllib2.urlopen(request)
print res.result()
However, I noticed that the "data" is somehow changed. It should (and does when I print it) look something like
query=select&table=testtable
However, when I actually do a post request, this site registers:
<parameter id="
&#10query">select</parameter>
<parameter id="table">testtab</parameter>
So it looks like the data is somehow shifted over 2 spaces. This is indepenedent of where I do the post request. Anyone ever have an error like this?

It sounds like your original data has a carriage return ('\r') and line feed ('\n') in it (the '
' and '
'). So, it might look like your key name is 'query', but you probably have some extra characters hidden in there.
Your site should use urlparse.parse_qs to read the request data. By default, that function will ignore those two characters:
>>> import urlparse
>>> data = '
query=select&table=testtable'
>>> urlparse.parse_qs(data)
'query': ['select'], 'table': ['testtable']}
>>> # make the parsing strict:
... urlparse.parse_qs(data, strict_parsing=True)
ValueError: bad query field: ''
So, those two characters are ignored if you use the default parameters with urlparse.parse_qs. Best solution is to check your input and get rid of those characters.

Related

How to handle API responses(JSON) containing \x00 (or \u0000) in its data, and store the data in Postgres using django models?

I'm making an api call, and trying to store its response in Postgres using Django models.
Here's what I have been doing:
response = requests.post(url='some.url.com', data=json.dumps(data), headers={'some': 'header'})
response_data = json.loads(response.content.decode('utf-8'))
#handler is a object of a model
handler.api_response = response_data
handler.save()
But this used to fail, when my json had fields like 'field_name': '\x00\x00\x00\x00\x00'. It used to give following error :
DataError at /api/booking/reprice/
unsupported Unicode escape sequence
LINE 1: ... NULL, "api_status" = 0, "api_response" = '{"errorRe...
^
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: ..."hoursConfirmed": 0, "field_name":...
How i tried to resolve this is by using the following:
response = requests.post(url='some.url.com', data=json.dumps(data), headers={'some': 'header'})
response_data = json.loads(response.content.decode('utf-8'))
#handler is a object of a model
handler.api_response = json.loads(json.dumps(response_data).encode("unicode-escape").decode())
handler.save()
The initial issue was solved then. But recently, when i got a field with value 'field2_name': 'Hey "Whats up"'. This thing failed by giving error:
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 143 (char 142)
Probably because json.loads() got confused with " inside the value as an enclosing " and not an escaped ".
Now, If i print the initial response just after json.loads(response.content.decode('utf-8')) statement, it shows the field as \x00\x00\x00\x00\x00.
But output of following code:
response = requests.post(url='some.url.com', data=json.dumps(data), headers={'some': 'header'})
response_data = json.loads(response.content.decode('utf-8'))
print(json.dumps(response_data))
This shows the field as \\u0000\\u0000\\u0000\\u0000\\u0000.
How does \x00 change to \\u0000
And how do i save this field into postgres tables ?
This is what i could think of.
json.loads(json.dumps(response_data).replace('\\u0000',''))
To add this statement before saving to postgres.
Is there a better way ?
Is the code response_data = json.loads(response.content.decode('utf-8')) wrong ? Or causing not to escape that particular character ?
FWIW, I recently ran into this and your proposed solution also worked for me. I think it's probably the simplest way to deal with it. Apparently this is the only character that can't go into a Postgres JSON column.
json.loads(json.dumps(response_data).replace('\\u0000',''))

Python requests with a raw body

Apologies if this is a repost/stupid question, I have tried searching around but found nothing.
I have a cgi webserver that takes a POST payload that is neither percentage encoded or form data. i.e, I need to stop the request from being percentage encoded, as with:
req = Request('POST', license_url, data={ None: data})
And I also need the request to not include the multiform boundaries introduced by:
req = Request('POST', license_url, file={ None: file_handle})
My request body looks something like:
abc
def
ghi=
And this is exactly what I want to post to the server (Postman achieves this when set to raw payload).
However, sending through the data param looks like:
data=abc%3Adef%3a..
And through the file Param:
--b70cc8bd5ac1411488c719a1d773edde
Content-Disposition: form-data; filename="blahblah"
abc
def
ghi=
--b70cc8bd5ac1411488c719a1d773edde
^^^ This is the closest to what I need but I also want to strip the boundary/content-disposition terms (not on the server).
Many Thanks
You can just post the raw text content like
req = Request('POST', license_url, data=data.encode())
provided the data is a string.

How to return this valid json data in Python?

I tested using Python to translate a curl to get some data.
import requests
import json
username="abc"
password="123"
headers = {
'Content-Type': 'application/json',
}
params = (
('version', '2017-05-01'),
)
data = '{"text":["This is message one."], "id":"en-es"}'
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, data=data, auth=(username, password))
print(response.text)
The above works fine. It returns json data.
It seems ["This is message one."] is a list. I want to use a variable that loads a file to replace this list.
I tried:
with open(f,"r",encoding='utf-8') as fp:
file_in_list=fp.read().splitlines()
toStr=str(file_in_list)
data = '{"text":'+toStr+', "id":"en-es"}'
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, data=data, auth=(username, password))
print(response.text)
But it returned error below.
{
"code" : 400,
"error" : "Mapping error, invalid JSON"
}
Can you help? How can I have valid response.text?
Thanks.
update:
The content of f contains only five lines below:
This is message one.
this is 2.
this is three.
this is four.
this is five.
The reason your existing code fails is that str applied to a list of strings will only rarely give you valid JSON. They're not intended to do the same thing. JSON only allows double-quoted strings; Python allows both single- and double-quoted strings. And, unless your strings all happen to include ' characters, Python will render them with single quotes:
>>> print(["abc'def"]) # gives you valid JSON, but only by accident
["abc'def"]
>>> print(["abc"]) # does not give you valid JSON
['abc']
If you want to get the valid JSON encoding of a list of strings, don't try to trick str into giving you valid JSON by accident, just use the json module:
toStr = json.dumps(file_in_list)
But, even more simply, you shouldn't be trying to figure out how to construct JSON strings in the first place. Just create a dict and json.dumps the whole thing:
data = {"text": file_in_list, "id": "en-es"}
data_str = json.dumps(data)
Being able to do this is pretty much the whole point of JSON: it's a simple way to automatically serialize all of the types that are common to all the major scripting languages.
Or, even better, let requests do it for you by passing a json argument instead of a data argument:
data = {"text": file_in_list, "id": "en-es"}
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, json=data, auth=(username, password))
This also automatically takes care of setting the Content-Type header to application/json for you. You weren't doing that—and, while many servers will accept your input without it, it's illegal, and some servers will not allow it.
For more details, see the section More complicated POST requests in the requests docs. But there really aren't many more details.
tldr;
toStr = json.dumps(file_in_list)
Explanation
Assuming your file contains something like
String_A
String_B
You need to ensure that toStr is:
Enclosed by [ and ]
Every String in the list is enclosed by quotation marks.
So your raw json (as a String) is equal to '{"text":["String_A", "String_B"], "id":"en-es"}'

Python: Json dumps escape quote

There is a POST request which works perfectly when I pass the data as below:
url = 'https://www.nnnow.com/api/product/details'
requests.post(url, data="{\"styleId\":\"BMHSUR2HTS\"}", headers=headers)
But when I use json.dumps() on a dictionary and send the response, I do not get the response (response code 504), using headers={'Content-Type': 'application/json'} . Have also tried json parameter of Post requests.
requests.post(url, data=json.dumps({"styleId":"BMHSUR2HTS"}), headers={'content-type': 'application/json'})
Now, the data returned by json.dumps({"styleId":"BMHSUR2HTS"}) and
"{\"styleId\":\"BMHSUR2HTS\"}" is not the same.
json.dumps({"styleId":"BMHSUR2HTS"}) == "{\"styleId\":\"BMHSUR2HTS\"}" gives False even though a print on both shows a similar string.
How can I get the same format as "{\"styleId\":\"BMHSUR2HTS\"}" from a dictionary {"styleId":"BMHSUR2HTS"} ?
If you print the json.dumps({"styleId":"BMHSUR2HTS"}), you will notice two things:
your output is a string (just try type(json.dumps({"styleId":"BMHSUR2HTS"})));
if you pay attention the output will add a space between the json name and value: {"styleId": "BMHSURT2HTS"}.
Not sure how do you want to handle this, and in your entry code, but there are 2 main options to workaround this issue:
Replace the space on json.dumps output: json.dumps({"styleId":"BMHSUR2HTS"}).replace(': ', ':')
Convert all to json by using eval(): eval(json.dumps({"styleId":"BMHSUR2HTS"})) and eval(YOUR_JSON_STRING)
I hope this helps you.

Microsoft Translator gives HTTP Error 400 only with certain strings

I am translating a large number of strings and using urllib2 to send requests to the API. My program will run fine, but I always get HTTP Error 400 when translating certain strings in particular. Each request I make is exactly the same, except for the text parameter, so I think it must be the text somehow causing the request to be malformed. Here are two strings that I know of that always cause this error:
#monaeltahawy hatla2eh fel jym bytmrn wla 3arf en fe 7aga bt7sl :d
and
#yaratambash ta3aly a3zemek 3l fetar
for example.
I know for certain that it isn't the "#" character causing the error, or the fact that "#" is at the front of the string. The API has processed strings with these attributes just fine before.
It also is not the nonsense words in these strings causing issues, because the API has processed nonsense words fine before as well. It just returns the same string that I sent to it.
Here is the code where the error seems to be coming from:
tweet = tweet.encode("utf-8")
to = "en"
translate_params = { 'text' : tweet, 'to' : to }
request = urllib2.Request('http://api.microsofttranslator.com/v2/Http.svc/Translate?' + urllib.urlencode(translate_params))
request.add_header('Authorization', 'Bearer '+ self.access_token)
response = urllib2.urlopen(request)
# Removes XML tags to return only the translated text
response_text = ET.fromstring(response.read())
response_text = ET.tostring(response_text, encoding = 'utf8', method = 'text')
return response_text
I am running Python 2.7 in Eclipse 4.3.2.
Any insight or suggestions would be very much appreciated.

Categories

Resources