Microsoft Translator gives HTTP Error 400 only with certain strings - python

I am translating a large number of strings and using urllib2 to send requests to the API. My program will run fine, but I always get HTTP Error 400 when translating certain strings in particular. Each request I make is exactly the same, except for the text parameter, so I think it must be the text somehow causing the request to be malformed. Here are two strings that I know of that always cause this error:
#monaeltahawy hatla2eh fel jym bytmrn wla 3arf en fe 7aga bt7sl :d
and
#yaratambash ta3aly a3zemek 3l fetar
for example.
I know for certain that it isn't the "#" character causing the error, or the fact that "#" is at the front of the string. The API has processed strings with these attributes just fine before.
It also is not the nonsense words in these strings causing issues, because the API has processed nonsense words fine before as well. It just returns the same string that I sent to it.
Here is the code where the error seems to be coming from:
tweet = tweet.encode("utf-8")
to = "en"
translate_params = { 'text' : tweet, 'to' : to }
request = urllib2.Request('http://api.microsofttranslator.com/v2/Http.svc/Translate?' + urllib.urlencode(translate_params))
request.add_header('Authorization', 'Bearer '+ self.access_token)
response = urllib2.urlopen(request)
# Removes XML tags to return only the translated text
response_text = ET.fromstring(response.read())
response_text = ET.tostring(response_text, encoding = 'utf8', method = 'text')
return response_text
I am running Python 2.7 in Eclipse 4.3.2.
Any insight or suggestions would be very much appreciated.

Related

Python API request POST does not pass parameters that contain "="

I've been trying out an API call that contains content with the character "=". When I check the API content inside the C# API, the parameter with the "=" character is missing.
Theres also an issue with the "+" character, which simply disappears and gets replaced with " ". I've tried different encodings but the issue persists.
Do you have any idea where the issue might be?
import requests
import json
qry = "select SUM(this + that) as a from asd WHERE a = 10"
URL = "http://127.0.0.1:42100"
PARAMS = {'request_type':'translate_model',
'content_string': qry}
r = requests.post(url = URL, params = PARAMS)
In the example above the content_string parameter is not present in the C# API parameters.

How to handle API responses(JSON) containing \x00 (or \u0000) in its data, and store the data in Postgres using django models?

I'm making an api call, and trying to store its response in Postgres using Django models.
Here's what I have been doing:
response = requests.post(url='some.url.com', data=json.dumps(data), headers={'some': 'header'})
response_data = json.loads(response.content.decode('utf-8'))
#handler is a object of a model
handler.api_response = response_data
handler.save()
But this used to fail, when my json had fields like 'field_name': '\x00\x00\x00\x00\x00'. It used to give following error :
DataError at /api/booking/reprice/
unsupported Unicode escape sequence
LINE 1: ... NULL, "api_status" = 0, "api_response" = '{"errorRe...
^
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: ..."hoursConfirmed": 0, "field_name":...
How i tried to resolve this is by using the following:
response = requests.post(url='some.url.com', data=json.dumps(data), headers={'some': 'header'})
response_data = json.loads(response.content.decode('utf-8'))
#handler is a object of a model
handler.api_response = json.loads(json.dumps(response_data).encode("unicode-escape").decode())
handler.save()
The initial issue was solved then. But recently, when i got a field with value 'field2_name': 'Hey "Whats up"'. This thing failed by giving error:
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 143 (char 142)
Probably because json.loads() got confused with " inside the value as an enclosing " and not an escaped ".
Now, If i print the initial response just after json.loads(response.content.decode('utf-8')) statement, it shows the field as \x00\x00\x00\x00\x00.
But output of following code:
response = requests.post(url='some.url.com', data=json.dumps(data), headers={'some': 'header'})
response_data = json.loads(response.content.decode('utf-8'))
print(json.dumps(response_data))
This shows the field as \\u0000\\u0000\\u0000\\u0000\\u0000.
How does \x00 change to \\u0000
And how do i save this field into postgres tables ?
This is what i could think of.
json.loads(json.dumps(response_data).replace('\\u0000',''))
To add this statement before saving to postgres.
Is there a better way ?
Is the code response_data = json.loads(response.content.decode('utf-8')) wrong ? Or causing not to escape that particular character ?
FWIW, I recently ran into this and your proposed solution also worked for me. I think it's probably the simplest way to deal with it. Apparently this is the only character that can't go into a Postgres JSON column.
json.loads(json.dumps(response_data).replace('\\u0000',''))

How to parse jsonp returned from api using python

I'm very new to coding, and I'm building my first web application using open REST api with python flask.
I think the api is returning jsonp which looks like this - callbackfunction{ json }; and I get from other posts that all I need to do is getting rid of this padding. However, I can't figure out at which point I should implement the stripping.
This is my code. 5th line is throwing an error "the JSON object must be str, bytes or bytearray, not HTTPResponse"
def lookup(title):
try:
url = "http://www.aladin.co.kr/ttb/api/ItemSearch.aspx?ttbkey=foo&Query=bar"
result = urllib.request.urlopen(url)
data = json.loads(result)
data_json = data.split("{", 1)[1].strip("}")
return data_json
except requests.RequestException:
return None
I'm sure it's working well until 4th line. When I tried the code below, at least it returned result, though cryptic, like this.
b'{ "version" : "20070901", "title" :
"\xec\x95\x8c\xeb\x9d\xbc\xeb\x94\x98 \xea\xb2\x80 ...
"customerReviewRank":9 } ] };'
Judging by the keys, I'm pretty sure this is the information I requested. So what can I do to fix this? Thanks in advance!
def lookup(title):
try:
url = "http://www.aladin.co.kr/ttb/api/ItemSearch.aspx?ttbkey=foo&Query=bar"
result = urllib.request.urlopen(url)
res = result.readline()
return res

How to return this valid json data in Python?

I tested using Python to translate a curl to get some data.
import requests
import json
username="abc"
password="123"
headers = {
'Content-Type': 'application/json',
}
params = (
('version', '2017-05-01'),
)
data = '{"text":["This is message one."], "id":"en-es"}'
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, data=data, auth=(username, password))
print(response.text)
The above works fine. It returns json data.
It seems ["This is message one."] is a list. I want to use a variable that loads a file to replace this list.
I tried:
with open(f,"r",encoding='utf-8') as fp:
file_in_list=fp.read().splitlines()
toStr=str(file_in_list)
data = '{"text":'+toStr+', "id":"en-es"}'
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, data=data, auth=(username, password))
print(response.text)
But it returned error below.
{
"code" : 400,
"error" : "Mapping error, invalid JSON"
}
Can you help? How can I have valid response.text?
Thanks.
update:
The content of f contains only five lines below:
This is message one.
this is 2.
this is three.
this is four.
this is five.
The reason your existing code fails is that str applied to a list of strings will only rarely give you valid JSON. They're not intended to do the same thing. JSON only allows double-quoted strings; Python allows both single- and double-quoted strings. And, unless your strings all happen to include ' characters, Python will render them with single quotes:
>>> print(["abc'def"]) # gives you valid JSON, but only by accident
["abc'def"]
>>> print(["abc"]) # does not give you valid JSON
['abc']
If you want to get the valid JSON encoding of a list of strings, don't try to trick str into giving you valid JSON by accident, just use the json module:
toStr = json.dumps(file_in_list)
But, even more simply, you shouldn't be trying to figure out how to construct JSON strings in the first place. Just create a dict and json.dumps the whole thing:
data = {"text": file_in_list, "id": "en-es"}
data_str = json.dumps(data)
Being able to do this is pretty much the whole point of JSON: it's a simple way to automatically serialize all of the types that are common to all the major scripting languages.
Or, even better, let requests do it for you by passing a json argument instead of a data argument:
data = {"text": file_in_list, "id": "en-es"}
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, json=data, auth=(username, password))
This also automatically takes care of setting the Content-Type header to application/json for you. You weren't doing that—and, while many servers will accept your input without it, it's illegal, and some servers will not allow it.
For more details, see the section More complicated POST requests in the requests docs. But there really aren't many more details.
tldr;
toStr = json.dumps(file_in_list)
Explanation
Assuming your file contains something like
String_A
String_B
You need to ensure that toStr is:
Enclosed by [ and ]
Every String in the list is enclosed by quotation marks.
So your raw json (as a String) is equal to '{"text":["String_A", "String_B"], "id":"en-es"}'

curl post request failing in the presence of special characters

Ok, I know there are too many questions on this topic already; reading every one of those hasn't helped me solve my problem.
I have " hello'© " on my webpage. My objective is to get this content as json, strip the "hello" and write back the remaining contents ,i.e, "'©" back on the page.
I am using a CURL POST request to write back to the webpage. My code for getting the json is as follows:
request = urllib2.Request("http://XXXXXXXX.json")
user = 'xxx'
base64string = base64.encodestring('%s:%s' % (xxx, xxx))
request.add_header("Authorization", "Basic %s" % base64string)
result = urllib2.urlopen(request) #send URL request
newjson = json.loads(result.read().decode('utf-8'))
At this point, my newres is unicode string. I discovered that my curl post request works only with percentage-encoding (like "%A3" for £).
What is the best way to do this? The code I wrote is as follows:
encode_dict = {'!':'%21',
'"':'%22',
'#':'%24',
'$':'%25',
'&':'%26',
'*':'%2A',
'+':'%2B',
'#':'%40',
'^':'%5E',
'`':'%60',
'©':'\xa9',
'®':'%AE',
'™':'%99',
'£':'%A3'
}
for letter in text1:
print (letter)
for keyz, valz in encode_dict.iteritems():
if letter == keyz:
print(text1.replace(letter, valz))
path = "xxxx"
subprocess.Popen(['curl','-u', 'xxx:xxx', 'Content-Type: text/html','-X','POST','--data',"text="+text1, ""+path])
This code gives me an error saying " UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if letter == keyz:"
Is there a better way to do this?
The problem was with the encoding. json.loads() returns a stream of bytes and needs to be decoded to unicode, using the decode() fucntion. Then, I replaced all non-ascii characters by encoding the unicode with ascii encoding using encode('ascii','xmlcharrefreplace').
newjson = json.loads(result.read().decode('utf-8').encode("ascii","xmlcharrefreplace"))
Also, learning unicode basics helped me a great deal! This is an excellent tutorial.

Categories

Resources