curl post request failing in the presence of special characters - python

Ok, I know there are too many questions on this topic already; reading every one of those hasn't helped me solve my problem.
I have " hello'© " on my webpage. My objective is to get this content as json, strip the "hello" and write back the remaining contents ,i.e, "'©" back on the page.
I am using a CURL POST request to write back to the webpage. My code for getting the json is as follows:
request = urllib2.Request("http://XXXXXXXX.json")
user = 'xxx'
base64string = base64.encodestring('%s:%s' % (xxx, xxx))
request.add_header("Authorization", "Basic %s" % base64string)
result = urllib2.urlopen(request) #send URL request
newjson = json.loads(result.read().decode('utf-8'))
At this point, my newres is unicode string. I discovered that my curl post request works only with percentage-encoding (like "%A3" for £).
What is the best way to do this? The code I wrote is as follows:
encode_dict = {'!':'%21',
'"':'%22',
'#':'%24',
'$':'%25',
'&':'%26',
'*':'%2A',
'+':'%2B',
'#':'%40',
'^':'%5E',
'`':'%60',
'©':'\xa9',
'®':'%AE',
'™':'%99',
'£':'%A3'
}
for letter in text1:
print (letter)
for keyz, valz in encode_dict.iteritems():
if letter == keyz:
print(text1.replace(letter, valz))
path = "xxxx"
subprocess.Popen(['curl','-u', 'xxx:xxx', 'Content-Type: text/html','-X','POST','--data',"text="+text1, ""+path])
This code gives me an error saying " UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if letter == keyz:"
Is there a better way to do this?

The problem was with the encoding. json.loads() returns a stream of bytes and needs to be decoded to unicode, using the decode() fucntion. Then, I replaced all non-ascii characters by encoding the unicode with ascii encoding using encode('ascii','xmlcharrefreplace').
newjson = json.loads(result.read().decode('utf-8').encode("ascii","xmlcharrefreplace"))
Also, learning unicode basics helped me a great deal! This is an excellent tutorial.

Related

Validating webhook with sha256 hmac using PHP and Python

I am working with webhooks from Bold Commerce, which are validated using a hash of the timestamp and the body of the webhook, with a secret key as the signing key. The headers from the webhook looks like this :
X-Bold-Signature: 06cc9aab9fd856bdc326f21d54a23e62441adb5966182e784db47ab4f2568231
timestamp: 1556410547
Content-Type: application/json
charset: utf-8
According to their documentation, the hash is built like so (in PHP):
$now = time(); // current unix timestamp
$json = json_encode($payload, JSON_FORCE_OBJECT);
$signature = hash_hmac('sha256', $now.'.'.$json, $signingKey);
I am trying to recreate the same hash using python, and I am always getting the wrong value for the hash. I've tried several combinations, with and without base64 encoding. In python3, a bytes object is expected for the hmac, so I need to encode everything before I can compare it. At this point my code looks like so :
json_loaded = json.loads(request.body)
json_dumped = json.dumps(json_loaded)
# if I dont load and then dump the json, the message has escaped \n characters in it
message = timestamp + '.' + json_dumped
# => 1556410547.{"event_type" : "order.created", "date": "2020-06-08".....}
hash = hmac.new(secret.encode(), message.encode(), hashlib.sha256)
hex_digest = hash.hexdigest()
# => 3e4520c869552a282ed29b6523eecbd591fc19d1a3f9e4933e03ae8ce3f77bd4
# hmac_to_verify = 06cc9aab9fd856bdc326f21d54a23e62441adb5966182e784db47ab4f2568231
return hmac.compare_digest(hex_digest, hmac_to_verify)
Im not sure what I am doing wrong. For the other webhooks I am validating, I used base64 encoding, but it seems like here, that hasnt been used on the PHP side. I am not so familiar with PHP so maybe there is something I've missed in how they built the orginal hash. There could be complications coming from the fact that I have to convert back and forth between byte arrays and strings, maybe I am using the wrong encoding for that ? Please someone help, I feel like I've tried every combination and am at a loss.
EDIT : Tried this solution by leaving the body without encoding it in json and it still fails :
print(type(timestamp)
# => <class 'str'>
print(type(body))
# => <class 'bytes'>
# cant concatenate bytes to string...
message = timestamp.encode('utf-8') + b'.' + body
# => b'1556410547.{\n "event_type": "order.created",\n "event_time": "2020-06-08 11:16:04",\n ...
hash = hmac.new(secret.encode(), message, hashlib.sha256)
hex_digest = hash.hexdigest()
# ...etc
EDIT EDIT :
Actually it is working in production ! Thanks to the solution described above (concatenating everything as bytes). My Postman request with the faked webhook was still failing, but that's probably because of how I copied and pasted the webhook data from my heroku logs :) .

How to return this valid json data in Python?

I tested using Python to translate a curl to get some data.
import requests
import json
username="abc"
password="123"
headers = {
'Content-Type': 'application/json',
}
params = (
('version', '2017-05-01'),
)
data = '{"text":["This is message one."], "id":"en-es"}'
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, data=data, auth=(username, password))
print(response.text)
The above works fine. It returns json data.
It seems ["This is message one."] is a list. I want to use a variable that loads a file to replace this list.
I tried:
with open(f,"r",encoding='utf-8') as fp:
file_in_list=fp.read().splitlines()
toStr=str(file_in_list)
data = '{"text":'+toStr+', "id":"en-es"}'
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, data=data, auth=(username, password))
print(response.text)
But it returned error below.
{
"code" : 400,
"error" : "Mapping error, invalid JSON"
}
Can you help? How can I have valid response.text?
Thanks.
update:
The content of f contains only five lines below:
This is message one.
this is 2.
this is three.
this is four.
this is five.
The reason your existing code fails is that str applied to a list of strings will only rarely give you valid JSON. They're not intended to do the same thing. JSON only allows double-quoted strings; Python allows both single- and double-quoted strings. And, unless your strings all happen to include ' characters, Python will render them with single quotes:
>>> print(["abc'def"]) # gives you valid JSON, but only by accident
["abc'def"]
>>> print(["abc"]) # does not give you valid JSON
['abc']
If you want to get the valid JSON encoding of a list of strings, don't try to trick str into giving you valid JSON by accident, just use the json module:
toStr = json.dumps(file_in_list)
But, even more simply, you shouldn't be trying to figure out how to construct JSON strings in the first place. Just create a dict and json.dumps the whole thing:
data = {"text": file_in_list, "id": "en-es"}
data_str = json.dumps(data)
Being able to do this is pretty much the whole point of JSON: it's a simple way to automatically serialize all of the types that are common to all the major scripting languages.
Or, even better, let requests do it for you by passing a json argument instead of a data argument:
data = {"text": file_in_list, "id": "en-es"}
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, json=data, auth=(username, password))
This also automatically takes care of setting the Content-Type header to application/json for you. You weren't doing that—and, while many servers will accept your input without it, it's illegal, and some servers will not allow it.
For more details, see the section More complicated POST requests in the requests docs. But there really aren't many more details.
tldr;
toStr = json.dumps(file_in_list)
Explanation
Assuming your file contains something like
String_A
String_B
You need to ensure that toStr is:
Enclosed by [ and ]
Every String in the list is enclosed by quotation marks.
So your raw json (as a String) is equal to '{"text":["String_A", "String_B"], "id":"en-es"}'

Microsoft Translator gives HTTP Error 400 only with certain strings

I am translating a large number of strings and using urllib2 to send requests to the API. My program will run fine, but I always get HTTP Error 400 when translating certain strings in particular. Each request I make is exactly the same, except for the text parameter, so I think it must be the text somehow causing the request to be malformed. Here are two strings that I know of that always cause this error:
#monaeltahawy hatla2eh fel jym bytmrn wla 3arf en fe 7aga bt7sl :d
and
#yaratambash ta3aly a3zemek 3l fetar
for example.
I know for certain that it isn't the "#" character causing the error, or the fact that "#" is at the front of the string. The API has processed strings with these attributes just fine before.
It also is not the nonsense words in these strings causing issues, because the API has processed nonsense words fine before as well. It just returns the same string that I sent to it.
Here is the code where the error seems to be coming from:
tweet = tweet.encode("utf-8")
to = "en"
translate_params = { 'text' : tweet, 'to' : to }
request = urllib2.Request('http://api.microsofttranslator.com/v2/Http.svc/Translate?' + urllib.urlencode(translate_params))
request.add_header('Authorization', 'Bearer '+ self.access_token)
response = urllib2.urlopen(request)
# Removes XML tags to return only the translated text
response_text = ET.fromstring(response.read())
response_text = ET.tostring(response_text, encoding = 'utf8', method = 'text')
return response_text
I am running Python 2.7 in Eclipse 4.3.2.
Any insight or suggestions would be very much appreciated.

BOM in server response screws up json parsing

I'm trying to write a Python script that posts some JSON to a web server and gets some JSON back. I patched together a few different examples on StackOverflow, and I think I have something that's mostly working.
import urllib2
import json
url = "http://foo.com/API.svc/SomeMethod"
payload = json.dumps( {'inputs': ['red', 'blue', 'green']} )
headers = {"Content-type": "application/json;"}
req = urllib2.Request(url, payload, headers)
f = urllib2.urlopen(req)
response = f.read()
f.close()
data = json.loads(response) # <-- Crashes
The last line throws an exception:
ValueError: No JSON object could be decoded
When I look at response, I see valid JSON, but the first few characters are a BOM:
>>> response
'\xef\xbb\xbf[\r\n {\r\n ... Valid JSON here
So, if I manually strip out the first three bytes:
data = json.loads(response[3::])
Everything works and response is turned into a dictionary.
My Question:
It seems kinda silly that json barfs when you give it a BOM. Is there anything different I can do with urllib or the json library to let it know this is a UTF8 string and to handle it as such? I don't want to manually strip out the first 3 bytes.
You should probably yell at whoever's running this service, because a BOM on UTF-8 text makes no sense. The BOM exists to disambiguate byte order, and UTF-8 is defined as being little-endian.
That said, ideally you should decode bytes before doing anything else with them. Luckily, Python has a codec that recognizes and removes the BOM: utf-8-sig.
>>> '\xef\xbb\xbffoo'.decode('utf-8-sig')
u'foo'
So you just need:
data = json.loads(response.decode('utf-8-sig'))
In case I'm not the only one who experienced the same problem, but is using requests module instead of urllib2, here is a solution that works in Python 2.6 as well as 3.3:
import requests
r = requests.get(url, params=my_dict, auth=(user, pass))
print(r.headers['content-type']) # 'application/json; charset=utf8'
if r.text[0] == u'\ufeff': # bytes \xef\xbb\xbf in utf-8 encoding
r.encoding = 'utf-8-sig'
print(r.json())
Since I lack enough reputation for a comment, I'll write an answer instead.
I usually encounter that problem when I need to leave the underlying Stream of a StreamWriter open. However, the overload that has the option to leave the underlying Stream open needs an encoding (which will be UTF8 in most cases), here's how to do it without emitting the BOM.
/* Since Encoding.UTF8 (the one you'd normally use in those cases) **emits**
* the BOM, use whats below instead!
*/
// UTF8Encoding has an overload which enables / disables BOMs in the output
UTF8Encoding encoding = new UTF8Encoding(false);
using (MemoryStream ms = new MemoryStream())
using (StreamWriter sw = new StreamWriter(ms, encoding, 4096, true))
using (JsonTextWriter jtw = new JsonTextWriter(sw))
{
serializer.Serialize(jtw, myObject);
}

Python: email get_payload decode fails when hitting equal sign?

Running into strangeness with get_payload: it seems to crap out when it sees an equal sign in the message it's decoding. Here's code that displays the error:
import email
data = file('testmessage.txt').read()
msg = email.message_from_string( data )
payload = msg.get_payload(decode=True)
print payload
And here's a sample message: test message.
The message is printed only until the first "=" . The rest is omitted. Anybody know what's going on?
The same script with "decode=False" returns the full message, so it appears the decode is unhappy with the equal sign.
This is under Python 2.5 .
You have a line endings problem. The body of your test message uses bare carriage returns (\r) without newlines (\n). If you fix up the line endings before parsing the email, it all works:
import email, re
data = file('testmessage.txt').read()
data = re.sub(r'\r(?!\n)', '\r\n', data) # Bare \r becomes \r\n
msg = email.message_from_string( data )
payload = msg.get_payload(decode=True)
print payload

Categories

Resources