Python UnicodeEncodeError, but I have encoded the parameters to UTF-8

Python UnicodeEncodeError, but I have encoded the parameters to UTF-8 - python

Here is my code:
def renren_get_sig(params):
cat_params = ''.join([u'%s=%s'%(unicode(k), unicode(params[k])) for k in sorted(params)])
sig = hashlib.md5(u"%s%s"%(unicode(cat_params), unicode(SEC_KEY))).hexdigest()
return sig
The exception message is:
Exception Type: UnicodeEncodeError
Exception Value: 'ascii' codec can't encode characters in position 138-141: ordinal not in range(128)
The dic params value is as the following:
params ={
'access_token':u'195036|6.3cf38700f.2592000.1347375600-462350295',
'action_link': u'http://wohenchun.xxx.com',
'action_name': u'\u6d4b\u8bd5\u4e00\u4e0b',
'api_key': u'8c0a2cded4f84bbba4328ccba22c3374',
'caption': u'\u7eaf\u6d01\u6307\u6570\u6d4b\u8bd5',
'description': u'\u4e16\u754c\u8fd9\u4e48\u4e71\uff0c\u88c5\u7eaf\u7ed9\u8c01\u770b\uff1f\u5230\u5e95\u4f60\u6709\u591a\u5355\u7eaf\uff0c\u8d85\u7ea7\u5185\u6db5\u7684\u4f60\uff0c\u6562\u4e0d\u6562\u6311\u6218\u8d85\u7ea7\u5185\u6db5\u7684\u9898\u76ee?\u4e0d\u7ba1\u4f60\u6d4b\u4e0d\u6d4b\uff0c\u53cd\u6b63\u6211\u662f\u6d4b\u4e86\uff01',
'format': u'JSON',
'image': u'http://hdn.xnimg.cn/photos/hdn21/20120809/1440/h0dd1376.jpg',
'message': u'\u5c3c\u?!! \u3010\u4f60\u96be\u9053\u6bd4\u6211\u66f4\u7eaf\u6d01\u4e48,\u6765\u6d4b\u8bd5\u4e00\u4e0b\u5427!\u4f20\u9001\u95e8 >> http://wohenchun.jiongceyan.com \u3011\r\n\t\t\t\t\t\t\t\t\t\t',
'method': u'feed.publishFeed',
'name': u'\u4eba\u4eba\u53f2\u4e0a\u6700\u706b\u7206\u6d4b\u8bd5\u4e4b\u5355\u7eaf\u6d4b\u8bd5',
'url': u'http://wohenchun.xxx.com',
'v': u'1.0'}
All the key-value pairs in params are Unicode objects. Why do I still get such an exception?
Thank you!

Unicode is the problem. Hashing algorithms are designed to be used with bytes, not unicode code points. So you must choose encoding and encode your unicode strings to byte strings before applying hashing algorithm:
from hashlib import md5
str_to_hash = unicode_str.encode('utf-8')
md5(str_to_hash).hexdigest()
There was an issue about this problem in Python tracker - investigate it for more information.

#Rostyslav has it right. Use byte strings with hashlib. May I also suggest using a source file encoding for readability? Check the message parameter. The original code had an error with \u?!! in the string. I left it out:
# coding: utf8
import hashlib
SEC_KEY = 'salt'
params = {
u'access_token' : u'195036|6.3cf38700f.2592000.1347375600-462350295',
u'action_link' : u'http://wohenchun.xxx.com',
u'action_name' : u'测试一下',
u'api_key' : u'8c0a2cded4f84bbba4328ccba22c3374',
u'caption' : u'纯洁指数测试',
u'description' : u'世界这么乱，装纯给谁看？到底你有多单纯，超级内涵的你，敢不敢挑战超级内涵的题目?不管你测不测，反正我是测了！',
u'format' : u'JSON',
u'image' : u'http://hdn.xnimg.cn/photos/hdn21/20120809/1440/h0dd1376.jpg',
u'message' : u'尼【你难道比我更纯洁么,来测试一下吧!传送门 >> http://wohenchun.jiongceyan.com 】\r\n\t\t\t\t\t\t\t\t\t\t',
u'method' : u'feed.publishFeed',
u'name' : u'人人史上最火爆测试之单纯测试',
u'url' : u'http://wohenchun.xxx.com',
u'v' : u'1.0'}
def renren_get_sig(params):
data = u''.join(u'{0}={1}'.format(k,v) for k,v in sorted(params.items()))
return hashlib.md5(data.encode('utf8') + SEC_KEY).hexdigest()
print renren_get_sig(params)
Output:
085b14d1384ba805d2d5d5e979913b27

Related

'latin-1' codec can't encode characters

My code works fine for English text, but doesn't work for for Russian search_text. How can I fix it?
Error text
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 41-46: Body ('Москва') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
My code
import requests
# search_text = "London" # OK: for english text
search_text = "Москва" # ERROR: 'latin-1' codec can't encode characters in position 41-46: Body ('Москва')
headers = {
'cookie': 'bci=6040686626671285074; _statid=a741e249-8adb-4c9a-8344-6e7e8360700a; viewport=762; _hd=h; tmr_lvid=ea50ffe34e269b16d061756e9a17b263; tmr_lvidTS=1609852383671; AUTHCODE=VCmGBS9d9sIxDnxN-hzApvPxPoLNADWCZLYyW8JOTcolv2dJjwH7ALYd8dNP9ljxZZuLvoKsDXgozEUt-PjSwXYEDt4syizx1I2LS58gb49kCFae-5uIap--mtLsff2ZqGbFqK5r7buboZ0_3; JSESSIONID=adca48748b8f0c58a926f5e4948f42c0c0aa9463798a9240.1f3566ed; LASTSRV=ok.ru; msg_conf=2468555756792551; TZ=6; _flashVersion=0; CDN=; nbp=; tmr_detect=0%7C1609852395541; cudr=0; klos=0; tmr_reqNum=4; TZD=6.200; TD=200',
}
data = '''{\n "id": 24,\n "parameters": {\n "query": "''' + search_text + '''"\n }\n}'''
response = requests.post('https://ok.ru/web-api/v2/search/suggestCities', headers=headers, data=data)
json_data = response.json()
print(json_data['result'][0]['id'])
I tried
city_name = city_name.encode('utf-8')
but received TypeError: must be str, not bytes

Try adding this after the line where you create the data variable before you post the request
data=data.encode() #will produce bytes object encoded with utf-8

in my case:
this worked
import json
import http.client
f = open('YOURFILE.json', encoding="utf8")
data = json.load(f)
message = json.dumps(data['MESSAGE'],ensure_ascii=False).encode('utf-8').decode('unicode-escape')
this is json file content:
{
"MESSAGE": "یو تی اف 8 کاراکتر"

TypeError: string indices must be integers - Python JSON

Getting TypeError: string indices must be integers on the my_object instantiation here below (3rd line of method)
def get_note_retrieval_body(event):
sns_body = event["Records"][0]["Sns"]
message = json.loads(sns_body["Message"])
my_object = message["data"]["getNotes"]
return my_object
I get the following (account numbers etc are marked with XXX) when i use str(event) and then json format it, so this will tell you what the event looks like
{
'Records': [{
'EventSource': 'aws:sns',
'EventVersion': '1.0',
'EventSubscriptionArn': 'arn:aws:sns:us-east-1:XXXXXX:topic-name-sandbox:7XXXX',
'Sns': {
'Type': 'Notification',
'MessageId': 'd1074c88-ae21-52b6-8a75-1b07d766cfdd',
'TopicArn': 'arn:aws:sns:us-east-1:XXXXXXX:topic-name',
'Subject': None,
'Message': '"{\\"data\\": {\\"getNotes\\": {\\"claimNumber\\": \\"AAAB09000010\\", \\"dateEntered\\": \\"2010-04-22T08:03:53\\",\\"categoryCode\\": \\"fdf49\\",\\"subCategoryCode\\": \\"ATT\\", \\"fileNoteTextDetails\\": [{\\"fileNoteText\\": {\\"fileNoteID\\": \\"112B40FE42934055\\", \\"noteText\\": \\"Send Acknowledgement Letter to Claimant\\", \\"authorID\\": \\"0\\"}, \\"fileNoteAttachments\\": [{\\"attachment\\": {\\"fileName\\": \\"F70F880879D35FC4.doc\\", \\"fileExtension\\": \\".URL\\", \\"dateCreated\\": \\"2010-04-22T08:59:57\\", \\"createdBy\\": \\"CLONER\\", \\"dateUpdated\\": \\"2020-07-30T08:36:19.1903051\\", \\"updatedBy\\": \\"EVERYONE\\"}}]}], \\"fileNoteExtendedEntityData\\": {\\"dateOnDocument\\": \\"2010-04-22T08:59:57\\", \\"serviceDateFrom\\": \\"2010-04-22T08:59:57\\", \\"serviceDateThrough\\": \\"2010-04-22T08:59:57\\", \\"author\\": \\"n0000000\\"}}}}"',
'Timestamp': '2020-07-20T10:50:47.850Z',
'SignatureVersion': '1',
'Signature': 'XXXXXX',
'SigningCertUrl': 'https://sns.us-east-1.amazonaws.com/XXXcert.pem',
'UnsubscribeUrl': 'https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east-1:XXXXXX:topic-name-sandbox:7XX'.
'MessageAttributes': {}
}
}]
}

When you run json.loads on sns_body["Message"], you still get back a string. You can run json.loads twice, and that should solve the issue.
sns_body["Message"] is quoted twice (i.e., it is wrapped in single quotes and double quotes). So when you try to decode it once, you still get back a string, but this time it is only quoted once. Then a second json.loads will decode the string into a dictionary.
def get_note_retrieval_body(event):
sns_body = event["Records"][0]["Sns"]
message = json.loads(json.loads(sns_body["Message"]))
my_object = message["data"]["getNotes"]
return my_object

JSON decoding error when querying Wikidata

I get a decoding error trying to decode my json data obtained from wikidata. I've read it may be because I'm trying to decode bytes instead of UTF-8 but I've tried to decode into UTF-8 and I can't seem to find the way to do so... Here's my method code (the parameter is a string and the return a boolean):
def es_enfermedad(candidato):
url = 'https://query.wikidata.org/sparql'
query = """
SELECT ?item WHERE {
?item rdfs:label ?nombre.
?item wdt:P31 ?tipo.
VALUES ?tipo {wd:Q12135}
FILTER(LCASE(?nombre) = "%s"#en)
}
""" % (candidato)
r = requests.get(url, params = {'format': 'json', 'query': query})
data = r.json()
return len(data['results']['bindings']) > 0

Python JSON decoder error with unicode characters in request content

Using requests library to execute http GET that return JSON response i'm getting this error when response string contains unicode char:
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 20 (char 19)
Execute same http request with Postman the json output is:
{ "value": "VILLE D\u0019ANAUNIA" }
My python code is:
data = requests.get(uri, headers=HEADERS).text
json_data = json.loads(data)
Can I remove or replace all Unicode chars before executing conversion with json.loads(...)?

It is likely to be caused by a RIGHT SINGLE QUOTATION MARK U+2019 (’). For reasons I cannot guess, the high order byte has been dropped leaving you with a control character which should be escaped in a correct JSON string.
So the correct way would be to control what exactly the API returns. If id does return a '\u0019' control character, you should contact the API owner because the problem should be there.
As a workaround, you can try to limit the problem for your processing by filtering out non ascii or control characters:
data = requests.get(uri, headers=HEADERS).text
data = ''.join((i for i in data if 0x20 <= ord(i) < 127)) # filter out unwanted chars
json_data = json.loads(data)
You should get {'value': 'VILLE DANAUNIA'}
Alternatively, you can replace all unwanted characters with spaces:
data = requests.get(uri, headers=HEADERS).text
data = ''.join((i if 0x20 <= ord(i) < 127 else ' ' for i in data))
json_data = json.loads(data)
You would get {'value': 'VILLE D ANAUNIA'}

The code below works on python 2.7:
import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }')
print(d)
The code below works on python 3.7:
import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }', strict=False)
print(d)
Output:
{u'value': u'VILLE D\x19ANAUNIA'}
Another point is that requests get return the data as json:
r = requests.get('https://api.github.com/events')
r.json()

Control/space characters not allowed using MD5 in Django cache backend

I am using caching for my site using using. This is giving the following error:
"Control/space characters not allowed (key="\xebw\x1b}\xae\xa3\xb8\x18\xc4\xb5\xce\x0c%\x13'\xed")".
The code which I am using is as follows:
def hash_key(key, key_prefix, version):
new_key = '%s :%s :%s' % (key_prefix, version, key)
if len(new_key) > 250:
m = hashlib.md5()
m.update(new_key)
new_key = m.digest()
return new_key
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
'LOCATION': '127.0.0.1.11211',
'KEY_FUNCTION': hash_key,
}
}

Try using m.hexdigest() instead of m.digest(). The data in the error message is 16 bytes, the length of the binary hash data. It appears you want the 32-character ASCII representation, which is what hexdigest provides.
Docs, for Python 3

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python UnicodeEncodeError, but I have encoded the parameters to UTF-8 - python

Related

'latin-1' codec can't encode characters

TypeError: string indices must be integers - Python JSON

JSON decoding error when querying Wikidata

Python JSON decoder error with unicode characters in request content

Control/space characters not allowed using MD5 in Django cache backend

Categories

Resources