My code works fine for English text, but doesn't work for for Russian search_text. How can I fix it?
Error text
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 41-46: Body ('Москва') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
My code
import requests
# search_text = "London" # OK: for english text
search_text = "Москва" # ERROR: 'latin-1' codec can't encode characters in position 41-46: Body ('Москва')
headers = {
'cookie': 'bci=6040686626671285074; _statid=a741e249-8adb-4c9a-8344-6e7e8360700a; viewport=762; _hd=h; tmr_lvid=ea50ffe34e269b16d061756e9a17b263; tmr_lvidTS=1609852383671; AUTHCODE=VCmGBS9d9sIxDnxN-hzApvPxPoLNADWCZLYyW8JOTcolv2dJjwH7ALYd8dNP9ljxZZuLvoKsDXgozEUt-PjSwXYEDt4syizx1I2LS58gb49kCFae-5uIap--mtLsff2ZqGbFqK5r7buboZ0_3; JSESSIONID=adca48748b8f0c58a926f5e4948f42c0c0aa9463798a9240.1f3566ed; LASTSRV=ok.ru; msg_conf=2468555756792551; TZ=6; _flashVersion=0; CDN=; nbp=; tmr_detect=0%7C1609852395541; cudr=0; klos=0; tmr_reqNum=4; TZD=6.200; TD=200',
}
data = '''{\n "id": 24,\n "parameters": {\n "query": "''' + search_text + '''"\n }\n}'''
response = requests.post('https://ok.ru/web-api/v2/search/suggestCities', headers=headers, data=data)
json_data = response.json()
print(json_data['result'][0]['id'])
I tried
city_name = city_name.encode('utf-8')
but received TypeError: must be str, not bytes
Try adding this after the line where you create the data variable before you post the request
data=data.encode() #will produce bytes object encoded with utf-8
in my case:
this worked
import json
import http.client
f = open('YOURFILE.json', encoding="utf8")
data = json.load(f)
message = json.dumps(data['MESSAGE'],ensure_ascii=False).encode('utf-8').decode('unicode-escape')
this is json file content:
{
"MESSAGE": "یو تی اف 8 کاراکتر"
Related
I am requiring to send special characters like accented characters with diacritics, e.g. o-acute ó, via API
This is my test code
import string
import http.client
import datetime
import json
def apiSendFarmacia(idatencion,articulo,deviceid):
##API PAYLOAD
now = datetime.datetime.now()
conn = http.client.HTTPSConnection("apimocha.com")
payload = json.dumps({
"idatencion": idatencion,
"articulo": articulo,
"timestamp": str(now),
"deviceId": deviceid
}).encode("utf-8")
headers = {
'Content-Type': 'application/json'
}
conn.request("POST"
,"/xxxx/api2"#"/my/api/path" #"/alexa"
, payload
, headers)
res = conn.getresponse()
httpcode = res.status
data = res.read()
return httpcode#, data.decode("utf-8")
##API PAYLOAD
when executing the function with some special characters
apiSendFarmacia(2222,"solución",2222)
the mock API will receive following JSON payload with \u00f3 instead of ó:
{
"idatencion": 2222,
"articulo": "soluci\u00f3n",
"timestamp": "2022-12-07 14:52:24.878976",
"deviceId": 2222
}
I was expecting the special character with its accent-mark ó to show in the mock API.
As you print it, it will appear as the special character:
>>> print('soluci\u00f3n')
solución
u00f3 denotes the hexadecimal representation for Unicode Character 'LATIN SMALL LETTER O WITH ACUTE' (U+00F3).
The \ is an escape-signal that tells the interpreter that the subsequent character(s) has to be treated as special character. For example \n is interpreted as newline, \t as tab and \u00f3 as Unicode character ó.
I am trying to convert the JSON at this URL: https://wtrl.racing/assets/js/miscttt170620.php?wtrlid=63, which I have saved in a file, to a CSV using this code:
json_data_file = open('TTT json', 'r')
content = json.load(json_data_file)
csv_results = csv.writer(open("TTT_results.csv.csv", "w", newline=''))
for item in content:
print(item)
csv_results.writerow(item)
This returns: json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 489351 (char 489350), which is the '2' in this section of JSON "ll": 43.529}, {"aa":
I'm bemused as to why this would be.
Looks like the json is malformed. You need to contact the person that generated this json to fix it.
See the entries:
"cc": "Nick "Lionel" Berry(TriTalk)"
"cc": ""Sherpa" Dave (R&K Hyenas)"
These quotes are not properly escaped. It needs to be:
"cc": "Nick \"Lionel\" Berry(TriTalk)"
"cc": "\"Sherpa\" Dave (R&K Hyenas)"
Looks like you're probably requesting the wrong accept encoding when accessing the data. Try downloading it by specifying that you want a JSON response and not an HTML payload:
import requests
url = 'https://wtrl.racing/assets/js/miscttt170620.php?wtrlid=63'
headers = {'Accept': 'application/json'}
response = requests.get(url, headers=headers)
data = response.json()
I get a decoding error trying to decode my json data obtained from wikidata. I've read it may be because I'm trying to decode bytes instead of UTF-8 but I've tried to decode into UTF-8 and I can't seem to find the way to do so... Here's my method code (the parameter is a string and the return a boolean):
def es_enfermedad(candidato):
url = 'https://query.wikidata.org/sparql'
query = """
SELECT ?item WHERE {
?item rdfs:label ?nombre.
?item wdt:P31 ?tipo.
VALUES ?tipo {wd:Q12135}
FILTER(LCASE(?nombre) = "%s"#en)
}
""" % (candidato)
r = requests.get(url, params = {'format': 'json', 'query': query})
data = r.json()
return len(data['results']['bindings']) > 0
Using requests library to execute http GET that return JSON response i'm getting this error when response string contains unicode char:
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 20 (char 19)
Execute same http request with Postman the json output is:
{ "value": "VILLE D\u0019ANAUNIA" }
My python code is:
data = requests.get(uri, headers=HEADERS).text
json_data = json.loads(data)
Can I remove or replace all Unicode chars before executing conversion with json.loads(...)?
It is likely to be caused by a RIGHT SINGLE QUOTATION MARK U+2019 (’). For reasons I cannot guess, the high order byte has been dropped leaving you with a control character which should be escaped in a correct JSON string.
So the correct way would be to control what exactly the API returns. If id does return a '\u0019' control character, you should contact the API owner because the problem should be there.
As a workaround, you can try to limit the problem for your processing by filtering out non ascii or control characters:
data = requests.get(uri, headers=HEADERS).text
data = ''.join((i for i in data if 0x20 <= ord(i) < 127)) # filter out unwanted chars
json_data = json.loads(data)
You should get {'value': 'VILLE DANAUNIA'}
Alternatively, you can replace all unwanted characters with spaces:
data = requests.get(uri, headers=HEADERS).text
data = ''.join((i if 0x20 <= ord(i) < 127 else ' ' for i in data))
json_data = json.loads(data)
You would get {'value': 'VILLE D ANAUNIA'}
The code below works on python 2.7:
import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }')
print(d)
The code below works on python 3.7:
import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }', strict=False)
print(d)
Output:
{u'value': u'VILLE D\x19ANAUNIA'}
Another point is that requests get return the data as json:
r = requests.get('https://api.github.com/events')
r.json()
Here is my code:
def renren_get_sig(params):
cat_params = ''.join([u'%s=%s'%(unicode(k), unicode(params[k])) for k in sorted(params)])
sig = hashlib.md5(u"%s%s"%(unicode(cat_params), unicode(SEC_KEY))).hexdigest()
return sig
The exception message is:
Exception Type: UnicodeEncodeError
Exception Value: 'ascii' codec can't encode characters in position 138-141: ordinal not in range(128)
The dic params value is as the following:
params ={
'access_token':u'195036|6.3cf38700f.2592000.1347375600-462350295',
'action_link': u'http://wohenchun.xxx.com',
'action_name': u'\u6d4b\u8bd5\u4e00\u4e0b',
'api_key': u'8c0a2cded4f84bbba4328ccba22c3374',
'caption': u'\u7eaf\u6d01\u6307\u6570\u6d4b\u8bd5',
'description': u'\u4e16\u754c\u8fd9\u4e48\u4e71\uff0c\u88c5\u7eaf\u7ed9\u8c01\u770b\uff1f\u5230\u5e95\u4f60\u6709\u591a\u5355\u7eaf\uff0c\u8d85\u7ea7\u5185\u6db5\u7684\u4f60\uff0c\u6562\u4e0d\u6562\u6311\u6218\u8d85\u7ea7\u5185\u6db5\u7684\u9898\u76ee?\u4e0d\u7ba1\u4f60\u6d4b\u4e0d\u6d4b\uff0c\u53cd\u6b63\u6211\u662f\u6d4b\u4e86\uff01',
'format': u'JSON',
'image': u'http://hdn.xnimg.cn/photos/hdn21/20120809/1440/h0dd1376.jpg',
'message': u'\u5c3c\u?!! \u3010\u4f60\u96be\u9053\u6bd4\u6211\u66f4\u7eaf\u6d01\u4e48,\u6765\u6d4b\u8bd5\u4e00\u4e0b\u5427!\u4f20\u9001\u95e8 >> http://wohenchun.jiongceyan.com \u3011\r\n\t\t\t\t\t\t\t\t\t\t',
'method': u'feed.publishFeed',
'name': u'\u4eba\u4eba\u53f2\u4e0a\u6700\u706b\u7206\u6d4b\u8bd5\u4e4b\u5355\u7eaf\u6d4b\u8bd5',
'url': u'http://wohenchun.xxx.com',
'v': u'1.0'}
All the key-value pairs in params are Unicode objects. Why do I still get such an exception?
Thank you!
Unicode is the problem. Hashing algorithms are designed to be used with bytes, not unicode code points. So you must choose encoding and encode your unicode strings to byte strings before applying hashing algorithm:
from hashlib import md5
str_to_hash = unicode_str.encode('utf-8')
md5(str_to_hash).hexdigest()
There was an issue about this problem in Python tracker - investigate it for more information.
#Rostyslav has it right. Use byte strings with hashlib. May I also suggest using a source file encoding for readability? Check the message parameter. The original code had an error with \u?!! in the string. I left it out:
# coding: utf8
import hashlib
SEC_KEY = 'salt'
params = {
u'access_token' : u'195036|6.3cf38700f.2592000.1347375600-462350295',
u'action_link' : u'http://wohenchun.xxx.com',
u'action_name' : u'测试一下',
u'api_key' : u'8c0a2cded4f84bbba4328ccba22c3374',
u'caption' : u'纯洁指数测试',
u'description' : u'世界这么乱,装纯给谁看?到底你有多单纯,超级内涵的你,敢不敢挑战超级内涵的题目?不管你测不测,反正我是测了!',
u'format' : u'JSON',
u'image' : u'http://hdn.xnimg.cn/photos/hdn21/20120809/1440/h0dd1376.jpg',
u'message' : u'尼【你难道比我更纯洁么,来测试一下吧!传送门 >> http://wohenchun.jiongceyan.com 】\r\n\t\t\t\t\t\t\t\t\t\t',
u'method' : u'feed.publishFeed',
u'name' : u'人人史上最火爆测试之单纯测试',
u'url' : u'http://wohenchun.xxx.com',
u'v' : u'1.0'}
def renren_get_sig(params):
data = u''.join(u'{0}={1}'.format(k,v) for k,v in sorted(params.items()))
return hashlib.md5(data.encode('utf8') + SEC_KEY).hexdigest()
print renren_get_sig(params)
Output:
085b14d1384ba805d2d5d5e979913b27