I got an UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-4: character maps to <undefined> when I using esy.osmfilter package (version 1.0.7) to filter an OSM .*pbf file and then save it to a *.json file with the following code:
import os
from esy.osmfilter import Node, Way, Relation
from esy.osmfilter import run_filter
PBF_inputfile = os.path.join(os.getcwd(), 'liechtenstein-latest.osm.pbf')
JSON_outputfile = os.path.join(os.getcwd(), 'liechtenstein-latest_river.json')
prefilter = {Node: {}, Way: {'waterway': ['river', ], }, Relation: {}}
whitefilter = []
blackfilter = []
[Data, _] = run_filter('noname',
PBF_inputfile,
JSON_outputfile,
prefilter,
whitefilter,
blackfilter,
NewPreFilterData=True,
CreateElements=False,
LoadElements=False,
verbose=True)
print(len(Data['Node']))
print(len(Data['Relation']))
print(len(Data['Way']))
I followed the tutorial and used tags like {'waterway': ['stream', ], }, {'waterway': ['canal', ], }, {'waterway': ['dam', ], }, etc. in the prefilter and they were all error-free. Then I found that the tag {'waterway': ['river', ], } will cause the error mentioned above. The same situation I received with the Berlin data. Then I tried with the Delaware data, which was error-free. So I thought it might be related to the German words? My default encoding is 'utf-8'.
I believe this bug is a pure Windows bug. Please use esy-osmfilter on a linux machine for the moment. This error results from an external library, however I will fix this within the next days.
this error is fixed with version 1.0.11
Related
I'm having an issue very similar to this post: pdfkit - python : 'str' object has no attribute decode
I'm running a python script via a web app.
import pdfkit after installing it with pip3, python version 3.6.
import pdfkit
def pdfkit(source, method):
if method == "string":
try:
options = {
'page-size': 'A4',
'margin-top': '0.75in',
'margin-right': '0.75in',
'margin-bottom': '0.75in',
'margin-left': '0.75in',
}
config = pdfkit.configuration(wkhtmltopdf=bytes("/usr/local/bin/wkhtmltopdf", 'utf8'))
pdf = pdfkit.from_string(source, False,options=options,configuration=config)
return pdf
except Exception as e:
return str(e)
else:
return "Error: Not yet Supported"
I installed wkhtmltopdf following these instructions for UBUNTU 20.04. It says that these are "headless" and can be executed from the command line. In fact, it does when using the pdfkit wrapper, but when I try to run via the python script itself it doesn't work.
One of the errors that I am getting is:
{
"pdf": "'function' object has no attribute 'configuration'"
}
among other, like the same for from_string if I remove the configuration.
Just wondering if I need to import some other modules or if I need a different version of wkhtmltopdf on the system.
Do I need to get a different binary, or follow the directions here. It is confusing because there are multiple ways to install that, the CLI, the .deb package and using the info on GitHub. Thanks.
wkhtmltopdf/packaging
wkhtmltopdf for UBUNTU
Thanks. Posting an Answer because I want to follow up and elaborate a little bit.
That was one problem, the function name (Duh !):
Still playing around with it a bit, but mostly working. I just started using Python. Pretty nice really. Wish I had discovered that sooner.
def getpdf(source, method):
if method == "string":
try:
options = {
'page-size': 'A4',
'margin-top': '0.75in',
'margin-right': '0.75in',
'margin-bottom': '0.75in',
'margin-left': '0.75in',
}
config = pdfkit.configuration(wkhtmltopdf=bytes("/usr/local/bin/wkhtmltopdf", 'utf8'))
pdf = pdfkit.from_string(source, False,options=options)
return pdf
except Exception as e:
return str(e)
# pdf = pdfkit.from_string(html, False)
# return pdf;
else:
return "Error: Not yet Supported"
def HTMLTOPDF(output, uri, **request):
if request['method'] != 'POST':
output.SendMethodNotAllowed('POST')
else:
query = json.loads(request['body'])
pdf = getpdf(query['html'],query['method'])
print (pdf)
encoded = base64.b64encode(pdf)
response = dict();
response['pdf'] = pdf
output.AnswerBuffer(encoded, 'text/html')
That is mostly working now, and as it is, I've just returning the encoded results as base64, or so it seems since the reponse I get via the request looks like (JVBERi0xLjQKMSAwIG9iago8PAovVGl0bGUgKP7/KQovQ3JlYXRvciAo), PDF I think. I am a little interested in returning the result as JSON instead because I'd like to pass something else back like:
{"base64":encoded","status":"","error":""}, something like that.
I tried something like this:
encoded = base64.b64encode(pdf)
response = dict();
response['pdf'] = encoded
response['status'] = "status"
output.AnswerBuffer(json.dumps(response, indent = 3), 'application/json')
and I get an error:
Object of type 'bytes' is not JSON serializable
Thanks though. At least I can get raw base64 back, which might be easier really.
I am running into this error when I use options(FromCache()) with Sqlalchemy running on python3.6.5, dogpile.cache==0.7.1 and SQLAlchemy==1.3.2
UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘\xae’ in position 744: ordinal not in range(128)
I figured out it's because of this the trademark in "BrandX®".
Example:
vendors = ['BrandX®', 'BrandY Inc.']
engine = create_engine(os.getenv('DEV_DATABASE_URL'), client_encoding='utf-8')
Session = scoped_session(sessionmaker(bind=engine, autoflush=False))
store_id = 123
db = Session()
q = db2.query(Order).join(Product) \
.options(FromCache()) \
.filter(Order.store_id == store_id) \
if vendor:
clauses = []
for v in vendor:
clauses.append((Product.vendor == v))
q = q.filter(or_(*clauses))
return q.all()
I tried to change the vendor encoding to 'utf-8' and 'ascii' and it's not working. Appreciate any help.
Ok, after playing around with encoding to no avail, I figured out the error is actually due to the caching. Specifically, the .options(FromCache()) is causing the problem.
I traced the error to a function called md5_key_mangler, and here's the function.
def md5_key_mangler(key):
"""Receive cache keys as long concatenated strings;
distill them into an md5 hash.
"""
return md5(key.encode("ascii")).hexdigest()
Full documentation from Sqlalchemy around dogpile caching.
It appears to be this line
md5(key.encode("ascii")).hexdigest()
that is causing the problem.
I was then able to go into the file containing my dogpile_caching.environment which I got from the attached link and change the key.encode to utf-8.
md5(key.encode("utf-8")).hexdigest()
And that solved the error. Hope that helps!
Please go through the archival data USA GOV Sample Data
Now I want to read this file in R then getting below mentioned error
result = fromJSON(textFileName)
Error in fromJSON(textFileName) : unexpected character 'u'
When I want to read it in Python then getting below mentioned error
import json
records = [json.loads(line) for line in open(path)]
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4088: character maps to <undefined>
can someone please help me that how can I read this kind of files.
I couldn't get the codes OP provided on the question on my system too(windows/Rstudio/Jupyter). I dig around and find this for R, adapting it to this case:
library(jsonlite)
out <- lapply(readLines("usagov_bitly_data2013-05-17-1368817803"), fromJSON)
df<-data.frame(Reduce(rbind, out))
Although the error I got in R is curiously different from yours.
result = fromJSON("usagov_bitly_data2013-05-17-1368817803")
#Error in parse_con(txt, bigint_as_char) : parse error: trailing garbage
# [ 34.730400, -86.586098 ] } { "a": "Mozilla\/5.0 (Windows N
# (right here) ------^
For Python, as mentioned by juanpa, it seems to be a matter of encoding. The following code works for me.
import json
import os
path=os.path.abspath("usagov_bitly_data2013-05-17-1368817803")
print(path)
file = open(path, encoding="utf8")
records = [json.loads(line) for line in file]
Solution in R:
library(jsonlite)
# if you have a local file
conn <- gzcon(file("usagov_bitly_data2013-05-17-1368817803.gz", "rb"))
# if you read it from URL
conn <- gzcon(url("http://1usagov.measuredvoice.com/bitly_archive/usagov_bitly_data2013-05-17-1368817803.gz"))
data <- stream_in(conn)
{
"Sponge": {
"orientation": "Straight",
"gender": "Woman",
"age": 23,
"rel_status": "Single",
"summary": " Bonjour! Je m'appelle Jacqueline!, Enjoy cooking, reading and traveling!, Love animals, languages and nature :-) ",
"location": "Kao-hsiung-k’a",
"id": "6693397339871"
}
}
I have this json above and I'm trying to read it except there is some special character in it. For example the "’" in location. This raise some error when I'm trying to read the JSON:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 27-28: character maps to <undefined>
I'm using python 3.5 and I have done the following code:
with open('test.json') as json_data:
users = json.load(json_data)
print users
Use codecs module to open the file for a quick fix.
with codecs.open('test.json', 'r', 'utf-8') as json_data:
users = json.load(json_data)
print(users)
Also answer to this question can be found easily on the web. (hint: that's how I learned about this module.)
Ok I find my solution it's a problem with the terminal of windows you have to type this in the terminal: chcp 65001
After that launch your program!
More explanation here: Why doesn't Python recognize my utf-8 encoded source file?
I am trying to write a script to download images from an API, I have a set up a loop that is as follows:
response = requests.get(url, params=query)
json_data = json.dumps(response.text)
pythonVal = json.loads(json.loads(json_data))
print(pythonVal)
The print(pythonVal) returns:
{
"metadata": {
"code": 200,
"message": "OK",
"version": "v2.0"
},
"data": {
"_links": {
"self": {
"href": "redactedLink"
}
},
"id": "123456789",
"_fixed": true
,
"type": "IMAGE",
"source": "social media",
"source_id": "1234567890_1234567890",
"original_source": "link",
"caption": "caption",
"video_url": null,
"share_url": "link",
"date_submitted": "2016-07-11T09:34:35+00:00",
"date_published": "2016-09-11T16:30:26+00:00",
I keep getting an error that reads:
UnicodeEncodeError: 'ascii' codec can't encode character '\xc4' in
position 527: ordinal not in range(128)
For the pythonVal variable, if I just have it set to json.loads(json_data), it prints out the JSON response, but then when I try doing pythonVal['data'] I get another error that reads:
TypeError: string indices must be integers
Ultimately I'd like to be able to get data from it by doing something like
pythonVal['data']['_embedded']['uploader']['username']
Thanks for your input!
Why doing json.loads() twice? Change:
json.loads(json.loads(json_data))
to:
json.loads(json_data)
and it should work.
Now since you are getting error TypeError: string indices must be integers on doing pythonVal['data'], it means that the value of pythonVal is of list type and not dict. Instead do:
for item in pythonVal:
print item
Please also mention the sample JSON content with the question, if you want better help from others :)
Put the following on top of your code. This works by overriding the native ascii encoding of Python to UTF-8.
# -*- coding: utf-8 -*-
The second error is because you have already gotten the string, and you need integer indices to get the characters of the string.