non-ascii characters in sqlalchemy query with caching environment - python

I am running into this error when I use options(FromCache()) with Sqlalchemy running on python3.6.5, dogpile.cache==0.7.1 and SQLAlchemy==1.3.2
UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘\xae’ in position 744: ordinal not in range(128)
I figured out it's because of this the trademark in "BrandX®".
Example:
vendors = ['BrandX®', 'BrandY Inc.']
engine = create_engine(os.getenv('DEV_DATABASE_URL'), client_encoding='utf-8')
Session = scoped_session(sessionmaker(bind=engine, autoflush=False))
store_id = 123
db = Session()
q = db2.query(Order).join(Product) \
.options(FromCache()) \
.filter(Order.store_id == store_id) \
if vendor:
clauses = []
for v in vendor:
clauses.append((Product.vendor == v))
q = q.filter(or_(*clauses))
return q.all()
I tried to change the vendor encoding to 'utf-8' and 'ascii' and it's not working. Appreciate any help.

Ok, after playing around with encoding to no avail, I figured out the error is actually due to the caching. Specifically, the .options(FromCache()) is causing the problem.
I traced the error to a function called md5_key_mangler, and here's the function.
def md5_key_mangler(key):
"""Receive cache keys as long concatenated strings;
distill them into an md5 hash.
"""
return md5(key.encode("ascii")).hexdigest()
Full documentation from Sqlalchemy around dogpile caching.
It appears to be this line
md5(key.encode("ascii")).hexdigest()
that is causing the problem.
I was then able to go into the file containing my dogpile_caching.environment which I got from the attached link and change the key.encode to utf-8.
md5(key.encode("utf-8")).hexdigest()
And that solved the error. Hope that helps!

Related

Fast API JSON response with Bytes

I'm using a FAST API to retrieve a mongo document that contains some bytes. The structure is as follows
item =
{"namd" : "xyz",
"value1: b'\x89PNG\r\n\sla\..."
...
"some_other_byte: b'\x89PNG\r\n\sla\..."
}
using a post request in fast API to return the above data, it tries to convert it json, but fails to do so automatically.
So I tried this:
json_compatible_item_data = jsonable_encoder(item)
but then I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
Is there a way to automatically convert the above dict into a json so it can be returned in a rest API? What would be the best way to do that?
With FastAPI jsonable_encoder you can use custom encoders. Example of converting arbitrary bytes object to base64 str:
json_compatible_item_data = jsonable_encoder(item, custom_encoder={
bytes: lambda v: base64.b64encode(v).decode('utf-8')})
Decoding target fields on the client side can be done like this:
value1 = base64.b64decode(response_dict["value1"])
In my case, the jsonable_encoder is an unnecessary wrapper around the lambda. I just call base64 directly on the pyodbc data row...
column_names = tuple(c[0] for c in cursor.description)
for row in cursor:
row = [base64.b64encode(x).decode() if isinstance(x, bytes) else x for x in row]
yield dict(zip(column_names, row))
But seriously, why is this necessary? For everything else, FastAPI just works "out of the box". This seems like a bug.

Python UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3

I'm reading a config file in python getting sections and creating new config files for each section.
However.. I'm getting a decode error because one of the strings contains Español=spain
self.output_file.write( what.replace( " = ", "=", 1 ) )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
How would I adjust my code to allow for encoded characters such as these? I'm very new to this so please excuse me if this is something simple..
class EqualsSpaceRemover:
output_file = None
def __init__( self, new_output_file ):
self.output_file = new_output_file
def write( self, what ):
self.output_file.write( what.replace( " = ", "=", 1 ) )
def get_sections():
configFilePath = 'C:\\test.ini'
config = ConfigParser.ConfigParser()
config.optionxform = str
config.read(configFilePath)
for section in config.sections():
configdata = {k:v for k,v in config.items(section)}
confignew = ConfigParser.ConfigParser()
cfgfile = open("C:\\" + section + ".ini", 'w')
confignew.add_section(section)
for x in configdata.items():
confignew.set(section,x[0],x[1])
confignew.write( EqualsSpaceRemover( cfgfile ) )
cfgfile.close()
If you use python2 with from __future__ import unicode_literals then every string literal you write is an unicode literal, as if you would prefix every literal with u"...", unless you explicitly write b"...".
This explains why you get an UnicodeDecodeError on this line:
what.replace(" = ", "=", 1)
because what you actually do is
what.replace(u" = ",u"=",1 )
ConfigParser uses plain old str for its items when it reads a file using the parser.read() method, which means what will be a str. If you use unicode as arguments to str.replace(), then the string is converted (decoded) to unicode, the replacement applied and the result returned as unicode. But if what contains characters that can't be decoded to unicode using the default encoding, then you get an UnicodeDecodeError where you wouldn't expect one.
So to make this work you can
use explicit prefixes for byte strings: what.replace(b" = ", b"=", 1)
or remove the unicode_litreals future import.
Generally you shouldn't mix unicode and str (python3 fixes this by making it an error in almost any case). You should be aware that from __future__ import unicode_literals changes every non prefixed literal to unicode and doesn't automatically change your code to work with unicode in all case. Quite the opposite in many cases.

UnicodeDecodeError (once again) with format() but not with concatenation

I have a class chunk with text fields title and text. When I want to print them, I get (surprise, surprise!) UnicodeDecodeError. It gives me an error when I try to format an output string, but when I just concatenate text and title and return it, I get no error:
class Chunk:
# init, fields, ...
# this implementation will give me an error
def __str__( self ):
return u'{0} {1}'.format ( enc(self.text), enc(self.title) )
# but this is OK - all is printed without error
def __str__( self ):
return enc(self.text) + enc(self.title)
def enc(x):
return x.encode('utf-8','ignore') # tried many combinations of arguments...
c = Chunk()
c.text, c.title = ... # feed from external file
print c
Bum! Error!
return u'{0} {1}'.format ( enc(self.text), enc(self.title) )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2844: ordinal not in range(128)
I think I used all the possible combinations of encode/decode/utf-8/ascii/replace/ignore/...
(the python unicode issue is really irritating!)
You should override __unicode__, not __str__, when you return a unicode.
There is no need to call .encode(), since the input is already a unicode. Just write
def __unicode__(self):
return u"{0} {1}".format(self.text, self.title)
The simplest way to avoid 2.x python's unicode problem is to set overall encoding to utf-8, or such a problems will be constantly arise in a sudden places:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

How can I understand this python error message?

Hi can you help me decode this message and what to do:
main.py", line 1278, in post
message.body = "%s %s/%s/%s" % (msg, host, ad.key().id(), slugify(ad.title.encode('utf-8')))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
Thanks
UPDATE having tried removing the encode call it appears to work:
class Recommend(webapp.RequestHandler):
def post(self, key):
ad= db.get(db.Key(key))
email = self.request.POST['tip_email']
host = os.environ.get("HTTP_HOST", os.environ["SERVER_NAME"])
senderemail = users.get_current_user().email() if users.get_current_user() else 'info#monton.cl' if host.endswith('.cl') else 'info#monton.com.mx' if host.endswith('.mx') else 'info#montao.com.br' if host.endswith('.br') else 'admin#koolbusiness.com'
message = mail.EmailMessage(sender=senderemail, subject="%s recommends %s" % (self.request.POST['tip_name'], ad.title) )
message.to = email
message.body = "%s %s/%s/%s" % (self.request.POST['tip_msg'],host,ad.key().id(),slugify(ad.title))
message.send()
matched_images=ad.matched_images
count = matched_images.count()
if ad.text:
p = re.compile(r'(www[^ ]*|http://[^ ]*)')
text = p.sub(r'\1',ad.text.replace('http://',''))
else:
text = None
self.response.out.write("Message sent<br>")
path = os.path.join(os.path.dirname(__file__), 'market', 'market_ad_detail.html')
self.response.out.write(template.render(path, {'user_url':users.create_logout_url(self.request.uri) if users.get_current_user() else users.create_login_url(self.request.uri),
'user':users.get_current_user(), 'ad.user':ad.user,'count':count, 'ad':ad, 'matched_images': matched_images,}))
The problem here is your underlying model (message.body) only wants ASCII text but you're trying to give it a string encoded in unicode.
But since you've got a normal ascii string here, you can just make python print out the '?' character when you've got a non-ascii-printing string.
"UNICODE STRING".encode('ascii','replace').decode('ascii')
So like from your example above:
message.body = "%s %s/%s/%s" % \
(msgencode('ascii','replace').decode('ascii'),
hostencode('ascii','replace').decode('ascii'),
ad.key().id()encode('ascii','replace').decode('ascii'),
slugify(ad.title)encode('ascii','replace').decode('ascii'))
Or just encode/decode on the variable that has the unicode character.
But this isn't an optimal solution. The best idea is to make message.body a unicode string. Being that doesn't seem feasible (I'm not familiar with GAE), you can use this to at least not have errors.
You've got a Unicode character in a place that you're not supposed to. Most often I find this error is having MS Word-style slanted quotes.
One of these fields has some characters that cannot be encoded. If you switch to python 3 (it has better unicode support), or you change the encoding of the entire script the problem should stop, about the best way to change the encoding in 2.x is using the encoding comment line. If you see http://evanjones.ca/python-utf8.html you will see more of an explanation of using python with utf-8 support the best suggestion is add # -*- coding: utf-8 -*- to the top of your script. And handle scripts like this
s = "hello normal string"
u = unicode( s, "utf-8" )
backToBytes = u.encode( "utf-8" )
I had a similar problem when using Django norel and Google App Engine.
The problem was at the folder containing the application. Probably isn't this the problem described in this question, but, maybe helps someone don't waste time like me.
Try first change you application folder maybe to /home/ and try to run again, if doesn't works, try something more.

Unicode with active_directory

I'm using the active_directory module, and trying to print a list of the users. My code is:
import active_directory as ad
users = ad.AD_Object("LDAP://OU=Home, DC=dome, DC=net")
for user in users.search(objectCategory="Person"):
print str(user)
It prints some of the users until it meets an unicode username. Then it throws the following error:
UnicodeEncodeError: ascii codec
can't encode characthers in position
10-14: ordinaal not in range(128).
What can I do?
Thank you very much.
Try:
print user.decode('utf-8')

Categories

Resources