shutil.make_archive throws UnicodeEncodeError

shutil.make_archive throws UnicodeEncodeError - python

I'm trying to zip some backups of a wordpress websites(a lot of files) via python's shutil.make_archive, but I'm getting this error:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.5/zipfile.py", line 432, in _encodeFilenameFlags
return self.filename.encode('ascii'), self.flag_bits
UnicodeEncodeError: 'ascii' codec can't encode character '\udcc3' in position 61: ordinal not in range(128)
I am using python 3.6.1 and run it on heroku.
Here's the actual code that works in some cases and in some it doesn't:
zipped = shutil.make_archive( zip_file_name, 'zip', self.folder_path, self.time )
I hope someone can help me find a solution to this problem.
Thanks!

Looks like zip_file_name has non-English chars.
Try:
zipped = shutil.make_archive( zip_file_name.decode('utf8','surrogateescape'), 'zip', self.folder_path, self.time )

Related

Scalene: An exception of type UnicodeEncodeError

I'm trying to run Scalene inside a .ipynb fily in Jupyter with %%scalene and get the following error:
Scalene: An exception of type UnicodeEncodeError occurred. Arguments:
('charmap', '\r\n<html>\r\n <head>\r\n <title>Scalene</title> ...
followed by basically the whole Scalene Github website html code and ending with:
Traceback (most recent call last):
File "C:\Users\marci\anaconda3\envs\wc2022v2_env\lib\site-packages\scalene\scalene_profiler.py", line 1949, in run_profiler
exit_status = profiler.profile_code(
File "C:\Users\marci\anaconda3\envs\wc2022v2_env\lib\site-packages\scalene\scalene_profiler.py", line 1781, in profile_code
Scalene.generate_html(profile_fname=Scalene.__profile_filename, output_fname=Scalene.__profiler_html)
File "C:\Users\marci\anaconda3\envs\wc2022v2_env\lib\site-packages\scalene\scalene_profiler.py", line 1729, in generate_html
f.write(rendered_content)
File "C:\Users\marci\anaconda3\envs\wc2022v2_env\lib\encodings\cp1250.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xa3' in position 79534: character maps to <undefined>
Am I right to think it is the website coding that is causing this error? Or is it the browser?
If anything else: what can be done about it?
python 3.8.15

Got the same error related to cp1250. Seems scalene profiler didn't bother to test national environments. Tried to use chcp 65001 to set runtime to utf-8, but it didn't help.
My fix was to hack its source "(...)\scalene\scalene_profiler.py" at line 1728:
instead of:
with open(output_fname, "w") as f:
use:
with open(output_fname, "w", encoding='utf-8') as f:
That solved the problem.

Unable to use csv.reader for a non ascii string in python 3

This is currently my code:
# -*- coding: utf-8 -*-
import csv
import codecs
# original directory
phys_comp_dir = '/Users/lmnt74/Physician_Compare'
# for row in Performance_Scores:
# print(','.join(row))
# file name
National_Downloadable_File = ('/Physician_Compare_National_Downloadable'
'_File.csv')
National_File = csv.reader(open(phys_comp_dir+National_Downloadable_File,
newline='', encoding='utf-8'),
quotechar='|', quoting=csv.QUOTE_MINIMAL,
lineterminator='\n'
)
for row in National_File:
for i in row:
try:
print(i)
except UnicodeError:
print(i.encode('latin-1').decode('utf-8'))
I receive the following error:
Traceback (most recent call last):
File "/Users/lmn74/Physician_Compare/q2.py", line 41, in <module>
print(i)
UnicodeEncodeError: 'ascii' codec can't encode character '\xae' in position 52: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/lmnt74/Physician_Compare/q2.py", line 43, in <module>
print(i.encode('latin-1').decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 52: invalid start byte
I am unsure about how to proceed. I know the string that is throwing the error is the (R), the registered trademark. I would like to figure out how to re-write my code so that it is able to check for this in each string OR if a better way exists to allocate for this when reading the file initially, I'm all for that.
What I've done so far:
I've read the unicode documentation.
I've read the CSV documentation
I've read about the unicode sandwich
None of which have helped me or are easy enough reads for me to understand. I'm a fairly new beginner and anything to point me in the right direction would be greatly appreciated.
EDIT: Figured it out, see below:
changed:
print(i.encode('latin-1').decode('utf-8'))
to:
print(i.encode('ascii', 'ignore').decode('utf-8', 'ignore'))
Sorry to waste anyone's time.

Python and IBM DB2: UnicodeDecodeError

I'm getting this error message
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 38: ordinal not in range(128)
when I try to execute any sql query in Python, like this one:
>>> import ibm_db
>>> conn = ibm_db.connect("sample","root","root")
>>> ibm_db.exec_immediate(conn, "select * from act")
I checked default encoding and it seems to be 'utf8':
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
I also know about this thread, where people are discussing quite a similar problem. One of the advices is:
Have you applied the required database PTFs (SI57014 and SI57015 for 7.1 and SI57146 and SI57147 for 7.2)? They are included as a distreq, so they should have been in the order with your PTFs, but won't be automatically applied.
However, I do not know what is database PTF and how to apply it. Need help.
PS. I'm using Windows 10.
EDIT
This is how I get my error message:
>>> print(ibm_db.stmt_errormsg())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 38:
ordinal not in range(128)
But when I run the same query "select * from act" in DB2 CLP, then it's ok.
And this is driver information, whcih I got running this code in Python:
if client:
print("DRIVER_NAME: string(%d) \"%s\"" % (len(client.DRIVER_NAME), client.DRIVER_NAME))
print("DRIVER_VER: string(%d) \"%s\"" % (len(client.DRIVER_VER), client.DRIVER_VER))
print("DATA_SOURCE_NAME: string(%d) \"%s\"" % (len(client.DATA_SOURCE_NAME), client.DATA_SOURCE_NAME))
print("DRIVER_ODBC_VER: string(%d) \"%s\"" % (len(client.DRIVER_ODBC_VER), client.DRIVER_ODBC_VER))
print("ODBC_VER: string(%d) \"%s\"" % (len(client.ODBC_VER), client.ODBC_VER))
print("ODBC_SQL_CONFORMANCE: string(%d) \"%s\"" % (len(client.ODBC_SQL_CONFORMANCE), client.ODBC_SQL_CONFORMANCE))
print("APPL_CODEPAGE: int(%s)" % client.APPL_CODEPAGE)
print("CONN_CODEPAGE: int(%s)" % client.CONN_CODEPAGE)
ibm_db.close(conn)
else:
print("Error.")
it prints:
DRIVER_NAME: string(10) "DB2CLI.DLL"
DRIVER_VER: string(10) "10.05.0007"
DATA_SOURCE_NAME: string(6) "SAMPLE"
DRIVER_ODBC_VER: string(5) "03.51"
ODBC_VER: string(10) "03.01.0000"
ODBC_SQL_CONFORMANCE: string(8) "EXTENDED"
APPL_CODEPAGE: int(1251)
CONN_CODEPAGE: int(1208)
True
EDIT
I also tried this:
>>> cnx = ibm_db.connect("sample","root","root")
>>> query = "select * from act"
>>> query.encode('ascii')
b'select * from act'
>>> ibm_db.exec_immediate(cnx, query)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception
>>> print(ibm_db.stmt_errormsg())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 38:
ordinal not in range(128)
As you can see, in this case I also get the very same error message.
SUMMARY
Below are all my attemts:
C:\Windows\system32>chcp
Active code page: 65001
C:\Windows\system32>python
Python 3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 20:20:57) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ibm_db
>>> cnx = ibm_db.connect("sample","root","root")
>>> ibm_db.exec_immediate(cnx, "select * from act")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception
>>> print(ibm_db.stmt_errormsg())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 38: ordinal not in range(128)
>>> ibm_db.exec_immediate(cnx, b"select * from act")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: statement must be a string or unicode
>>> query = "select * from act"
>>> query = query.encode()
>>> ibm_db.exec_immediate(cnx, query)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: statement must be a string or unicode
>>> ibm_db.exec_immediate(cnx, "select * from act").decode('cp-1251')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception

What you have here is an incompatibility between your client code (ibm_db) and the DB2 server. As you can see in the client code the logic for your query is basically:
Extract and check the parameters passed in (lines 4873 to 4918).
Allocate native objects for the query (up to 4954).
Do the query and decode the results (the rest of the function).
Based on our investigations so far, you know that the data you're passing in for the query is well-formed (and so it is not step 1). Looking at the error paths in step 2, you'd see simple error messages explaining these failures. You're therefore failing in step 3.
You are getting an empty Exception raised on the query and when you try to get the details of the error you get another Unicode decoding Exception. This looks like either a bug in ibm_db or a configuration error that means your DB2 installation is not compatible. So how can we find out which...?
As flagged elsewhere, the issue is fundamentally to do with codepages. All the ibm_db code basically interprets strings as ASCII (by converting them with StringOBJ_FromASCII which maps down to calls into Python APIs that insist on receiving ASCII chars - and will throw unicode exceptions if not).
Based on your diags, you could try to prove/disprove this problem, by installing/configuring both your systems (client and DB2 server) to use US English. This should get you past the codepage incompatibility to find the real error here.
If the query is really going out over the network, you might just get a network trace that shows the response coming back from the server. However, based on the fact that you saw nothing in the logs, I'm not convinced this will bear any fruit.
Failing that you need to patch the ibm_db code to handle non-ASCII content - either by raising a bug report with the maintainer or trying it yourself (if you know how to build and debug C extensions).

The problem is that the DB2 server is returned CP-1251 (also known as Windows-1251) text (as evidenced by APPL_CODEPAGE: int(1251)) in your config output. Python (specifically, the interactive Python REPL) is expecting either UTF-8 or ASCII output, so this causes issues.
The solution is to do:
ibm_db.exec_immediate(conn, "select * from act").decode('cp-1251')
Additionally, you need to make sure that your terminal's text encoding is set to UTF-8. Details on changing that setting will depend on the specific terminal that you are using. Since you have said you are using cmd, the appropriate command is chcp 65001.

In this kind of case, using an utf8 environment, with a stuff that requires a ascii one; i use the decode method.
'ascii' codec can't decode byte 0xc8
Allright, it's normal, this is not ascii but utf8 string: you should decode it with utf8 encoding.
...
query.decode('utf8')
ibm_db.exec_immediate(cnx, query)
After that you may need to re-encode the results to write or print them.

Unicode Decoding error with xlwt

I'm trying to backport a script to Python 2.4 and an error that keeps coming up is the following:
Traceback (most recent call last)
File "vuln-excel-processor.py", line 103, in ?
values[row][col] = str(sheet.cell(row + 1, col).value).decode('utf-8')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 44: ordinal not in range(128)
Here's the offending code block.
for row in range(sheet.nrows - 1):
for col in range(sheet.ncols):
values[row][col] = str(sheet.cell(row + 1, col).value).decode('utf-8')
I found this page discussing this error Python - Finding unicode/ascii problems, which prompted me to put the call to str.decode() in the last line but that failed to resolve the problem. Any ideas on the culprit here? Much thanks for any assistance.

dictionary data extraction issue

top_100 is a mongodb collection:
the following code:
x=[]
thread=[]
for doc in top_100.find():
x.append(doc['_id'])
db = Connection().test
top_100 = db.top_100_thread
thread = [a["thread"] for a in x]
for doc in thread:
print doc
gives this error:
Traceback (most recent call last):
File "C:\Users\chatterjees\workspace\de.vogella.python.first\src\top_100_thread.py", line 21, in <module>
print doc
File "C:\Python27\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u03b9' in position 10: character maps to <undefined>
what's going on?

Its because your document contains some unicode data.
You need to correctly output unicode data
instead of directly printing it.
see:
python 3.0, how to make print() output unicode?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.