rendering a file in python using pygal - ascii code error - python

I am trying to create a pygal chart in python and saving it to a .svg file.
#Creating pygal charts
pie_chart = pygal.Pie(style=DarkSolarizedStyle, legend_box_size = 20, pretty_print=True)
pie_chart.title = 'Github-Migration Status Chart (in %)'
pie_chart.add('Intro', int(intro))
pie_chart.add('Parallel', int(parallel))
pie_chart.add('In Progress', int(in_progress) )
pie_chart.add('Complete', int(complete))
pie_chart.render_to_file('../../../../../usr/share/nginx/html/TeamFornax/githubMigration/OverallProgress/overallProgress.svg')
This simple piece of code seems to give the error -
> Traceback (most recent call last): File
> "/home/ec2-user/githubr/migrationcharts.py", line 161, in <module>
> pie_chart.render_to_file('../../../../../usr/share/nginx/html/TeamFornax/githubMigration/OverallProgress/overallProgress.svg')
> File "/usr/lib/python2.6/site-packages/pygal/ghost.py", line 149, in
> render_to_file
> f.write(self.render(is_unicode=True, **kwargs)) File "/usr/lib/python2.6/site-packages/pygal/ghost.py", line 112, in render
> .render(is_unicode=is_unicode)) File "/usr/lib/python2.6/site-packages/pygal/graph/base.py", line 293, in
> render
> is_unicode=is_unicode, pretty_print=self.pretty_print) File "/usr/lib/python2.6/site-packages/pygal/svg.py", line 271, in render
> self.root, **args) File "/usr/lib64/python2.6/xml/etree/ElementTree.py", line 1010, in
> tostring
> return string.join(data, "") File "/usr/lib64/python2.6/string.py", line 318, in join
> return sep.join(words) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 40: ordinal not in range(128)
Any idea why ?

try to decode the path string to unicode that you send to render_to_file.
such like:
pie_chart.render_to_file('path/to/overallProgress.svg'.decode('utf-8'))
the decoding charset should be consistent with your file encoding.

Related

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 49: for textacy

I am using the textacy method to get synonyms.
import textacy.resources
rs = textacy.resources.ConceptNet()
syn=rs.get_synonyms('happy')
I get the below error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Dhiraj\Desktop\Work\QGen\lib\site-packages\textacy\resources\concept_net.py", line 353, in get_synonyms
return self._get_relation_values(self.synonyms, term, lang=lang, sense=sense)
File "C:\Users\Dhiraj\Desktop\Work\QGen\lib\site-packages\textacy\resources\concept_net.py", line 338, in synonyms
self._synonyms = self._get_relation_data("/r/Synonym", is_symmetric=True)
File "C:\Users\Dhiraj\Desktop\Work\QGen\lib\site-packages\textacy\resources\concept_net.py", line 162, in _get_relation_data
for row in rows:
File "C:\Users\Dhiraj\Desktop\Work\QGen\lib\site-packages\textacy\io\csv.py", line 96, in read_csv
for row in csv_reader:
File "C:\Python37\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 49: character maps to <undefined>
I have tried to enforce encoding='utf8' in both concept_net.py", line 162, and io\csv.py", line 96, in read_csv, but that gives another error
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
What can be done ?

ignore encoding error when parsing pdf with pdfminer

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdftypes import resolve1
fn='test.pdf'
with open(fn, mode='rb') as fp:
parser = PDFParser(fp)
doc = PDFDocument(parser)
fields = resolve1(doc.catalog['AcroForm'])['Fields']
item = {}
for i in fields:
field = resolve1(i)
name, value = field.get('T'), field.get('V')
item[name]=value
Hello, I need help with this code as it is giving me Unicode error on some characters
Traceback (most recent call last):
File "<stdin>", line 7, in <module>
File "/home/timmy/.local/lib/python3.8/site-packages/pdfminer/pdftypes.py", line 80, in resolve1
x = x.resolve(default=default)
File "/home/timmy/.local/lib/python3.8/site-packages/pdfminer/pdftypes.py", line 67, in resolve
return self.doc.getobj(self.objid)
File "/home/timmy/.local/lib/python3.8/site-packages/pdfminer/pdfdocument.py", line 673, in getobj
stream = stream_value(self.getobj(strmid))
File "/home/timmy/.local/lib/python3.8/site-packages/pdfminer/pdfdocument.py", line 676, in getobj
obj = self._getobj_parse(index, objid)
File "/home/timmy/.local/lib/python3.8/site-packages/pdfminer/pdfdocument.py", line 648, in _getobj_parse
raise PDFSyntaxError('objid mismatch: %r=%r' % (objid1, objid))
File "/home/timmy/.local/lib/python3.8/site-packages/pdfminer/psparser.py", line 85, in __repr__
return self.name.decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
is there anything I can add so it "ingores" the charchters that its not able to decode or at least return the name with the value as blank in name, value = field.get('T'), field.get('V').
any help is appreciated
Here is one way you can fix it
nano "/home/timmy/.local/lib/python3.8/site-packages/pdfminer/psparser.py"
then in line 85
def __repr__(self):
return self.name.decode('ascii', 'ignore') # this fixes it
I don't believe it's recommended to edit source scripts, you should also post an issue on Github

UnicodeDecodeError: 'utf8' codec can't decode byte 0xbb in position 5: invalid start byte

I am using Python 2.7 and had this error that I can't fix. I am trying to download HTMLs from a page and the next button looks like this : Next »
Traceback (most recent call last):
File "C:\Users\Said&Nour\Desktop\Documents\PythonFiles\LebanonParsing\Al Rifai\alrifai.py", line 109, in <module>
if PageP.find('a',attrs={'title':'Next »'}) is None:
File "C:\Python27\lib\site-packages\bs4\element.py", line 1300, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "C:\Python27\lib\site-packages\bs4\element.py", line 1321, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "C:\Python27\lib\site-packages\bs4\element.py", line 602, in _find_all
strainer = SoupStrainer(name, attrs, text, **kwargs)
File "C:\Python27\lib\site-packages\bs4\element.py", line 1420, in __init__
normalized_attrs[key] = self._normalize_search_value(value)
File "C:\Python27\lib\site-packages\bs4\element.py", line 1434, in _normalize_search_value
return value.decode("utf8")
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xbb in position 5: invalid start byte

decode subprocess.Popen and store in file

I wrote a script / Addon for pyLoad.
Basically it executes FileBot with arguments.
What I am trying to do is to get the output and store it into the pyLoad Log file.
So far so good. It works until that point where a single character needs to be decoded.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 5: ordinal not in range(128)
I dont know how to do that.
I hope u guys can help.
try:
if self.getConfig('output_to_log') is True:
log = open('Logs/log.txt', 'a')
subprocess.Popen(args, stdout=log, stderr=log, bufsize=-1)
Thanks in advance
[edit]
28.05.2015 12:34:06 DEBUG FileBot-Hook: MKV-Checkup (package_extracted)
28.05.2015 12:34:06 DEBUG Hier sind keine Archive
28.05.2015 12:34:06 INFO FileBot: executed
28.05.2015 12:34:06 INFO FileBot: cleaning
Locking /usr/share/filebot/data/logs/amc.log
Done ヾ(@⌒ー⌒@)ノ
Parameter: exec = cd / && ./filebot.sh "{file}"
Parameter: clean = y
Parameter: skipExtract = y
Parameter: reportError = n
Parameter: storeReport = n
Parameter: artwork = n
Parameter: subtitles = de
Parameter: movieFormat = /mnt/HD/Medien/Movies/{n} ({y})/{n} ({y})
Parameter: seriesFormat = /mnt/HD/Medien/TV Shows/{n}/Season {s.pad(2)}/{n} - {s00e00} - {t}
Parameter: extras = n
So im guessing this
Done ヾ(@⌒ー⌒@)ノ
is causing the issue
when i open the loginterface on the webgui to see the log - this is the traceback
Traceback (most recent call last):
File "/usr/share/pyload/module/lib/bottle.py", line 733, in _handle
return route.call(**args)
File "/usr/share/pyload/module/lib/bottle.py", line 1448, in wrapper
rv = callback(*a, **ka)
File "/usr/share/pyload/module/web/utils.py", line 113, in _view
return func(*args, **kwargs)
File "/usr/share/pyload/module/web/pyload_app.py", line 464, in logs
[pre_processor])
File "/usr/share/pyload/module/web/utils.py", line 30, in render_to_response
return t.render(**args)
File "/usr/share/pyload/module/lib/jinja2/environment.py", line 891, in render
return self.environment.handle_exception(exc_info, True)
File "/usr/share/pyload/module/web/templates/Next/logs.html", line 1, in top-level template code
{% extends 'Next/base.html' %}
File "/usr/share/pyload/module/web/templates/Next/base.html", line 179, in top-level template code
{% block content %}
File "/usr/share/pyload/module/web/templates/Next/logs.html", line 30, in block "content"
<tr><td class="logline">{{line.line}}</td><td>{{line.date}}</td><td class="loglevel">{{line.level}}</td><td>{{line.message}}</td></tr>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 5: ordinal not in range(128)
I found a solution.
proc=subprocess.Popen(args, stdout=subprocess.PIPE)
for line in proc.stdout:
self.logInfo(line.decode('utf-8').rstrip('\r|\n'))
proc.wait()

Unable to decode yml file ... utf8' codec can't decode byte #xa0: invalid start byte

I'm trying to read YAML file and convert it into dictionary file. I'm seeing an issue while loading the file into dict variable.
I tried to search for similar issues. One of the replies in stackoverflow was to replace each character '\\xa0' with ' '. I tried do that line = line.replace('\\xa0',' '). This program doesn't work on Python 2.7 version. I tried using Python 3 it works fine.
import yaml
import sys
yaml_dir = "/root/tools/test_case/"
#file_name = "TC_CFD_SR.yml"
file_name = "TC_QB.yml"
tc_file_name = yaml_dir + file_name
def write(file,content):
file = open(file,'a')
file.write(content)
file.close()
def verifyYmlFile(yml_file):
data = {}
with open(yml_file, 'r') as fin:
for line in fin:
line = line.replace('\\xa0',' ')
write('anand-yaml.yml',line)
with open('anand-yaml.yml','r') as fin:
data = yaml.load(fin)
return data
if __name__ == '__main__':
data = {}
print "verifying yaml"
data= verifyYmlFile(tc_file_name)
Error:
[root#anand-harness test_case]# python verify_yaml.py
verifying yaml
Traceback (most recent call last):
File "verify_yaml.py", line 29, in <module>
data= verifyYmlFile(tc_file_name)
File "verify_yaml.py", line 23, in verifyYmlFile
data = yaml.load(fin)
File "/usr/lib64/python2.6/site-packages/yaml/__init__.py", line 71, in load
return loader.get_single_data()
File "/usr/lib64/python2.6/site-packages/yaml/constructor.py", line 37, in get_single_data
node = self.get_single_node()
File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 82, in compose_node
node = self.compose_sequence_node(anchor)
File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 111, in compose_sequence_node
node.value.append(self.compose_node(node, index))
File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 133, in compose_mapping_node
item_value = self.compose_node(node, item_key)
File "/usr/lib64/python2.6/site-packages/yaml/composer.py", line 64, in compose_node
if self.check_event(AliasEvent):
File "/usr/lib64/python2.6/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/usr/lib64/python2.6/site-packages/yaml/parser.py", line 449, in parse_block_mapping_value
if not self.check_token(KeyToken, ValueToken, BlockEndToken):
File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 116, in check_token
self.fetch_more_tokens()
File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 244, in fetch_more_tokens
return self.fetch_single()
File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 653, in fetch_single
self.fetch_flow_scalar(style='\'')
File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 667, in fetch_flow_scalar
self.tokens.append(self.scan_flow_scalar(style))
File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 1156, in scan_flow_scalar
chunks.extend(self.scan_flow_scalar_non_spaces(double, start_mark))
File "/usr/lib64/python2.6/site-packages/yaml/scanner.py", line 1196, in scan_flow_scalar_non_spaces
while self.peek(length) not in u'\'\"\\\0 \t\r\n\x85\u2028\u2029':
File "/usr/lib64/python2.6/site-packages/yaml/reader.py", line 91, in peek
self.update(index+1)
File "/usr/lib64/python2.6/site-packages/yaml/reader.py", line 165, in update
exc.encoding, exc.reason)
yaml.reader.ReaderError: 'utf8' codec can't decode byte #xa0: invalid start byte
in "anand-yaml.yml", position 3246
What am I missing?
The character sequence "\\xa0" is not the problem that you see in the message, the problem is the sequence "\xa0" (note that the backslash is not escaped).
You replacement line should be:
line = line.replace('\xa0',' ')
to circumvent the problem.
If you know what the format is you can do the correct conversion yourself, but that should not be necessary and that or the above patching is not a structural solution. It would be best if the YAML file was generated in a correct way (they default to UTF-8, so it should contain correct UTF-8). It could UTF-16 without the appropriate BOM (which the yaml library interprets IIRC).
s1 = 'abc\\xa0xyz'
print(repr(s1))
u1 = s1.decode('utf-8') # this works fine
s = 'abc\xa0xyz'
print(repr(s))
u = s.decode('utf-8') # this throws an error

Categories

Resources