Getting unicode decode error in python? - python

I am using facebook graph API but getting error when I try to run graph.py
How should I resolve this problem of charmap. I am facing unicode decode error.
enter image description here
In graph.py :
table = json2html.convert(json = variable)
htmlfile=table.encode('utf-8')
f = open('Table.html','wb')
f.write(htmlfile)
f.close()
# replacing '&gt' with '>' and '&lt' with '<'
f = open('Table.html','r')
s=f.read()
s=s.replace(">",">")
s=s.replace("<","<")
f.close()
# writting content to html file
f = open('Table.html','w')
f.write(s)
f.close()
# output
webbrowser.open("Table.html")
else:
print("We couldn't find anything for",PageName)
I could not understand why I am facing this issue. Also getting some error with 's=f.read()'

In error message I see it tries to guess encoding used in file when you read it and finally it uses encoding cp1250 to read it (probably because Windows use cp1250 as default in system) but it is incorrect encoding becuse you saved it as 'utf-8'.
So you have to use open( ..., encoding='utf-8') and it will not have to guess encoding.
# replacing '&gt' with '>' and '&lt' with '<'
f = open('Table.html','r', encoding='utf-8')
s = f.read()
f.close()
s = s.replace(">",">")
s = s.replace("<","<")
# writting content to html file
f = open('Table.html','w', encoding='utf-8')
f.write(s)
f.close()
But you could change it before you save it. And then you don't have to open it again.
table = json2html.convert(json=variable)
table = table.replace(">",">").replace("<","<")
f = open('Table.html', 'w', encoding='utf-8')
f.write(table)
f.close()
# output
webbrowser.open("Table.html")
BTW: python has function html.unescape(text) to replace all "chars" like > (so called entity)
import html
table = json2html.convert(json=variable)
table = html.unescape(table)
f = open('Table.html', 'w', encoding='utf-8')
f.write(table)
f.close()
# output
webbrowser.open("Table.html")

Related

How can i convert a UTF-16-LE txt file to an ANSI txt file and remove the header in PYTHON?

I have a .txt file in UTF-16-LE encoding .
I want to remove the headers(1st row) and save it in ANSI
I can do it maually but i need to do that for 150 txt files EVERY day
So i wanted to use Python to do it automatically.
But i am stuck ,
i have tried this code but it is not working ,produces an error :
*"return mbcs_encode(input, self.errors)[0]
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character "*
filename = "filetochangecodec.txt"
path = "C:/Users/fallen/Desktop/New folder/"
pathfile = path + filename
coding1 = "utf-16-le"
coding2 = "ANSI"
f= open(pathfile, 'r', encoding=coding1)
content= f.read()
f.close()
f= open(pathfile, 'w', encoding=coding2)
f.write(content)
f.close()
A noble contributer helped me with the solution and i now post it so everyone can benefit and save time.
Instead of trying to write all the content , we make a list with every line of the txt file and then we write them in a new file one by one with the use of " for " .
import os
inpath = r"C:/Users/user/Desktop/insert/"
expath = r"C:/Users/user/Desktop/export/"
encoding1 = "utf-16"
encoding2 = "ansi"
input_filename = "text.txt"
input_pathfile = os.path.join(inpath, input_filename)
output_filename = "new_text.txt"
output_pathfile = os.path.join(expath, output_filename)
with open(input_pathfile, 'r', encoding=encoding1) as file_in:
lines = []
for line in file_in:
lines.append(line)
with open(output_pathfile, 'w', encoding='ANSI') as f:
for line in lines:
f.write(line)

How to call a date within a gzip.open call

I am wanting to write a script where I open a gziped file with 'todays date' in its title.
Here is what I have so far:
todays_date = time.strftime("%Y%m%d") #format time as YYYYMMDD
nextpath = os.getcwd()
service_file = glob.glob(nextpath+"\\"+"shot_*_"+todays_date+"*_vice.gz")
input_file = glob.glob(nextpath+"\\"+"input_file.csv")
myData = gzip.open(service_file, 'rb')
myFile = open(input_file, 'wb') with myFile:
writer = csv.writer(myFile)
writer.writerows(myData)
This was working when I wrote the full path:
myData = gzip.open(D:/Temp/shot_655_20180109121455_vice.gz
myFile = open(D:/Temp/input_file.csv, 'wb') with myFile:
But since I have attempted to change it to make the date variable changeable I get the error:
SyntaxError: invalid syntax
I know I am calling on it wrong somehow but I am stuck and any help would be appreciated.
Thanks
You're using 'with open' incorrectly. It should look like this:
with open(my_file, 'r') as mf:
# do stuff here
this way you don't have to worry about closing it later. Otherwise you can just assign the result of open() to a variable:
mf = open(my_file, 'r')
....
mf.close()
Here's a link to the docs, with more information https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files

How to encode a pre-existing text file to utf-8 in separate file?

I'm attempting to encode a pre-existing text file and write it in utf-8. I've made a menu in which the user is asked for which text file they would like to encode, but after that I am absolutely lost. I was looking at a previous post and I incorporated that code into my code, however I am unsure of how it works or what I'm doing.
Any help would be greatly appreciated!
import codecs
def getMenuSelection():
print "\n"
print "\t\tWhich of the following files would you like to encode?"
print "\n"
print "\t\t================================================"
print "\t\t1. hamletQuote.txt"
print "\t\t2. RandomQuote.txt"
print "\t\t3. WeWillRockYou.txt"
print "\t\t================================================"
print "\t\tq or Q to quit"
print "\t\t================================================"
print ""
selection = raw_input("\t\t")
return selection
again = True
while (again == True):
choice = getMenuSelection()
if choice.lower() == 1 :
with codecs.open(hamletQuote.txt,'r',encoding='utf8') as f:
text = f.read()
with codecs.open(hamletQuote.txt,'w',encoding='utf8') as f:
f.write(text)
if choice.lower() == 2 :
with codecs.open(RandomQuote.txt,'r',encoding='utf8') as f:
text = f.read()
with codecs.open(RandomQuote.txt,'w',encoding='utf8') as f:
f.write(text)
if choice.lower() == 3 :
with codecs.open(WeWillRockYou.txt,'r',encoding='utf8') as f:
text = f.read()
with codecs.open(WeWillRockYou.txt,'w',encoding='utf8') as f:
f.write(text)
elif choice.lower() == "q":
again = False
Your code will work correctly, though you need to make the filenames strings. Your input filename is also the same as the output filename, so the input file will be overwritten. You can fix this by naming the output file something different:
with codecs.open("hamletQuote.txt",'r',encoding='utf8') as f:
text = f.read()
with codecs.open("hamletQuote2.txt",'w',encoding='utf8') as f:
f.write(text)
If your curious how it works, codecs.open opens an encoded file in the given mode; in this case r which means read mode. w refers to write mode. f refers to the file object which has several methods including read() and write() (which you used).
When you use the with statement it simplifies opening the file. It ensures clean-up is always used. Without the with block, you would have to specify f.close() after you have finished working with the file.
Why don't you use the regular open statement and open the file as binary and write the encoded text to utf-8, you will need to open the file as regular read mode since it's not encoded:
with open("hamletQuote.txt", 'r') as read_file:
text = read_file.read()
with open("hamletQuote.txt", 'wb') as write_file:
write_file.write(text.encode("utf-8"))
But if you insist on using codecs, you can do this:
with codecs.open("hamletQuote.txt", 'r') as read_file:
text = read_file.read()
with codecs.open("hamletQuote.txt", 'wb', encoding="utf-8") as write_file:
write_file.write(text.encode("utf-8"))

How to save multiple output in multiple file where each file has a different title coming from an object in python?

I'm scraping rss feed from a web site (http://www.gfrvitale.altervista.org/index.php/autismo-in?format=feed&type=rss).
I have wrote down a script to extract and purifie the text from every of the feed. My main problem is to save each text of each item in a different file, I also need to name each file with it's proper title exctractet from the item.
My code is:
for item in myFeed["items"]:
time_structure=item["published_parsed"]
dt = datetime.fromtimestamp(mktime(time_structure))
if dt>t:
link=item["link"]
response= requests.get(link)
doc=Document(response.text)
doc.summary(html_partial=False)
# extracting text
h = html2text.HTML2Text()
# converting
h.ignore_links = True #ignoro i link
h.skip_internal_links=True #ignoro i link esterni
h.inline_links=True
h.ignore_images=True #ignoro i link alle immagini
h.ignore_emphasis=True
h.ignore_anchors=True
h.ignore_tables=True
testo= h.handle(doc.summary()) #testo estratto
s = doc.title()+"."+" "+testo #contenuto da stampare nel file finale
tit=item["title"]
# save each file with it's proper title
with codecs.open("testo_%s", %tit "w", encoding="utf-8") as f:
f.write(s)
f.close()
The error is:
File "<ipython-input-57-cd683dec157f>", line 34 with codecs.open("testo_%s", %tit "w", encoding="utf-8") as f:
^
SyntaxError: invalid syntax
You need to put the comma after %tit
should be:
#save each file with it's proper title
with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f:
f.write(s)
f.close()
However, if your file name has invalid characters it will return an error (i.e [Errno 22])
You can try this code:
...
tit = item["title"]
tit = tit.replace(' ', '').replace("'", "").replace('?', '') # Not the best way, but it could help for now (will be better to create a list of stop characters)
with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f:
f.write(s)
f.close()
Other way using nltk:
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')
tit = item["title"]
tit = tokenizer.tokenize(tit)
tit = ''.join(tit)
with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f:
f.write(s)
f.close()
First off, you misplaced the comma, it should be after the %tit not before.
Secondly, you don't need to close the file because the with statement that you use, does it automatically for you. And where did the codecs came from? I don't see it anywhere else.... anyway, the correct with statement would be:
with open("testo_%s" %tit, "w", encoding="utf-8") as f:
f.write(s)

Append JSON to file

I am trying to append values to a json file. How can i append the data? I have been trying so many ways but none are working ?
Code:
def all(title,author,body,type):
title = "hello"
author = "njas"
body = "vgbhn"
data = {
"id" : id,
"author": author,
"body" : body,
"title" : title,
"type" : type
}
data_json = json.dumps(data)
#data = ast.literal_eval(data)
#print data_json
if(os.path.isfile("offline_post.json")):
with open('offline_post.json','a') as f:
new = json.loads(f)
new.update(a_dict)
json.dump(new,f)
else:
open('offline_post.json', 'a')
with open('offline_post.json','a') as f:
new = json.loads(f)
new.update(a_dict)
json.dump(new,f)
How can I append data to json file when this function is called?
I suspect you left out that you're getting a TypeError in the blocks where you're trying to write the file. Here's where you're trying to write:
with open('offline_post.json','a') as f:
new = json.loads(f)
new.update(a_dict)
json.dump(new,f)
There's a couple of problems here. First, you're passing a file object to the json.loads command, which expects a string. You probably meant to use json.load.
Second, you're opening the file in append mode, which places the pointer at the end of the file. When you run the json.load, you're not going to get anything because it's reading at the end of the file. You would need to seek to 0 before loading (edit: this would fail anyway, as append mode is not readable).
Third, when you json.dump the new data to the file, it's going to append it to the file in addition to the old data. From the structure, it appears you want to replace the contents of the file (as the new data contains the old data already).
You probably want to use r+ mode, seeking back to the start of the file between the read and write, and truncateing at the end just in case the size of the data structure ever shrinks.
with open('offline_post.json', 'r+') as f:
new = json.load(f)
new.update(a_dict)
f.seek(0)
json.dump(new, f)
f.truncate()
Alternatively, you can open the file twice:
with open('offline_post.json', 'r') as f:
new = json.load(f)
new.update(a_dict)
with open('offline_post.json', 'w') as f:
json.dump(new, f)
This is a different approach, I just wanted to append without reloading all the data. Running on a raspberry pi so want to look after memory. The test code -
import os
json_file_exists = 0
filename = "/home/pi/scratch_pad/test.json"
# remove the last run json data
try:
os.remove(filename)
except OSError:
pass
count = 0
boiler = 90
tower = 78
while count<10:
if json_file_exists==0:
# create the json file
with open(filename, mode = 'w') as fw:
json_string = "[\n\t{'boiler':"+str(boiler)+",'tower':"+str(tower)+"}\n]"
fw.write(json_string)
json_file_exists=1
else:
# append to the json file
char = ""
boiler = boiler + .01
tower = tower + .02
while(char<>"}"):
with open(filename, mode = 'rb+') as f:
f.seek(-1,2)
size=f.tell()
char = f.read()
if char == "}":
break
f.truncate(size-1)
with open(filename, mode = 'a') as fw:
json_string = "\n\t,{'boiler':"+str(boiler)+",'tower':"+str(tower)+"}\n]"
fw.seek(-1, os.SEEK_END)
fw.write(json_string)
count = count + 1

Categories

Resources