Python can't decode website UnicodeEncodeError

Python can't decode website UnicodeEncodeError - python

I'm having problem using decode in python, I'm trying to fetch an IMDB website (example address: http://www.imdb.com/title/tt2216240/):
req = urllib.request.Request(address)
response = urllib.request.urlopen(req)
page = response.read().decode('utf-8', 'ignore')
with open('film.html', 'w') as f:
print(page, file=f)
I get an error:
UnicodeEncodeError: 'charmap' codec can't encode character '\xe6' in position 4132: character maps to <undefined>

Try to explicitly specify utf-8 file encoding:
with open('film.html', 'w', encoding='utf-8') as f:
print(page, file=f)

Did already use requests library ?
Anyway, it made simpler:
#samplerequest.py
import requests
address = "http://www.imdb.com/title/tt2216240/"
req = requests.get(address)
print req.text
print req.encoding

Related

How can I encode html file after read file with ZipFile?

I am reading a zip file from a URL. Inside the zip file, there is an HTML file. After I read the file everything works fine. But when I print the text I am facing a Unicode problem. Python version: 3.8
from zipfile import ZipFile
from io import BytesIO
from bs4 import BeautifulSoup
from lxml import html
content = requests.get("www.url.com")
zf = ZipFile(BytesIO(content.content))
file_name = zf.namelist()[0]
file = zf.open(file_name)
soup = BeautifulSoup(file.read(),'html.parser',from_encoding='utf-8',exclude_encodings='utf-8')
for product in soup.find_all('tr'):
product = product.find_all('td')
if len(product) < 2: continue
print(product[1].text)
I already try to open file and print text with .decode('utf-8') I got following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte
I add from_encoding and exclude_encodings in BeautifulSoup but nothing change and I didn't get an error.
Expected prints:
ÇEŞİTLİ MADDELER TOPLAMI
Tarçın
Fidanı
What I am getting:
ÇEÞÝTLÝ MADDELER TOPLAMI
Tarçýn
Fidaný

I look at the file and the encoding is not utf-8, but iso-8859-9.
Change the encoding and everything will be fine:
soup = BeautifulSoup(file.read(),'html.parser',from_encoding='iso-8859-9')
This will output: ÇEŞİTLİ MADDELER TOPLAMI

Python - Pandas : how to save csv file from url

so I'm trying to get a csv file with requests and save it to my project:
import requests
import pandas as pd
import csv
def get_and_save_countries():
url = 'https://www.trackcorona.live/api/countries'
r = requests.get(url)
data = r.json()
data = data["data"]
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="") as f:
title = "location,country_code,latitude,longitude,confirmed,dead,recovered,updated".split(",")
cw = csv.DictWriter(f,title,delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
cw.writeheader()
cw.writerows(data)
I've managed that but when I try this:
get_data.get_and_save_countries()
df = pd.read_csv("corona\\dash_apps\\finished_apps\\apicountries.csv")
I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte
And I have no idea why. Any help is welcome. Thanks.

Try:
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="", encoding ='utf-8') as f:
to explicitly specify the encoding with encoding='utf-8'

When you write to a file, the default encoding is locale.getpreferredencoding(False). On Windows that is usually not UTF-8 and even on Linux the terminal could be configured other than UTF-8. Pandas is defaulting to utf-8, so specify encoding='utf8' as another parameter to open.

JSON import in Python

I would like to import the JSON file located at "https://www.drivy.com/cars/458342/reviews?page=1&paginate_per=6&rel=next" in python.
When I run this:
with open('C:/Users/coppe/Documents/py trials/eval.json') as json_file:
reviews = json.load(json_file)
I get an error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 6776: character maps to <undefined>
Actually this error is due to a special character contained in the html keyvalue. Knowing that this character is an emoticon (a thumb), how can I still import my JSON by ignoring this ?

You need to specify the correct format for the json encoder to use. Most use utf8, therefore use something like:
reviews = json.load(
open("C:/Users/coppe/Documents/py trials/eval.json", encoding="utf8")
)
or
with open('C:/Users/coppe/Documents/py trials/eval.json') as json_file:
reviews = json.load(json_file, encoding="utf8")
Good Luck!

use
open(json_file, encoding="utf8")

How to convert UTF-8 list to String in python

i need to save BeautifulSoup results to .txt file. and i need convert results to string with str() and not worked because list is UTF-8 :
# -*- coding: utf-8 -*-
page_content = soup(page.content, "lxml")
links = page_content.select('h3', class_="LC20lb")
for link in links:
with open("results.txt", 'a') as file:
file.write(str(link) + "\n")
and get this error :
File "C:\Users\omido\AppData\Local\Programs\Python\Python37-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 183-186: character maps to <undefined>

If you want to write to the file as UTF-8 as well, you’ll need to specify that:
with open("results.txt", 'a', encoding='utf-8') as file:
file.write(str(link) + "\n")
and it’s a good idea to only open the file once:
with open("results.txt", 'a', encoding='utf-8') as file:
for link in links:
file.write(str(link) + "\n")
(You can also print(link, file=file).)

Save API unicode response to zip file python

When I hit post API, it is returning a zip file content as an output (which is in unicode form) and I want to save those content in zipfile locally.
How can I save the same?
Trials :
Try 1:
`//variable data containing API response. (i.e data = response.text)
f = open('test.zip', 'wb')
f.write(data.encode('utf8'))
f.close()`
Above code creating zip file. But the file is corrupted one.
Try 2
with zipfile.ZipFile('spam.zip', 'w') as myzip:
myzip.write(data.decode("utf8"))
Above code giving me an error: UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 97: ordinal not in range(128)
Can anyone help me to resolve the same?

I found the answer for above problem. May be someone in future wants the same. So writing answer for my own question.
response.content instead of response.text resolved my problem.
import requests
response = requests.request("POST", <<url>>, <<payload>>, <<headers>>, verify=False)
data = response.content
f = open('test.zip', 'w')
f.write(data)
f.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python can't decode website UnicodeEncodeError - python

Try to explicitly specify utf-8 file encoding: with open('film.html', 'w', encoding='utf-8') as f: print(page, file=f)

Did already use requests library ? Anyway, it made simpler: #samplerequest.py import requests address = "http://www.imdb.com/title/tt2216240/" req = requests.get(address) print req.text print req.encoding

Related

How can I encode html file after read file with ZipFile?

Python - Pandas : how to save csv file from url

JSON import in Python

How to convert UTF-8 list to String in python

Save API unicode response to zip file python

Categories

Resources