I'm having problem using decode in python, I'm trying to fetch an IMDB website (example address: http://www.imdb.com/title/tt2216240/):
req = urllib.request.Request(address)
response = urllib.request.urlopen(req)
page = response.read().decode('utf-8', 'ignore')
with open('film.html', 'w') as f:
print(page, file=f)
I get an error:
UnicodeEncodeError: 'charmap' codec can't encode character '\xe6' in position 4132: character maps to <undefined>
Try to explicitly specify utf-8 file encoding:
with open('film.html', 'w', encoding='utf-8') as f:
print(page, file=f)
Did already use requests library ?
Anyway, it made simpler:
#samplerequest.py
import requests
address = "http://www.imdb.com/title/tt2216240/"
req = requests.get(address)
print req.text
print req.encoding
Related
I am reading a zip file from a URL. Inside the zip file, there is an HTML file. After I read the file everything works fine. But when I print the text I am facing a Unicode problem. Python version: 3.8
from zipfile import ZipFile
from io import BytesIO
from bs4 import BeautifulSoup
from lxml import html
content = requests.get("www.url.com")
zf = ZipFile(BytesIO(content.content))
file_name = zf.namelist()[0]
file = zf.open(file_name)
soup = BeautifulSoup(file.read(),'html.parser',from_encoding='utf-8',exclude_encodings='utf-8')
for product in soup.find_all('tr'):
product = product.find_all('td')
if len(product) < 2: continue
print(product[1].text)
I already try to open file and print text with .decode('utf-8') I got following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte
I add from_encoding and exclude_encodings in BeautifulSoup but nothing change and I didn't get an error.
Expected prints:
ÇEŞİTLİ MADDELER TOPLAMI
Tarçın
Fidanı
What I am getting:
ÇEÞÝTLÝ MADDELER TOPLAMI
Tarçýn
Fidaný
I look at the file and the encoding is not utf-8, but iso-8859-9.
Change the encoding and everything will be fine:
soup = BeautifulSoup(file.read(),'html.parser',from_encoding='iso-8859-9')
This will output: ÇEŞİTLİ MADDELER TOPLAMI
so I'm trying to get a csv file with requests and save it to my project:
import requests
import pandas as pd
import csv
def get_and_save_countries():
url = 'https://www.trackcorona.live/api/countries'
r = requests.get(url)
data = r.json()
data = data["data"]
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="") as f:
title = "location,country_code,latitude,longitude,confirmed,dead,recovered,updated".split(",")
cw = csv.DictWriter(f,title,delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
cw.writeheader()
cw.writerows(data)
I've managed that but when I try this:
get_data.get_and_save_countries()
df = pd.read_csv("corona\\dash_apps\\finished_apps\\apicountries.csv")
I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte
And I have no idea why. Any help is welcome. Thanks.
Try:
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="", encoding ='utf-8') as f:
to explicitly specify the encoding with encoding='utf-8'
When you write to a file, the default encoding is locale.getpreferredencoding(False). On Windows that is usually not UTF-8 and even on Linux the terminal could be configured other than UTF-8. Pandas is defaulting to utf-8, so specify encoding='utf8' as another parameter to open.
I would like to import the JSON file located at "https://www.drivy.com/cars/458342/reviews?page=1&paginate_per=6&rel=next" in python.
When I run this:
with open('C:/Users/coppe/Documents/py trials/eval.json') as json_file:
reviews = json.load(json_file)
I get an error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 6776: character maps to <undefined>
Actually this error is due to a special character contained in the html keyvalue. Knowing that this character is an emoticon (a thumb), how can I still import my JSON by ignoring this ?
You need to specify the correct format for the json encoder to use. Most use utf8, therefore use something like:
reviews = json.load(
open("C:/Users/coppe/Documents/py trials/eval.json", encoding="utf8")
)
or
with open('C:/Users/coppe/Documents/py trials/eval.json') as json_file:
reviews = json.load(json_file, encoding="utf8")
Good Luck!
use
open(json_file, encoding="utf8")
i need to save BeautifulSoup results to .txt file. and i need convert results to string with str() and not worked because list is UTF-8 :
# -*- coding: utf-8 -*-
page_content = soup(page.content, "lxml")
links = page_content.select('h3', class_="LC20lb")
for link in links:
with open("results.txt", 'a') as file:
file.write(str(link) + "\n")
and get this error :
File "C:\Users\omido\AppData\Local\Programs\Python\Python37-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 183-186: character maps to <undefined>
If you want to write to the file as UTF-8 as well, you’ll need to specify that:
with open("results.txt", 'a', encoding='utf-8') as file:
file.write(str(link) + "\n")
and it’s a good idea to only open the file once:
with open("results.txt", 'a', encoding='utf-8') as file:
for link in links:
file.write(str(link) + "\n")
(You can also print(link, file=file).)
When I hit post API, it is returning a zip file content as an output (which is in unicode form) and I want to save those content in zipfile locally.
How can I save the same?
Trials :
Try 1:
`//variable data containing API response. (i.e data = response.text)
f = open('test.zip', 'wb')
f.write(data.encode('utf8'))
f.close()`
Above code creating zip file. But the file is corrupted one.
Try 2
with zipfile.ZipFile('spam.zip', 'w') as myzip:
myzip.write(data.decode("utf8"))
Above code giving me an error: UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 97: ordinal not in range(128)
Can anyone help me to resolve the same?
I found the answer for above problem. May be someone in future wants the same. So writing answer for my own question.
response.content instead of response.text resolved my problem.
import requests
response = requests.request("POST", <<url>>, <<payload>>, <<headers>>, verify=False)
data = response.content
f = open('test.zip', 'w')
f.write(data)
f.close()