Writing to a file but getting ASCII encoding error - python

I am trying to scrape data from a table on a website.
page_soup = soup(html, 'html.parser')
stat_table = page_soup.find_all('table')
stat_table = stat_table[0]
with open ('stats.txt','w') as q:
for row in stat_table.find_all('tr'):
for cell in row.find_all('td'):
q.write(cell.text)
However, when I try to write the file, I get this error message: 'ascii' codec can't encode character '\xa0' in position 19: ordinal not in range(128).
I understand that it should be encoded with .encode('utf-8'), but
cell.text.encode('utf-8')
doesn't work.
Any help would be greatly appreciated. Using Python 3.6

The file encoding is determined from the current Environment, in this case assuming ascii. You can specify the file encoding directly use:
with open ('stats.txt', 'w', encoding='utf8') as q:
pass

Related

Python - Pandas : how to save csv file from url

so I'm trying to get a csv file with requests and save it to my project:
import requests
import pandas as pd
import csv
def get_and_save_countries():
url = 'https://www.trackcorona.live/api/countries'
r = requests.get(url)
data = r.json()
data = data["data"]
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="") as f:
title = "location,country_code,latitude,longitude,confirmed,dead,recovered,updated".split(",")
cw = csv.DictWriter(f,title,delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
cw.writeheader()
cw.writerows(data)
I've managed that but when I try this:
get_data.get_and_save_countries()
df = pd.read_csv("corona\\dash_apps\\finished_apps\\apicountries.csv")
I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte
And I have no idea why. Any help is welcome. Thanks.
Try:
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="", encoding ='utf-8') as f:
to explicitly specify the encoding with encoding='utf-8'
When you write to a file, the default encoding is locale.getpreferredencoding(False). On Windows that is usually not UTF-8 and even on Linux the terminal could be configured other than UTF-8. Pandas is defaulting to utf-8, so specify encoding='utf8' as another parameter to open.

JSON import in Python

I would like to import the JSON file located at "https://www.drivy.com/cars/458342/reviews?page=1&paginate_per=6&rel=next" in python.
When I run this:
with open('C:/Users/coppe/Documents/py trials/eval.json') as json_file:
reviews = json.load(json_file)
I get an error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 6776: character maps to <undefined>
Actually this error is due to a special character contained in the html keyvalue. Knowing that this character is an emoticon (a thumb), how can I still import my JSON by ignoring this ?
You need to specify the correct format for the json encoder to use. Most use utf8, therefore use something like:
reviews = json.load(
open("C:/Users/coppe/Documents/py trials/eval.json", encoding="utf8")
)
or
with open('C:/Users/coppe/Documents/py trials/eval.json') as json_file:
reviews = json.load(json_file, encoding="utf8")
Good Luck!
use
open(json_file, encoding="utf8")

How to extract String from a Unicoded JSONObject in Python?

I'm getting the below error when I try to parse a String with Unicodes like ' symbol and Emojis, etc :
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f33b' in position 19: ordinal not in range(128)
Sample Object:
{"user":{"name":"\u0e2a\u0e31\u0e48\u0e07\u0e14\u0e48\u0e27\u0e19 \u0e2b\u0e21\u0e14\u0e44\u0e27 \u0e40\u0e14\u0e23\u0e2a\u0e41\u0e1f\u0e0a\u0e31\u0e48\u0e19\u0e21\u0e32\u0e43\u0e2b\u0e21\u0e48 \u0e23\u0e32\u0e04\u0e32\u0e40\u0e1a\u0e32\u0e46 \u0e2a\u0e48\u0e07\u0e17\u0e31\u0e48\u0e27\u0e44\u0e17\u0e22 \u0e44\u0e14\u0e49\u0e02\u0e2d\u0e07\u0e0a\u0e31\u0e27\u0e23\u0e4c\u0e08\u0e49\u0e32 \u0e2a\u0e19\u0e43\u0e08\u0e15\u0e34\u0e14\u0e15\u0e48\u0e2d\u0e2a\u0e2d\u0e1a\u0e16\u0e32\u0e21 Is it","tag":"XYZ"}}
I'm able to extract tag value, but I'm unable to extract name value.
Here is my code:
dict = json.loads(json_data)
print('Tag - 'dict['user']['tag'])
print('Name - 'dict['user']['name'])
You can save the data in CSV file format which could also be opened using Excel. When you open a file in this way: open(filename, "w") then you can only store ASCII characters, but if you try to store Unicode data this way, you would get UnicodeEncodeError. In order for you to store Unicode data, you need to open the file with UTF-8 encoding.
mydict = json.loads(json_data) # or whatever dictionary it is...
# Open the file with UTF-8 encoding, most important step
f = open("userdata.csv", "w", encoding='utf-8')
f.write(mydict['user']['name'] + ", " + mydict['user']['tag'] + "\n")
f.close()
Feel free to change the code based on the data you have.
That's it...

Python Selenium page cannot save source code encode error

I am trying to save source code with Selenium into .txt, but the .txt file stays empty.
When I tryed to print the source code with command:
htmlcode = driver.page_source
(driver.page_source).encode('utf-8'))
print(htmlcode)
It will print the source code but then it kills the script with error:
File "C:\Python27\lib\encodings\cp850.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u20ac' in position 16329: character maps to <undefined>
Problem solved! After 3 hours searching ':-)
html = driver.page_source
f = open('savepage.html', 'w')
f.write(html.encode('utf-8'))
f.close()

Python CSV cant encode character

Using Python an Beautiful Soup, I have created a script that takes the name, address and phone number of businesses off a website and the output is saved into three columns of a CSV file.
The script works fine but it stops when I get to a business name that is as follows:
u'\nLevel 12, 280 George Street SYDNEY\xa0 NSW\xa0 2000. . Sydney. NSW 2000\n'
The problem is the "xa0" part. The error message states:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 35: ordinal not in range(128)
I have a vague idea of what this error means but have no idea how to deal with it. Any ideas?
Thanks
Edit:
My script is as follows:
import bs4
import requests
page = requests.get('http://accountantlist.com.au/x123-Accountants-in-Sydney.aspx?Page=0')
soup = bs4.BeautifulSoup(page.content)
for company in soup.select('table#ctl00_ContentPlaceHolder1_dgLawyers tr > td > table'):
name = company.a.text
address = company.find_all('tr')[1].text
phone = company.tr.find_all('td')[1].text
with open('/home/kwal0203/Desktop/eggs.csv', 'a') as csvfile:
s = csv.writer(csvfile)
s.writerow([name,address,phone])
You need to encode it to utf-8 format while writing to csv file as Python's built-in csv doesn't supports unicode.
def remove_non_ascii(text):
return ''.join(i for i in text if ord(i)<128)
name = remove_non_ascii(company.a.text)
address = remove_non_ascii(company.find_all('tr')[1].text)
phone = remove_non_ascii(company.tr.find_all('td')[1].text)
with open('/home/kwal0203/Desktop/eggs.csv', 'a') as csvfile:
s = csv.writer(csvfile)
s.writerow([data.encode("utf-8") for data in [name,address,phone]])
Or you can install unicodecsv which supports unicode by default.
You can install it like this.
pip install unicodecsv

Categories

Resources