Trouble loading CSV files - python

import xlrd
import pandas as pd
data = pd.read_csv("/Milk_Papers_Estimated_Class.csv")
path ='/Milk_Papers_Estimated_Class.csv'
I experience an error in the following code while trying to run the .csv file.:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 504: invalid continuation byte.
I do not know why I am facing this error.Can anyone help me out with this?

By default the read_csv takes utf-8 as the encoder.
data = pd.read_csv("/Milk_Papers_Estimated_Class.csv", encoding='latin-1')
Try giving the encoding as latin-1
Might work:")

Related

Pandas read_csv UnicodeDecodeError: invalid start byte

I am trying to read a .csv file using pandas but get this error.
Line of code:
pd.read_csv(r"C:\Users\antba\Desktop\ffstats.csv")
Error message:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 85: invalid start byte
I've removed the 'r' from the pd.read_csv command but was met with a different error message. Any help would be appreciated, thank you.
Okay, this might be due to encoding. Second Please try google before asking questions on StackOverflow it will help to learn more things.
The reason for your problem is encoding if you know the encoding of CSV try something like this.
pd.read_csv('your_file.csv', encoding = 'ISO-8859-1')

UnicodeDecodeError when reading CSV file in Pandas with Python "'utf-8' codec can't decode byte 0xff in position 0: invalid start byte"

I am having having trouble reading a csv file using read_csv in Pandas. Here's the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I have tried a bunch of different encoding types with the file I am dealing with and none seem to work. The file is from Google's Search Ads 360 product, which says the csv should be in the 'UFT-16' format. Strangely, if I open the file in Excel and save it as a utf-8 format, I can use read_csv normally.
I've tried the solutions to a similar problem here, but they did not work for me. This is the only code I am running:
import pandas as pd
df = pd.read_csv('path/file.csv')
Edit: I read in the file as tab delimited, and that seemed to work. I still don't understand why I got the error I did when I tried to read it in as a normal csv. Any insight into this would be appreciated!!
Try this encoding:
import pandas as pd
df = pd.read_csv('path/file.csv',encoding='cp1252')

UnicodeEncodeError: 'ascii' codec can't encode character error

I am reading some files from google cloud storage using python
spark = SparkSession.builder.appName('aggs').getOrCreate()
df = spark.read.option("sep","\t").option("encoding", "UTF-8").csv('gs://path/', inferSchema=True, header=True,encoding='utf-8')
df.count()
df.show(10)
However, I keep getting an error that complains about the df.show(10) line:
df.show(10)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line
350, in show
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 162: ordinal not in range(128)
I googled and found this seems to be a common error and the solution should be added in the encoding of "UTF-8" to the spark.read.option, as I already did. Since this doesn't help, I am still getting this error, could experts help? Thanks in advance.
How about exporting PYTHONIOENCODING before running your Spark job:
export PYTHONIOENCODING=utf8
For Python 3.7+ the following should also do the trick:
sys.stdout.reconfigure(encoding='utf-8')
For Python 2.x you can use the following:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

Encoding error while trying to read Hindi text from csv file using pandas

I am trying to read Devanagari Text from a csv file using pandas. I am getting an error when using encoding="utf-8". When i changed encoding="latin1", I am getting NaN values.
Please help if someone has already encountered a similar problem or knows how to solve this.
Thanks in advance.
Here is the Error I am getting:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position
31: invalid start byte

Python Codecs Unicode Decode Error

I was using the codecs module for reading a text file and extracting information from it. My code is as follows:
import codecs
handle = codecs.open('try.txt',encoding="utf-8")
f1 = handle.read()
# Do further stuff with f1
However, it is giving me the following error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xea in position 628: invalid continuation byte
Can anybody help me on this? Thanks in advance! :)

Categories

Resources