Pandas read_csv UnicodeDecodeError: invalid start byte

Pandas read_csv UnicodeDecodeError: invalid start byte - python

I am trying to read a .csv file using pandas but get this error.
Line of code:
pd.read_csv(r"C:\Users\antba\Desktop\ffstats.csv")
Error message:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 85: invalid start byte
I've removed the 'r' from the pd.read_csv command but was met with a different error message. Any help would be appreciated, thank you.

Okay, this might be due to encoding. Second Please try google before asking questions on StackOverflow it will help to learn more things.
The reason for your problem is encoding if you know the encoding of CSV try something like this.
pd.read_csv('your_file.csv', encoding = 'ISO-8859-1')

Related

UnicodeDecodeError when reading CSV file in Pandas with Python "'utf-8' codec can't decode byte 0xff in position 0: invalid start byte"

I am having having trouble reading a csv file using read_csv in Pandas. Here's the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I have tried a bunch of different encoding types with the file I am dealing with and none seem to work. The file is from Google's Search Ads 360 product, which says the csv should be in the 'UFT-16' format. Strangely, if I open the file in Excel and save it as a utf-8 format, I can use read_csv normally.
I've tried the solutions to a similar problem here, but they did not work for me. This is the only code I am running:
import pandas as pd
df = pd.read_csv('path/file.csv')
Edit: I read in the file as tab delimited, and that seemed to work. I still don't understand why I got the error I did when I tried to read it in as a normal csv. Any insight into this would be appreciated!!

Try this encoding:
import pandas as pd
df = pd.read_csv('path/file.csv',encoding='cp1252')

Trouble loading CSV files

import xlrd
import pandas as pd
data = pd.read_csv("/Milk_Papers_Estimated_Class.csv")
path ='/Milk_Papers_Estimated_Class.csv'
I experience an error in the following code while trying to run the .csv file.:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 504: invalid continuation byte.
I do not know why I am facing this error.Can anyone help me out with this?

By default the read_csv takes utf-8 as the encoder.
data = pd.read_csv("/Milk_Papers_Estimated_Class.csv", encoding='latin-1')
Try giving the encoding as latin-1
Might work:")

Encoding error while trying to read Hindi text from csv file using pandas

I am trying to read Devanagari Text from a csv file using pandas. I am getting an error when using encoding="utf-8". When i changed encoding="latin1", I am getting NaN values.
Please help if someone has already encountered a similar problem or knows how to solve this.
Thanks in advance.
Here is the Error I am getting:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position
31: invalid start byte

Python encoding issue while reading a file

I am trying to read a file that contains this character in it "ë". The problem is that I can not figure out how to read it no matter what I try to do with the encoding. When I manually look at the file in textedit it is listed as a unknown 8-bit file. If I try changing it to utf-8, utf-16 or anything else it either does not work or messes up the entire file. I tried reading the file just in standard python commands as well as using codecs and can not come up with anything that will read it correctly. I will include a code sample of the read below. Does anyone have any clue what I am doing wrong? This is Python 2.17.10 by the way.
readFile = codecs.open("FileName",encoding='utf-8')
The line I am trying to read is this with nothing else in it.
Aeëtes
Here are some of the errors I get:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x91 in position 0: invalid start byte
UTF-16 stream does not start with BOM"
UnicodeError: UTF-16 stream does not start with BOM -- I know this one is that it is not a utf-16 file.
UnicodeDecodeError: 'ascii' codec can't decode byte 0x91 in position 0: ordinal not in range(128)
If I don't use a Codec the word comes in as Ae?tes which then crashes later in the program. Just to be clear, none of the suggested questions or any other anywhere on the net have pointed to an answer. One other detail that might help is that I am using OS X, not Windows.

Credit for this answer goes to RadLexus for figuring out the proper encoding and also to Mad Physicist who pointed me in the right track even if I did not consider all possible encodings.
The issue is apparently a Mac will convert the .txt file to mac_roman. If you use that encoding it will work perfectly.
This is the line of code that I used to convert it.
readFile = codecs.open("FileName",encoding='mac_roman')

UnicodeDecodeError when reading CSV File into Dataframe

I am using the code below to read a csv file into a dataframe. However, I get the error pandas.parser.CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 and hence I changed pd.read_csv('D:/TRYOUT.csv') to pd.read_csv('D:/TRYOUT.csv', error_bad_lines=False) as suggested here. However, I now get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 1: invalid continuation byte in the same line.
def ExcelFileReader():
mergedf = pd.read_csv('D:/TRYOUT.csv', error_bad_lines=False)
return mergedf

If you're on Windows, you probably need to use pd.read_csv(filename, encoding='latin-1')

I had a similar problem and had to use
utf-8-sig
as the encoding,
The reason i used utf-8-sig is because if you do ever get non-Latin characters it wont be able to deal with it correctly. There are a few ways of getting around the problem, but i guess you can just choose the best that suits your needs.
Hope that helps.

If you would like to exclude the rows providing error and ignore the malformed data then you need to use:
pd.read_csv(file_path, encoding="utf8", error_bad_lines=False, encoding_errors="ignore")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas read_csv UnicodeDecodeError: invalid start byte - python

Okay, this might be due to encoding. Second Please try google before asking questions on StackOverflow it will help to learn more things. The reason for your problem is encoding if you know the encoding of CSV try something like this. pd.read_csv('your_file.csv', encoding = 'ISO-8859-1')

Related

UnicodeDecodeError when reading CSV file in Pandas with Python "'utf-8' codec can't decode byte 0xff in position 0: invalid start byte"

Trouble loading CSV files

Encoding error while trying to read Hindi text from csv file using pandas

Python encoding issue while reading a file

UnicodeDecodeError when reading CSV File into Dataframe

Categories

Resources