Error while trying to reading csv file to Jupyter - python

I am trying to read a csv file to my Jupyter notebook using Python 3
address = 'C:/Users/X/Y/Z/Data.csv'
data = pd.read_csv(address)
This is the message I am getting:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xee in position 0: invalid continuation byte
Any suggestions? I am having trouble understanding what it wants from me to do with the data in order to load it.
Thanks a lot !

Related

Unable to Read a tsv file in pandas. Gives UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 113: invalid start byte

I am trying to read a tsv file (one of many) but it is given me the following error.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 113: invalid start byte
Before this I have read similar other files using the same code.
df = pd.read_csv(fname,sep='\t',error_bad_lines=False)
But for this particular, I am getting an error.
How can I read this file? Hopefully without missing any data.
A suggestion would be to check waht encoding you actually have. Do it this way:
with open('filename.tsv) as f: ### or whatever your etension is
print(f)
from that you'll obtain the encoding. Then,
df=pd.read_csv('filename.tsv', encoding="the encoding that was returned")
It would be nice if you could post lien 113 of you dataset, where the first occurrence of the error occured.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 2: invalid continuation byte

Basically, I was using pandas to read csv files to separate a column which had "Date + Hour" in the format "dd/mm/yy hh".
I had help here trying to write a script to separate the column in 2 different columns.
First of all, this is what the dataset looked like:
The joint field is "FECHA" and I managed to run this code on some of the csv files:
import pandas as pd,os
sal = pd.read_csv('C:/Users/drivasti/Documents/002_Script_Separa_Fecha_Hora/Anexo2_THP_UL.csv')
df=sal.join(sal['FECHA'].str.partition(' ')[[0, 2]]).rename({0: 'DATE', 2: 'HOUR'}, axis=1)
df.to_csv('C:/Users/drivasti/Documents/002_Script_Separa_Fecha_Hora/Anexo2_THP_UL_2.csv',index=False)
And they worked perfectly as seen here:
However, I encountered this error when I tried running another csv file (note that I change the name of the file everytime I have to run it, but they're all csv files):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 2: invalid continuation byte
Now I have tried some of the answers here but none have helped:
UnicodeDecodeError: 'utf-8' codec can't decode byte
'utf-8' codec can't decode byte 0xdb in position 1:
Anyone might know how to parse this as UTF-8? or is it a problem in the field "FECHA"?

UnicodeDecodeError when reading CSV file in Pandas with Python for Bulgarian cyrillic

I receive UnicodeDecodeError when reading CSV file in Pandas with Python.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position
0: invalid start byte.
There aren't any cyrillic symbols in the data. Is it because of setting for Bulgarian cyrillic? Do you know what to set?
I tried with next:
df= pd.read_csv('kgb.csv',header=0,encoding ='cp1251')
But I receive the same error.

Encoding error while trying to read Hindi text from csv file using pandas

I am trying to read Devanagari Text from a csv file using pandas. I am getting an error when using encoding="utf-8". When i changed encoding="latin1", I am getting NaN values.
Please help if someone has already encountered a similar problem or knows how to solve this.
Thanks in advance.
Here is the Error I am getting:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position
31: invalid start byte

load .json into python; UnicodeDecodeError

I am trying to load a json file into python with no success. I have been googling a solution for the past few hours and just cannot seem to get it to load. I have tried to load it using the same json.load('filename') function that has worked for everyone. I keep getting :
"UnicodeDecodeError: 'utf8' codec can't decode byte 0xc2 in postion 124: invalid continuation byte"
Here is the code I am using
import json
json_data = open('myfile.json')
for line in json_data:
data = json.loads(line) <--I get an error at this.
Here is a sample line from my file
{"topic":"security","question":"Putting the Biba-LaPadula Mandatory Access Control Methods to Practise?","excerpt":"Text books on database systems always refer to the two Mandatory Access Control models; Biba for the Integrity objective and Bell-LaPadula for the Secrecy or Confidentiality objective.\n\nText books ...\r\n "}
What is my error if this seems to have worked for everyone in every example I have googled?
Have you tried:
json.loads(line.decode("utf-8"))
Similar question asked here: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2
Edit:
If the above does not work,
json.loads(line.decode("utf-8","ignore"))
will.

Categories

Resources