Error while trying to reading csv file to Jupyter

Error while trying to reading csv file to Jupyter - python

I am trying to read a csv file to my Jupyter notebook using Python 3
address = 'C:/Users/X/Y/Z/Data.csv'
data = pd.read_csv(address)
This is the message I am getting:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xee in position 0: invalid continuation byte
Any suggestions? I am having trouble understanding what it wants from me to do with the data in order to load it.
Thanks a lot !

Related

Unable to Read a tsv file in pandas. Gives UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 113: invalid start byte

I am trying to read a tsv file (one of many) but it is given me the following error.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 113: invalid start byte
Before this I have read similar other files using the same code.
df = pd.read_csv(fname,sep='\t',error_bad_lines=False)
But for this particular, I am getting an error.
How can I read this file? Hopefully without missing any data.

A suggestion would be to check waht encoding you actually have. Do it this way:
with open('filename.tsv) as f: ### or whatever your etension is
print(f)
from that you'll obtain the encoding. Then,
df=pd.read_csv('filename.tsv', encoding="the encoding that was returned")
It would be nice if you could post lien 113 of you dataset, where the first occurrence of the error occured.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 2: invalid continuation byte

Basically, I was using pandas to read csv files to separate a column which had "Date + Hour" in the format "dd/mm/yy hh".
I had help here trying to write a script to separate the column in 2 different columns.
First of all, this is what the dataset looked like:
The joint field is "FECHA" and I managed to run this code on some of the csv files:
import pandas as pd,os
sal = pd.read_csv('C:/Users/drivasti/Documents/002_Script_Separa_Fecha_Hora/Anexo2_THP_UL.csv')
df=sal.join(sal['FECHA'].str.partition(' ')[[0, 2]]).rename({0: 'DATE', 2: 'HOUR'}, axis=1)
df.to_csv('C:/Users/drivasti/Documents/002_Script_Separa_Fecha_Hora/Anexo2_THP_UL_2.csv',index=False)
And they worked perfectly as seen here:
However, I encountered this error when I tried running another csv file (note that I change the name of the file everytime I have to run it, but they're all csv files):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 2: invalid continuation byte
Now I have tried some of the answers here but none have helped:
UnicodeDecodeError: 'utf-8' codec can't decode byte
'utf-8' codec can't decode byte 0xdb in position 1:
Anyone might know how to parse this as UTF-8? or is it a problem in the field "FECHA"?

UnicodeDecodeError when reading CSV file in Pandas with Python for Bulgarian cyrillic

I receive UnicodeDecodeError when reading CSV file in Pandas with Python.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position
0: invalid start byte.
There aren't any cyrillic symbols in the data. Is it because of setting for Bulgarian cyrillic? Do you know what to set?
I tried with next:
df= pd.read_csv('kgb.csv',header=0,encoding ='cp1251')
But I receive the same error.

Encoding error while trying to read Hindi text from csv file using pandas

I am trying to read Devanagari Text from a csv file using pandas. I am getting an error when using encoding="utf-8". When i changed encoding="latin1", I am getting NaN values.
Please help if someone has already encountered a similar problem or knows how to solve this.
Thanks in advance.
Here is the Error I am getting:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position
31: invalid start byte

load .json into python; UnicodeDecodeError

I am trying to load a json file into python with no success. I have been googling a solution for the past few hours and just cannot seem to get it to load. I have tried to load it using the same json.load('filename') function that has worked for everyone. I keep getting :
"UnicodeDecodeError: 'utf8' codec can't decode byte 0xc2 in postion 124: invalid continuation byte"
Here is the code I am using
import json
json_data = open('myfile.json')
for line in json_data:
data = json.loads(line) <--I get an error at this.
Here is a sample line from my file
{"topic":"security","question":"Putting the Biba-LaPadula Mandatory Access Control Methods to Practise?","excerpt":"Text books on database systems always refer to the two Mandatory Access Control models; Biba for the Integrity objective and Bell-LaPadula for the Secrecy or Confidentiality objective.\n\nText books ...\r\n "}
What is my error if this seems to have worked for everyone in every example I have googled?

Have you tried:
json.loads(line.decode("utf-8"))
Similar question asked here: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2
Edit:
If the above does not work,
json.loads(line.decode("utf-8","ignore"))
will.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Error while trying to reading csv file to Jupyter - python

Related

Unable to Read a tsv file in pandas. Gives UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 113: invalid start byte

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 2: invalid continuation byte

UnicodeDecodeError when reading CSV file in Pandas with Python for Bulgarian cyrillic

Encoding error while trying to read Hindi text from csv file using pandas

load .json into python; UnicodeDecodeError

Categories

Resources