I have a .xlsx file and transformed it into a .csv file. Then I'm uploading the .csv file to a Python script I wrote, but an error is thrown.
Since the file is upload through HTTP, I'm accessing it with file = request.files['file']. This is returning a file of type FileStorage. After I'm trying to read it with the StringIO object as follows:
io.StringIO(file.stream.read().decode("UTF8"), newline=None)
I'm getting the following error:
TypeError: initial_value must be str or None, not bytes
I also tried to read the file of FileStorage object this way:
file_data = file.read().decode("utf-8")
and I'm getting the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 97: invalid start byte
Maybe it is interesting to note, that I'm being able to read the file directly, i.e. as a csv file, with the following code:
with open('file_path', 'r') as file:
csv_reader = csv.reader(file, delimiter=";")
...
But since I'm trying to get the file from an upload button, i.e. an input HTML element of type file, as mentioned above, I'm getting a FileStorage object, which I'm not being able to read it.
Anyone has any idea how could I approach this?
Thank you in advance!
It could be that it's not encoded in utf-8. Try decoding it into latin-1 instead:
file_data = file.read().decode("latin-1")
Related
I have a prophet model that I have stored to Google cloud storage folder and now I want to read this model in my code to run prediction pipeline. The model object was stored as JSON using this link https://facebook.github.io/prophet/docs/additional_topics.html
For this, first I download the JSON object locally from the bucket. And then I try to use the model_from_json() method. However, I keep getting below error -
import json
from google.cloud import bigquery, storage
from prophet.serialize import model_to_json, model_from_json
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob('/GCSpath/to/.json')
blob.download_to_filename('mymodel.json') # download the file locally
with open('mymodel.json', 'r') as fin: m = model_from_json(json.load(fin))
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/python/3.7.11/lib/python3.7/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/Users/python/3.7.11/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
I tried the method specified here too but it still does not work - Downloading a file from google cloud storage inside a folder
What is the correct way to save and load Prophet models?
The error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte makes reference that either your filename or some text inside your file is not formated in UTF-8.
This means that you have some special characters inside your file that are not being able to be decoded, for example it could be Cyrillic characters or even some Unicode characters. Check this here for a reference on the difference between Unicode and UTF, you will find some examples too.
I would recommend checking your files in case there are special characters that are not compatible and removing them. It also marks the position where the error was found, so you could try starting from there.
On the other hand, if reviewing file by file and removing characters is not viable, you could also try opening your files in binary.
Instead of using 'r' in the open() command:
with open('mymodel.json', 'r') as fin: m = model_from_json(json.load(fin))
Try using 'rb':
with open('mymodel.json', 'rb') as fin: m = model_from_json(json.load(fin))
This most likely will solve your problem since reading a file in binary would not try to decode bytes to strings, hence no formatting issues. You may find more information about file reading in Python here, and more about how or why to read files in binary here.
i think the error message is quite clear. 'utf-8' can not decode the format of data in your file.
when you use open(), which is a python built-in function, it expects an argument for "encoding" which is set to 'utf-8' by default.
you need to find the encoding preferable for data in your file and provide it as argument to "encoding=your-encoding-code"
Hope this helps!
I'm trying to read a CSV over SFTP using pysftp/Paramiko. My code looks like this:
input_conn = pysftp.Connection(hostname, username, password)
file = input_conn.open("Data.csv")
file_contents = list(csv.reader(file))
But when I do this, I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 23: invalid start byte
I know that this means the file is expected to be in UTF-8 encoding but isn't. The strange thing is, if I download the file and then use my code to open the file, I can specify the encoding as "macroman" and get no error:
with open("Data.csv", "r", encoding="macroman") as csvfile:
file_contents = list(csv.reader(csvfile))
The Paramiko docs say that the encoding of a file is meaningless over SFTP because it treats all files as bytes – but then, how can I get Python's CSV module to recognize the encoding if I use Paramiko to open the file?
If the file is not huge, so it's not a problem to have it loaded twice into the memory, you can download and convert the contents in memory:
with io.BytesIO() as bio:
input_conn.getfo("Data.csv", bio)
bio.seek(0)
with io.TextIOWrapper(bio, encoding='macroman') as f:
file_contents = list(csv.reader(f))
Partially based on Convert io.BytesIO to io.StringIO to parse HTML page.
I have a function who reads MemoryObject of file. It is reading most of the files but for one file it is raising UnicodeDecodeError.
here is my code
def read(file):
"""
:param file: File Memory Object (submitted from POST)
:return: File Iterable object
"""
file = StringIO(file.read().decode())
return csv.DictReader(file, delimiter=',')
File which raise no issue.
File which is rasing issue.
Complete error is as follows : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte
In other question where user have asked questions of similar errors. they are using open() to streamline the file but I already have streamline MemoryObject so can't use open()
Your file is already opened in binary mode: decode is a method of bytes, not str.
For your problem, the encoding and errors parameter of bytes.decode works the same as for open. You can apply the appropriate encoding, or ignore errors:
def read(file, encoding: str = 'utf-8', errors: str = 'strict'):
"""
:param file: File Memory Object (submitted from POST)
:return: File Iterable object
"""
file = StringIO(file.read().decode(encoding=encoding, errors=errors))
return csv.DictReader(file, delimiter=',')
Note that you either must know the encoding, or suppress errors by ignoring them. You can try different encodings to find one that works, but in the end you must know what your data means.
I am trying to do this:
fh = request.FILES['csv']
fh = io.StringIO(fh.read().decode())
reader = csv.DictReader(fh, delimiter=";")
This is failing always with the error in title and I spent almost 8 hours on this.
here is my understanding:
I am using python3, so file fh is in bytes. I am encoding it into string and putting it in memory via StringIO.
with csv.DictReader() trying to read it as dict into memory. It is failing here:
also tried with io.StringIO(fh.read().decode('utf-8')), but same error.
what am I missing? :/
The error is because there is some non-ASCII character in the file and it can't be encoded/decoded. One simple way to avoid this error is to encode/decode such strings with encode()/decode() function as follows (if a is the string with non-ASCII character):
a.decode('utf-8')
Also, you could try opening the file as:
with open('filename', 'r', encoding = 'utf-8') as f:
your code using f as file pointer
use 'rb' if your file is binary.
I am trying to parse multiple gpx files stored in a directory with gpxpy in Python and create a pandas data frame.
Here is my code:
import gpxpy
import os
# Open the file in read mode and parse it
gpx_dir = r'/Users/Gav/GPX Data/'
for filename in os.listdir(gpx_dir):
gpx_file = open(filename, 'r')
gpx = gpxpy.parse(gpx_file)
I am getting the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)
I know the gpx file is fine as I am able to open it and parse it as a single file, but as soon as I try to open multiple gpx files it gives this error.
ok after lots of digging around fixed the problem myself...turns out there was .DS_store file in my data folder which was a hidden and auto generated file and it caused the issue. I was able to fix the problem after removing it.