I'm trying to return the contents of an image file via a Python Connexion application generated from an OpenAPI v2 spec file using swagger-codegen and the python-flask language setting. In my controller module, I simply do the following:
def file_contents_get(file_id):
file = app.datastore.get_instance().get_file(file_id)
with open(file.path, "rb") as f:
return f.read()
However, this results in the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
What is the proper way to return a file's contents? Note that I don't want the file as an attachment but rather inline.
Related
I have code thats upload SQlite3 file to GitHub(module PyGithub).
import github
with open('server.db', 'r') as file:
content = file.read()
g = github.Github('token')
repo = g.get_user().get_repo("my-repo")
file = repo.get_contents("server.db")
repo.update_file("server.db", "Python Upload", content, file.sha, branch="main")
If you open this file through a text editor, then there will be characters that are not included in UTF-8, since this is a database file. I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd9 in position 99: invalid continuation byte
How i can fix it?
Maybe I can upload the file to GitHub so it is not text-based, like a PNG?
i use this.
f = open("yourtextfile", encoding="utf8")
contents = get_blob_content(repo, branch="main",path_name="yourfile")
repo.delete_file("yourfile", "test title", contents.sha)
repo.create_file("yourfile", "test title", f.read())
and this def
def get_blob_content(repo, branch, path_name):
# first get the branch reference
ref = repo.get_git_ref(f'heads/{branch}')
# then get the tree
tree = repo.get_git_tree(ref.object.sha, recursive='/' in path_name).tree
# look for path in tree
sha = [x.sha for x in tree if x.path == path_name]
if not sha:
# well, not found..
return None
# we have sha
return repo.get_git_blob(sha[0])
Hey I'm trying to read gzip file from s3 bucket, and here's my try:
s3client = boto3.client(
's3',
region_name='us-east-1'
)
bucketname = 'wind-obj'
file_to_read = '20190101_0000.gz'
fileobj = s3client.get_object(
Bucket=bucketname,
Key=file_to_read
)
filedata = fileobj['Body'].read()
And now to open gzip file I'm doing like:
gzip.open(filedata,'rb')
but it's throwing me error:
ValueError: embedded null byte
So I'm trying to decode it first:
contents = filedata.decode('utf-8')
which is throwing another error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I have tried decoding it using ISO-8859-1 and then it's getting decode but again while opening gzip file it's giving same error.
Or is there any other way using which I can pull the data from S3 like using URL or something?
gzip.open expects a filename or an already opened file object, but you are passing it the downloaded data directly. Try using gzip.decompress instead:
filedata = fileobj['Body'].read()
uncompressed = gzip.decompress(filedata)
I'm trying to write an HTTP server, but it doesn't matter.
When I try to decode an image data (after writing 'data = file.read()', it gives an error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I opened the file in 'rb' mode.
Other people usually open the file in 'r' mode and that causes the error. But what is the error here?
What is the problem???
def get_content_file(file_path):
"""
Gets a full path to a file and returns the content of it.
file_path must be a valid path.
:param file_path: str (path)
:return: str (data)
"""
print(file_path)
file = open(file_path, 'rb')
data = file.read()
file.close()
return data.decode()
I'll suggest that you confirm the encoding format of 'file_path'. Download and open the file with Notepad++, check the lower right corner; there you can see whether your file was encoded in the compatible format, or if it has the Byte Order Marker or BOM sign, if either of these is true, simply 'save as' -the correct/required format.
I was trying to read a file in python2.7, and it was readen perfectly. The problem that I have is when I execute the same program in Python3.4 and then appear the error:
'utf-8' codec can't decode byte 0xf2 in position 424: invalid continuation byte'
Also, when I run the program in Windows (with python3.4), the error doesn't appear. The first line of the document is:
Codi;Codi_lloc_anonim;Nom
and the code of my program is:
def lectdict(filename,colkey,colvalue):
f = open(filename,'r')
D = dict()
for line in f:
if line == '\n': continue
D[line.split(';')[colkey]] = D.get(line.split(';')[colkey],[]) + [line.split(';')[colvalue]]
f.close
return D
Traduccio = lectdict('Noms_departaments_centres.txt',1,2)
In Python2,
f = open(filename,'r')
for line in f:
reads lines from the file as bytes.
In Python3, the same code reads lines from the file as strings. Python3
strings are what Python2 call unicode objects. These are bytes decoded
according to some encoding. The default encoding in Python3 is utf-8.
The error message
'utf-8' codec can't decode byte 0xf2 in position 424: invalid continuation byte'
shows Python3 is trying to decode the bytes as utf-8. Since there is an error, the file apparently does not contain utf-8 encoded bytes.
To fix the problem you need to specify the correct encoding of the file:
with open(filename, encoding=enc) as f:
for line in f:
If you do not know the correct encoding, you could run this program to simply
try all the encodings known to Python. If you are lucky there will be an
encoding which turns the bytes into recognizable characters. Sometimes more
than one encoding may appear to work, in which case you'll need to check and
compare the results carefully.
# Python3
import pkgutil
import os
import encodings
def all_encodings():
modnames = set(
[modname for importer, modname, ispkg in pkgutil.walk_packages(
path=[os.path.dirname(encodings.__file__)], prefix='')])
aliases = set(encodings.aliases.aliases.values())
return modnames.union(aliases)
filename = '/tmp/test'
encodings = all_encodings()
for enc in encodings:
try:
with open(filename, encoding=enc) as f:
# print the encoding and the first 500 characters
print(enc, f.read(500))
except Exception:
pass
Ok, I did the same as #unutbu tell me. The result was a lot of encodings one of these are cp1250, for that reason I change :
f = open(filename,'r')
to
f = open(filename,'r', encoding='cp1250')
like #triplee suggest me. And now I can read my files.
In my case I can't change encoding because my file is really UTF-8 encoded. But some rows are corrupted and causes the same error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 7092: invalid continuation byte
My decision is to open file in binary mode:
open(filename, 'rb')
I'm trying to serve a PDF file with django 1.7, and this is basically the code that "should" work... it certainly works if I change the content_type to 'text' and download a .tex file with it, but when I try it with a binary file, I get "UnicodeDecodeError at /path/to/file/filename.pdf
'utf-8' codec can't decode byte 0xd0 in position 10: invalid continuation byte"
def download(request, file_name):
file = open('path/to/file/{}'.format(file_name), 'r')
response = HttpResponse(file, content_type='application/pdf')
response['Content-Disposition'] = "attachment; filename={}".format(file_name)
return response
So basically, if I understand correctly, it's trying to serve the file as a UTF-8 encoded text file, instead of a binary file. I've tried to change the content_type to 'application/octet-stream' with similar results. What am I missing?
Try opening the file using binary mode:
file = open('path/to/file/{}'.format(file_name), 'rb')