Convert file into BytesIO object using python - python

I have a file and want to convert it into BytesIO object so that it can be stored in database's varbinary column.
Please can anyone help me convert it using python.
Below is my code:
f = open(filepath, "rb")
print(f.read())
myBytesIO = io.BytesIO(f)
myBytesIO.seek(0)
print(type(myBytesIO))

Opening a file with open and mode read-binary already gives you a Binary I/O object.
Documentation:
The easiest way to create a binary stream is with open() with 'b' in the mode string:
f = open("myfile.jpg", "rb")
So in normal circumstances, you'd be fine just passing the file handle wherever you need to supply it. If you really want/need to get a BytesIO instance, just pass the bytes you've read from the file when creating your BytesIO instance like so:
from io import BytesIO
with open(filepath, "rb") as fh:
buf = BytesIO(fh.read())
This has the disadvantage of loading the entire file into memory, which might be avoidable if the code you're passing the instance to is smart enough to stream the file without keeping it in memory. Note that the example uses open as a context manager that will reliably close the file, even in case of errors.

Related

Python--best way to use "open" command with in-memory str

I have a library I need to call that takes a local file path as input and runs open(local_path, 'rb'). However, I don't have a local file--I have an in memory text string. Right now I am writing that to a temp file and passing that, but it seems wasteful. Is there a better way to do this, given that I need to be able to run open(local_path, 'rb') on it?
Current code:
text = "Some text"
temp = tempfile.TemporaryFile(delete=False)
temp.write(bytes(text, 'UTF-8'))
temp.seek(0)
temp.close()
#call external lib here, passing in temp.name as the local_path input
Later, inside the lib I need to use (I can't edit this):
with open(local_path, 'rb') as content_file:
file_content = content_file.read()
Since the function you call in turn calls open() with the passed parameter, you must give it a str or a PathLike. This means you basically need a file which exists in the file system. You won't be able to pass an in-memory object like I was originally thinking.
Original answer:
I suggest looking at the io package. Specifically, StringIO provides a file-like wrapper on an in-memory string object. If you need binary, then try BytesIO.

Convert bytes to a file object in python

I have a small application that reads local files using:
open(diefile_path, 'r') as csv_file
open(diefile_path, 'r') as file
and also uses linecache module
I need to expand the use to files that send from a remote server.
The content that is received by the server type is bytes.
I couldn't find a lot of information about handling IOBytes type and I was wondering if there is a way that I can convert the bytes chunk to a file-like object.
My goal is to use the API is specified above (open,linecache)
I was able to convert the bytes into a string using data.decode("utf-8"),
but I can't use the methods above (open and linecache)
a small example to illustrate
data = 'b'First line\nSecond line\nThird line\n'
with open(data) as file:
line = file.readline()
print(line)
output:
First line
Second line
Third line
can it be done?
open is used to open actual files, returning a file-like object. Here, you already have the data in memory, not in a file, so you can instantiate the file-like object directly.
import io
data = b'First line\nSecond line\nThird line\n'
file = io.StringIO(data.decode())
for line in file:
print(line.strip())
However, if what you are getting is really just a newline-separated string, you can simply split it into a list directly.
lines = data.decode().strip().split('\n')
The main difference is that the StringIO version is slightly lazier; it has a smaller memory foot print compared to the list, as it splits strings off as requested by the iterator.
The answer above that using StringIO would need to specify an encoding, which may cause wrong conversion.
from Python Documentation using BytesIO:
from io import BytesIO
f = BytesIO(b"some initial binary data: \x00\x01")

Open a TemporaryFile using open()

I'm trying to interface with an existing library that uses the built in open() function to read a .json file using either a str or bytes object representing a path, or an object implementing the os.PathLike protocol.
My function generate a dictionary which is converted to json using json.dump(), but I'm not sure how to pass that to the existing function which expects a file path.
I was thinking something like this might work, but I'm not sure how to get a os.PathLike object of a TemporaryFile.
import tempfile
temp_file = tempfile.TemporaryFile('w')
json.dump('{"test": 1}', fp=temp_file)
file = open(temp_file.path(), 'r')
Create a NamedTemporaryFile() object instead; it has a .name attribute you can pass on to the function:
from tempfile import NamedTemporaryFile
with NamedTemporaryFile('w') as jsonfile:
json.dump('{"test": 1}', jsonfile)
jsonfile.flush() # make sure all data is flushed to disk
# pass the filename to something that expects a string
open(jsonfile.name, 'r')
Opening an already-open file object does have issues on Windows (you are not allowed to); there you'd have to close the file object first (making sure to disable delete-on-close), and delete it manually afterwards:
from tempfile import NamedTemporaryFile
import os
jsonfile = NamedTemporaryFile('w', delete=False)
try:
with jsonfile:
json.dump('{"test": 1}', jsonfile)
# pass the filename to something that expects a string
open(jsonfile.name, 'r')
finally:
os.unlink(jsonfile.name)
The with statement causes the file to be closed when the suite exits (so by the time you reach the open() call).

Django: uploaded file is binary. Is it possible to change to utf? So readline() returns unicode rather than bytes

Uploading a file in Django (1.7) using Python 3:
f = form.files['file']
f.__repr__()
outputs
<InMemoryUploadedFile: index.html (text/html)>
If I call f.readline() I get bytes back.
Normally that would be okay, I could just read the file and decode it, however in this case I'm passing the file on to another function that expects to call readline() on the parameter it receives, and readline() needs to return unicode rather than bytes.
Is it possible to set encoding or such on an instance of InMemoryUploadedFile, so readline would return unicode rather than bytes? Or do I have to use StringIO to first read in the entire file and then pass the instance of StringIO to my function?
The general way to handle this may be to write a custom upload handler and tell Django to use it. But I've never done this, so I'm not sure.
But a simple approach would be to just wrap the underlying file object. (If you use TextIOWrapper instead of StringIO you shouldn't need to worry about the overhead.)
from io import TextIOWrapper
f = form.files['file']
text_f = TextIOWrapper(f.file, encoding='utf-8')

Python urllib2 Images Distorted

I'm making a program using the website http://placekitten.com, but I've run into a bit of a problem. Using this:
im = urllib2.urlopen(url).read()
f = open('kitten.jpeg', 'w')
f.write(im)
f.close()
The image turns out distorted with mismatched colors, like this:
http://imgur.com/zVg64Kn.jpeg
I was wondering if there was an alternative to extracting images with urllib2. If anyone could help, that would be great!
You need to open the file in binary mode:
f = open('kitten.jpeg', 'wb')
Python will otherwise translate line endings to the native platform form, a transformation that breaks binary data, as documented for the open() function:
The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability.
When copying data from a URL to a file, you could use shutil.copyfileob() to handle streaming efficiently:
from shutil import copyfileobj
im = urllib2.urlopen(url)
with open('kitten.jpeg', 'wb') as out:
copyfileobj(im, out)
This will read data in chunks, avoiding filling memory with large blobs of binary data. The with statement handles closing the file object for you.
Change
f = open('kitten.jpeg', 'w')
to read
f = open('kitten.jpeg', 'wb')
See http://docs.python.org/2/library/functions.html#open for more information. What's happening is that the newlines in the jpeg are getting modified in the process of saving, and opening as a binary file will prevent this.
If you're using Windows, you have to open the file in binary mode:
f = open('kitten.jpeg', 'wb')
Or more Pythonically:
import urllib2
url = 'http://placekitten.com.s3.amazonaws.com/homepage-samples/200/140.jpg'
image = urllib2.urlopen(url).read()
with open('kitten.jpg', 'wb') as handle:
handle.write(image)

Categories

Resources