Open a TemporaryFile using open() - python

I'm trying to interface with an existing library that uses the built in open() function to read a .json file using either a str or bytes object representing a path, or an object implementing the os.PathLike protocol.
My function generate a dictionary which is converted to json using json.dump(), but I'm not sure how to pass that to the existing function which expects a file path.
I was thinking something like this might work, but I'm not sure how to get a os.PathLike object of a TemporaryFile.
import tempfile
temp_file = tempfile.TemporaryFile('w')
json.dump('{"test": 1}', fp=temp_file)
file = open(temp_file.path(), 'r')

Create a NamedTemporaryFile() object instead; it has a .name attribute you can pass on to the function:
from tempfile import NamedTemporaryFile
with NamedTemporaryFile('w') as jsonfile:
json.dump('{"test": 1}', jsonfile)
jsonfile.flush() # make sure all data is flushed to disk
# pass the filename to something that expects a string
open(jsonfile.name, 'r')
Opening an already-open file object does have issues on Windows (you are not allowed to); there you'd have to close the file object first (making sure to disable delete-on-close), and delete it manually afterwards:
from tempfile import NamedTemporaryFile
import os
jsonfile = NamedTemporaryFile('w', delete=False)
try:
with jsonfile:
json.dump('{"test": 1}', jsonfile)
# pass the filename to something that expects a string
open(jsonfile.name, 'r')
finally:
os.unlink(jsonfile.name)
The with statement causes the file to be closed when the suite exits (so by the time you reach the open() call).

Related

Is there a Python analog to matlab's "save()" function?

Say I ran some code that produces multiple arrays as its output. How might I save the entire output in one go as in matlab?
In matlab I'd simply say save(data) -> load('data').
Apologies if this is a basic quesiton.
How to save a python object:
To save objects to file in Python, you can use pickle:
Import the pickle module.
Get a file handle in write mode that points
to a file path.
Use pickle.dump to write the object that we want to
save to file via that file handle.
How to use it:
import pickle
object = Object()
filehandler = open(filename, 'w')
pickle.dump(object, filehandler)
How to load a python object:
Import the pickle module.
Get a file handle in read mode that points to a file path that has a file that contains the serialized form of a Python object.
Use pickle.load to read the object from the file that contains the serialized form of a Python object via the file handle.
How to use it:
import pickle
filehandler = open(filename, 'r')
object = pickle.load(filehandler)
Extras : save multiple objects at once
Obviously, using a list you can also store multiple objects at once:
object_a = "foo"
object_b = "bar"
object_c = "baz"
objects_list = [object_a , object_b , object_c ]
file_name = "my_objects.pkl"
open_file = open(file_name, "wb")
pickle.dump(objects_list, open_file)
open_file.close()

Python--best way to use "open" command with in-memory str

I have a library I need to call that takes a local file path as input and runs open(local_path, 'rb'). However, I don't have a local file--I have an in memory text string. Right now I am writing that to a temp file and passing that, but it seems wasteful. Is there a better way to do this, given that I need to be able to run open(local_path, 'rb') on it?
Current code:
text = "Some text"
temp = tempfile.TemporaryFile(delete=False)
temp.write(bytes(text, 'UTF-8'))
temp.seek(0)
temp.close()
#call external lib here, passing in temp.name as the local_path input
Later, inside the lib I need to use (I can't edit this):
with open(local_path, 'rb') as content_file:
file_content = content_file.read()
Since the function you call in turn calls open() with the passed parameter, you must give it a str or a PathLike. This means you basically need a file which exists in the file system. You won't be able to pass an in-memory object like I was originally thinking.
Original answer:
I suggest looking at the io package. Specifically, StringIO provides a file-like wrapper on an in-memory string object. If you need binary, then try BytesIO.

Convert file into BytesIO object using python

I have a file and want to convert it into BytesIO object so that it can be stored in database's varbinary column.
Please can anyone help me convert it using python.
Below is my code:
f = open(filepath, "rb")
print(f.read())
myBytesIO = io.BytesIO(f)
myBytesIO.seek(0)
print(type(myBytesIO))
Opening a file with open and mode read-binary already gives you a Binary I/O object.
Documentation:
The easiest way to create a binary stream is with open() with 'b' in the mode string:
f = open("myfile.jpg", "rb")
So in normal circumstances, you'd be fine just passing the file handle wherever you need to supply it. If you really want/need to get a BytesIO instance, just pass the bytes you've read from the file when creating your BytesIO instance like so:
from io import BytesIO
with open(filepath, "rb") as fh:
buf = BytesIO(fh.read())
This has the disadvantage of loading the entire file into memory, which might be avoidable if the code you're passing the instance to is smart enough to stream the file without keeping it in memory. Note that the example uses open as a context manager that will reliably close the file, even in case of errors.

saving string to tarfile in python 3 throws unexpected end of data error

I'm trying to open a tar.gz file full of json data, extract the text from them, and save them back to tar.gz. Here's my code in Python 3 thus far.
from get_clean_text import get_cleaned_text # my own module
import tarfile
import os
import json
from io import StringIO
from pathlib import Path
def make_clean_gzip(inzip):
outzip = "extracted/clean-" + inzip
with tarfile.open(inzip, 'r:gz') as infile, tarfile.open(outzip, 'w:gz') as outfile:
jfiles = infile.getnames()
for j in jfiles:
dirtycase = json.loads(infile.extractfile(j).read().decode("utf-8"))
cleaned = get_cleaned_text(dirtycase)
newtarfile = tarfile.TarInfo(Path(j).stem + ".txt")
fobj = StringIO()
fobj.write(cleaned)
newtarfile.size = fobj.tell()
outfile.addfile(newtarfile, fobj)
However, this throws an OSError: unexpected end of data. (I've verified, incidentally, that all the strings I want to write are of non-zero length, and also verified that calling tell() on the file object returns the same value as calling len() on the string.)
I found this prior SO, which suggested that the problem is that StringIO isn't encoded, so I swapped out BytesIO for StringIO and then fobj.write(cleaned.encode("utf-8")), but this still throws the same error.
I also tried simply not setting the size on the TarInfo object, and that code ran, but created an archive with a bunch of empty files.
What am I missing? Thanks!
The .addfile() method presumably just calls .read() on the file object you give it - which returns nothing in this case, because you're already at the end of the file. Try adding fobj.seek(0) just before that line.

How can I pass a Python StringIO() object to a ZipFile(), or is it not supported?

I have a StringIO() file-like object, and I am trying to write it to a ZipFile(), but I get this TypeError:
coercing to Unicode: need string or buffer, cStringIO.StringI found
Here is a sample of the code I am using:
file_like = StringIO()
archive = zipfile.ZipFile(file_like, 'w', zipfile.ZIP_DEFLATED)
# my_file is a StringIO object returned by a remote file storage server.
archive.write(my_file)
The docs say that StringIO() is a file-like class and that ZipFile() can accept a file-like object. Is there something I am missing?
To add a string to a ZipFile you need to use the writestr method and pass the string from StringIO using getvalue method of the StringIO instance
e.g.
archive.writestr("name of file in zip", my_file.getvalue())
Note you also need to give the name of the string to say where it is placed in the zip file.

Categories

Resources