How do I create the binary contents of a zip file from .csv file binary contents? I don't want to actually write any files to memory.
For instance, I have tried zipObj = ZipFile(outputZipFileName, 'w'), but that requires a file name, which means it is not just a file in binary format.
EDIT: I just found the answer at https://www.neilgrogan.com/py-bin-zip/
The file in zipfile.ZipFile() can be an actual disk file or a file-like-object. Your solution probably lies in the io.BytesIO or io.StringIO classes. These let you create bytes or strings in memory and treat them like files in other functions and classes that take file-like-objects.
Answer is at https://www.neilgrogan.com/py-bin-zip/.
Turns out, using BytesIO alongside zipfile was the ticket!
Related
I'm trying, just for fun, to understand if I can extract the full path of my file while using the with statement (python 3.8)
I have this simple code:
with open('tmp.txt', 'r') as file:
print(os.path.basename(file))
But I keep getting an error that it's not a suitable type format.
I've been trying also with the relpath, abspath, and so on.
It says that the input should be a string, but even after casting it into string, I'm getting something that I can't manipulate.
Perhaps there isn't an actual way to extract that full path name, but I think there is. I just can't find it, yet.
You could try:
import os
with open("tmp.txt", "r") as file_handle:
print(os.path.abspath(file_handle.name))
The functions in os.path accept strings or path-like objects. You are attempting to pass in a file instead. There are lots of reasons the types aren't interchangable.
Since you opened the file for text reading, file is an instance of io.TextIOWrapper. This class is just an interface that provides text encoding and decoding for some underlying data. It is not associated with a path in general: the underlying stream can be a file on disk, but also a pipe, a network socket, or an in-memory buffer (like io.StringIO). None of the latter are associated with a path or filename in the way that you are thinking, even though you would interface with them as through normal file objects.
If your file-like is an instance of io.FileIO, it will have a name attribute to keep track of this information for you. Other sources of data will not. Since the example in your question uses FileIO, you can do
with open('tmp.txt', 'r') as file:
print(os.path.abspath(file.name))
The full file path is given by os.path.abspath.
That being said, since file objects don't generally care about file names, it is probably better for you to keep track of that info yourself, in case one day you decide to use something else as input. Python 3.8+ allows you to do this without changing your line count using the walrus operator:
with open((filename := 'tmp.txt'), 'r') as file:
print(os.path.abspath(filename))
I have a file data type. I need to convert it to binary. Is there a way to do it? What im finding out is using
open(file_path, 'rb')
But using that means I need to save the file first to a directory and its a problem with efficiency. I get the file from HTML form
instead of using
bucket.put_object_from_file('file_upload', fileobj)
I use:
bucket.put_object('file_upload', fileobj)
to upload using network streams
I have a text file which I constantly append data to. When processing is done I need to gzip the file. I tried several options like shutil.make_archive, tarfile, gzip but could not eventually do it. Is there no simple way to compress a file without actually writing to it?
Let's say I have mydata.txt file and I want it to be gzipped and saved as mydata.txt.gz.
I don't see the problem. You should be able to use e.g. the gzip module just fine, something like this:
inf = open("mydata.txt", "rb")
outf = gzip.open("file.txt.gz", "wb")
outf.write(inf.read())
outf.close()
inf.close()
There's no problem with the file being overwritten, the name given to gzip.open() is completely independent of the name given to plain open().
If you want to compress a file without writing to it, you could run a shell command such as gzip using the Python libraries subprocess or popen or os.system.
When you do:
file = open("my file","wt")
and
file = open("my file" , "rt")
These both create file objects that we use file methods on. But are they creating different file objects? And if they are creating different file objects would it be fair to say that the "wt" one is mutable, while the "rt" one is immutable?
No, that would not be fair to say. You are creating instances of the same standard file type, which proxies file manipulation calls to the operating system. The mode defines what the operating system will let you do.
It doesn't matter if you use the same filename or different filenames; the OS doesn't care, and neither does Python; the open file objects are distinct.
The Python object itself is immutable; you cannot change the mode, filename or other attributes after the fact.
Note that by adding + to the mode, you can both read and write to the file object; w+ will truncate the file first, while r+ would not.
At the OS level, they would be created as two distinct file descriptors. They would (likely) point to the same data in the VFS/cache, but can be operated independently.
I am a programming student for the semester. In class we have been learning about file opening, reading and writing.
We have used a_reader to achieve such tasks for file opening. I have been reading our associated text/s and I have noticed that there is a CSV reader option which I have been using.
I wanted to know if there were anymore possible ways to open/read a file as I am trying to grow my knowledge base in python and its associated contents.
EDIT:
I was referring to CSV more specifically as that is the type of files we use at the moment. We have learnt about CSV Reader and a_reader and an example from one of our lectures is shown below.
def main():
a_reader = open('IDCJAC0016_009225_1800_Data.csv', 'rU')
file_data = a_reader.read()
a_reader.close()
print file_data
main()
It may seem overly broad but I have no knowledge which is why I am asking is there more than just the 2 ways above. If there is can someone who knows provide the types so I can read up on and research on them.
If you're asking about places to store things, the first interfaces you'll meet are files and sockets (pretend a network connection is like a file, see http://docs.python.org/2/library/socket.html).
If you mean file formats (like csv), there are many! Probably you can think of many yourself, but besides csv there are html files, pictures (png, jpg, gif), archive formats (tar, zip), text files (.txt!), python files (.py). The list goes on.
There are many ways to read files in different ways.
Just plain open will take a filename and open it as a sequence of lines. Or, you can just call read() on it, and it will read the whole file at once into one giant string.
codecs.open will take a filename and a character set, and decode each line to Unicode automatically. Or, again, you can just call read() on it, and it will read and decode the whole file at once into one giant Unicode string.
csv.reader will take a file or file-like object, and read it as a sequence of CSV rows. There's no direct equivalent of read()—but you can turn any sequence into a list by just calling list on it, so list(my_reader) will give you a list of rows (each of which is, itself, a list).
zipfile.ZipFile will take a filename, or a file or file-like object, and read it as a ZIP archive. This doesn't go line by line, of course, but you can go archived file by archived file. Or you can do fancier things, like search for archived files by name.
There are modules for reading JSON and XML documents, different ways of handling binary files, and so on. Some of them work differently—for example, you can search an XML document as a tree with one module, or go element by element with a different one.
Python has a pretty extensive standard library, and you can find the documentation online. Every module that seems like it should be able to work on files, probably can.
And, beyond what comes in the standard library, PyPI, the Python Package Index has thousands of additional modules. Looking for a way to read YAML documents? Search PyPI for yaml and you'll find it.
Finally, Python makes it very easy to add things like this on your own. The skeleton of a function like csv.reader is as simple as this:
def reader(fileobj):
for line in fileobj:
yield parse_one_csv_line(line)
You can replace that parse_one_csv_line with anything you want, and you've got a custom reader. For example, here's an uppercase_reader:
def uppercase_reader(fileobj):
for line in fileobj:
yield line.upper()
In fact, you can even write the whole thing in one line:
shouts = (line.upper() for line in fileobj)
And the best thing is that, as long as your reader only yields one line at a time, your reader is itself a file-like object, so you can pass uppercase_reader(fileobj) to csv.reader and it works just fine.