Python: Closing and removing files - python

I'm trying to unzip a file, and read one of the extracted files, and delete the extracted files.
Files extracted (e.g. we got file1 and file2)
Read file1, and close it.
with open(file1, 'r') as f:
data = f.readline()
f.close()
Do something with the "data".
Remove the files extracted.
os.remove(file1)
Everything went fine, except it received these messages at the end. The files were also removed. How do I close the files properly?
/tmp/file1: No such file or directory
140347508795048:error:02001002:system library:fopen:No such file or directory:bss_file.c:398:fopen('/tmp/file1','r')
140347508795048:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:400:
UPDATE:
(My script looks similar to these)
#!/usr/bin/python
import subprocess, os
infile = "filename.enc"
outfile = "filename.dec"
opensslCmd = "openssl enc -a -d -aes-256-cbc -in %s -out %s" % (infile, outfile)
subprocess.Popen(opensslCmd, shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, close_fds=True)
os.remove(infile)

No need to close a file handle when using with with a file context manager, the handle is automatically closed when the scope have changed, i.e. when readline is done.
See python tutorial

The errors you see are not errors as Python would report them. They mean something other than Python tried to open these files, although it's hard to tell what from your little snippet.
If you're simply trying to retrieve some data from a zip file, there isn't really a reason to extract them to disk. You can simply read the data directly from the zip file, extracting to memory only, with zipfile.ZipFile.open.

Related

How to *properly* compress and decompress a text file using bz2 and python

So I've had this system that scrapes and compresses files for a while now using bz2 compression. The way it does so is using the following block of code I found on SO a few months back:
Let's assume for the purposes of this post the filename is always file.XXXX where XXXX is the relevant extension. We start with .txt
### How to compress a text file
filepath_compressed = "file.tar.bz2"
with open("file.txt", 'rb') as data:
tarbz2contents = bz2.compress(data.read(), 9)
with bz2.BZ2File(filepath_compressed, 'wb') as f_comp:
f_comp.write(tarbz2contents)
Now, to decompress it, I've always got it to work using a decompression software I have called Keka which decompresses the .tar.bz2 file to .tar, then I run it through Keka again to get an "extensionless" file which I then add a .txt to on my mac and then it works.
Now, to do decompress programmatically, I've tried a few things. I've tried the stuff from this post and the code from this post. I've tried using BZ2Decompressor and BZ2File and everything. I just seem to be missing something and I'm not sure what it is.
Here is what I have so far, and I'd like to know what is wrong with this code:
import bz2, tarfile, shutil
# Decompress to tar
with bz2.BZ2File("file.tar.bz2") as fr, open("file.tar", "wb") as fw:
shutil.copyfileobj(fr, fw)
# Decompress from tar to txt
with tarfile.open("file.tar", "r:") as tar:
tar.extractall("file_out.txt")
This code crashes because of a "tarfile.ReadError: truncated header" problem. I think the first context manager outputs a binary text file, and I tried decoding that but that failed too. What am i missing here i feel like a noob.
If you would like a minimum runnable piece of code to replicate this, add the following to make a dummy file:
lines = ["Line 1","Line 2", "Line 3"]
with open("file.txt", "w") as f:
for line in lines:
f.write(line+"\n")
The thing that you're making is not a .tar.bz2 file, but rather a .bz2.bz2 file. You are compressing twice with bzip2 (the second time with no effect), and there is no tar file generation anywhere to be seen.

Error when using gzip on a file containing line breaks

I'm attempting to use python's gzip library to streamline some python scripts that create csv output files. I've tried a number of different methods of creating the gzip file, but no matter which method I've tried, I'm running into the same issue.
My python script runs successfully, but when I try to decompress the gzip file in Finder (using MacOS 10.15.6), I'm prompted with the following error:
Unable to expand "file.csv.gz" into "Documents". (Error 79 - Inappropriate file type or format.)
After some debugging, I've narrowed down the cause of the error to the file content containing line break (\n) characters.
This simple example code triggers the above error on gzip expansion:
import gzip
content = b'Id,Food\n1,Spam\n2,Eggs\n'
f = gzip.open('file.csv.gz', 'wb')
f.write(content)
f.close()
When I remove all \n characters from the content variable, everything works fine:
import gzip
content = b'Id,Food,1,Spam,2,Eggs'
f = gzip.open('file.csv.gz', 'wb')
f.write(content)
f.close()
Does gzip want me to use a different line break mechanism? I'm sure I'm missing some sort of foundational knowledge about gzip or binaries, so any info that helps get me back on track would be much appreciated.
It has nothing to do with Python's gzip. It is, arguably, a bug in macOS where it sometimes detects the resulting uncompressed data as an mtree by the Archive Utility, but then finds the uncompressed data violates the mtree format.
The solution is to not double-click to decompress. Use gzip to decompress.

How to compress a processed text file in Python?

I have a text file which I constantly append data to. When processing is done I need to gzip the file. I tried several options like shutil.make_archive, tarfile, gzip but could not eventually do it. Is there no simple way to compress a file without actually writing to it?
Let's say I have mydata.txt file and I want it to be gzipped and saved as mydata.txt.gz.
I don't see the problem. You should be able to use e.g. the gzip module just fine, something like this:
inf = open("mydata.txt", "rb")
outf = gzip.open("file.txt.gz", "wb")
outf.write(inf.read())
outf.close()
inf.close()
There's no problem with the file being overwritten, the name given to gzip.open() is completely independent of the name given to plain open().
If you want to compress a file without writing to it, you could run a shell command such as gzip using the Python libraries subprocess or popen or os.system.

How to close 30 open ms-word files with python code?

I'm using python 2.7.8. I'm working with 30 open docx files simultaneously.
Is there some way with python code to close all the files simultaneously instead closing every file separately ?
UPDATE:
I'm using different files every day so the names of the files change every time. My code must generally without specific names files (if it is possible)
I suggest is using the with statement when opening files:
with open('file1.txt', 'w') as file1, open('file2.txt', 'w') as file2:
file1.write('stuff to write')
file2.write('stuff to write')
...do other stuff...
print "Both files are closed because I'm out of the with statement"
When you leave the with statement, your file closes. You can even open all of your files on one line, but it's not recommended unless you are actively using all 20 files at once.
You need to find the pid of your word files and then use kill method to terminate the word file's process. e.g
import os
os.kill(pid, signal.SIGTERM)
first you have to append all opened file object in list.
l = []
f1 = open('f1.txt'):
#...do something
l.append(f1)
f2 = open('f2.txt'):
#...do something
l.append(f2)
Now get all files object from list and close them.
for files in l:
files.close()

unzipping large files using python

I am attempting to unzip files of various sizes (some are 4GB or above in size) using python, however I have noticed that on several occasions especially when the files are extremely large the file fails to unzip. When I open the new result file it is empty. Below is the code i am using - is there anything wrong with my approach?
inF = gzip.open(localFile, 'rb')
localFile = localFile[:-3]
outF = open(localFile, 'wb')
outF.write( inF.read() )
inF.close()
outF.close()
in this case it looks like you don't need python to do any processing on the file you read in so you might be better off just using subprocess.Popen:
from subprocess import Popen
Popen('gunzip %s %s' % (infilename, outfilename)).wait()
you might need to pass shell=True, but other than that should be good
Another solution for large .zip files (works on Ubuntu 16.04.4).
First install 7z:
sudo apt-get install p7zip-full
Then in your python code, call 7zip with:
import subprocess
subprocess.call(['7z', 'x', src_file, '-o'+target_dir])
This code loops of blocks of input data, writing each to an output file. In this way we don't read the entire input into memory at once, conserving memory and avoiding mysterious crashes.
import gzip, os
localFile = 'cat.gz'
outFile = os.path.splitext(localFile)[0]
print 'Unzipping {} to {}'.format(localFile, outFile)
with gzip.open(localFile, 'rb') as inF:
with open( outFile, 'wb') as outF:
outF.write( inF.read(size=1024) )

Categories

Resources