How to use subprocess to unzip gz file in python - python

I dont know how to unzip gz file in python using subprocess.
gzip library is so slow and i was thinking to make the same function above using gnu/linux shell code and subprocess library.
def __unzipGz(filePath):
import gzip
inputFile = gzip.GzipFile(filePath, 'rb')
stream = inputFile.read()
inputFile.close()
outputFile = file(os.path.splitext(filePath)[0], 'wb')
outputFile.write(stream)
outputFile.close()

You can use something like this:
import subprocess
filename = "some.gunzip.file.tar.gz"
output = subprocess.Popen(['tar', '-xzf', filename])
Since there is no much useful output here, You could also use os.system instead of subprocess.Popen like this:
import os
filename = "some.gunzip.file.tar.gz"
exit_code = os.system("tar -xzf {}".format(filename))

Related

How can I write this file in Python, preferably using gzip, to a zipped file?

I have some code writing output to a file that I want zipped, but I can't figure out how I would write it to a zipped file.
subprocess.run([f"grep -i -m 1 'REMARK VINA RESULT:' ./output/{docking_type}/output_{filename} \
| awk '{{print $4}}' >> results_{rank}.txt; echo {filename} \
>> results_{rank}.txt"], shell=True)
At this point I can only see writing the output then taking that file and zipping it, but I'm hoping to combine those steps, since I am writing a very large number of files. From the gzip documentation this would be done via:
import gzip
content = b"Lots of content here"
with gzip.open('/home/joe/file.txt.gz', 'wb') as f:
f.write(content)
Am I just misunderstanding gzip? Thanks for any help!
I've tried a few variations without success so far!
Try this:
import gzip
content = b"Lots of content here"
with gzip.open('/home/joe/file.txt.gz', 'wb') as f:
f.write(content.encode()) # encoded
Yes, you can write the output directly to a gzipped file.
Here's How you can achieve this.
import subprocess
import gzip
# run the shell command to get the result and filename
result = subprocess.run([f"grep -i -m 1 'REMARK VINA RESULT:' ./output/{docking_type}/output_{filename} \
| awk '{{print $4}}' && echo {filename}"], shell=True, stdout=subprocess.PIPE)
# convert the output to bytes
content = result.stdout
# write the content to a gzipped file
with gzip.open(f'results_{rank}.txt.gz', 'wb') as f:
f.write(content)
This will run the shell command and store the output in result.stdout, then write that output directly to a gzipped file with .gz extension.

How to pipe tar.extractall from python

I'm extracting a tarball using the tarfile module of python. I don't want the extracted files to be written on the disk, but rather get piped directly to another program, specifically bgzip. I'm also trying to use StringIO for that matter, but I get stuck even on that stage - the tarball gets extracted on the disk.
#!/usr/bin/env python
import tarfile, StringIO
tar = tarfile.open("6genomes.tgz", "r:gz")
def enafun(members):
for tarkati in tar:
if tarkati.isreg():
yield tarkati
reles = StringIO.StringIO()
reles.write(tar.extractall(members=enafun(tar)))
tar.close()
How then do I pipe correctly the output of tar.extractall?
You cannot use extractall method, but you can use getmembers and extractfile methods instead:
#!/usr/bin/env python
import tarfile, StringIO
reles = StringIO.StringIO()
with tarfile.open("6genomes.tgz", "r:gz") as tar:
for m in tar.members():
if m.isreg():
reles.write(tar.extractfile(m).read())
# do what you want with "reles".
According to the documentation, extractfile() method can take a TarInfo and will return a file-like object. You can then get the content of that file with read().
[EDIT] I add what you asked me in comment as formatting in comment seems not to render properly.
#!/usr/bin/env python
import tarfile
import subprocess
with tarfile.open("6genomes.tgz", "r:gz") as tar:
for m in tar.members():
if m.isreg():
f = tar.extractfile(m)
new_filename = generate_new_filename(f.name)
with open(new_filename, 'wb') as new_file:
proc = subprocess.Popen(['bgzip', '-c'], stdin=subprocess.PIPE, stdout=new_file)
proc.stdin.write(f.read())
proc.stdin.close()
proc.wait()
f.close()

How can I call Vim from Python?

I would like a Python script to prompt me for a string, but I would like to use Vim to enter that string (because the string might be long and I want to use Vim's editing capability while entering it).
You can call vim with a file path of your choice:
from subprocess import call
call(["vim","hello.txt"])
Now you can use this file as your string:
file = open("hello.txt", "r")
aString = file.read()
Solution:
#!/usr/bin/env python
from __future__ import print_function
from os import unlink
from tempfile import mkstemp
from subprocess import Popen
def callvim():
fd, filename = mkstemp()
p = Popen(["/usr/bin/vim", filename])
p.wait()
try:
return open(filename, "r").read()
finally:
unlink(filename)
data = callvim()
print(data)
Example:
$ python foo.py
This is a big string.
This is another line in the string.
Bye!

How to execute a python script and write output to txt file?

I'm executing a .py file, which spits out a give string. This command works fine
execfile ('file.py')
But I want the output (in addition to it being shown in the shell) written into a text file.
I tried this, but it's not working :(
execfile ('file.py') > ('output.txt')
All I get is this:
tugsjs6555
False
I guess "False" is referring to the output file not being successfully written :(
Thanks for your help
what your doing is checking the output of execfile('file.py') against the string 'output.txt'
you can do what you want to do with subprocess
#!/usr/bin/env python
import subprocess
with open("output.txt", "w+") as output:
subprocess.call(["python", "./script.py"], stdout=output);
This'll also work, due to directing standard out to the file output.txt before executing "file.py":
import sys
orig = sys.stdout
with open("output.txt", "wb") as f:
sys.stdout = f
try:
execfile("file.py", {})
finally:
sys.stdout = orig
Alternatively, execute the script in a subprocess:
import subprocess
with open("output.txt", "wb") as f:
subprocess.check_call(["python", "file.py"], stdout=f)
If you want to write to a directory, assuming you wish to hardcode the directory path:
import sys
import os.path
orig = sys.stdout
with open(os.path.join("dir", "output.txt"), "wb") as f:
sys.stdout = f
try:
execfile("file.py", {})
finally:
sys.stdout = orig
If you are running the file on Windows command prompt:
python filename.py >> textfile.txt
The output would be redirected to the textfile.txt in the same folder where the filename.py file is stored.
The above is only if you have the results showing on cmd and you want to see the entire result without it being truncated.
The simplest way to run a script and get the output to a text file is by typing the below in the terminal:
PCname:~/Path/WorkFolderName$ python scriptname.py>output.txt
*Make sure you have created output.txt in the work folder before executing the command.
Use this instead:
text_file = open('output.txt', 'w')
text_file.write('my string i want to put in file')
text_file.close()
Put it into your main file and go ahead and run it. Replace the string in the 2nd line with your string or a variable containing the string you want to output. If you have further questions post below.
file_open = open("test1.txt", "r")
file_output = open("output.txt", "w")
for line in file_open:
print ("%s"%(line), file=file_output)
file_open.close()
file_output.close()
using some hints from Remolten in the above posts and some other links I have written the following:
from os import listdir
from os.path import isfile, join
folderpath = "/Users/nupadhy/Downloads"
filenames = [A for A in listdir(folderpath) if isfile(join(folderpath,A))]
newlistfiles = ("\n".join(filenames))
OuttxtFile = open('listallfiles.txt', 'w')
OuttxtFile.write(newlistfiles)
OuttxtFile.close()
The code above is to list all files in my download folder. It saves the output to the output to listallfiles.txt. If the file is not there it will create and replace it with a new every time to run this code. Only thing you need to be mindful of is that it will create the output file in the folder where your py script is saved. See how you go, hope it helps.
You could also do this by going to the path of the folder you have the python script saved at with cmd, then do the name.py > filename.txt
It worked for me on windows 10

How to test a directory of files for gzip and uncompress gzipped files in Python using zcat?

I'm in my 2nd week of Python and I'm stuck on a directory of zipped/unzipped logfiles, which I need to parse and process.
Currently I'm doing this:
import os
import sys
import operator
import zipfile
import zlib
import gzip
import subprocess
if sys.version.startswith("3."):
import io
io_method = io.BytesIO
else:
import cStringIO
io_method = cStringIO.StringIO
for f in glob.glob('logs/*'):
file = open(f,'rb')
new_file_name = f + "_unzipped"
last_pos = file.tell()
# test for gzip
if (file.read(2) == b'\x1f\x8b'):
file.seek(last_pos)
#unzip to new file
out = open( new_file_name, "wb" )
process = subprocess.Popen(["zcat", f], stdout = subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
if process.poll() != None:
break;
output = io_method(process.communicate()[0])
exitCode = process.returncode
if (exitCode == 0):
print "done"
out.write( output )
out.close()
else:
raise ProcessException(command, exitCode, output)
which I've "stitched" together using these SO answers (here) and blogposts (here)
However, it does not seem to work, because my test file is 2.5GB and the script has been chewing on it for 10+mins plus I'm not really sure if what I'm doing is correct anyway.
Question:
If I don't want to use GZIP module and need to de-compress chunk-by-chunk (actual files are >10GB), how do I uncompress and save to file using zcat and subprocess in Python?
Thanks!
This should read the first line of every file in the logs subdirectory, unzipping as required:
#!/usr/bin/env python
import glob
import gzip
import subprocess
for f in glob.glob('logs/*'):
if f.endswith('.gz'):
# Open a compressed file. Here is the easy way:
# file = gzip.open(f, 'rb')
# Or, here is the hard way:
proc = subprocess.Popen(['zcat', f], stdout=subprocess.PIPE)
file = proc.stdout
else:
# Otherwise, it must be a regular file
file = open(f, 'rb')
# Process file, for example:
print f, file.readline()

Categories

Resources