I have a .tar.gz file which I want to unpack (when I unpack with 7-Zip manually, I am getting a .tar file inside). I am able to unpack this .tar file easily then with Python tarfile module then.
When I right-click the .tar.gz file in Windows Explorer, I can see under Type of file: 7-Zip.gz (.gz). I have tried using gzip module (gzip.open), however I am getting a an exception 'Not a gzipped file'. So there should be some other way to go.
I have searched the Internet and seen that people use 7-Zip manually or some batch commands, however I cannot find a way to do this in Python. I am on Python 2.7.
The tarfile library is able to read gzipped tar files. You should look at the examples here:
http://docs.python.org/2/library/tarfile.html#examples
The first example might accomplish what you want. It extracts the content of the archive to the current working directory:
import tarfile
tar = tarfile.open("sample.tar.gz")
tar.extractall()
tar.close()
import os
import tarfile
import zipfile
def extract_file(path, to_directory='.'):
if path.endswith('.zip'):
opener, mode = zipfile.ZipFile, 'r'
elif path.endswith('.tar.gz') or path.endswith('.tgz'):
opener, mode = tarfile.open, 'r:gz'
elif path.endswith('.tar.bz2') or path.endswith('.tbz'):
opener, mode = tarfile.open, 'r:bz2'
else:
raise ValueError, "Could not extract `%s` as no appropriate extractor is found" % path
cwd = os.getcwd()
os.chdir(to_directory)
try:
file = opener(path, mode)
try: file.extractall()
finally: file.close()
finally:
os.chdir(cwd)
Found this here:
http://code.activestate.com/recipes/576714-extract-a-compressed-file/
This is the example from the python-docs and should work:
import gzip
f = gzip.open('file.txt.gz', 'rb')
file_content = f.read()
f.close()
Related
I'm trying to extract all from a tar.gz file into the same Directory. The following code works to extract all, but the files are stored in the working directory instead of the path I entered as name.
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall()
tar.close()
How do I make sure the extracted files are saved in the directory path where I need them? I've been trying at this for ages, I really can't see why this doesn't work.
You should use:
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall(path=r"P:\Lehmann\Test_Python_Project")
tar.close()
You can try using shutil.unpack_archive
def extract_all(archives, extract_path):
for filename in archives:
shutil.unpack_archive(filename, extract_path)
for a pattern recognition application, I want to read and operate on jpeg files from another folder using the os module.
I tried to use str(file) and file.encode('latin-1') but they both give me errors
I tried :
allLines = []
path = 'results/'
fileList = os.listdir(path)
for file in fileList:
file = open(os.path.join('results/'+ str(file.encode('latin-1'))), 'r')
allLines.append(file.read())
print(allLines)
but I get an error saying:
No such file or directory "results/b'thefilename"
when I expect a list with the desired file names that are accessible
If you can use Python 3.4 or newer, you can use the pathlib module to handle the paths.
from pathlib import Path
all_lines = []
path = Path('results/')
for file in path.iterdir():
with file.open() as f:
all_lines.append(f.read())
print(all_lines)
By using the with statement, you don't have to close the file descriptor by hand (what is currently missing), even if an exception is raised at some point.
I am trying to zip a single file in python. For whatever reason, I'm having a hard time getting down the syntax. What I am trying to do is keep the original file and create a new zipped file of the original (like what a Mac or Windows would do if you archive a file).
Here is what I have so far:
import zipfile
myfilepath = '/tmp/%s' % self.file_name
myzippath = myfilepath.replace('.xml', '.zip')
zipfile.ZipFile(myzippath, 'w').write(open(myfilepath).read()) # does not zip the file properly
The correct way to zip file is:
zipfile.ZipFile('hello.zip', mode='w').write("hello.csv")
# assume your xxx.py under the same dir with hello.csv
The python official doc says:
ZipFile.write(filename, arcname=None, compress_type=None)
Write the file named filename to the archive, giving it the archive name arcname
You pass open(filename).read() into write(). open(filename).read() is a single string that contains the whole content of file filename, it would throw FileNotFoundError because it is trying to find a file named with the string content.
If the file to be zipped (filename) is in a different directory called pathname, you should use the arcname parameter. Otherwise, it will recreate the full folder hierarchy to the file folder.
from zipfile import ZipFile
import os
with ZipFile(zip_file, 'w') as zipf:
zipf.write(os.path.join(pathname,filename), arcname=filename)
Try calling zipfile.close() afterwards?
from zipfile import ZipFile
zipf = ZipFile("main.zip","w", zipfile.ZIP_DEFLATED)
zipf.write("main.json")
zipf.close()
Since you also want to specify the directory try using os.chdir:
#!/usr/bin/python
from zipfile import ZipFile
import os
os.chdir('/path/of/target/and/destination')
ZipFile('archive.zip', 'w').write('original_file.txt')
Python zipfile : Work with Zip archives
Python Miscellaneous operating system interfaces
Does anybody has any code for converting tar.gz file into zip using only Python code? I have been facing many issues with tar.gz as mentioned in the How can I read tar.gz file using pandas read_csv with gzip compression option?
You would have to use the tarfile module, with mode 'r|gz' for reading.
Then use zipfile for writing.
import tarfile, zipfile
tarf = tarfile.open( name='mytar.tar.gz', mode='r|gz' )
zipf = zipfile.ZipFile( file='myzip.zip', mode='a', compression=zipfile.ZIP_DEFLATED )
for m in tarf:
f = tarf.extractfile( m )
fl = f.read()
fn = m.name
zipf.writestr( fn, fl )
tarf.close()
zipf.close()
You can use is_tarfile() to check for a valid tar file.
Perhaps you could also use shutil, but I think it cannot work on memory.
PS: From the brief testing that I performed, you may have issues with members m which are directories.
If so, you may have to use is_dir(), or even first get the info on each tar file member with tarf.getmembers(), and the open the tar.gz file for transferring to zip, since you cannot do it after tarf.getmembers() (you cannot seek backwards).
This just fixes a couple of tiny issues from the above answer, makes sure the mtime is preserved and makes sure compression is happening on all the files. All credit to the above for the simple answer.
from datetime import datetime
import sys
from tarfile import open
from zipfile import ZipFile, ZIP_DEFLATED, ZipInfo
compresslevel = 9
compression = ZIP_DEFLATED
with open(name=sys.argv[1], mode='r|gz') as tarf:
with ZipFile(file=sys.argv[2], mode='w', compression=compression, compresslevel=compresslevel) as zipf:
for m in tarf:
mtime = datetime.fromtimestamp(m.mtime)
print(f'{mtime} - {m.name}')
zinfo: ZipInfo = ZipInfo(
filename=m.name,
date_time=(mtime.year, mtime.month, mtime.day, mtime.hour, mtime.minute, mtime.second)
)
if not m.isfile():
# for directories and other types
continue
f = tarf.extractfile(m)
fl = f.read()
zipf.writestr(zinfo, fl, compress_type=compression, compresslevel=compresslevel)
print('done.')
I have this function that references the path of a file:
some_obj.file_name(FILE_PATH)
where FILE_PATH is a string of the path of a file, i.e. H:/path/FILE_NAME.ext
I want to create a file FILE_NAME.ext inside my python script with the content of a string:
some_string = 'this is some content'
How to go about this? The Python script will be placed inside a Linux box.
I think you're looking for a tempfile.NamedTemporaryFile.
import tempfile
with tempfile.NamedTemporaryFile() as tmp:
print(tmp.name)
tmp.write(...)
But:
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
If that is a concern for you:
import os, tempfile
tmp = tempfile.NamedTemporaryFile(delete=False)
try:
print(tmp.name)
tmp.write(...)
finally:
tmp.close()
os.unlink(tmp.name)
There is a tempfile module for python, but a simple file creation also does the trick:
new_file = open("path/to/FILE_NAME.ext", "w")
Now you can write to it using the write method:
new_file.write('this is some content')
With the tempfile module this might look like this:
import tempfile
new_file, filename = tempfile.mkstemp()
print(filename)
os.write(new_file, "this is some content")
os.close(new_file)
With mkstemp you are responsible for deleting the file after you are done with it. With other arguments, you can influence the directory and name of the file.
UPDATE
As rightfully pointed out by Emmet Speer, there are security considerations when using mkstemp, as the client code is responsible for closing/cleaning up the created file. A better way to handle it is the following snippet (as taken from the link):
import os
import tempfile
fd, path = tempfile.mkstemp()
try:
with os.fdopen(fd, 'w') as tmp:
# do stuff with temp file
tmp.write('stuff')
finally:
os.remove(path)
The os.fdopen wraps the file descriptor in a Python file object, that closes automatically when the with exits. The call to os.remove deletes the file when no longer needed.