Extract only a single directory from tar (in python) - python

I am working on a project in python in which I need to extract only a subfolder of tar archive not all the files.
I tried to use
tar = tarfile.open(tarfile)
tar.extract("dirname", targetdir)
But this does not work, it does not extract the given subdirectory also no exception is thrown. I am a beginner in python.
Also if the above function doesn't work for directories whats the difference between this command and tar.extractfile() ?

Building on the second example from the tarfile module documentation, you could extract the contained sub-folder and all of its contents with something like this:
with tarfile.open("sample.tar") as tar:
subdir_and_files = [
tarinfo for tarinfo in tar.getmembers()
if tarinfo.name.startswith("subfolder/")
]
tar.extractall(members=subdir_and_files)
This creates a list of the subfolder and its contents, and then uses the recommended extractall() method to extract just them. Of course, replace "subfolder/" with the actual path (relative to the root of the tar file) of the sub-folder you want to extract.

The other answer will retain the subfolder path, meaning that subfolder/a/b will be extracted to ./subfolder/a/b. To extract a subfolder to the root, so subfolder/a/b would be extracted to ./a/b, you can rewrite the paths with something like this:
def members(tf):
l = len("subfolder/")
for member in tf.getmembers():
if member.path.startswith("subfolder/"):
member.path = member.path[l:]
yield member
with tarfile.open("sample.tar") as tar:
tar.extractall(members=members(tar))

Related

Python tarfile.extract func not extracting content of directory

I'm trying to extract a directory from tarfile using python. But some/ALL of its files inside that directory are missing after extraction. Only pathname got extracted (ie, I get folder home inside /tmp/myfolder but its empty)
Code is as follwing:
for tar in tarfiles:
mytar = tarfile.open(tar)
for file in mytar:
if file == "myfile":
mytar.extract('home', /tmp/myfolder)
Found a fix, by default extract only extracts path of variable, I can get content with
tar.extractall(members=members(tar))
Reference:
https://stackoverflow.com/a/43094365/20223973

Unzipping a file with subfolders into the same directory without creating an extra folder

I hope I don't duplicate here, but I didn't find a solution until now since the answers don't include subfolders. I have a zipfile that contains a folder which contains files and subfolders.
I want to extract the files within the folder (my_folder) and the subfolder to a specific path: Users/myuser/Desktop/another . I want only files and subfolders in the another dir. With my current code what happens it that a directory my_folder is created in which my files and subfolders are placed. But I don't want that directory created. This is what I am doing:
with zipfile.ZipFile("Users/myuser/Desktop/another/my_file.zip", "r") as zip_ref:
zip_ref.extractall(Users/myuser/Desktop/another)
I tried listing all the zipfiles within the folder and extracting them manually:
with ZipFile('Users/myuser/Desktop/another/myfile.zip', 'r') as zipObj:
# Get a list of all archived file names from the zip
listOfFileNames = zipObj.namelist()
for fileName in new_list_of_fn:
print(fileName)
zipObj.extract(fileName, 'Users/myuser/Desktop/another/')
This yields the same result. I the tried create a new list, stripping the names so that they don't include the name of the folder anymore but then it tells me that there is no item named xyz in the archive.
Finally I leveraged those two questions/code (extract zip file without folder python and Extract files from zip without keeping the structure using python ZipFile?) and this works, but only if there are no subfolders involved. If there are subfolders it throws me the error FileNotFoundError: [Errno 2] No such file or directory: ''. What I want though is that the files in the subdirectory get extracted to the subdirectory.
I can only use this code if I skip all directories:
my_zip = Users/myuser/Desktop/another/myfile.zip
my_dir = Users/myuser/Desktop/another/
with zipfile.ZipFile(my_zip, 'r') as zip_file:
for member in zip_file.namelist():
filename = os.path.basename(member)
print(filename)
# skip directories
if not filename:
continue
# copy file (taken from zipfile's extract)
source = zip_file.open(member)
target = open(os.path.join(my_dir, filename), "wb")
with source, target:
shutil.copyfileobj(source, target)
So I am looking for a way to do this which would also extract subdirs to their respective dir. That means I want a structure within /Users/myuser/Desktop/another:
-file1
-file2
-file3
...
- subfolder
-file1
-file2
-file3
...
I have the feeling this must be doable with shututil but don't really know how....
Is there a way I can do this? Thanks so much for any help. Very much appreciated.

extract zip file without folder python

I am currently using extratall function in python to unzip, after unziping it also creates a folder like: myfile.zip -> myfile/myfile.zip , how do i get rid of myfile flder and just unzip it to the current folder without the folder, is it possible ?
I use the standard module zipfile. There is the method extract which provides what I think you want. This method has the optional argument path to either extract the content to the current working directory or the the given path
import os, zipfile
os.chdir('path/of/my.zip')
with zipfile.ZipFile('my.zip') as Z :
for elem in Z.namelist() :
Z.extract(elem, 'path/where/extract/to')
If you omit the 'path/where/extract/to' the files from the ZIP-File will be extracted to the directory of the ZIP-File.
import shutil
# loop over everything in the zip
for name in myzip.namelist():
# open the entry so we can copy it
member = myzip.open(name)
with open(os.path.basename(name), 'wb') as outfile:
# copy it directly to the output directory,
# without creating the intermediate directory
shutil.copyfileobj(member, outfile)

Create a package files python

I need to create a script to copy all files .class and .xml from multiple folders and generate a package something like tar type, those diferent path folders will be filled when the script runs, is this possible?
I'm using linux - Centos
Thanks
Python's standard library comes with multiple archiving modules, and more are available from PyPI and elsewhere.
I'm not sure how you want to fill in the paths to the things to include, but let's say you've already got that part done, and you have a list or iterator full of (appropriately relative) pathnames to files. Then, you can just do this:
with tarfile.TarFile('package.tgz', 'w:gz') as tar:
for pathname in pathnames:
tar.add(pathname)
But you don't even have to gather all the files one by one, because tarfile can do that for you. Let's say your script just takes one or more directory names as command-line arguments, and you want it to recursively add all of the files whose names end in .xml or .class anywhere in any of those directories:
def package_filter(info):
if info.isdir() or os.path.splitext(info.name)[-1] in ('.xml', '.class'):
return info
else:
return None
with tarfile.TarFile('package.tgz', 'w:gz', filter=package_filter) as tar:
for pathname in sys.argv[1:]:
tar.add(pathname)
See the examples for more. But mainly, read the docs for TarFile's constructor and open method.

How can files be added to a tarfile with Python, without adding the directory hierarchy?

When I invoke add() on a tarfile object with a file path, the file is added to the tarball with directory hierarchy associated. In other words, if I unzip the tarfile the directories in the original directories hierarchy are reproduced.
Is there a way to simply adding a plain file without directory info that untarring the resulting tarball produce a flat list of files?
Using the arcname argument of TarFile.add() method is an alternate and convenient way to match your destination.
Example: you want to archive a dir repo/a.git/ to a tar.gz file, but you rather want the tree root in the archive begins by a.git/ but not repo/a.git/, you can do like followings:
archive = tarfile.open("a.git.tar.gz", "w|gz")
archive.add("repo/a.git", arcname="a.git")
archive.close()
You can use tarfile.addfile(), in the TarInfo object, which is the first parameter, you can specify a name that's different from the file you're adding.
This piece of code should add /path/to/filename to the TAR file but will extract it as myfilename:
tar.addfile(tarfile.TarInfo("myfilename.txt"), open("/path/to/filename.txt"))
Maybe you can use the "arcname" argument to TarFile.add(name, arcname). It takes an alternate name that the file will have inside the archive.
thanks to #diabloneo, function to create selective tarball of a dir
def compress(output_file="archive.tar.gz", output_dir='', root_dir='.', items=[]):
"""compress dirs.
KWArgs
------
output_file : str, default ="archive.tar.gz"
output_dir : str, default = ''
absolute path to output
root_dir='.',
absolute path to input root dir
items : list
list of dirs/items relative to root dir
"""
os.chdir(root_dir)
with tarfile.open(os.path.join(output_dir, output_file), "w:gz") as tar:
for item in items:
tar.add(item, arcname=item)
>>>root_dir = "/abs/pth/to/dir/"
>>>compress(output_file="archive.tar.gz", output_dir=root_dir,
root_dir=root_dir, items=["logs", "output"])
Here is the code sample to tar list of files in folder without adding folder:
with tarfile.open(tar_path, 'w') as tar:
for filename in os.listdir(folder):
fpath = os.path.join(folder, filename)
tar.add(fpath, arcname=filename)
If you want to add the directory name but not its contents inside a tarfile, you can do the following:
(1) create an empty directory called empty
(2) tf.add("empty", arcname=path_you_want_to_add)
That creates an empty directory with the name path_you_want_to_add.

Categories

Resources