I have some .tar.gz files and I can extract them with:
if (fname.endswith("tar.gz")):
tar = tarfile.open(fname, "r:gz")
tar.extractall()
tar.close()
But I want to add all the info of the extracted file in a .txt file, but I don't know the folders' name that .tar.gz files have inside to do it. Is it possible to know/rename the folders if you don't know the names and extract them? Thank you.
Each entry in the tarfile has a TarInfo header. You can get that info several ways, the easiest is just by iteration. That includes the path name which you can manage with os.posixpath functions. For example, given a tgz file I happen to have on hand:
>>> tf = tarfile.open("Downloads/dbutil-0.5.0.tar.gz", "r:gz")
>>> for info in tf:
... print(info.name, "DIR" if info.isdir() else "FILE")
...
dbutil-0.5.0 DIR
dbutil-0.5.0/setup.py FILE
dbutil-0.5.0/dbutil DIR
dbutil-0.5.0/dbutil/connection.py FILE
dbutil-0.5.0/dbutil/__init__.py FILE
dbutil-0.5.0/dbutil/row.py FILE
dbutil-0.5.0/PKG-INFO FILE
dbutil-0.5.0/dbutil.egg-info DIR
dbutil-0.5.0/dbutil.egg-info/dependency_links.txt FILE
dbutil-0.5.0/dbutil.egg-info/PKG-INFO FILE
dbutil-0.5.0/dbutil.egg-info/SOURCES.txt FILE
dbutil-0.5.0/dbutil.egg-info/top_level.txt FILE
dbutil-0.5.0/setup.cfg FILE
I would suggest comparing the list of files in the directory before and after archive extracting. Additional files and folders will be those from tar file.
Related
I am trying to iterate through a folder which contains n subfolders, each of which has a subfolder with TIFF files in it. Using the zipfile module, I've tried the following:
path = 'D:\Project\I20\top'
with ZipFile(path, 'r') as zipObj:
listOfiles = zipObj.infolist()
for elem in listOfiles:
print(elem.filename, ' : ', elem.file_size, ' : ')
I am getting the following error when I try to do this:
Traceback (most recent call last):
File "D:\Test\algo\checksize.py", line 30, in <module>
with ZipFile(path, 'r') as zipObj:
File "C:\Users\manaT\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1239, in __init__
self.fp = io.open(file, filemode)
PermissionError: [Errno 13] Permission denied: 'D:\\Project\\I20\\top'
I have tried running Atom as administrator but that doesn't work. I have tried changing the drive's properties to allow full access to authenticated users.
The folder properties are still read only and every time I change it it reverts back to read only.
Is there a fix for this? If there is another method that will allow me to loop through the files in the folders within the zip files and store their names and sizes in a dictionary that would help as well.
If want to get list of .zip files in a folder then can use glob() or rglob() on the directory. Also, the ZipFile class expects a .zip file path as the argument not a directory. Then you can iterate over the file entries in the zip file.
from pathlib import Path
from zipfile import ZipFile
zips = {} # dictionary of zip files and sizes
path = Path(r'D:\Project\I20\top')
for file in path.glob('*.zip'):
with ZipFile(file, 'r') as zipObj:
for entry in zipObj.infolist():
print(entry.filename, ' : ', entry.file_size, ' : ')
# store filename and size in dictionary
zips[entry.filename] = entry.file_size
If want to recursively find .zip files in sub-folders in a target folder then replace glob() with rglob().
If zip file includes directory entries add if not entry.filename.endswith('/'): to ignore directory entries before printing the entry and/or adding it to the dictionary.
You don't open a directory using ZipFile, you can only open a zip file. You need to read the list of files in the zipfile:
with open(zipFile, 'r') as f:
files = f.infolist()
filenames = [file.filename for file in files]
You will now have a list of strings representing filenames. You can now manipulate these strings as if they were filenames and figure out what's in what directory.
I want to open a html file and that html file is in a zip file(both name is same) and i'm trying to open that html file.
old_file = input("DRAG:") #dir C:\Users\GG\PycharmProjects\pythonProject\f1dbef77-342b-4026-85d8-7f30fe691a63_f.zip
file_parts = old_file.split(".") #[C:\Users\GG\PycharmProjects\pythonProject\f1dbef77-342b-4026-85d8-7f30fe691a63_f] [zip]
first= file_parts[0]
direcs = first.split("\\")
file_itself = direcs[-1] # the file name that i need to use
last = file_parts[1]
file = open(f'{first}.zip\\{file_itself}.html', encoding="UTF-8").read()
You should first unzip the archive in a temporary folder, then you should open the file from there, and when everything is done, you may delete the folder in which you have extracted your data.
You may use python ZipFile as library and the extract() call to unzip your html.
See ZipFile Docs
I'm trying to extract all from a tar.gz file into the same Directory. The following code works to extract all, but the files are stored in the working directory instead of the path I entered as name.
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall()
tar.close()
How do I make sure the extracted files are saved in the directory path where I need them? I've been trying at this for ages, I really can't see why this doesn't work.
You should use:
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall(path=r"P:\Lehmann\Test_Python_Project")
tar.close()
You can try using shutil.unpack_archive
def extract_all(archives, extract_path):
for filename in archives:
shutil.unpack_archive(filename, extract_path)
I have a series of tar files that I wish to extract their containing data into a new directory. I want this directory to be an edited version of the original tar file name.
import tarfile
import glob
import os
for file in glob.glob("*.tar"):
# Open file
tar = tarfile.open(file, "r:")
# Create new diretory with name of tar file (minus .tar)
new_dir = file[0:-4]
os.makedirs(new_dir)
tar.extractall()
os.chdir(new_dir)
This works fine up until the tar.extractall() part. Is there a way to directly extract the tar file into the target directory or am I forced to extract all and then move the files across?
new_dir = "path"
tar.extractall(path=new_dir)
I wannted to extract some selected files from a sample.tar.gz files ( do not need to be extracted all the files inside sample.tar.gz files ) the file structures are looks like below.
sample.tar.gz is the base file, under this base file following files are contained.
hello_usb.tar.gz
hello_usb1.tar.gz
hello_usb2.tar.gz
world_usb1.tar.gz
world_usb2.tar.gz
world_usb3.tar.gz
I wanted to extract only "hello_*" files are to be extracted ( while extracting it should create a fiolder name same as the file name for example, while extracting hello_usb.tar.gz file it should create a folder name called as hello_usb.tar.gz/... )
I have tried with following codes but no luck.
tar = tarfile.open("D:\Python34\Testing\sample.tar.gz", "r:gz")
only_names = tar.getnames()
for count in range(len(only_names)):
print (only_names[count][:9])
if only_names[count][:9] == "hello_usb":
tar.extract(only_names[count], path = "D:\Python34\Testing")
#tar.extractfile(only_names[count])
Your help would be highly appreciated
Something like this may work.
import shutil
import glob
import os
with tarfile.open('D:\Python34\Testing\sample.tar.gz', 'r:gz') as tar:
tar.extractall()
os.chdir('sample')
for arch in glob.glob('hello_*'):
shutil.unpack_archive(arch, 'D:\Python34\Testing')