I have a series of tar files that I wish to extract their containing data into a new directory. I want this directory to be an edited version of the original tar file name.
import tarfile
import glob
import os
for file in glob.glob("*.tar"):
# Open file
tar = tarfile.open(file, "r:")
# Create new diretory with name of tar file (minus .tar)
new_dir = file[0:-4]
os.makedirs(new_dir)
tar.extractall()
os.chdir(new_dir)
This works fine up until the tar.extractall() part. Is there a way to directly extract the tar file into the target directory or am I forced to extract all and then move the files across?
new_dir = "path"
tar.extractall(path=new_dir)
Related
I'm trying to extract all from a tar.gz file into the same Directory. The following code works to extract all, but the files are stored in the working directory instead of the path I entered as name.
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall()
tar.close()
How do I make sure the extracted files are saved in the directory path where I need them? I've been trying at this for ages, I really can't see why this doesn't work.
You should use:
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall(path=r"P:\Lehmann\Test_Python_Project")
tar.close()
You can try using shutil.unpack_archive
def extract_all(archives, extract_path):
for filename in archives:
shutil.unpack_archive(filename, extract_path)
I have some .tar.gz files and I can extract them with:
if (fname.endswith("tar.gz")):
tar = tarfile.open(fname, "r:gz")
tar.extractall()
tar.close()
But I want to add all the info of the extracted file in a .txt file, but I don't know the folders' name that .tar.gz files have inside to do it. Is it possible to know/rename the folders if you don't know the names and extract them? Thank you.
Each entry in the tarfile has a TarInfo header. You can get that info several ways, the easiest is just by iteration. That includes the path name which you can manage with os.posixpath functions. For example, given a tgz file I happen to have on hand:
>>> tf = tarfile.open("Downloads/dbutil-0.5.0.tar.gz", "r:gz")
>>> for info in tf:
... print(info.name, "DIR" if info.isdir() else "FILE")
...
dbutil-0.5.0 DIR
dbutil-0.5.0/setup.py FILE
dbutil-0.5.0/dbutil DIR
dbutil-0.5.0/dbutil/connection.py FILE
dbutil-0.5.0/dbutil/__init__.py FILE
dbutil-0.5.0/dbutil/row.py FILE
dbutil-0.5.0/PKG-INFO FILE
dbutil-0.5.0/dbutil.egg-info DIR
dbutil-0.5.0/dbutil.egg-info/dependency_links.txt FILE
dbutil-0.5.0/dbutil.egg-info/PKG-INFO FILE
dbutil-0.5.0/dbutil.egg-info/SOURCES.txt FILE
dbutil-0.5.0/dbutil.egg-info/top_level.txt FILE
dbutil-0.5.0/setup.cfg FILE
I would suggest comparing the list of files in the directory before and after archive extracting. Additional files and folders will be those from tar file.
I am extracting .tar.gz files which inside there are folders (with files with many extensions). I want to move all the .txt files of the folders to another, but I don't know the folders' name.
.txt files location ---> my_path/extracted/?unknown_name_folder?/file.txt
I want to do ---> my_path/extracted/file.txt
My code:
os.mkdir('extracted')
t = tarfile.open('xxx.tar.gz', 'r')
for member in t.getmembers():
if ".txt" in member.name:
t.extract(member, 'extracted')
###
I would try extracting the tar file first (See here)
import tarfile
tar = tarfile.open("xxx.tar.gz")
tar.extractall()
tar.close()
and then use the os.walk() method (See here)
import os
for root, dirs, files in os.walk('.\\xxx\\'):
txt_files = [path for path in files if path[:-4] == '.txt']
OR use the glob package to gather the txt files as suggested by #alper in the comments below:
txt_files = glob.glob('./**/*.txt', recursive=True)
This is untested, but should get you pretty close
And obviously move them once you get the list of text files
new_path = ".\\extracted\\"
for path in txt_files:
name = path[path.rfind('\\'):]
os.rename(path, new_path + name)
I have two tar.gz files, 2014_SRS.tar.gz and 2013_SRS.tar.gz. Each of the files contains a folder called SRS, which is full of text files. I downloaded these from an ftp server. I want to unzip them automatically in Python. This is my code:
import re
import ftplib
import os
import time
import tarfile
import sys
print('1')
tar = tarfile.open('2014_SRS.tar.gz')
tar.extractall()
tar.close()
print('2')
tar = tarfile.open('2013_SRS.tar.gz')
tar.extractall()
tar.close()
print('3')
This code only opens the second file. How do I fix it to open both files?
Also, I tried using a for loop to run through the whole directory. The code is shown below.
for i in os.listdir(os.getcwd()):
if i.endswith(".tar.gz"):
tar = tarfile.open(i, "r:gz")
tar.extractall()
tar.close()
However this gave me an EOFError. In addition, before I ran bit of code, I was able to unzip both files manually. However, after I run it, and after the code gives me an error, I cannot unzip the 2014_SRS file manually anymore. How do I fix this?
While this may not answer your specific question as to why both files could not be unzipped with your code , the following is one way to unzip a list of tar.gz files.
import tarfile, glob
srcDir = "/your/src/directory"
dstDir = "/your/dst/directory"
for f in glob.glob(srcDir + "/*.gz"):
t = tarfile.open(f,"r:gz")
for member in t.getmembers():
t.extract(member,dstDir)
t.close()
I wannted to extract some selected files from a sample.tar.gz files ( do not need to be extracted all the files inside sample.tar.gz files ) the file structures are looks like below.
sample.tar.gz is the base file, under this base file following files are contained.
hello_usb.tar.gz
hello_usb1.tar.gz
hello_usb2.tar.gz
world_usb1.tar.gz
world_usb2.tar.gz
world_usb3.tar.gz
I wanted to extract only "hello_*" files are to be extracted ( while extracting it should create a fiolder name same as the file name for example, while extracting hello_usb.tar.gz file it should create a folder name called as hello_usb.tar.gz/... )
I have tried with following codes but no luck.
tar = tarfile.open("D:\Python34\Testing\sample.tar.gz", "r:gz")
only_names = tar.getnames()
for count in range(len(only_names)):
print (only_names[count][:9])
if only_names[count][:9] == "hello_usb":
tar.extract(only_names[count], path = "D:\Python34\Testing")
#tar.extractfile(only_names[count])
Your help would be highly appreciated
Something like this may work.
import shutil
import glob
import os
with tarfile.open('D:\Python34\Testing\sample.tar.gz', 'r:gz') as tar:
tar.extractall()
os.chdir('sample')
for arch in glob.glob('hello_*'):
shutil.unpack_archive(arch, 'D:\Python34\Testing')