Extract a seris of tar files into self titled directories

Extract a seris of tar files into self titled directories - python

I have a series of tar files that I wish to extract their containing data into a new directory. I want this directory to be an edited version of the original tar file name.
import tarfile
import glob
import os
for file in glob.glob("*.tar"):
# Open file
tar = tarfile.open(file, "r:")
# Create new diretory with name of tar file (minus .tar)
new_dir = file[0:-4]
os.makedirs(new_dir)
tar.extractall()
os.chdir(new_dir)
This works fine up until the tar.extractall() part. Is there a way to directly extract the tar file into the target directory or am I forced to extract all and then move the files across?

new_dir = "path"
tar.extractall(path=new_dir)

Related

tarfile.open() does not extract into the right directory path

I'm trying to extract all from a tar.gz file into the same Directory. The following code works to extract all, but the files are stored in the working directory instead of the path I entered as name.
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall()
tar.close()
How do I make sure the extracted files are saved in the directory path where I need them? I've been trying at this for ages, I really can't see why this doesn't work.

You should use:
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall(path=r"P:\Lehmann\Test_Python_Project")
tar.close()

You can try using shutil.unpack_archive
def extract_all(archives, extract_path):
for filename in archives:
shutil.unpack_archive(filename, extract_path)

rename the folder of an extracted file

I have some .tar.gz files and I can extract them with:
if (fname.endswith("tar.gz")):
tar = tarfile.open(fname, "r:gz")
tar.extractall()
tar.close()
But I want to add all the info of the extracted file in a .txt file, but I don't know the folders' name that .tar.gz files have inside to do it. Is it possible to know/rename the folders if you don't know the names and extract them? Thank you.

Each entry in the tarfile has a TarInfo header. You can get that info several ways, the easiest is just by iteration. That includes the path name which you can manage with os.posixpath functions. For example, given a tgz file I happen to have on hand:
>>> tf = tarfile.open("Downloads/dbutil-0.5.0.tar.gz", "r:gz")
>>> for info in tf:
... print(info.name, "DIR" if info.isdir() else "FILE")
...
dbutil-0.5.0 DIR
dbutil-0.5.0/setup.py FILE
dbutil-0.5.0/dbutil DIR
dbutil-0.5.0/dbutil/connection.py FILE
dbutil-0.5.0/dbutil/__init__.py FILE
dbutil-0.5.0/dbutil/row.py FILE
dbutil-0.5.0/PKG-INFO FILE
dbutil-0.5.0/dbutil.egg-info DIR
dbutil-0.5.0/dbutil.egg-info/dependency_links.txt FILE
dbutil-0.5.0/dbutil.egg-info/PKG-INFO FILE
dbutil-0.5.0/dbutil.egg-info/SOURCES.txt FILE
dbutil-0.5.0/dbutil.egg-info/top_level.txt FILE
dbutil-0.5.0/setup.cfg FILE

I would suggest comparing the list of files in the directory before and after archive extracting. Additional files and folders will be those from tar file.

moving files from an unknown folder to other

I am extracting .tar.gz files which inside there are folders (with files with many extensions). I want to move all the .txt files of the folders to another, but I don't know the folders' name.
.txt files location ---> my_path/extracted/?unknown_name_folder?/file.txt
I want to do ---> my_path/extracted/file.txt
My code:
os.mkdir('extracted')
t = tarfile.open('xxx.tar.gz', 'r')
for member in t.getmembers():
if ".txt" in member.name:
t.extract(member, 'extracted')
###

I would try extracting the tar file first (See here)
import tarfile
tar = tarfile.open("xxx.tar.gz")
tar.extractall()
tar.close()
and then use the os.walk() method (See here)
import os
for root, dirs, files in os.walk('.\\xxx\\'):
txt_files = [path for path in files if path[:-4] == '.txt']
OR use the glob package to gather the txt files as suggested by #alper in the comments below:
txt_files = glob.glob('./**/*.txt', recursive=True)
This is untested, but should get you pretty close
And obviously move them once you get the list of text files
new_path = ".\\extracted\\"
for path in txt_files:
name = path[path.rfind('\\'):]
os.rename(path, new_path + name)

Why won't it expand both tar.gz files?

I have two tar.gz files, 2014_SRS.tar.gz and 2013_SRS.tar.gz. Each of the files contains a folder called SRS, which is full of text files. I downloaded these from an ftp server. I want to unzip them automatically in Python. This is my code:
import re
import ftplib
import os
import time
import tarfile
import sys
print('1')
tar = tarfile.open('2014_SRS.tar.gz')
tar.extractall()
tar.close()
print('2')
tar = tarfile.open('2013_SRS.tar.gz')
tar.extractall()
tar.close()
print('3')
This code only opens the second file. How do I fix it to open both files?
Also, I tried using a for loop to run through the whole directory. The code is shown below.
for i in os.listdir(os.getcwd()):
if i.endswith(".tar.gz"):
tar = tarfile.open(i, "r:gz")
tar.extractall()
tar.close()
However this gave me an EOFError. In addition, before I ran bit of code, I was able to unzip both files manually. However, after I run it, and after the code gives me an error, I cannot unzip the 2014_SRS file manually anymore. How do I fix this?

While this may not answer your specific question as to why both files could not be unzipped with your code , the following is one way to unzip a list of tar.gz files.
import tarfile, glob
srcDir = "/your/src/directory"
dstDir = "/your/dst/directory"
for f in glob.glob(srcDir + "/*.gz"):
t = tarfile.open(f,"r:gz")
for member in t.getmembers():
t.extract(member,dstDir)
t.close()

How to extract a selected files from sample.tar.gz file usin gpython script

I wannted to extract some selected files from a sample.tar.gz files ( do not need to be extracted all the files inside sample.tar.gz files ) the file structures are looks like below.
sample.tar.gz is the base file, under this base file following files are contained.
hello_usb.tar.gz
hello_usb1.tar.gz
hello_usb2.tar.gz
world_usb1.tar.gz
world_usb2.tar.gz
world_usb3.tar.gz
I wanted to extract only "hello_*" files are to be extracted ( while extracting it should create a fiolder name same as the file name for example, while extracting hello_usb.tar.gz file it should create a folder name called as hello_usb.tar.gz/... )
I have tried with following codes but no luck.
tar = tarfile.open("D:\Python34\Testing\sample.tar.gz", "r:gz")
only_names = tar.getnames()
for count in range(len(only_names)):
print (only_names[count][:9])
if only_names[count][:9] == "hello_usb":
tar.extract(only_names[count], path = "D:\Python34\Testing")
#tar.extractfile(only_names[count])
Your help would be highly appreciated

Something like this may work.
import shutil
import glob
import os
with tarfile.open('D:\Python34\Testing\sample.tar.gz', 'r:gz') as tar:
tar.extractall()
os.chdir('sample')
for arch in glob.glob('hello_*'):
shutil.unpack_archive(arch, 'D:\Python34\Testing')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract a seris of tar files into self titled directories - python

new_dir = "path" tar.extractall(path=new_dir)

Related

tarfile.open() does not extract into the right directory path

rename the folder of an extracted file

moving files from an unknown folder to other

Why won't it expand both tar.gz files?

How to extract a selected files from sample.tar.gz file usin gpython script

Categories

Resources