tarfile.open() does not extract into the right directory path - python

I'm trying to extract all from a tar.gz file into the same Directory. The following code works to extract all, but the files are stored in the working directory instead of the path I entered as name.
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall()
tar.close()
How do I make sure the extracted files are saved in the directory path where I need them? I've been trying at this for ages, I really can't see why this doesn't work.

You should use:
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall(path=r"P:\Lehmann\Test_Python_Project")
tar.close()

You can try using shutil.unpack_archive
def extract_all(archives, extract_path):
for filename in archives:
shutil.unpack_archive(filename, extract_path)

Related

How to modify this script so that all of my files are not deleted when trying to delete files that do not have XML files with them?

I am trying to delete all .JPG files that do not have .xml files with the same name attached to them. However, when I run this script, all of my files are deleted in my directory and not just the desired images. How can I change this script so that I can just delete the images without corresponding .xml files?
Note: The only files I have in the directory are .JPG and .XML
import os
from tqdm import tqdm
path = 'C:\\users\\my_username\\path_to_directory_with_xml_and_jpg_images'
files = os.listdir(path)
for file in tqdm(files):
filename, filetype = file.split('.')
if filetype == 'xml':
continue
imgfile = os.path.join(path, file)
xmlfile = os.path.join(path, filename + '.xml')
if not os.path.exists(xmlfile):
print('{} deleted.'.format(imgfile))
os.remove(imgfile)
It's hard to tell why your code doesn't work as we don't know the exact contents of the directory. But a simpler way to do what you want could be to use the amazing pathlib library (Python >= 3.4). The method Path.with_suffix() will make the task quite easy, together with Path.glob():
from pathlib import Path
path = Path('C:\\users\\my_username\\path_to_directory_with_xml_and_jpg_images')
for imgfile in path.glob("*.jpg"):
xmlfile = imgfile.with_suffix(".xml")
if not xmlfile.exists():
imgfile.unlink()
print(imgfile, 'deleted.')

How can I extract all .zip extension in a folder without retaining directory using python?

Here is my code I don't know how can I loop every .zip in a folder, please help me: I want all contents of 5 zip files to extracted in one folder, not including its directory name
import os
import shutil
import zipfile
my_dir = r"C:\\Users\\Guest\\Desktop\\OJT\\scanner\\samples_raw"
my_zip = r"C:\\Users\\Guest\\Desktop\\OJT\\samples\\001-100.zip"
with zipfile.ZipFile(my_zip) as zip_file:
zip_file.setpassword(b"virus")
for member in zip_file.namelist():
filename = os.path.basename(member)
# skip directories
if not filename:
continue
# copy file (taken from zipfile's extract)
source = zip_file.open(member)
target = file(os.path.join(my_dir, filename), "wb")
with source, target:
shutil.copyfileobj(source, target)
repeated question, please refer below link.
How to extract zip file recursively in Pythonn
What you are looking for is glob. Which can be used like this:
#<snip>
import glob
#assuming all your zip files are in the directory below.
for my_zip in glob.glob(r"C:\\Users\\Guest\\Desktop\\OJT\\samples\\*.zip"):
with zipfile.ZipFile(my_zip) as zip_file:
zip_file.setpassword(b"virus")
for member in zip_file.namelist():
#<snip> rest of your code here.

Extract a seris of tar files into self titled directories

I have a series of tar files that I wish to extract their containing data into a new directory. I want this directory to be an edited version of the original tar file name.
import tarfile
import glob
import os
for file in glob.glob("*.tar"):
# Open file
tar = tarfile.open(file, "r:")
# Create new diretory with name of tar file (minus .tar)
new_dir = file[0:-4]
os.makedirs(new_dir)
tar.extractall()
os.chdir(new_dir)
This works fine up until the tar.extractall() part. Is there a way to directly extract the tar file into the target directory or am I forced to extract all and then move the files across?
new_dir = "path"
tar.extractall(path=new_dir)

Why won't it expand both tar.gz files?

I have two tar.gz files, 2014_SRS.tar.gz and 2013_SRS.tar.gz. Each of the files contains a folder called SRS, which is full of text files. I downloaded these from an ftp server. I want to unzip them automatically in Python. This is my code:
import re
import ftplib
import os
import time
import tarfile
import sys
print('1')
tar = tarfile.open('2014_SRS.tar.gz')
tar.extractall()
tar.close()
print('2')
tar = tarfile.open('2013_SRS.tar.gz')
tar.extractall()
tar.close()
print('3')
This code only opens the second file. How do I fix it to open both files?
Also, I tried using a for loop to run through the whole directory. The code is shown below.
for i in os.listdir(os.getcwd()):
if i.endswith(".tar.gz"):
tar = tarfile.open(i, "r:gz")
tar.extractall()
tar.close()
However this gave me an EOFError. In addition, before I ran bit of code, I was able to unzip both files manually. However, after I run it, and after the code gives me an error, I cannot unzip the 2014_SRS file manually anymore. How do I fix this?
While this may not answer your specific question as to why both files could not be unzipped with your code , the following is one way to unzip a list of tar.gz files.
import tarfile, glob
srcDir = "/your/src/directory"
dstDir = "/your/dst/directory"
for f in glob.glob(srcDir + "/*.gz"):
t = tarfile.open(f,"r:gz")
for member in t.getmembers():
t.extract(member,dstDir)
t.close()

How to extract a selected files from sample.tar.gz file usin gpython script

I wannted to extract some selected files from a sample.tar.gz files ( do not need to be extracted all the files inside sample.tar.gz files ) the file structures are looks like below.
sample.tar.gz is the base file, under this base file following files are contained.
hello_usb.tar.gz
hello_usb1.tar.gz
hello_usb2.tar.gz
world_usb1.tar.gz
world_usb2.tar.gz
world_usb3.tar.gz
I wanted to extract only "hello_*" files are to be extracted ( while extracting it should create a fiolder name same as the file name for example, while extracting hello_usb.tar.gz file it should create a folder name called as hello_usb.tar.gz/... )
I have tried with following codes but no luck.
tar = tarfile.open("D:\Python34\Testing\sample.tar.gz", "r:gz")
only_names = tar.getnames()
for count in range(len(only_names)):
print (only_names[count][:9])
if only_names[count][:9] == "hello_usb":
tar.extract(only_names[count], path = "D:\Python34\Testing")
#tar.extractfile(only_names[count])
Your help would be highly appreciated
Something like this may work.
import shutil
import glob
import os
with tarfile.open('D:\Python34\Testing\sample.tar.gz', 'r:gz') as tar:
tar.extractall()
os.chdir('sample')
for arch in glob.glob('hello_*'):
shutil.unpack_archive(arch, 'D:\Python34\Testing')

Categories

Resources