Why the extract function stopped extracting? - python

Can someone explain and help me resolve why my function stopped extracting .tgz files when I added a counter to create folders with different names to keep the extracted folder from overwriting the previous one when I extracted another .tgz file in the same directory? What am I doing wrong? Thanks! Below are the two functions ... the first function extracts files properly; the second function extracts a numbered folder and quits.
Works:
def extract(tar_url, extract_path='.'):
print tar_url
tar = tarfile.open(tar_url, 'r')
for item in tar:
tar.extract(item, extract_path)
if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
extract(item.name, "./" + item.name[:item.name.rfind('/')])
Does not work:
global counter
counter=1
def extract(tar_url, extract_path='.'):
global counter
print tar_url
tar = tarfile.open(tar_url, 'wb')# changed from r to wb 6/12
for item in tar:
tar.extract(item, extract_path+"_%d"%counter)
counter+=1
if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
extract(item.name, "./" + item.name[:item.name.rfind('/')])
Here is how I call it in main (I'm using easygui):
direct = diropenbox(msg="Choose path to place extracted files!", title='SQA Extractor', default='c:\\Extracted')
msg = "Are you sure you want to extract?"
title = "Confirm"
os.chdir(direct)
try:
for root, dirname, files in os.walk(directory):
for file1 in files:
if file1.endswith(".tgz") or file1.endswith(".tar"):
extract(os.path.join(root, file1))

Perhaps it was this change that broke your code:
tar = tarfile.open(tar_url, 'r')
Changed to:
tar = tarfile.open(tar_url, 'wb')# changed from r to wb 6/12

Does the extract path with the counter exist?
for item in tar:
os.mkdir(extract_path + "_%d" % counter)
tar.extract(item, extract_path+"_%d" % counter)
counter+=1
if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
extract(item.name, "./" + item.name[:item.name.rfind('/')])

The original version relies on the created folder names matching the relative paths specified in the archive. In the new version, the recursive call tries to put the files into a folder without a 'tag' number, after extracting the other files at that level into one that does.
Try adding the tag to the path name used for the recursive call as well.
BTW, the Python-idiomatic spelling of item.name.find(".tar") != -1 is '.tar' in item.name.

Related

Need to upload sub-dirs and their contents, not just files in current dir

A script was supplied to me in order to upload files to a cloud bucket. You input the dir where the files you want to upload are and bingo bango, done.
What needs to happen is that there are additional sub dirs with their own files in them that I would like to transfer as well based on the input of the root dir. They would need to retain their tree structure relative to the root dir input.
Using the current code I get a write error/access denied fail. I know this is because the for loop is using os.listdir which can't parse the extra sub dirs and files but I'm not sure how to modify.
I attempted to get all the information I needed using os.walk and parsing that out. I verified with some print tests that it was looking in the right place for everything. However I hit a wall when I got this error when running the script:
folder\folder\lib\ntpath.py", line 76, in join
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not list
I understand that something is being generated as a list when it shouldn't be but I'm not sure how to go about this...
This is the original script provided to me below. I have added the variable at the top just to be a little less abstract.
local_directory_path = 'C:\folder\folder\sync\FROM_LOCAL_UPLOAD'
def upload_folder_to_cloud(self, mount_id, local_directory_path):
''' This method will list every file at the local_directory_path and then for each,
it will call the api method athera.sync.upload_file for every file in your local directory
'''
_, destination_folder = os.path.split(local_directory_path)
if not destination_folder:
self.logger.error("Make sure the provided 'local_directory_path' does not end with a '/' or a '\\'")
sys.exit(2)
destination_folder = destination_folder + "/"
self.logger.info("Folder = {}".format(destination_folder))
for filename in os.listdir(local_directory_path):
destination_path = destination_folder + filename
filepath = os.path.join(local_directory_path, filename)
with open(filepath, "rb") as f:
_, err = self.client.upload_file(self.group_id, mount_id, f, destination_path=destination_path,)
if err != None:
self.logger.error(err)
sys.exit(4)
return destination_folder
This is what I modified it to as a test:
for root, dirs, files in os.walk(local_directory_path):
srcFile = (os.path.join(files))
srcRoot = (os.path.join(root))
rootSplit = os.path.normpath(srcRoot).split(os.path.sep)
srcDirs = '/'.join(rootSplit[4:])
src = str('fixLocalFolder') + '/' + str(srcDirs) +'/'+ (files)
dst = str(srcDirs) + '/' + (files)
destination_folder = str(srcRoot) + "/"
destination_path = str(destination_folder) + str(srcFile)
filepath = os.path.join((str(srcDirs), str(srcFile)))
with open(filepath, "rb") as f:
_, err = self.client.upload_file(
self.group_id,
mount_id,
f,
destination_path=destination_path,
)
if err != None:
self.logger.error(err)
sys.exit(4)
return destination_folder
I do not code for a living so I am sure I am not going about this the right way. I apologize for any code atrocities in advance. Thank you!
I do see some issues in that code, even without testing it. Something like the following might work for that loop. (Note! Untested!).
for root, dirs, files in os.walk(local_directory_path):
# Iterate through files in the currently processed directory
for current_file in files:
# Full path to file
src_file = os.path.join(root, current_file)
# Get the sub-path relative the original root.
sub_path = os.path.relpath(root, start=destination_folder)
# Get the destination path
destination_path = os.path.join(sub_path, current_file)
with open(src_file, "rb") as f:
_, err = self.client.upload_file(
self.group_id,
mount_id,
f,
destination_path=destination_path,
)
if err != None:
self.logger.error(err)
sys.exit(4)
I believe your central problem was misunderstanding what os.walk gives you. It gives you listing of each directory (and subdirectory), one after another.
Thus the values of one iterations might look like (when listing /mydir):
# First iteration:
root = "/mydir"
dirs = ["subdir", ...]
files = ["something.doc", "something else.txt"]
# Second iteration:
root = "/mydir/subdir"
dirs = ["sub-sub-dir1", ...]
files = ["file1.txt", "file2.txt", ...]

traverse directory structure in python recursively without os.walk

I am trying to write a python2 function that will recursively traverse through the whole directory structure of a given directory, and print out the results.
All without using os.walk
This is what I have got so far:
test_path = "/home/user/Developer/test"
def scanning(sPath):
output = os.path.join(sPath, 'output')
if os.path.exists(output):
with open(output) as file1:
for line in file1:
if line.startswith('Final value:'):
print line
else:
for name in os.listdir(sPath):
path = os.path.join(sPath, name)
if os.path.isdir(path):
print "'", name, "'"
print_directory_contents(path)
scanning(test_path)
This is what I currently get, the script doesn't enter the new folder:
' test2'
'new_folder'
The issue is that it does not go further down than one directory. I would also like to able to indicate visually what is a directory, and what is a file
Try this:
import os
test_path = "YOUR_DIRECTORY"
def print_directory_contents(dir_path):
for child in os.listdir(dir_path):
path = os.path.join(dir_path, child)
if os.path.isdir(path):
print("FOLDER: " + "\t" + path)
print_directory_contents(path)
else:
print("FILE: " + "\t" + path)
print_directory_contents(test_path)
I worked on windows, verify if still working on unix.
Adapted from:
http://codegists.com/snippet/python/print_directory_contentspy_skobnikoff_python
Try this out with recursion
it is much simple and less code
import os
def getFiles(path="/var/log", files=[]):
if os.path.isfile(path):
return files.append(path)
for item in os.listdir(path):
item = os.path.join(path, item)
if os.path.isfile(item):
files.append(item)
else:
files = getFiles(item, files)
return files
for f in getFiles("/home/afouda/test", []):
print(f)
Try using a recursive function,
def lastline(fil):
with open(fil) as f:
for li in f.readlines():
if li.startswith("Final Value:"):
print(li)
## If it still doesnt work try putting 'dirs=[]' here
def lookforfiles(basepath):
contents = os.listdir(basepath)
dirs = []
i = 0
while i <= len(contents):
i += 1
for n in contents:
f = os.path.join(basepath, n)
if os.path.isfile(f):
lastline(f)
print("\n\nfile %s" % n)
elif os.path.isdir(f):
print("Adding dir")
if f in dirs:
pass
else:
dirs.append(f)
else:
for x in dirs:
print("dir %s" % x)
lookforfiles(x)
sorry if this doesn't fit your example precisely but I had a hard time understanding what you were trying to do.
This question is a duplicate of Print out the whole directory tree.
TL;TR: Use os.listdir.

How to Refactor this code to not nest more than 3 "if"

This function in python downloads a file to AWS S3 bucket. I have a problem with the code that I want to not nest the three "If" so that the code can be more clear and readable :
for fileinfo in response['Contents']:
if key in fileinfo['Key']:
if '/' in fileinfo['Key']:
filekeysplit = fileinfo['Key'].rsplit('/', 1)
if filekeysplit[1] == '':
continue
if not os.path.exists(file):
os.makedirs(file)
fileout = os.path.join(file, filekeysplit[1])
self._s3.download_file(bucket, fileinfo['Key'], fileout)
else:
self._s3.download_file(bucket, fileinfo['Key'], file)
How to do that ? thank you
You can always invert a test and use continue to skip the iteration:
for fileinfo in response['Contents']:
if key not in fileinfo['Key']:
continue
if '/' not in fileinfo['Key']:
self._s3.download_file(bucket, fileinfo['Key'], file)
continue
filekeysplit = fileinfo['Key'].rsplit('/', 1)
if filekeysplit[1] == '':
continue
if not os.path.exists(file):
os.makedirs(file)
fileout = os.path.join(file, filekeysplit[1])
self._s3.download_file(bucket, fileinfo['Key'], fileout)
We can pull out the double download_file() call; skip keys that end in / early. You only need to create directories once, outside the loop (I'd rename file to directory here too). I'd use str.rpartition() here instead of str.rsplit():
# file has been renamed to directory, no need to test,
# as `os.makedirs()` does this for us
os.makedirs(directory)
for fileinfo in response['Contents']:
if key not in fileinfo['Key']:
continue
__, slash, basename = fileinfo['Key'].rpartition('/')
if not basename and slash: # ended in "/"
continue
target = directory
if slash: # there was a partition
target = os.path.join(target, basename)
self._s3.download_file(bucket, fileinfo['Key'], target)
I would like to suggest using some features of the standard library. Like Martijn Pieters said, you should have renamed your file variable to target_directory or something like that because it could confuse the reader of you code if you don't:
for fileinfo in response['Contents']:
filepath_retrieved = fileinfo['Key']
if key in filepath_retrieved:
pathname_retrieved, filename_retrieved = os.path.split(filepath_retrieved)
if pathname_retrieved:
if filename_retrieved:
os.makedirs(target_directory, exist_ok=True)
output_filepath = os.path.join(target_directory, filename_retrieved)
self._s3.download_file(bucket, filepath_retrieved, output_filepath)
else:
output_filepath = target_directory
self._s3.download_file(bucket, filepath_retrieved, output_filepath)
The features used are:
os.path.split() instead of str.rsplit() or str.rpartition() because it looks like you wanted to retrieve a filename at the end of a filepath when you tried to do fileinfo['Key'].rsplit('/', 1)
exist_ok argument of os.makedirs() so you don't have to worry about the existence of directory before you need to create it.

Move file to a folder or make a renamed copy if it exists in the destination folder

I have a piece of code i wrote for school:
import os
source = "/home/pi/lab"
dest = os.environ["HOME"]
for file in os.listdir(source):
if file.endswith(".c")
shutil.move(file,dest+"/c")
elif file.endswith(".cpp")
shutil.move(file,dest+"/cpp")
elif file.endswith(".sh")
shutil.move(file,dest+"/sh")
what this code is doing is looking for files in a source directory and then if a certain extension is found the file is moved to that directory. This part works. If the file already exists in the destination folder of the same name add 1 at end of the file name, and before the extension and if they are multiples copies do "1++".
Like this: test1.c,test2.c, test3.c
I tried using os.isfile(filename) but this only looks at the source directory. and I get a true or false.
To test if the file exists in the destination folder you should os.path.join the dest folder with the file name
import os
import shutil
source = "/home/pi/lab"
dest = os.environ["HOME"]
# Avoid using the reserved word 'file' for a variable - renamed it to 'filename' instead
for filename in os.listdir(source):
# os.path.splitext does exactly what its name suggests - split the name and extension of the file including the '.'
name, extension = os.path.splitext(filename)
if extension == ".c":
dest_filename = os.path.join(dest, filename)
if not os.path.isfile(dest_filename):
# We copy the file as is
shutil.copy(os.path.join(source, filename) , dest)
else:
# We rename the file with a number in the name incrementing the number until we find one that is not used.
# This should be moved to a separate function to avoid code duplication when handling the different file extensions
i = 0
dest_filename = os.path.join(dest, "%s%d%s" % (name, i, extension))
while os.path.isfile(dest_filename):
i += 1
dest_filename = os.path.join(dest, "%s%d%s" % (name, i, extension))
shutil.copy(os.path.join(source, filename), dest_filename)
elif extension == ".cpp"
...
# Handle other extensions
If you want to have put the renaming logic in a separate function using glob and re this is one way:
import glob
import re
...
def rename_file(source_filename, source_ext):
filename_pattern = os.path.join(dest, "%s[0-9]*%s"
% (source_filename, source_ext))
# Contains file such as 'a1.c', 'a2.c', etc...
existing_files = glob.glob(filename_pattern)
regex = re.compile("%s([0-9]*)%s" % (source_filename, source_ext))
# Retrieve the max of the index used for this file using regex
max_index = max([int(match.group(1))
for match in map(regex.search, existing_files)
if match])
source_full_path = os.path.join(source, "%s%s"
% (source_filename, source_ext))
# Rebuild the destination filename with the max index + 1
dest_full_path = os.path.join(dest, "%s%d%s"
% (source_filename,
(max_index + 1),
source_ext))
shutil.copy(source_full_path, dest_full_path)
...
# If the file already exists i.e. replace the while loop in the else statement
rename_file(name, extension)
I din't test the code. But something like this should do the job:-
i = 0
filename = "a.txt"
while True:
if os.isfile(filename):
i+= 1
break
if i:
fname, ext = filename.split('.')
filename = fname + str(i) + '.' + ext

Passing a relative path in a function

Can someone tell me if the following function declaration is the correct way to pass a relative path to a function? The call is only taking one variable. When I include a second variable (absolute path), my function does not work.
def extract(tar_url, extract_path='.'):
The call that does not work:
extract(chosen, path)
This works, but does not extract:
extract(chosen)
Full Code:
def do_fileExtract(self, line):
defaultFolder = "Extracted"
if not defaultFolder.endswith(':') and not os.path.exists('c:\\Extracted'):
os.mkdir('c:\\Extracted')
raw_input("PLACE .tgz FILES in c:\Extracted AT THIS TIME!!! PRESS ENTER WHEN FINISHED!")
else:
pass
def extract(tar_url, extract_path='.'):
print tar_url
tar = tarfile.open(tar_url, 'r')
for item in tar:
tar.extract(item, extract_path)
if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
extract(item.name, "./" + item.name[:item.name.rfind('/')])
userpath = "Extracted"
directory = os.path.join("c:\\", userpath)
os.chdir(directory)
path=os.getcwd() #Set log path here
dirlist=os.listdir(path)
files = [fname for fname in os.listdir(path)
if fname.endswith(('.tgz','.tar'))]
for item in enumerate(files):
print "%d- %s" % item
try:
idx = int(raw_input("\nEnter the file's number:\n"))
except ValueError:
print "You fail at typing numbers."
try:
chosen = files[idx]
except IndexError:
print "Try a number in range next time."
newDir = raw_input('\nEnter a name to create a folder a the c: root directory:\n')
selectDir = os.path.join("c:\\", newDir)
path=os.path.abspath(selectDir)
if not newDir.endswith(':') and not os.path.exists(selectDir):
os.mkdir(selectDir)
try:
extract(chosen, path)
print 'Done'
except:
name = os.path.basename(sys.argv[0])
print chosen
It looks like you missed an escape character in "PLACE .tgz FILES in c:\Extracted AT THIS TIME!!! PRESS ENTER WHEN FINISHED!"
I don't think raw_input sees the prompt string as a raw string, just the user input.
But this shouldn't affect the functionality of your program.
Are you on Unix or windows? I was under the impression that the on Unix you use / forward slash instead of \\ backslash as a separator.
I tested some code on this file:
http://simkin.asu.edu/geowall/mars/merpano0.tar.gz
The following code:
>>> from os import chdir
>>> import tarfile
>>> chdir(r'C:\Users\Acer\Downloads')
>>> tar_url = 'merpano0.tar.gz'
>>> print tar_url
merpano0.tar.gz
>>> tar = tarfile.open(tar_url, 'r')
>>> extract_path = 'C:\\Users\\Acer\\Downloads\\test\\'
>>> for item in tar:
tar.extract(item, extract_path)
executed cleanly with no problems on my end. In the test directory I got a single folder with some files, exactly as in the original tar file. Can you explain what you're doing differently in your code that might be bugging up?

Categories

Resources