Zip Structure in Python Trying to Read Folders Only - python

New to Python, an interesting problem I encountered and trying to solve the zip file to identify the top level directory using folders. I do not know how to put words together to make sense. I will write up a pseudo-code to give you the feel of it.
for folders in Zipfile.namelist():
if /zipfile/1folder1/:
return PASS
elif:
/zipfile/1folder1/
/zipfile/1folder2/
/zipfile/1folder1/2folder1/
return FAIL
I am interested in reading the folder names, not file names. I tried the reduce() method but to no avail because it will go all the way to the lowest level folder which I do not want. I want only the top level, or the first, folder as in /zipfile/1folder1/. It has to be only and one folder, not multiple folders at the top level directory.
I cannot figure out a method to read the folder and create a for loop iteration to retrieve the index value to determine the number of folders at top level directory.
Thanks!

I found the workaround to it. It was the os.sep including .split and .keys methods that did the trick and performed the iteration.
for folders in Zipfile.namelist()[1:]:
seps = len(folders.split(os.sep))
if seps > 2:
if seps not in dir_count.keys():
count[seps] = 0
count += 1
if count[seps] > 1:
raise exception

Related

Finding File Path in Python

I'd like to find the full path of any given file, but when I tried to use
os.path.abspath("file")
it would only give me the file location as being in the directory where the program is running. Does anyone know why this is or how I can get the true path of the file?
What you are looking to accomplish here is ultimately a search on your filesystem. This does not work out too well, because it is extremely likely you might have multiple files of the same name, so you aren't going to know with certainty whether the first match you get is in fact the file that you want.
I will give you an example of how you can start yourself off with something simple that will allow you traverse through directories to be able to search.
You will have to give some kind of base path to be able to initiate the search that has to be made for the path where this file resides. Keep in mind that the more broad you are, the more expensive your searching is going to be.
You can do this with the os.walk method.
Here is a simple example of using os.walk. What this does is collect all your file paths with matching filenames
Using os.walk
from os import walk
from os.path import join
d = 'some_file.txt'
paths = []
for i in walk('/some/base_path'):
if d in i[2]:
paths.append(join(i[0], d))
So, for each iteration over os.walk you are going to get a tuple that holds:
(path, directories, files)
So that is why I am checking against location i[2] to look at files. Then I join with i[0], which is the path, to put together the full filepath name.
Finally, you can actually put the above code all in to one line and do:
paths = [join(i[0], d) for i in walk('/some/base_path') if d in i[2]]

Using a list to find and move specific files - python 2.7

I've seen a lot of people asking questions about searching through folders and creating a list of files, but I haven't found anything that has helped me do the opposite.
I have a csv file with a list of files and their extensions (xxx0.laz, xxx1.laz, xxx2.laz, etc). I need to read through this list and then search through a folder for those files. Then I need to move those files to another folder.
So far, I've taken the csv and created a list. At first I was having trouble with the list. Each line had a "\n" at the end, so I removed those. From the only other example I've found... [How do I find and move certain files based on a list in excel?. So I created a set from the list. However, I'm not really sure why or if I need it.
So here's what I have:
id = open('file.csv','r')
list = list(id)
list_final = ''.join([item.rstrip('\n') for item in list])
unique_identifiers = set(list_final)
os.chdir(r'working_dir') # I set this as the folder to look through
destination_folder = 'folder_loc' # Folder to move files to
for identifier in unique_identifiers:
for filename in glob.glob('%s_*' % identifier)"
shutil.move(filename, destination_folder)
I've been wondering about this ('%s_*' % identifier) with the glob function. I haven't found any examples with this, perhaps that needs to be changed?
When I do all that, I don't get anything. No errors and no actual files moved...
Perhaps I'm going about this the wrong way, but that is the only thing I've found so far anywhere.
its really not hard:
for fname in open("my_file.csv").read().split(","):
shutil.move(fname.strip(),dest_dir)
you dont need a whole lot of things ...
also if you just want all the *.laz files in a source directory you dont need a csv at all ...
for fname in glob.glob(os.path.join(src_dir,"*.laz")):
shutil.move(fname,dest_dir)

Python: how to discern if a path is within another path?

I need to know if pathA is a subset of, or is contained within pathB.
I'm making a little script that will walk some old volumes and find duplicate files. My general approach (and even if it's a bad one for it's inefficiency, it's just for me and it works, so I'm ok with the brute-forceness of it) has been:
Map all the files to a log
Create a hash for all the files in the log
Sort the hash list for duplicates
Move the duplicates somewhere for inspection prior to deletion
I want to be able to exclude certain directories, though (ie. System files). This is what I've written:
#self.search_dir = top level directory to be searched for duplicates
#self.mfl = master_file_list, being built by this func, a list of all files in search_dir
#self.no_crawl_list = list of files and directories to be excluded from the search
def build_master_file_list(self):
for root, directories, files in os.walk(self.search_dir):
files = [f for f in files if not f[0] == '.']
directories[:] = [d for d in directories if not d[0] == '.']
for filename in files:
filepath = os.path.join(root, filename)
if [root, filepath] in self.no_crawl_list:
pass
else:
self.mfl.write(filepath + "\n")
self.mfl.close()
But I'm pretty sure this isn't going to do what I'd intended. My goal is to have all subdirectories of anything in self.no_crawl_list excluded as well, such that:
if
/path/to/excluded_dir is added to self.no_crawl_list
then paths like /path/to/excluded_dir/sub_dir/implicitly_excluded_file.txt
will be skipped as well. I think my code is currently being entirely literal about what to skip. Short of exploding the path parts and comparing them to every possible combination in self.no_crawl_list, however, I don't know how to do this. 'Lil help? :)
As per the assistance of Lukas Graf in the comments above, I was able to build this and it works like a charm:
def is_subpath(self, path, of_paths):
if isinstance(of_paths, basestring): of_paths = [of_paths]
abs_of_paths = [os.path.abspath(of_path) for of_path in of_paths]
return any(os.path.abspath(path).startswith(subpath) for subpath in abs_of_paths)
Also, this currently doesn't account for symlinks and assumes a UNIX filesystem, see comments in original question for advice on extending this.

Best Practices when matching large number of files against large number of regex strings

I have a directory with several thousand files. I want to sort them into directories based on file name, but many of the file names are very similar.
my thinking is that i'm going to have to write up a bunch of regex strings, and then do some sort of looping. this is my question:
is one of these two options more optimal than the other? do i loop over all my files, and for each file check it against my regexs, keeping track of how many match? or do i do the opposite and loop over the regex and touch each file?
i had though to do it in python, as thats my strongest language, but i'm open to other ideas.
this is some code i use for a program of mine which i have modified for your purposes, it gets a directory (sort_dir) goes every every file there, and creates directories based on the filenames, then moves the files into those directories. since you have not provided any information as to where or how you want to sort your files, you will have to add that part where i have mentioned:
def sort_files(sort_dir):
for f in os.listdir(sort_dir):
if not os.path.isfile(os.path.join(sort_dir, f)):
continue
# this is the folder names to be created, what do you want them to be?
destinationPath = os.path.join(sort_dir,f) #right now its just the filename...
if not os.path.exists(destinationPath):
os.mkdir(destinationPath)
if os.path.exists(os.path.join(destinationPath,f)):
at = True
while at:
try:
shutil.move(os.path.join(sort_dir,f), \
os.path.join(destinationPath,f))
at = False
except:
continue
else:
shutil.move(os.path.join(sort_dir,f), destinationPath)

In python, how do I copy files into a directory and stop once that directory reaches a certain size

I am still very new to Python, but I am trying to create a program which will, among other things, copy the contents of a directory into a set of directories that will fit onto a disc (I have it set up the following variables to be the size capacities I want, and set up an input statement to say which one applies):
BluRayCap = 25018184499
DVDCap = 4617089843
CDCap = 681574400
So basically I want to copy the contents of a beginning directory into another directory, and as needed, create another directory in order for the contents to fit into discs.
I kind of hit a roadblock here. Thanks!
You can use os.path.getsize to get the size of a file, and you can use os.walk to walk a directory tree, so something like the following (I'll let you implement CreateOutputDirectory and CopyFileToDirectory):
current_destination = CreateOutputDirectory()
for root, folders, files in os.walk(input_directory):
for file in files:
file_size = os.path.getsize(file)
if os.path.getsize(current_destination) + file_size > limit:
current_destination = CreateOutputDirectory()
CopyFileToDirectory(root, file, current_destination)
Also, you may find the Python Search extension for Chrome helpful for looking up this documentation.
Michael Aaron Safyan's answer is good.
Besides, you can use shutil module to CreateOutputDirectory and CopyFileToDirectory

Categories

Resources