I'd like to find the full path of any given file, but when I tried to use
os.path.abspath("file")
it would only give me the file location as being in the directory where the program is running. Does anyone know why this is or how I can get the true path of the file?
What you are looking to accomplish here is ultimately a search on your filesystem. This does not work out too well, because it is extremely likely you might have multiple files of the same name, so you aren't going to know with certainty whether the first match you get is in fact the file that you want.
I will give you an example of how you can start yourself off with something simple that will allow you traverse through directories to be able to search.
You will have to give some kind of base path to be able to initiate the search that has to be made for the path where this file resides. Keep in mind that the more broad you are, the more expensive your searching is going to be.
You can do this with the os.walk method.
Here is a simple example of using os.walk. What this does is collect all your file paths with matching filenames
Using os.walk
from os import walk
from os.path import join
d = 'some_file.txt'
paths = []
for i in walk('/some/base_path'):
if d in i[2]:
paths.append(join(i[0], d))
So, for each iteration over os.walk you are going to get a tuple that holds:
(path, directories, files)
So that is why I am checking against location i[2] to look at files. Then I join with i[0], which is the path, to put together the full filepath name.
Finally, you can actually put the above code all in to one line and do:
paths = [join(i[0], d) for i in walk('/some/base_path') if d in i[2]]
Related
How to get Absolute file path within a specified directory and ignore dot(.) directories and dot(.)files
I have below solution, which will provide a full path within the directory recursively,
Help me with the fastest way of list files with full path and ignore .directories/ and .files to list
(Directory may contain 100 to 500 millions files )
import os
def absoluteFilePath(directory):
for dirpath,_,filenames in os.walk(directory):
for f in filenames:
yield os.path.abspath(os.path.join(dirpath, f))
for files in absoluteFilePath("/my-huge-files"):
#use some start with dot logic ? or any better solution
Example:
/my-huge-files/project1/file{1..100} # Consider all files from file1 to 100
/my-huge-files/.project1/file{1..100} # ignore .project1 directory and its files (Do not need any files under .(dot) directories)
/my-huge-files/project1/.file1000 # ignore .file1000, it is starts with dot
os.walk by definition visits every file in a hierarchy, but you can select which ones you actually print with a simple textual filter.
for file in absoluteFilePath("/my-huge-files"):
if '/.' not in file:
print(file)
When your starting path is already absolute, calling os.path.abspath on it is redundant, but I guess in the great scheme of things, you can just leave it in.
Don't use os.walk() as it will visit every file
Instead, fall back to .scandir() or .listdir() and write your own implementation
You can use pathlib.Path(test_path).expanduser().resolve() to fully expand a path
import os
from pathlib import Path
def walk_ignore(search_root, ignore_prefixes=(".",)):
""" recursively walk directories, ignoring files with some prefix
pass search_root as an absolute directory to get absolute results
"""
for dir_entry in os.scandir(Path(search_root)):
if dir_entry.name.startswith(ignore_prefixes):
continue
if dir_entry.is_dir():
yield from walk_ignore(dir_entry, ignore_prefixes=ignore_prefixes)
else:
yield Path(dir_entry)
You may be able to save some overhead with a closure, coercing to Path once, yielding only .name, etc., but that's really up to your needs
Also not to your question, but related to it; if the files are very small, you'll likely find that packing them together (several files in one) or tuning the filesystem block size will see tremendously better performance
Finally, some filesystems come with bizarre caveats specific to them and you can likely break this with oddities like symlink loops
I'm using Glob.Glob to search a folder, and the sub-folders there in for all the invoices I have. To simplify that I'm going to add the program to the context menu, and have it take the path as the first part of,
import glob
for filename in glob.glob(path + "/**/*.pdf", recursive=True):
print(filename)
I'll have it keep the list and send those files to a Printer, in a later version, but for now just writing the name is a good enough test.
So my question is twofold:
Is there anything fundamentally wrong with the way I'm writing this?
Can anyone point me in the direction of how to actually capture folder path and provide it as path-variable?
You should have a look at this question: Python script on selected file. It shows how to set up a "Sent To" command in the context menu. This command calls a python script an provides the file name sent via sys.argv[1]. I assume that also works for a directory.
I do not have Python3.5 so that I can set the flag recursive=True, so I prefer to provide you a solution which you can run on any Python version (known up to day).
The solution consists in using calling os.walk() to run explore the directories and the set build-in type.
it is better to use set instead of list as with this later one you'll need more code to check if the directory you want to add is not listed already.
So basically you can keep two sets: one for the names of files you want to print and the other one for the directories and their sub folders.
So you can adapat this solution to your class/method:
import os
path = '.' # Any path you want
exten = '.pdf'
directories_list = set()
files_list = set()
# Loop over direcotries
for dirpath, dirnames, files in os.walk(path):
for name in files:
# Check if extension matches
if name.lower().endswith(exten):
files_list.add(name)
directories_list.add(dirpath)
You can then loop over directories_list and files_list to print them out.
I need to know if pathA is a subset of, or is contained within pathB.
I'm making a little script that will walk some old volumes and find duplicate files. My general approach (and even if it's a bad one for it's inefficiency, it's just for me and it works, so I'm ok with the brute-forceness of it) has been:
Map all the files to a log
Create a hash for all the files in the log
Sort the hash list for duplicates
Move the duplicates somewhere for inspection prior to deletion
I want to be able to exclude certain directories, though (ie. System files). This is what I've written:
#self.search_dir = top level directory to be searched for duplicates
#self.mfl = master_file_list, being built by this func, a list of all files in search_dir
#self.no_crawl_list = list of files and directories to be excluded from the search
def build_master_file_list(self):
for root, directories, files in os.walk(self.search_dir):
files = [f for f in files if not f[0] == '.']
directories[:] = [d for d in directories if not d[0] == '.']
for filename in files:
filepath = os.path.join(root, filename)
if [root, filepath] in self.no_crawl_list:
pass
else:
self.mfl.write(filepath + "\n")
self.mfl.close()
But I'm pretty sure this isn't going to do what I'd intended. My goal is to have all subdirectories of anything in self.no_crawl_list excluded as well, such that:
if
/path/to/excluded_dir is added to self.no_crawl_list
then paths like /path/to/excluded_dir/sub_dir/implicitly_excluded_file.txt
will be skipped as well. I think my code is currently being entirely literal about what to skip. Short of exploding the path parts and comparing them to every possible combination in self.no_crawl_list, however, I don't know how to do this. 'Lil help? :)
As per the assistance of Lukas Graf in the comments above, I was able to build this and it works like a charm:
def is_subpath(self, path, of_paths):
if isinstance(of_paths, basestring): of_paths = [of_paths]
abs_of_paths = [os.path.abspath(of_path) for of_path in of_paths]
return any(os.path.abspath(path).startswith(subpath) for subpath in abs_of_paths)
Also, this currently doesn't account for symlinks and assumes a UNIX filesystem, see comments in original question for advice on extending this.
I have a semi-complicated directory traversal that I need to perform. I have a program that I wrote that will require a SOURCE directory as a required parameter. I then need to get a list of FIRST LEVEL directories as I really only care about the first level. I then need to take that list of FIRST LEVEL directories and look to see if it contains a specific file. If it does, I want to add that first level directory to a list. So for I have done the following to get a list of the first level source. I am just unsure on how I would look under each of these directories to find the existence of a file. (something like *.proj). Any help you could provide would be awesome to point me in the right direction!
for name in os.listdir(args.source):
if os.path.isdir(args.source):
self.tempSource.append(os.path.join(args.source, name))
You can use glob to search for files:
import glob
import os
files = []
for name in os.listdir(args.source):
if os.path.isdir(name):
files.append(glob.glob(os.path.join(args.source, name + "/*.proj"))) # match any files ending with .proj
The *.proj uses a wildcard, it will match any file ending with the extension .proj
There is another short tutorial here:
You're checking whether the source directory is a directory repeatedly. You need to check the items inside that directory to see if they are directories. You also need to join the paths because listdir only returns names.
subpaths = (os.path.join(args.source, name) for name in os.listdir(args.source))
subdirs = list(filter(os.path.isdir, subpaths))
I am using os.walk to run through a tree of directories check for some input files and then run a program if the proper inputs are there. I notice I am having a problem because of the away that os.walk is evaluating the root variable in the loop:
for root, dirs, files in os.walk('.'):# I use '.' because I want the walk to
# start where I run the script. And it
# will/can change
if "input.file" in files:
infile = os.path.join(root,"input.file")
subprocess.check_output("myprog input.file", Shell=True)
# if an input file is found, store the path to it
# and run the program
This is giving me an issue because the infile string looks like this
./path/to/input.file
When it needs to look like this for the program to be able to find it
/home/start/of/walk/path/to/input.file
I want to know if there is a better method/ a different way to use os.walk such that I can leave the starting directory arbitrary, but still be able to use the full path to any files that it finds for me. Thanks
The program I am using is written by me in c++ and I suppose I could modify it as well. But I am not asking about how to do that in this question just to clarify this question is about python's os.walk and related topics that is why there is no examples of my c++ code here.
Instead of using ., convert it to the absolute path by using os.path.abspath("."). That will convert your current path to an absolute path before you begin.