Let's say I have a string that is a root directory that has been entered
'C:/Users/Me/'
Then I use os.listdir() and join with it to create a list of subdirectories.
I end up with a list of strings that are like below:
'C:/Users/Me/Adir\Asubdir\'
and so on.
I want to split the subdirectories and capture each directory name as its own element. Below is one attempt. I am seemingly having issues with the \ and / characters. I assume \ is escaping, so '[\\/]' to me that says look for \ or / so then '[\\/]([\w\s]+)[\\/]' as a match pattern should look for any word between two slashes... but the output is only ['/Users/'] and nothing else is matched. So I then I add a escape for the forward slash.
'[\\\/]([\w\s]+)[\\\/]'
However, my output then only becomes ['Users','ADir'] so that is confusing the crud out of me.
My question is namely how do I tokenize each directory from a string using both \ and / but maybe also why is my RE not working as I expect?
Minimal Example:
import re, os
info = re.compile('[\\\/]([\w ]+)[\\\/]')
root = 'C:/Users/i12500198/Documents/Projects/'
def getFiles(wdir=os.getcwd()):
files = (os.path.join(wdir,file) for file in os.listdir(wdir)
if os.path.isfile(os.path.join(wdir,file)))
return list(files)
def getDirs(wdir=os.getcwd()):
dirs = (os.path.join(wdir,adir) for adir in os.listdir(wdir)
if os.path.isdir(os.path.join(wdir,adir)))
return list(dirs)
def walkSubdirs(root,below=[]):
subdirs = getDirs(root)
for aDir in subdirs:
below.append(aDir)
walkSubdirs(aDir,below)
return below
subdirs = walkSubdirs(root)
for aDir in subdirs:
files = getFiles(aDir)
for f in files:
finfo = info.findall(f)
print(f)
print(finfo)
I want to split the subdirectories and capture each directory name as its own element
Instead of regular expressions, I suggest you use one of Python's standard functions for parsing filesystem paths.
Here is one using pathlib:
from pathlib import Path
p = Path("C:/Users/Me/ADir\ASub Dir\2 x 2 Dir\\")
p.parts
#=> ('C:\\', 'Users', 'Me', 'ADir', 'ASub Dir\x02 x 2 Dir')
Note that the behaviour of pathlib.Path depends on the system running Python. Since I'm on a Linux machine, I actually used pathlib.PureWindowsPath here. I believe the output should be accurate for those of you on Windows.
I'm working on a program that needs to split and rejoin some file paths, and I'm not sure why os.path.join(*list) and os.path.sep.join(list) produce different results when there is a drive letter present in the separated path.
import os
path = 'C:\\Users\\choglan\\Desktop'
separatedPath = path.split(os.path.sep)
# ['C:', 'Users', 'choglan', 'Desktop']
path = os.path.sep.join(separatedPath)
# C:\\Users\\choglan\\Desktop
print(path)
path = os.path.join(*separatedPath)
# C:Users\\choglan\\Desktop
print(path)
Why does this happen? And should I just use os.path.sep.join(list) for my program even though os.path.join(*list) seems to be more commonly used?
os.path.join is not intended to be the inverse of path.split(os.path.sep). If you read the docs, you'll find a description of a much more complicated process than just sticking os.path.sep between the arguments. The most relevant part is the following:
On Windows... Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.
You should probably be using pathlib.PurePath(path).parts rather than path.split(os.path.sep).
os.path.sep isn't an independent object with its own methods, it's a string. So the join method on it just pastes together the strings with that character between them.
>>> type(os.path.sep)
<class 'str'>
You can use join from any string.
>>> '|'.join(separatedPath)
'C:|Users|choglan|Desktop'
In Python what command should I use to get the name of the folder which contains the file I'm working with?
"C:\folder1\folder2\filename.xml"
Here "folder2" is what I want to get.
The only thing I've come up with is to use os.path.split twice:
folderName = os.path.split(os.path.split("C:\folder1\folder2\filename.xml")[0])[1]
Is there any better way to do it?
You can use dirname:
os.path.dirname(path)
Return the directory name of pathname path. This is the first element
of the pair returned by passing path to the function split().
And given the full path, then you can split normally to get the last portion of the path. For example, by using basename:
os.path.basename(path)
Return the base name of pathname path. This is the second element of
the pair returned by passing path to the function split(). Note that
the result of this function is different from the Unix basename
program; where basename for '/foo/bar/' returns 'bar', the basename()
function returns an empty string ('').
All together:
>>> import os
>>> path=os.path.dirname("C:/folder1/folder2/filename.xml")
>>> path
'C:/folder1/folder2'
>>> os.path.basename(path)
'folder2'
You are looking to use dirname. If you only want that one directory, you can use os.path.basename,
When put all together it looks like this:
os.path.basename(os.path.dirname('dir/sub_dir/other_sub_dir/file_name.txt'))
That should get you "other_sub_dir"
The following is not the ideal approach, but I originally proposed,using os.path.split, and simply get the last item. which would look like this:
os.path.split(os.path.dirname('dir/sub_dir/other_sub_dir/file_name.txt'))[-1]
this is pretty old, but if you are using Python 3.4 or above use PathLib.
# using OS
import os
path=os.path.dirname("C:/folder1/folder2/filename.xml")
print(path)
print(os.path.basename(path))
# using pathlib
import pathlib
path = pathlib.PurePath("C:/folder1/folder2/filename.xml")
print(path.parent)
print(path.parent.name)
os.path.dirname is what you are looking for -
os.path.dirname(r"C:\folder1\folder2\filename.xml")
Make sure you prepend r to the string so that its considered as a raw string.
Demo -
In [46]: os.path.dirname(r"C:\folder1\folder2\filename.xml")
Out[46]: 'C:\\folder1\\folder2'
If you just want folder2 , you can use os.path.basename with the above, Example -
os.path.basename(os.path.dirname(r"C:\folder1\folder2\filename.xml"))
Demo -
In [48]: os.path.basename(os.path.dirname(r"C:\folder1\folder2\filename.xml"))
Out[48]: 'folder2'
you can use pathlib
from pathlib import Path
Path(r"C:\folder1\folder2\filename.xml").parts[-2]
The output of the above was this:
'folder2'
You could get the full path as a string then split it into a list using your operating system's separator character.
Then you get the program name, folder name etc by accessing the elements from the end of the list using negative indices.
Like this:
import os
strPath = os.path.realpath(__file__)
print( f"Full Path :{strPath}" )
nmFolders = strPath.split( os.path.sep )
print( "List of Folders:", nmFolders )
print( f"Program Name :{nmFolders[-1]}" )
print( f"Folder Name :{nmFolders[-2]}" )
print( f"Folder Parent:{nmFolders[-3]}" )
The output of the above was this:
Full Path :C:\Users\terry\Documents\apps\environments\dev\app_02\app_02.py
List of Folders: ['C:', 'Users', 'terry', 'Documents', 'apps', 'environments', 'dev', 'app_02', 'app_02.py']
Program Name :app_02.py
Folder Name :app_02
Folder Parent:dev
I made an improvement on the solutions available, namely the snippet that works with all of,
File
Directory with a training slash
Directory without a training slash
My solution is,
from pathlib import Path
def path_lastname(s):
Path(s).with_name("foo").parts[-2]
Explanation
Path(s) - Creates a custom Path object out of s without resolving it.
.with_name("foo") - Adds a fake file foo to the path
.parts[-2] returns second last part of the string. -1 part will be foo
I'm using 2 ways to get the same response:
one of them use:
os.path.basename(filename)
due to errors that I found in my script I changed to:
Path = filename[:(len(filename)-len(os.path.basename(filename)))]
it's a workaround due to python's '\\'
When I run the following script:
c:\Program Files\foo\bar\scripy.py
How can I refer to directory 'foo'?
Is there a convenient way of using relative paths?
I've done it before with the string module, but there must be a better way (I couldn't find it in os.path).
The os.path module includes various functions for working with paths like this. The convention in most operating system is to use .. to go "up one level", so to get the outside directory you could do this:
import os
import os.path
current_dir = os.getcwd() # find the current directory
print current_dir # c:\Program Files\foo\bar\scripy.py
parent = os.path.join(current_dir, "..") # construct a path to its parent
print parent # c:\Program Files\foo\bar\..
normal_parent = os.path.normpath(parent) # "normalize" the path
print normal_parent # c:\Program Files\foo
# or on one line:
print os.path.normpath(os.path.join(os.getcwd(), ".."))
os.path.dirname(path)
Will return the second half of a SPLIT that is performed on the path parameter. (head - the directory and tail, the file) Put simply it returns the directory the path is in. You'll need to do it twice but this is probably the best way.
Python Docs on path functions:
http://docs.python.org/library/os.path#os.path.expanduser
I have recently started using the unipath library instead of os.path. Its object-oriented representations of paths are much simpler:
from unipath import Path
original = Path(__file__) # .absolute() # r'c:\Program Files\foo\bar\scripy.py'
target = original.parent.parent
print target # Path(u'c:\\Program Files\\foo')
Path is a subclass of str so you can use it with standard filesystem functions, but it also provides alternatives for many of them:
print target.isdir() # True
numbers_dir = target.child('numbers')
print numbers_dir.exists() # False
numbers_dir.mkdir()
print numbers_dir.exists() # True
for n in range(10):
file_path = numbers_dir.child('%s.txt' % (n,))
file_path.write_file("Hello world %s!\n" % (n,), 'wt')
This is a bit tricky. For instance, the following code:
import sys
import os
z = sys.argv[0]
p = os.path.dirname(z)
f = os.path.abspath(p)
print "argv[0]={0} , dirname={1} , abspath={2}\n".format(z,p,f)
gives this output on Windows
argv[0]=../zzz.py , dirname=.. , abspath=C:\Users\michael\Downloads
First of all, notice that argv has the slash which I typed in the command python ../zzz.py and the absolute path has the normal Windows backslashes. If you need to be cross platform you should probably refrain from putting regular slashes on Python command lines, and use os.sep to refer to the character that separated pathname components.
So far I have only partly answered your question. There are a couple of ways to use the value of f to get what you want. Brute force is to use something like:
targetpath = f + os.sep + ".." + os.sep + ".."
which would result in something like C:\Users\michael\Downloads\..\.. on Windows and /home/michael/../.. on Unix. Each .. goes back one step and is the equivalent of removing the pathname component.
But you could do better by breaking up the path:
target = f.split(os.sep)
targetpath = os.sep.join(target[:-2]
and rejoining all but the last two bits to get C:\Users on Windows and / on Unix. If you do that it might be a good idea to check that there are enough pathname components to remove.
Note that I ran the program above by typing python ../xxx.py. In other words I was not in the same working directory as the script, therefore getcwd() would not be useful.
The below code will not join, when debugged the command does not store the whole path but just the last entry.
os.path.join('/home/build/test/sandboxes/', todaystr, '/new_sandbox/')
When I test this it only stores the /new_sandbox/ part of the code.
The latter strings shouldn't start with a slash. If they start with a slash, then they're considered an "absolute path" and everything before them is discarded.
Quoting the Python docs for os.path.join:
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
Note on Windows, the behaviour in relation to drive letters, which seems to have changed compared to earlier Python versions:
On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.
The idea of os.path.join() is to make your program cross-platform (linux/windows/etc).
Even one slash ruins it.
So it only makes sense when being used with some kind of a reference point like
os.environ['HOME'] or os.path.dirname(__file__).
os.path.join() can be used in conjunction with os.path.sep to create an absolute rather than relative path.
os.path.join(os.path.sep, 'home','build','test','sandboxes',todaystr,'new_sandbox')
Do not use forward slashes at the beginning of path components, except when refering to the root directory:
os.path.join('/home/build/test/sandboxes', todaystr, 'new_sandbox')
see also: http://docs.python.org/library/os.path.html#os.path.join
To help understand why this surprising behavior isn't entirely terrible, consider an application which accepts a config file name as an argument:
config_root = "/etc/myapp.conf/"
file_name = os.path.join(config_root, sys.argv[1])
If the application is executed with:
$ myapp foo.conf
The config file /etc/myapp.conf/foo.conf will be used.
But consider what happens if the application is called with:
$ myapp /some/path/bar.conf
Then myapp should use the config file at /some/path/bar.conf (and not /etc/myapp.conf/some/path/bar.conf or similar).
It may not be great, but I believe this is the motivation for the absolute path behaviour.
It's because your '/new_sandbox/' begins with a / and thus is assumed to be relative to the root directory. Remove the leading /.
Try combo of split("/") and * for strings with existing joins.
import os
home = '/home/build/test/sandboxes/'
todaystr = '042118'
new = '/new_sandbox/'
os.path.join(*home.split("/"), todaystr, *new.split("/"))
How it works...
split("/") turns existing path into list: ['', 'home', 'build', 'test', 'sandboxes', '']
* in front of the list breaks out each item of list its own parameter
To make your function more portable, use it as such:
os.path.join(os.sep, 'home', 'build', 'test', 'sandboxes', todaystr, 'new_sandbox')
or
os.path.join(os.environ.get("HOME"), 'test', 'sandboxes', todaystr, 'new_sandbox')
do it like this, without too the extra slashes
root="/home"
os.path.join(root,"build","test","sandboxes",todaystr,"new_sandbox")
Try with new_sandbox only
os.path.join('/home/build/test/sandboxes/', todaystr, 'new_sandbox')
os.path.join("a", *"/b".split(os.sep))
'a/b'
a fuller version:
import os
def join (p, f, sep = os.sep):
f = os.path.normpath(f)
if p == "":
return (f);
else:
p = os.path.normpath(p)
return (os.path.join(p, *f.split(os.sep)))
def test (p, f, sep = os.sep):
print("os.path.join({}, {}) => {}".format(p, f, os.path.join(p, f)))
print(" join({}, {}) => {}".format(p, f, join(p, f, sep)))
if __name__ == "__main__":
# /a/b/c for all
test("\\a\\b", "\\c", "\\") # optionally pass in the sep you are using locally
test("/a/b", "/c", "/")
test("/a/b", "c")
test("/a/b/", "c")
test("", "/c")
test("", "c")
Note that a similar issue can bite you if you use os.path.join() to include an extension that already includes a dot, which is what happens automatically when you use os.path.splitext(). In this example:
components = os.path.splitext(filename)
prefix = components[0]
extension = components[1]
return os.path.join("avatars", instance.username, prefix, extension)
Even though extension might be .jpg you end up with a folder named "foobar" rather than a file called "foobar.jpg". To prevent this you need to append the extension separately:
return os.path.join("avatars", instance.username, prefix) + extension
you can strip the '/':
>>> os.path.join('/home/build/test/sandboxes/', todaystr, '/new_sandbox/'.strip('/'))
'/home/build/test/sandboxes/04122019/new_sandbox'
I'd recommend to strip from the second and the following strings the string os.path.sep, preventing them to be interpreted as absolute paths:
first_path_str = '/home/build/test/sandboxes/'
original_other_path_to_append_ls = [todaystr, '/new_sandbox/']
other_path_to_append_ls = [
i_path.strip(os.path.sep) for i_path in original_other_path_to_append_ls
]
output_path = os.path.join(first_path_str, *other_path_to_append_ls)
The problem is your laptop maybe running Window. And Window annoyingly use back lash instead of forward slash'/'. To make your program cross-platform (linux/windows/etc).
You shouldn't provide any slashes (forward or backward) in your path if you want os.path.join to handle them properly. you should using:
os.path.join(os.environ.get("HOME"), 'test', 'sandboxes', todaystr, 'new_sandbox')
Or throw some Path(__file__).resolve().parent (path to parent of current file) or anything so that you don't use any slash inside os.path.join
Please refer following code snippet for understanding os.path.join(a, b)
a = '/home/user.name/foo/'
b = '/bar/file_name.extension'
print(os.path.join(a, b))
>>> /bar/file_name.extension
OR
a = '/home/user.name/foo'
b = '/bar/file_name.extension'
print(os.path.join(a, b))
>>> /bar/file_name.extension
But, when
a = '/home/user.name/foo/'
b = 'bar/file_name.extension'
print(os.path.join(a, b))
>>> /bar/file_name.extension
OR
a = '/home/user.name/foo'
b = 'bar/file_name.extension'
print(os.path.join(a, b))
>>> /home/user.name/foo/bar/file_name.extension