Split string using delimiter "\" in python [duplicate] - python

This question already has answers here:
Splitting path strings into drive, path and file name parts
(2 answers)
Closed 8 years ago.
I need to split the string using delimiter "\"
The string can be in any of the following format:
file://C:\Users\xyz\filename.txt
C:\Users\xyz\filename.txt
I need my script to give the output as "filename.txt"
I tried to use split('\\\\'). It does not work out. Which is the better function to use?

Suppose your string is pathName, then you can use fileName = pathName.split('\\')[-1].

Try the following steps, do notice the valid string format for using \ inside strings and to avoid \x scope error
>>> file = 'file://C:\\Users\\xyz\\filename.txt'
>>> file.split('\\')[-1]
'filename.txt'
>>> file = 'C:\\Users\\xyz\\filename.txt'
>>> file.split('\\')[-1]
'filename.txt'

Two issues here.
Path splitting
You'd normally use os.path.split to work with paths:
>>> import os.path
>>> p=r'C:\Users\xyz\filename.txt'
>>> head, tail = os.path.split(p)
>>> head
'C:\\Users\\xyz'
>>> tail
'filename.txt'
Caveat: os.path works with the path format of the operating system it's used on. If you know you specifically want to work with Windows paths (even when your program is ran on Linux or OSX), then instead of the os.path you'd work with the ntpath module. See the note:
Note Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. However, you can also import and use the individual modules if you want to manipulate a path that is always in one of the different formats. They all have the same interface:
posixpath for UNIX-style paths
ntpath for Windows paths
macpath for old-style MacOS paths
os2emxpath for OS/2 EMX paths
Format support
You have 2 formats to support:
file://C:\Users\xyz\filename.txt
C:\Users\xyz\filename.txt
2 is a normal Windows path, and 1 is... Frankly, I have no idea what that is. It kind of looks like a file URI, but uses Windows-style delimiters (backslashes). This is strange. When I open a PDF in Chrome on Windows the URI looks different:
file:///C:/Users/kos/Downloads/something.pdf
and I'll assume that's the format you're interested in. If not, then I can't vouch for what you're dealing with and you can make some educated guess on how to interpret it (drop the file:// prefix and treat it as a Windows path?).
An URI you can split into meaningful parts using the urlparse module (see urllib.parse for python 3), and once you've extracted the path part of the URI, you can just .split('/') it (URI grammar is simple enough to allow that). Here's what happens if you use this module on a file:// URI:
>>> r = urlparse.urlparse(r'file:///C:/Users/xyz/filename.txt')
>>> r
ParseResult(scheme='file', netloc='', path='/C:/Users/xyz/filename.txt', params='', query='', fragment='')
>>> r.path
'/C:/Users/xyz/filename.txt'
>>> r.path.lstrip('/').split('/')
['C:', 'Users', 'xyz', 'filename.txt']
Please read this URI scheme description to have a better idea how this format looks like and why there are three slashes after file:.

Related

Is it possible to make pathlib to treat trailing slash in a Path as significant?

There has been multiple discussions about the issue when dealing with trailing slashes in pathlib.Path, on Unix systems in particular such as https://bugs.python.org/issue21039 and https://bugs.python.org/issue39140.
Given the pathlib.Path constructed from a string, I wonder what would be the best way to make sure a trailing slash is preserved in the Path object the same way the os module does it?
>>> os.path.dirname("/a/b/")
'/a/b'
>>> os.path.dirname("/a/b")
'/a'
os module understands the difference between "/a/b/" and "/a/b", but pathlib doesn't:
>>> Path("/a/b/").parent
PosixPath('/a')
Is there any way to be able to differentiate between paths that are pointing to a file (without a trailing slash) and to a directory (that has a trailing slash)? Or I'd have to switch to using os module in this particular case?
If it's not possible, what would be a reasonable workaround to take advantage of pathlib and deal with the trailing slash issue?
This looks like a low-level path manipulation, I would go with the os module (as suggested by the pathlib documentation)
This would add the trailing slash, OS independently:
os.path.join(os.path.abspath("/a/b/"), "")

How to correctly decode window path in python

I have a question on correctly decode a window path in python. I tried several method online but didn't find a solution. I assigned the path (folder directory) to a variable and would like to read it as raw. However, there is '\' combined with number and python can't read correctly, any suggestion? Thanks
fld_dic = 'D:TestData\20190917_DT19_HigherFlowRate_StdCooler\DM19_Data'
I would like to have:
r'D:TestData\20190917_DT19_HigherFlowRate_StdCooler\DM19_Data'
And I tried:
fr'{fld_dic}' it gives me answer as: 'D:TestData\x8190917_DT19_HigherFlowRate_StdCooler\\DM19_Data'
which is not what I want. Any idea how to change to raw string from an assigned variable with '\' and number combined?
Thanks
The problem's root caused is string assigning. When you assigning like that path='c:\202\data' python encode this string according to default UNICODE. You need to change your assigning. You have to assige as raw string. Also like this path usage is not best practice. It will occure proble continuesly. It is not meet with PEP8
You should not be used path variable as string. It will destroy python cross platform advantage.
You should use pathlib or os.path. I recommend pathlib. It have pure windows and linux path. Also while getting path use this path. If You get path from and input you can read it as raw text and convert to pathlib instance.
Check this link:
https://docs.python.org/3/library/pathlib.html
It works but not best practice. Just replace path assigning as raw string/
import os
def fcn(path=r'C:\202\data'):
print(path)
os.chdir(path)
fcn()

Convert WindowsPath to PosixPath

I am using pathlib to manage my paths in my Python project using the Path class.
When I am using Linux, everything works fine. But on Windows, I have a little issue.
At some point in my code, I have to write a JavaScript file which lists the references to several other files. These paths have to be written in POSIX format. But when I do str(my_path_instance) on Windows, The path is written in Windows format.
Do you know a simple way to convert a WindowsPath to a PosixPath with pathlib?
pathlib has an as_posix method to convert from Windows to POSIX paths:
pathlib.path(r'foo\bar').as_posix()
Apart from this, you can generally construct system-specific paths by calling the appropriate constructor. The documentation states that
You cannot instantiate a WindowsPath when running on Unix, but you can instantiate PureWindowsPath. [or vice versa]
So use the Pure* class constructor:
str(pathlib.PurePosixPath(your_posix_path))
However, this won’t do what you want if your_posix_path contains backslashes, since \ (= Windows path separator) is just a regular character as far as POSIX is concerned. So a\b is valid POSIX filename, not a path denoting a file b inside a directory b, and PurePosixPath will preserve this interpretation:
>>> str(pathlib.PurePosixPath(r'a\b'))
'a\\b'
To convert Windows to POSIX paths, use the PureWindowsPath class and convert via as_posix:
>>> pathlib.PureWindowsPath(r'a\b').as_posix()
'a/b'
Python pathlib if you want to manipulate Windows paths on a Unix machine (or vice versa) - you cannot instantiate a WindowsPath when running on Unix, but you can instantiate PureWindowsPath/PurePosixPath
.

Why is glob ignoring some directories?

I'm trying to find all *.txt files in a directory with glob(). In some cases, glob.glob('some\path\*.txt') gives an empty string, despite existing files in the given directories. This is especially true, if path is all lower-case or numeric.
As a minimal example I have two folders a and A on my C: drive both holding one Test.txt file.
import glob
files1 = glob.glob('C:\a\*.txt')
files2 = glob.glob('C:\A\*.txt')
yields
files1 = []
files2 = ['C:\\A\\Test.txt']
If this is by design, is there any other directory name, that leads to such unexpected behaviour?
(I'm working on win 7, with Python 2.7.10 (32bit))
EDIT: (2019) Added an answer for Python 3 using pathlib.
The problem is that \a has a special meaning in string literals (bell char).
Just double backslashes when inserting paths in string literals (i.e. use "C:\\a\\*.txt").
Python is different from C because when you use backslash with a character that doesn't have a special meaning (e.g. "\s") Python keeps both the backslash and the letter (in C instead you would get just the "s").
This sometimes hides the issue because things just work anyway even with a single backslash (depending on what is the first letter of the directory name) ...
I personally avoid using double-backslashes in Windows and just use Python's handy raw-string format. Just change your code to the following and you won't have to escape the backslashes:
import glob
files1 = glob.glob(r'C:\a\*.txt')
files2 = glob.glob(r'C:\A\*.txt')
Notice the r at the beginning of the string.
As already mentioned, the \a is a special character in Python. Here's a link to a list of Python's string literals:
https://docs.python.org/2/reference/lexical_analysis.html#string-literals
As my original answer attracted more views than expected and some time has passed. I wanted to add an answer that reliably solves this kind of problems and is also cross-plattform compatible. It's in python 3 on Windows 10, but should also work on *nix systems.
from pathlib import Path
filepath = Path(r'C:\a')
filelist = list(filepath.glob('*.txt'))
--> [WindowsPath('C:/a/Test.txt')]
I like this solution better, as I can copy and paste paths directly from windows explorer, without the need to add or double backslashes etc.

how to simplify use of pathlib objects to work with functions looking for strings

I quite like using pathlib for path management in python, but the drawback of using this package is that a lot of commands, like shutil.copy, .move, the
builtin open requires a string and not a PosixPath object, giving as error
TypeError: coercing to Unicode: need string or buffer, PosixPath found
The logical solution is of course to use str().
My question is how would it be possible (if it would be) to modify pathlib objects such that a call like open(pathlib.PosixPath) would work without the use of str().
The answer by #Navith is what you should now do in python 3.4. However, PEP-519 is proposed and accepted in python 3.6 to address this valid concern.
This PEP proposes a protocol for classes which represent a file system path to be able to provide a str or bytes representation. Changes to Python's standard library are also proposed to utilize this protocol where appropriate to facilitate the use of path objects where historically only str and/or bytes file system paths are accepted.
So in python 3.6 the standard library methods you refer to now accept Paths, and the answer to your question is use python 3.6.
Path objects have open, rmdir, chmod, ... methods that work the way you'd expect.
>>> import pathlib
>>> a_path = pathlib.Path("a.txt")
>>> a_txt = a_path.open("w", encoding="UTF-8")
>>> a_txt
<_io.TextIOWrapper name='a.txt' mode='w' encoding='UTF-8'>

Categories

Resources