Python simple string method - python

I basically want to call only the part of my string that falls before the "."
For example, if my filename is sciPHOTOf105w0.fits, I want to call "sciPHOTOf105w" as its own string so that I can use it to name a new file that's related to it. How do you do this? I can't just use numeral values "ex. if my file name is 'file', file[5:10]." I need to be able to collect everything up to the dot without having to count, because the file names can be of different lengths.

You can also use os.path like so:
>>> from os.path import splitext
>>> splitext('sciPHOTOf105w0.fits') # separates extension from file name
('sciPHOTOf105w0', '.fits')
>>> splitext('sciPHOTOf105w0.fits')[0]
'sciPHOTOf105w0'
If your file happens to have a longer path, this approach will also account for your full path.

import os.path
filename = "sciPHOTOf105w0.fits"
root, ext = os.path.splitext(filename)
print "root is: %s" % root
print "ext is: %s" % ext
result:
>root is: sciPHOTOf105w0
>ext is: .fits

In [33]: filename = "sciPHOTOf105w0.fits"
In [34]: filename.rpartition('.')[0]
Out[34]: 'sciPHOTOf105w0'
In [35]: filename.rsplit('.', 1)[0]
Out[35]: 'sciPHOTOf105w0'

You can use .index() on a string to find the first occurence of a substring.
>>> filename = "sciPHOTOf105w0.fits"
>>> filename.index('.')
14
>>> filename[:filename.index('.')]
'sciPHOTOf105w0'

Related

Remove file name and extension from path and just keep path

Hi I have a string like this which will be dynamic and can be in following combinations.
'new/file.csv'
'new/mainfolder/file.csv'
'new/mainfolder/subfolder/file.csv'
'new/mainfolder/subfolder/secondsubfolder/file.csv'
Something like these. In any case I just want this string in 2 parts like path and filename. Path will not consist of file name for example.
End result expected
'new'
'new/mainfolder'
'new/mainfolder/subfolder'
'new/mainfolder/subfolder/secondsubfolder'
Till now tried many things included
path = 'new/mainfolder/file.csv'
final_path = path.split('/', 1)[-1]
And rstrip() but nothing worked till now.
You can use pathlib for this.
For example,
>>>import pathlib
>>>path = pathlib.Path('new/mainfolder/file.csv')
>>>path.name
'file.csv'
>>>str(path.parent)
'new/mainfolder'
input = ['new/file.csv',
'new/mainfolder/file.csv',
'new/mainfolder/subfolder/file.csv',
'new/mainfolder/subfolder/secondsubfolder/file.csv']
output = []
for i in input:
i = i.split("/")[:-1]
i = "/".join(i)
output.append(i)
print(output)
Output:
['new', 'new/mainfolder', 'new/mainfolder/subfolder', 'new/mainfolder/subfolder/secondsubfolder']
An option to pathlib is os
import os
fullPath = 'new/mainfolder/file.csv'
parent = os.path.dirname(fullPath) # get path only
file = os.path.basename(fullPath) # get file name
print (parent)
Output:
new/mainfolder
path.dirname:
Return the directory name of pathname path. This is the first element of the pair returned by passing path to the function split(). Source: https://docs.python.org/3.3/library/os.path.html#os.path.dirname
path.basename:
Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split(). Note that the result of this function is different from the Unix basename program; where basename for '/foo/bar/' returns 'bar', the basename() function returns an empty string ('').
Source: https://docs.python.org/3.3/library/os.path.html#os.path.basename
you almost got it, just use rsplit, like this:
path = 'new/mainfolder/file.csv'
file_path, file_name = path.rsplit('/', 1)
print(f'{file_path=}\n{file_name=}')
'''
file_path='new/mainfolder'
file_name='file.csv'

Python removes parts at address [duplicate]

In python, suppose I have a path like this:
/folderA/folderB/folderC/folderD/
How can I get just the folderD part?
Use os.path.normpath, then os.path.basename:
>>> os.path.basename(os.path.normpath('/folderA/folderB/folderC/folderD/'))
'folderD'
The first strips off any trailing slashes, the second gives you the last part of the path. Using only basename gives everything after the last slash, which in this case is ''.
With python 3 you can use the pathlib module (pathlib.PurePath for example):
>>> import pathlib
>>> path = pathlib.PurePath('/folderA/folderB/folderC/folderD/')
>>> path.name
'folderD'
If you want the last folder name where a file is located:
>>> path = pathlib.PurePath('/folderA/folderB/folderC/folderD/file.py')
>>> path.parent.name
'folderD'
You could do
>>> import os
>>> os.path.basename('/folderA/folderB/folderC/folderD')
UPDATE1: This approach works in case you give it /folderA/folderB/folderC/folderD/xx.py. This gives xx.py as the basename. Which is not what you want I guess. So you could do this -
>>> import os
>>> path = "/folderA/folderB/folderC/folderD"
>>> if os.path.isdir(path):
dirname = os.path.basename(path)
UPDATE2: As lars pointed out, making changes so as to accomodate trailing '/'.
>>> from os.path import normpath, basename
>>> basename(normpath('/folderA/folderB/folderC/folderD/'))
'folderD'
Here is my approach:
>>> import os
>>> print os.path.basename(
os.path.dirname('/folderA/folderB/folderC/folderD/test.py'))
folderD
>>> print os.path.basename(
os.path.dirname('/folderA/folderB/folderC/folderD/'))
folderD
>>> print os.path.basename(
os.path.dirname('/folderA/folderB/folderC/folderD'))
folderC
I was searching for a solution to get the last foldername where the file is located, I just used split two times, to get the right part. It's not the question but google transfered me here.
pathname = "/folderA/folderB/folderC/folderD/filename.py"
head, tail = os.path.split(os.path.split(pathname)[0])
print(head + " " + tail)
I like the parts method of Path for this:
grandparent_directory, parent_directory, filename = Path(export_filename).parts[-3:]
log.info(f'{t: <30}: {num_rows: >7} Rows exported to {grandparent_directory}/{parent_directory}/{filename}')
If you use the native python package pathlib it's really simple.
>>> from pathlib import Path
>>> your_path = Path("/folderA/folderB/folderC/folderD/")
>>> your_path.stem
'folderD'
Suppose you have the path to a file in folderD.
>>> from pathlib import Path
>>> your_path = Path("/folderA/folderB/folderC/folderD/file.txt")
>>> your_path.name
'file.txt'
>>> your_path.parent
'folderD'
During my current projects, I'm often passing rear parts of a path to a function and therefore use the Path module. To get the n-th part in reverse order, I'm using:
from typing import Union
from pathlib import Path
def get_single_subpath_part(base_dir: Union[Path, str], n:int) -> str:
if n ==0:
return Path(base_dir).name
for _ in range(n):
base_dir = Path(base_dir).parent
return getattr(base_dir, "name")
path= "/folderA/folderB/folderC/folderD/"
# for getting the last part:
print(get_single_subpath_part(path, 0))
# yields "folderD"
# for the second last
print(get_single_subpath_part(path, 1))
#yields "folderC"
Furthermore, to pass the n-th part in reverse order of a path containing the remaining path, I use:
from typing import Union
from pathlib import Path
def get_n_last_subparts_path(base_dir: Union[Path, str], n:int) -> Path:
return Path(*Path(base_dir).parts[-n-1:])
path= "/folderA/folderB/folderC/folderD/"
# for getting the last part:
print(get_n_last_subparts_path(path, 0))
# yields a `Path` object of "folderD"
# for second last and last part together
print(get_n_last_subparts_path(path, 1))
# yields a `Path` object of "folderc/folderD"
Note that this function returns a Pathobject which can easily be converted to a string (e.g. str(path))
path = "/folderA/folderB/folderC/folderD/"
last = path.split('/').pop()
str = "/folderA/folderB/folderC/folderD/"
print str.split("/")[-2]

How to append the grand-parent folder name to the filename?

I have directory where multiple folders exist and within each folder,a file exist inside another folder. Below is the structure
C:\users\TPCL\New\20190919_xz.txt
C:\users\TPCH\New\20190919_abc.txt
Objective:
I want to rename the file names like below:
C:\users\TPCL\New\20190919_xz_TPCL.txt
C:\users\TPCH\New\20190919_abc_TPCH.txt
My Approach:
for root,dirs,filename in os.walk('C\users\TPCL\New'):
prefix = os.path.basename(root)
for f in filename:
os.rename(os.path.join(root,f),os.path.join(root,"{}_{}".format(f,prefix)))
The above approach is yielding the following result:
C:\users\TPCL\New\20190919_xz_New.txt
C:\users\TPCH\New\20190919_abc_New.txt
So the question is: How to get the grand-parent folder name get appended, instead of parent folder name?
You need to use both dirname and basename to do this.
Use os.path.dirname to get the directory name (excluding the last part) and
then use os.path.basename to get the last part of the pathname.
Replace
prefix = os.path.basename(root)
with
os.path.basename(os.path.dirname(root))
Please refer this:
https://docs.python.org/3.7/library/os.path.html#os.path.basename
https://docs.python.org/3.7/library/os.path.html#os.path.dirname
Using PurePath from pathlib you can get the parts of the path. If the path contains the filename its grand-parent folder will be at index -3.
In [23]: from pathlib import PurePath
In [24]: p = r'C:\users\TPCL\New\20190919_xz_TPCL.txt'
In [25]: g = PurePath(p)
In [26]: g.parts
Out[26]: ('C:\\', 'users', 'TPCL', 'New', '20190919_xz_TPCL.txt')
In [27]: g.parts[-3]
Out[27]: 'TPCL'
If the path does not contain the filename the grand=parent would be at index -2.
Your process would look something like this:
import os.path
from pathlib import PurePath
for root,dirs,fnames in os.walk(topdirectory):
#print(root)
try:
addition = PurePath(root).parts[-2]
for name in fnames:
n,ext = os.path.splitext(name)
newname = n + '_' + addition + ext
print(name, os.path.join(root,newname))
except IndexError:
pass
I added the try/except to filter out paths that don't have grand-parents - it isn't necessary if you know it isn't needed.
You can split the path string using '\' and then count back to what would be considered the grandparent
directory for any given file and then append it. For example, if you have
filename = "dir1\dir2\file.txt"
splitPaths = filename.split('\') // gives you ['dir1', 'dir2', 'file.txt']
the last entry is the file name, the second to last is the parent, and the third to last is the grandparent and so on. You can then append whichever directory you want to the end of the string.

How to get part of filename into a variable?

I have a lot of .csv files and I'd like to parse the file names.
The file names are in this format:
name.surname.csv
How can I write a function that populates two variables with the components of the file name?
A = name
B = surname
Use str.split and unpack the result in A, B and another "anonymous" variable to store (and ignore) the extension.
filename = 'name.surname.csv'
A, B, _ = filename.split('.')
Try this, the name is split by . and stored in A and B
a="name.surname.csv"
A,B,C=a.split('.')
Of course, this assumes that your file name is in the form first.second.csv
If the file names always have the exact same form, with exactly two periods, then you can do:
>>> name, surname, ext = "john.doe.csv".split(".")
>>> name
'john'
>>> surname
'doe'
>>> ext
'csv'
>>>
Simple use str.split() method and this function.
def split_names(input:str):
splitted = input.split(".")
return splitted[0], splitted[1]
A, B = split_names("name.surname.csv")
First find all the files in your directory with the extention '.csv', then split it by '.'
import os
for file in os.listdir("/mydir"):
if file.endswith(".csv"):
# print the file name
print(os.path.join("/mydir", file))
# split the file name by '.'
name, surname, ext = file.split(".")
# print or append or whatever you will do with the result here
If you have file saved at a specific location in the system , then you have to first get only the file name :
# if filename = name.surname.csv then discard first two lines
filename = "C://CSVFolder//name.surname.csv"
absfilename = filename.split('//')[-1]
# by concept of packing unpacking
A,B,ext = absfilename.split('.')
else you can just provide
A,B,ext = "name.surname.csv".split('.')
print A,B,ext
Happy coding :)

Python split url to find image name and extension

I am looking for a way to extract a filename and extension from a particular url using Python
lets say a URL looks as follows
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
How would I go about getting the following.
filename = "da4ca3509a7b11e19e4a12313813ffc0_7"
file_ext = ".jpg"
try:
# Python 3
from urllib.parse import urlparse
except ImportError:
# Python 2
from urlparse import urlparse
from os.path import splitext, basename
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
disassembled = urlparse(picture_page)
filename, file_ext = splitext(basename(disassembled.path))
Only downside with this is that your filename will contain a preceding / which you can always remove yourself.
Try with urlparse.urlsplit to split url, and then os.path.splitext to retrieve filename and extension (use os.path.basename to keep only the last filename) :
import urlparse
import os.path
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
print os.path.splitext(os.path.basename(urlparse.urlsplit(picture_page).path))
>>> ('da4ca3509a7b11e19e4a12313813ffc0_7', '.jpg')
filename = picture_page.split('/')[-1].split('.')[0]
file_ext = '.'+picture_page.split('.')[-1]
# Here's your link:
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
#Here's your filename and ext:
filename, ext = (picture_page.split('/')[-1].split('.'))
When you do picture_page.split('/'), it will return a list of strings from your url split by a /.
If you know python list indexing well, you'd know that -1 will give you the last element or the first element from the end of the list.
In your case, it will be the filename: da4ca3509a7b11e19e4a12313813ffc0_7.jpg
Splitting that by delimeter ., you get two values:
da4ca3509a7b11e19e4a12313813ffc0_7 and jpg, as expected, because they are separated by a period which you used as a delimeter in your split() call.
Now, since the last split returns two values in the resulting list, you can tuplify it.
Hence, basically, the result would be like:
filename,ext = ('da4ca3509a7b11e19e4a12313813ffc0_7', 'jpg')
os.path.splitext will help you extract the filename and extension once you have extracted the relevant string from the URL using urlparse:
fName, ext = os.path.splitext('yourImage.jpg')
This is the easiest way to find image name and extension using regular expression.
import re
import sys
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
regex = re.compile('(.*\/(?P<name>\w+)\.(?P<ext>\w+))')
print regex.search(picture_page).group('name')
print regex.search(picture_page).group('ext')
>>> import re
>>> s = 'picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"'
>>> re.findall(r'\/([a-zA-Z0-9_]*)\.[a-zA-Z]*\"$',s)[0]
'da4ca3509a7b11e19e4a12313813ffc0_7'
>>> re.findall(r'([a-zA-Z]*)\"$',s)[0]
'jpg'

Categories

Resources