Replacing parts of a string containing directory paths using Python - python

I have a large string with potentially many paths in it resembling this structure:
dirA/dirB/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn
and I need to replace everything before the a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn part of the string with "local/" such that the
result will look like this:
local/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn
The string could contain more than just dirA/dirB/ at
the start of the string too.
How can I do this string manipulation in Python?

Using regular expressions, you can replace everything up to and including the last "/" with "locals/"
import re
s = "dirA/dirB/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn"
re.sub(r'.*(\/.*)',r'local\1',s)
and you obtain:
'local/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn'

Use os module
Ex:
import os
path = "dirA/dirB/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn"
print(os.path.join("locals", os.path.basename(path)))

Another alternative is to split the string on "/" and then concatenate "locals/" with the last element of the resultant list.
s = "dirA/dirB/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn"
print("locals/" + s.split("/")[-1])
#'locals/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn'

How does this look?
inputstring = 'dirA/dirB/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn'
filename = os.path.basename(inputstring)
localname = 'local'
os.path.join(localname, filename)

Related

Python Pathlib escaping string stored in variable

I'm on windows and have an api response that includes a key value pair like
object = {'path':'/my_directory/my_subdirectory/file.txt'}
I'm trying to use pathlib to open and create that structure relative to the current working directory as well as a user supplied directory name in the current working directory like this:
output = "output_location"
path = pathlib.Path.cwd().joinpath(output,object['path'])
print(path)
What this gives me is this
c:\directory\my_subdirectory\file.txt
Whereas I'm looking for it to output something like:
'c:\current_working_directory\output_location\directory\my_subdirectory\file.txt'
The issue is because the object['path'] is a variable I'm not sure how to escape it as a raw string. And so I think the escapes are breaking it. I can't guarantee there will always be a leading slash in the object['path'] value so I don't want to simply trim the first character.
I was hoping there was an elegant way to do this using pathlib that didn't involve ugly string manipulation.
Try lstrip('/')
You want to remove your leading slash whenever it’s there, because pathlib will ignore whatever comes before it.
import pathlib
object = {'path': '/my_directory/my_subdirectory/file.txt'}
output = "output_location"
# object['path'][1:] removes the initial '/'
path = pathlib.PureWindowsPath(pathlib.Path.cwd()).joinpath(output,object[
'path'][1:])
# path = pathlib.Path.cwd().joinpath(output,object['path'])
print(path)

renaming the filename with regex in python using re

I have a folder which contains multiple files with a below filename as one example and I have multiple different such
_EGAZ00001018697_2014_ICGC_130906_D81P8DQ1_0153_C2704ACXX.nopd.AOCS_001_ICGCDBDE20130916001.rsem.bam
Now I want to rename then using only by ICGCDBDE20130916001.rsem.bam will change according to the file in the path. The string corresponding to the name *.rsem.bam should be the one separated by "_". So for all the files in the directory should be replaced accordingly by this. I am thinking to use the regular expression so I came up with the below pattern
pat=r'_(.*)_(.*)_(.*)_(.*)_(.\w+)'
This separates out my filename as desired and I can rename the filenames with by using a global variable where I take only pat[4]. I wanted to use python since I want to learn it as of now to make small changes as file renaming and so on and later with time convert my workflows in python. I am unable to do it. How should I make this work in python? Also am in a fix what should have been the corresponding bash regex since this one is a pretty big filename and my encounter with such is very new. Below was my code not to change directly but to understand if it works but how should I get it work if I want to rename them.
import re
import os
_src = "path/bam/test/"
_ext = ".rsem.bam"
endsWithNumber = re.compile(r'_(.*)_(.*)_(.*)_(.*)_(.\w+)'+(re.escape(_ext))+'$')
print(endsWithNumber)
for filename in os.listdir(_src):
m = endsWithNumber.search(filename)
print(m)
I would appreciate both in python and bash, however, I would prefer python for my own understanding and future learning.
You can use rpartition which will separate out the part you want from the rest in to a three part tuple.
Given:
>>> fn
'_EGAZ00001018697_2014_ICGC_130906_D81P8DQ1_0153_C2704ACXX.nopd.AOCS_001_ICGCDBDE20130916001.rsem.bam'
You can do:
>>> fn.rpartition('_')
('_EGAZ00001018697_2014_ICGC_130906_D81P8DQ1_0153_C2704ACXX.nopd.AOCS_001', '_', 'ICGCDBDE20130916001.rsem.bam')
Then:
>>> _,sep,new_name=fn.rpartition('_')
>>> new_name
'ICGCDBDE20130916001.rsem.bam'
If you want to use a regex:
>>> re.search(r'_([^_]+$)', fn).group(1)
'ICGCDBDE20130916001.rsem.bam'
As a practical matter, you would test to see if there was a match before using group(1):
>>> m=re.search(r'_([^_]+$)', fn)
>>> new_name = m.group(1) if m else fn
For sed you can do:
$ echo "$fn" | sed -E 's/.*_([^_]*)$/\1/'
ICGCDBDE20130916001.rsem.bam
Or in Bash, same regex:
$ [[ $fn =~ _([^_]*)$ ]] && echo "${BASH_REMATCH[1]}"
ICGCDBDE20130916001.rsem.bam
You can use list comprehension
import re
import os
_src = "path/bam/test/"
new_s = [re.search("[a-zA-Z0-9]+\.rsem\.bam", filename) for filename in os.listdir(_src)]
for first, second in zip(os.listdir(_src), new_s):
if second is not None:
os.rename(first, second.group(0))
Too much work.
newname = oldname.rsplit('_', 1)[1]
import os
fname = 'YOUR_FILENAME.avi'
fname1 = fname.split('.')
fname2 = str(fname1[0]) + '.mp4'
os.rename('path to your source file' + str(fname), 'path to your destination file' + str(fname2))
fname = fname2

Extract a name substring from a filename and store it in a variable in Python

I have a tar file whose name I am successfully able to read and store in a variable,
tarname = 'esarchive--Mona-AB-Test226-8037affd-06d1-4c61-a91f-816ec9cb825f-05222017-4.tar'
But how do I extract just "Mona" from this file name and store it in a variable?
(The filename structure for the tar file will be same as above for all tar files with the name occuring after "es-archive--{Name}-AB" , so a solution which returns any name obeying this format)
Thanks!
parse module is good for this kind of stuff. You may think of it as the inverse of str.format.
from parse import parse
pattern = 'esarchive--{Name}-AB-{otherstuff}.tar'
result = parse(pattern, tarname)
Demo:
>>> result = parse(pattern, tarname)
>>> result['Name']
'Mona'
>>> result.named
{'Name': 'Mona',
'otherstuff': 'Test226-8037affd-06d1-4c61-a91f-816ec9cb825f-05222017-4'}
Easiest way I can think of:
Split the filename on the - character.
Get the 3rd item from the resulting list (index 2).
In code:
filename.split('-')[2]
Simple one-liner. This is of course working off your example. I would need more sample filenames to account for possible variations and know for certain if this will always work.
>>> import re
>>> tarname = "esarchive--Mona-AB-Test226-8037affd-06d1-4c61-a91f-816ec9cb825f-05222017-4.tar"
>>> s = re.match("esarchive--(\w+)-AB", tarname).group(1)
>>> s
'Mona'

How to get the substring from a String in python

I have a string path='/home/user/Desktop/My_file.xlsx'.
I want to extract the "My_file" substring. I am using Django framework for python.
I have tried to get it with:
re.search('/(.+?).xlsx', path).group(1)
but it returns the whole path again.
Can someone please help.
If you know that the file extension is always the same (e.g. ".xlsx") I would suggest you to go this way:
import os
filename_full = os.path.basename(path)
filename = filename_full.split(".xlsx")[0]
Hope it helps
More generally:
import os
filename = os.path.basename(os.path.splitext(path)[0])
If you need to match the exact extension:
# (?<=/) ensure that before the match is /
# [^/]*.xlsx search for anything but / followed by .xlsx
mo1 = re.search('(?<=/)[^/]*.xlsx', path).group(0)
print(mo1)
My_file.xlsx
otherwise:
path='/home/user/Desktop/My_file.xlsx'
with regex:
mo = re.search(r'(?<=/)([\w.]+$)',path)
print(mo.group(1))
My_file.xlsx
with rsplit:
my_file = path.rsplit('/')[-1]
print(my_file)
My_file.xlsx

remove part of path

I have the following data:
/​share/​Downloads/​Videos/​Movies/​Big.Buck.Bunny.​720p.​Bluray.​x264-BLA.​torrent/Big.Buck.Bunny.​720p.​Bluray.​x264-BLA
However, I dont want to have "Big.Buck.Bunny.​720p.​Bluray.​x264-BLA.torrent/" in it, I want the path to be like:
/​share/​Downloads/​Videos/​Movies/Big.Buck.Bunny.​720p.​Bluray.​x264-BLA
With regular expressions I basically want to math anything that holds *.torrent./, how can I accomplish this in regexp?
Thanks!
You don't even need regular expressions. You can use os.path.dirname and os.path.basename:
os.path.join(os.path.dirname(os.path.dirname(path)),
os.path.basename(path))
where path is the original path to the file.
Alternatively, you can also use os.path.split as follows:
dirname, filename = os.path.split(path)
os.path.join(os.path.dirname(dirname), filename)
Note This will work under the assumption that what you want to remove is the directory name that contains the file from the path as in the example in the question.
You can do this without using regexp:
>>> x = unicode('/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA.torrent/Big.Buck.Bunny.720p.Bluray.x264-BLA')
>>> x.rfind('.torrent')
66
>>> x[:x.rfind('.torrent')]
u'/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA'
I basically want to math anything that holds *.torrent./, how can I accomplish this in regexp?
You can use:
[^/]*\.torrent/
Assuming the last . was a typo.
Given path='/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA.torrent/Big.Buck.Bunny.720p.Bluray.x264-BLA'
You can do it with regular expression as
re.sub("/[^/]*\.torrent/","",path)
You can also do it without regex as
'/'.join(x for x in path.split("/") if x.find("torrent") == -1)
Your question is a bit vague and unclear, but here's one way how to strip off what you want:
import re
s = "/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA.torrent/Big.Buck.Bunny.720p.Bluray.x264-BLA"
c = re.compile("(/.*/).*?torrent/(.*)")
m = re.match(c, s)
path = m.group(1)
file = m.group(2)
print path + file
>>> ## working on region in file /usr/tmp/python-215357Ay...
/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA

Categories

Resources