How to get the substring from a String in python

How to get the substring from a String in python - python

I have a string path='/home/user/Desktop/My_file.xlsx'.
I want to extract the "My_file" substring. I am using Django framework for python.
I have tried to get it with:
re.search('/(.+?).xlsx', path).group(1)
but it returns the whole path again.
Can someone please help.

If you know that the file extension is always the same (e.g. ".xlsx") I would suggest you to go this way:
import os
filename_full = os.path.basename(path)
filename = filename_full.split(".xlsx")[0]
Hope it helps

More generally:
import os
filename = os.path.basename(os.path.splitext(path)[0])

If you need to match the exact extension:
# (?<=/) ensure that before the match is /
# [^/]*.xlsx search for anything but / followed by .xlsx
mo1 = re.search('(?<=/)[^/]*.xlsx', path).group(0)
print(mo1)
My_file.xlsx
otherwise:
path='/home/user/Desktop/My_file.xlsx'
with regex:
mo = re.search(r'(?<=/)([\w.]+$)',path)
print(mo.group(1))
My_file.xlsx
with rsplit:
my_file = path.rsplit('/')[-1]
print(my_file)
My_file.xlsx

Related

Python: filename matching using regex

I have written this code
import os
from datetime import datetime
import re
now = datetime.now()
filename = now.strftime("%Y%m%d%H%M") #For example 202006191839
for fname in os.listdir(downloadPath):
if re.match('export_' + filename + '[0-9]{2}.xlsx', fname):
print(fname)
In downloadPath I have these files
export_20200619183900.xlsx
export_20200619183921.xlsx
export_20200619183930.xlsx
But the re.match is not matching as desired.
But, if i change
filename = now.strftime("%Y%m%d%H%M")
with a simple assignment
filename = "202006191839"
The code works.
The problem is, I need to have dynamic data.
Can anyone help me?

I think it is because you are matching 'export_' + filename, but you said the file was excel_20200619183900

Ok.
I am solved the problem... i am blind probably
The file I search, is download before the above code, despite being very small, the search command starts before the download...
I have add a simple time.sleep(2) before search command.
Thanks to all.

Replacing parts of a string containing directory paths using Python

I have a large string with potentially many paths in it resembling this structure:
dirA/dirB/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn
and I need to replace everything before the a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn part of the string with "local/" such that the
result will look like this:
local/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn
The string could contain more than just dirA/dirB/ at
the start of the string too.
How can I do this string manipulation in Python?

Using regular expressions, you can replace everything up to and including the last "/" with "locals/"
import re
s = "dirA/dirB/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn"
re.sub(r'.*(\/.*)',r'local\1',s)
and you obtain:
'local/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn'

Use os module
Ex:
import os
path = "dirA/dirB/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn"
print(os.path.join("locals", os.path.basename(path)))

Another alternative is to split the string on "/" and then concatenate "locals/" with the last element of the resultant list.
s = "dirA/dirB/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn"
print("locals/" + s.split("/")[-1])
#'locals/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn'

How does this look?
inputstring = 'dirA/dirB/a1ed4f3b-a046-4fbf-bb70-0774bd7bfcn'
filename = os.path.basename(inputstring)
localname = 'local'
os.path.join(localname, filename)

renaming the filename with regex in python using re

I have a folder which contains multiple files with a below filename as one example and I have multiple different such
_EGAZ00001018697_2014_ICGC_130906_D81P8DQ1_0153_C2704ACXX.nopd.AOCS_001_ICGCDBDE20130916001.rsem.bam
Now I want to rename then using only by ICGCDBDE20130916001.rsem.bam will change according to the file in the path. The string corresponding to the name *.rsem.bam should be the one separated by "_". So for all the files in the directory should be replaced accordingly by this. I am thinking to use the regular expression so I came up with the below pattern
pat=r'_(.*)_(.*)_(.*)_(.*)_(.\w+)'
This separates out my filename as desired and I can rename the filenames with by using a global variable where I take only pat[4]. I wanted to use python since I want to learn it as of now to make small changes as file renaming and so on and later with time convert my workflows in python. I am unable to do it. How should I make this work in python? Also am in a fix what should have been the corresponding bash regex since this one is a pretty big filename and my encounter with such is very new. Below was my code not to change directly but to understand if it works but how should I get it work if I want to rename them.
import re
import os
_src = "path/bam/test/"
_ext = ".rsem.bam"
endsWithNumber = re.compile(r'_(.*)_(.*)_(.*)_(.*)_(.\w+)'+(re.escape(_ext))+'$')
print(endsWithNumber)
for filename in os.listdir(_src):
m = endsWithNumber.search(filename)
print(m)
I would appreciate both in python and bash, however, I would prefer python for my own understanding and future learning.

You can use rpartition which will separate out the part you want from the rest in to a three part tuple.
Given:
>>> fn
'_EGAZ00001018697_2014_ICGC_130906_D81P8DQ1_0153_C2704ACXX.nopd.AOCS_001_ICGCDBDE20130916001.rsem.bam'
You can do:
>>> fn.rpartition('_')
('_EGAZ00001018697_2014_ICGC_130906_D81P8DQ1_0153_C2704ACXX.nopd.AOCS_001', '_', 'ICGCDBDE20130916001.rsem.bam')
Then:
>>> _,sep,new_name=fn.rpartition('_')
>>> new_name
'ICGCDBDE20130916001.rsem.bam'
If you want to use a regex:
>>> re.search(r'_([^_]+$)', fn).group(1)
'ICGCDBDE20130916001.rsem.bam'
As a practical matter, you would test to see if there was a match before using group(1):
>>> m=re.search(r'_([^_]+$)', fn)
>>> new_name = m.group(1) if m else fn
For sed you can do:
$ echo "$fn" | sed -E 's/.*_([^_]*)$/\1/'
ICGCDBDE20130916001.rsem.bam
Or in Bash, same regex:
$ [[ $fn =~ _([^_]*)$ ]] && echo "${BASH_REMATCH[1]}"
ICGCDBDE20130916001.rsem.bam

You can use list comprehension
import re
import os
_src = "path/bam/test/"
new_s = [re.search("[a-zA-Z0-9]+\.rsem\.bam", filename) for filename in os.listdir(_src)]
for first, second in zip(os.listdir(_src), new_s):
if second is not None:
os.rename(first, second.group(0))

Too much work.
newname = oldname.rsplit('_', 1)[1]

import os
fname = 'YOUR_FILENAME.avi'
fname1 = fname.split('.')
fname2 = str(fname1[0]) + '.mp4'
os.rename('path to your source file' + str(fname), 'path to your destination file' + str(fname2))
fname = fname2

Python and regex - how to find anytext_NUMBER_svm.pkl

I have file names that are in this format:
anytext_NUMBER_svm.pkl
I need to loop thourgh all files in a dir and file files that look like this:
file1.txt
file2.txt
anytext_1_svm.pkl
anytext_2_svm.pkl
anytext_3_svm.pkl
The matched files will be this:
anytext_1_svm.pkl
anytext_2_svm.pkl
anytext_3_svm.pkl
How to I use python regex to do this?

An option that:
doesn't use re
makes sure the comparison is only on the filename part - not part of a path
restricts the number of filename patterns to validate further using iglob
Code:
from glob import iglob
import os.path
for fname in iglob('*_*_svm.pkl'):
path, name = os.path.split(fname)
anytext, digit, rest = name.split('_', 2)
if digit.isdigit(): # add criteria for anytext if required...
# ....

This regex shoud solve your problems:
>>> import re
>>> regex = re.compile(r'.+_\d+_svm\.pkl')
>>> regex.search('anytext_1_svm.pkl') != None
True
But you should definitely take a look at the documentation: http://docs.python.org/library/re.html

I would suggest a review of this page:
http://docs.python.org/py3k/library/re.html#module-re
It will help you understand how to write regular expressions and ensure that you are matching things properly. For the number, use [0-9]*, use _ to separate your groups, and write a little match-checking conditional stuff and this will be a quick project.

import glob
file_list = glob.glob('anytext_[0-9]_svm.pk1')

regex to catch "anytext_NUMBER_svm.pkl" is very simple.
r'.+_\d+_svm\.pkl'

remove part of path

I have the following data:
/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA.torrent/Big.Buck.Bunny.720p.Bluray.x264-BLA
However, I dont want to have "Big.Buck.Bunny.720p.Bluray.x264-BLA.torrent/" in it, I want the path to be like:
/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA
With regular expressions I basically want to math anything that holds *.torrent./, how can I accomplish this in regexp?
Thanks!

You don't even need regular expressions. You can use os.path.dirname and os.path.basename:
os.path.join(os.path.dirname(os.path.dirname(path)),
os.path.basename(path))
where path is the original path to the file.
Alternatively, you can also use os.path.split as follows:
dirname, filename = os.path.split(path)
os.path.join(os.path.dirname(dirname), filename)
Note This will work under the assumption that what you want to remove is the directory name that contains the file from the path as in the example in the question.

You can do this without using regexp:
>>> x = unicode('/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA.torrent/Big.Buck.Bunny.720p.Bluray.x264-BLA')
>>> x.rfind('.torrent')
66
>>> x[:x.rfind('.torrent')]
u'/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA'

I basically want to math anything that holds *.torrent./, how can I accomplish this in regexp?
You can use:
[^/]*\.torrent/
Assuming the last . was a typo.

Given path='/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA.torrent/Big.Buck.Bunny.720p.Bluray.x264-BLA'
You can do it with regular expression as
re.sub("/[^/]*\.torrent/","",path)
You can also do it without regex as
'/'.join(x for x in path.split("/") if x.find("torrent") == -1)

Your question is a bit vague and unclear, but here's one way how to strip off what you want:
import re
s = "/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA.torrent/Big.Buck.Bunny.720p.Bluray.x264-BLA"
c = re.compile("(/.*/).*?torrent/(.*)")
m = re.match(c, s)
path = m.group(1)
file = m.group(2)
print path + file
>>> ## working on region in file /usr/tmp/python-215357Ay...
/share/Downloads/Videos/Movies/Big.Buck.Bunny.720p.Bluray.x264-BLA

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get the substring from a String in python - python

I have a string path='/home/user/Desktop/My_file.xlsx'. I want to extract the "My_file" substring. I am using Django framework for python. I have tried to get it with: re.search('/(.+?).xlsx', path).group(1) but it returns the whole path again. Can someone please help.

If you know that the file extension is always the same (e.g. ".xlsx") I would suggest you to go this way: import os filename_full = os.path.basename(path) filename = filename_full.split(".xlsx")[0] Hope it helps

More generally: import os filename = os.path.basename(os.path.splitext(path)[0])

Related

Python: filename matching using regex

Replacing parts of a string containing directory paths using Python

renaming the filename with regex in python using re

Python and regex - how to find anytext_NUMBER_svm.pkl

remove part of path

Categories

Resources