How would I delete everything after a certain character of a string in python? For example I have a string containing a file path and some extra characters. How would I delete everything after .zip? I've tried rsplit and split , but neither included the .zip when deleting extra characters.
Any suggestions?
Just take the first portion of the split, and add '.zip' back:
s = 'test.zip.zyz'
s = s.split('.zip', 1)[0] + '.zip'
Alternatively you could use slicing, here is a solution where you don't need to add '.zip' back to the result (the 4 comes from len('.zip')):
s = s[:s.index('.zip')+4]
Or another alternative with regular expressions:
import re
s = re.match(r'^.*?\.zip', s).group(0)
str.partition:
>>> s='abc.zip.blech'
>>> ''.join(s.partition('.zip')[0:2])
'abc.zip'
>>> s='abc.zip'
>>> ''.join(s.partition('.zip')[0:2])
'abc.zip'
>>> s='abc.py'
>>> ''.join(s.partition('.zip')[0:2])
'abc.py'
Use slices:
s = 'test.zip.xyz'
s[:s.index('.zip') + len('.zip')]
=> 'test.zip'
And it's easy to pack the above in a little helper function:
def removeAfter(string, suffix):
return string[:string.index(suffix) + len(suffix)]
removeAfter('test.zip.xyz', '.zip')
=> 'test.zip'
I think it's easy to create a simple lambda function for this.
mystrip = lambda s, ss: s[:s.index(ss) + len(ss)]
Can be used like this:
mystr = "this should stay.zipand this should be removed."
mystrip(mystr, ".zip") # 'this should stay.zip'
You can use the re module:
import re
re.sub('\.zip.*','.zip','test.zip.blah')
Related
I want to match x.py from a/b/c/x.py, when I use re:
s = 'a/b/c/x.py'
res = re.search('/(.*.py)?', s).group(1)
>>> res = b/c/x.py
This is not what I need. Any ideas?
You don't need regex, just use str.rsplit, with maxsplit=1, and take the last item:
>>> s.rsplit('/',1)[-1]
'x.py'
when you want to extract filename from path, you should use os.path.split. The os.path.split() method in Python is used to Split the path name into a pair head and tail independent of OS. Here, tail is the last path name component and head is everything leading up to that.
import os
path = 'a/b/c/x.py'
res = os.path.split(path)
print(res[1])
You can also use normpath and os.sep for this solution:
import os
path = 'a/b/c/x.py'
path = os.path.normpath(path)
res = path.split(os.sep)
print(res[-1])
You can use rsplit as #ThePyGuy said in this case to avoid more splitting by changing the line to:
res = path.rsplit(os.sep,1)
If you need to ensure that the element is is the last in a path, you can prepend (?<=\/), a positive lookbehind:
>>> s = 'a/b/c/x.py'
>>> el = re.search(r"(?<=\/)(\w+\.py)", s).group(1)
>>> el
'x.py'
Otherwise, if you need to match also filename.py, you need to remove it:
>>> s2 = 'file.py'
>>> el = re.search(r"(\w+\.py)", s2).group(1)
>>> el
'file.py'
I prefer a splitting approach here:
s = 'a/b/c/x.py'
last = s.split('/')[-1]
print(last) # x.py
import re
s = 'a/b/c/x.py'
res = re.search('\w*\.py', s).group() # It will match alphanumeric
# res = re.search(r'[\w&.-]+$', s).group()
# The above regex will match alphanumeric and the given special characters
EDIT
To match everything after the last / you can use following regex
res = re.search('[^/]+$', s).group()
Example strings:
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
myString2 = "/desktop/51232132561/Screenshots/photo_3501.png"
myString3 = "/desktop/12321516123/Screenshots/photo_7501.png"
myString4 = "/desktop/5234324324/Screenshots/photo_11501.png"
I had a look around, and couldn't really figure out a proper way to do this. I want to be able to also retrieve the last numbers of my strings after the photo_ part, and store them in another variable (string, not int or float). Furthermore, I don't need the number before /Screenshots. It would also be nice if it can work for any number length. The photo_ will always remain inside the string too.
You can write a regex that only matches the end of the string
>>> import re
>>> myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
>>> re.search(r"photo_(\d+)\.png$", myString1).group(1)
'0000'
This calls for a regex solution:
import re
mystring = "/desktop/2512754353/Screenshots/photo_0000.png"
your_value = re.findall(r'(photo_[0-9]+)', mystring)[0]
print(your_value) # photo_0000
Using regex:
import re
data = [
"/desktop/2512754353/Screenshots/photo_0000.png",
"/desktop/51232132561/Screenshots/photo_3501.png",
"/desktop/12321516123/Screenshots/photo_7501.png",
"/desktop/5234324324/Screenshots/photo_11501.png"
]
id_regex = re.compile(r".+photo_(\d+)\.png")
ids = [int(id_regex.match(d).groups()[0]) for d in data]
print(ids) # [0, 3501, 7501, 11501]
Simple way to do it with split function :
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
first_split = myString1.split('photo_')
number = first_split[1].split('.')[0]
print(number)
Other way by using regex :
import re
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
number = re.findall(r'\d+', myString1)[1]
print(number)
Proper way would be to use pathlib in conjunction with re.
import re
from pathlib import Path
pattern = re.compile(r"(?<=photo_)[0-9]*")
pattern.search(Path(myString1).name).group(0)
> '0000'
Try this:
import re
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
x=re.findall(r'photo_\d+',myString1.split("/")[-1])
print(x)
Another way using inbuilt string functions would be to slice the string between "photo" and ".png":
strings = [myString1, myString2, myString3, myString4]
>>> [s[s.rfind("photo")+6:s.rfind(".png")] for s in strings]
['0000', '3501', '7501', '11501']
I suggest:
import re
myString3 = "/desktop/12321516123/Screenshots/photo_7501.png"
s = re.findall('\_\d+',myString3)[0]
int(s[1:len(s)])
output: 7501
You can use pathlib and split:
from pathlib import Path
fn="/desktop/5234324324/Screenshots/photo_11501.png"
Path(fn).stem.split('_')[-1])
# 11501
The pathlib property .stem is the name of the path stripped of the path to it and the extension:
>>> Path(fn).stem
'photo_11501'
Then either split or partition on the '_' delimiter:
>>> Path(fn).stem.partition('_')
('photo', '_', '11501')
>>> Path(fn).stem.split('_')
['photo', '11501']
You can use split or partition entirely on strings that represent paths:
>>> fn.partition('.png')[0].partition('_')[-1]
'11501'
But using pathlib allows you to produce those paths as the result of a glob or other method and is likely more robust and certainly more cross platform.
how to split the below string after 2nd occurrence of '/' from the end:
/u01/dbms/orcl/product/11.2.0.4/db_home
Expected output is :
/u01/dbms/orcl/product/
Thanks.
Do not use split, use rsplit instead! It's much simpler and faster.
s = '/u01/dbms/orcl/product/11.2.0.4/db_home'
result = s.rsplit('/', 2)[0] + '/'
string = "/u01/dbms/orcl/product/11.2.0.4/db_home"
split_string = string.split('/')
expected_output = "/".join(split_string[:-2]) + "/"
You're also free to change "-2" to minus whatever amount of filenames you need clipped.
If you can parse it as a filepath, I recommend pathlib, try:
from pathlib import Path
p = Path('/u01/dbms/orcl/product/11.2.0.4/db_hom')
p.parent.parent # Returns object containg path /u01/dbms/orc1/product/
input='/u01/dbms/orcl/product/11.2.0.4/db_home'
output = '/'.join(str(word) for word in input.split('/')[:-2])+'/'
f_output.write('\n{}, {}\n'.format(filename, summary))
I am printing the output as the name of the file. I am getting the output as VCALogParser_output_ARW.log, VCALogParser_output_CZC.log and so on. but I am interested only in printing ARW, CZC and so on. So please someone can tell me how to split this text ?
filename.split('_')[-1].split('.')[0]
this will give you : 'ARW'
summary.split('_')[-1].split('.')[0]
and this will give you: 'CZC'
If you are only interested in CZC and ARW without the .log then, you can do it with re.search method:
>>> import re
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> re.search(r'.*_(.*)\.log', s1).group(1)
'ARW'
>>> re.search(r'.*_(.*)\.log', s2).group(1)
'CZC'
Or better maker your patten p then call its search method when formatting your string:
>>> p = re.compile(r'.*_(.*)\.log')
>>>
>>> '\n{}, {}\n'.format(p.search(s1).group(1), p.search(s2).group(1))
'\nARW, CZC\n'
Also, it might be helpful using re.sub with positive look ahead and group naming:
>>> p = re.compile(r'.*(?<=_)(?P<mystr>[a-zA-Z0-9]+)\.log$')
>>>
>>>
>>> p.sub('\g<mystr>', s1)
'ARW'
>>> p.sub('\g<mystr>', s2)
'CZC'
>>>
>>>
>>> '\n{}, {}\n'.format(p.sub('\g<mystr>', s1), p.sub('\g<mystr>', s2))
'\nARW, CZC\n'
In case, you are not able or you don't want to use re module, then you can define lengths of strings that you don't need and index your string variables with them:
>>> i1 = len('VCALogParser_output_')
>>> i2 = len ('.log')
>>>
>>> '\n{}, {}\n'.format(s1[i1:-i2], s2[i1:-i2])
'\nARW, CZC\n'
But keep in mind that the above is valid as long as you have those common strings in all of your string variables.
fname.split('_')[-1]
is rought but this will give you 'CZC.log', 'ARW.log' and so on, assuming that all files have the same underscore-delimited format.
If the format of the file is always such that it ends with _ARW.log or _CZC.log this is really easy to do just using the standard string split() method, with two consecutive splits:
shortname = filename.split("_")[-1].split('.')[0]
Or, to make it (arguably) a bit more readable, we can use the os module:
shortname = os.path.splitext(filename)[0].split("_")[-1]
You can also try:
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> s1.split('_')[2].split('.')[0]
ARW
>>> s2.split('_')[2].split('.')[0]
CZC
Parse file name correctly, so basically my guess is that you wanna to strip file extension .log and prefix VCALogParser_output_ to do that it's enough to use str.replace rather than using str.split
Use os.linesep when you writing to file to have cross-browser
Code below would perform desired result(after applying steps listed above):
import os
filename = 'VCALogParser_output_SOME_NAME.log'
summary = 'some summary'
fname = filename.replace('VCALogParser_output_', '').replace('.log', '')
linesep = os.linesep
f_output.write('{linesep}{fname}, {summary}{linesep}'
.format(fname=fname, summary=summary, linesep=linesep))
# or if vars in execution scope strictly controlled pass locals() into format
f_output.write('{linesep}{fname}, {summary}{linesep}'.format(**locals()))
Suppose I have a string , text2='C:\Users\Sony\Desktop\f.html', and I want to separate "C:\Users\Sony\Desktop" and "f.html" and store them in different variables then what should I do ? I tried out regular expressions but I wasn't successful.
os.path.split does what you want:
>>> import os
>>> help(os.path.split)
Help on function split in module ntpath:
split(p)
Split a pathname.
Return tuple (head, tail) where tail is everything after the final slash.
Either part may be empty.
>>> os.path.split(r'c:\users\sony\desktop\f.html')
('c:\\users\\sony\\desktop', 'f.html')
>>> path,filename = os.path.split(r'c:\users\sony\desktop\f.html')
>>> path
'c:\\users\\sony\\desktop'
>>> filename
'f.html'