Finding last char appearance in string - python

If have this input:
/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff
and i want to find the last time the char "/" appeared, and get the string BasicTest
what is a good way of doing that?
Thank you!

os.path module provides basic path name manipulations.
>>> from os.path import *
>>> file = '/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff'
>>> splitext(basename(dirname(file)))[0]
'BasicTest'

>>> s = "/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff"
>>> ind = s.rfind('/')
>>> ind1 = s[:ind].rfind('/')
>>> print(s[ind1+1:ind].split('.')[0])
BasicTest

here is an exmple with os:
>>> p = '/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff'
>>> os.path.dirname(p)
'/Users/myMac/Desktop/MemoryAccess/BasicTest.asm'
>>> os.path.splitext(os.path.dirname(p))
('/Users/myMac/Desktop/MemoryAccess/BasicTest', '.asm')
>>> os.path.basename(os.path.splitext(os.path.dirname(p))[0])
'BasicTest'

Well, "BasicTest" follows the next-to-last appearance of "/", but beyond that, try rfind.

The following will return BasicTest.asm which is half the battle:
'/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff'.split('/')[-2]
The same trick can be used to split on the '.'
'BasicTest.asm'.split('.')[0]

with re in python
import re
s = "/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff"
pattern = re.compile(r"/(\w+)\.\w+/\w*$")
match = re.search(pattern,s)
print match.group(1)

Related

Extract numbers from string from specific point

Example strings:
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
myString2 = "/desktop/51232132561/Screenshots/photo_3501.png"
myString3 = "/desktop/12321516123/Screenshots/photo_7501.png"
myString4 = "/desktop/5234324324/Screenshots/photo_11501.png"
I had a look around, and couldn't really figure out a proper way to do this. I want to be able to also retrieve the last numbers of my strings after the photo_ part, and store them in another variable (string, not int or float). Furthermore, I don't need the number before /Screenshots. It would also be nice if it can work for any number length. The photo_ will always remain inside the string too.
You can write a regex that only matches the end of the string
>>> import re
>>> myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
>>> re.search(r"photo_(\d+)\.png$", myString1).group(1)
'0000'
This calls for a regex solution:
import re
mystring = "/desktop/2512754353/Screenshots/photo_0000.png"
your_value = re.findall(r'(photo_[0-9]+)', mystring)[0]
print(your_value) # photo_0000
Using regex:
import re
data = [
"/desktop/2512754353/Screenshots/photo_0000.png",
"/desktop/51232132561/Screenshots/photo_3501.png",
"/desktop/12321516123/Screenshots/photo_7501.png",
"/desktop/5234324324/Screenshots/photo_11501.png"
]
id_regex = re.compile(r".+photo_(\d+)\.png")
ids = [int(id_regex.match(d).groups()[0]) for d in data]
print(ids) # [0, 3501, 7501, 11501]
Simple way to do it with split function :
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
first_split = myString1.split('photo_')
number = first_split[1].split('.')[0]
print(number)
Other way by using regex :
import re
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
number = re.findall(r'\d+', myString1)[1]
print(number)
Proper way would be to use pathlib in conjunction with re.
import re
from pathlib import Path
pattern = re.compile(r"(?<=photo_)[0-9]*")
pattern.search(Path(myString1).name).group(0)
> '0000'
Try this:
import re
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
x=re.findall(r'photo_\d+',myString1.split("/")[-1])
print(x)
Another way using inbuilt string functions would be to slice the string between "photo" and ".png":
strings = [myString1, myString2, myString3, myString4]
>>> [s[s.rfind("photo")+6:s.rfind(".png")] for s in strings]
['0000', '3501', '7501', '11501']
I suggest:
import re
myString3 = "/desktop/12321516123/Screenshots/photo_7501.png"
s = re.findall('\_\d+',myString3)[0]
int(s[1:len(s)])
output: 7501
You can use pathlib and split:
from pathlib import Path
fn="/desktop/5234324324/Screenshots/photo_11501.png"
Path(fn).stem.split('_')[-1])
# 11501
The pathlib property .stem is the name of the path stripped of the path to it and the extension:
>>> Path(fn).stem
'photo_11501'
Then either split or partition on the '_' delimiter:
>>> Path(fn).stem.partition('_')
('photo', '_', '11501')
>>> Path(fn).stem.split('_')
['photo', '11501']
You can use split or partition entirely on strings that represent paths:
>>> fn.partition('.png')[0].partition('_')[-1]
'11501'
But using pathlib allows you to produce those paths as the result of a glob or other method and is likely more robust and certainly more cross platform.

how to split the text using python?

f_output.write('\n{}, {}\n'.format(filename, summary))
I am printing the output as the name of the file. I am getting the output as VCALogParser_output_ARW.log, VCALogParser_output_CZC.log and so on. but I am interested only in printing ARW, CZC and so on. So please someone can tell me how to split this text ?
filename.split('_')[-1].split('.')[0]
this will give you : 'ARW'
summary.split('_')[-1].split('.')[0]
and this will give you: 'CZC'
If you are only interested in CZC and ARW without the .log then, you can do it with re.search method:
>>> import re
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> re.search(r'.*_(.*)\.log', s1).group(1)
'ARW'
>>> re.search(r'.*_(.*)\.log', s2).group(1)
'CZC'
Or better maker your patten p then call its search method when formatting your string:
>>> p = re.compile(r'.*_(.*)\.log')
>>>
>>> '\n{}, {}\n'.format(p.search(s1).group(1), p.search(s2).group(1))
'\nARW, CZC\n'
Also, it might be helpful using re.sub with positive look ahead and group naming:
>>> p = re.compile(r'.*(?<=_)(?P<mystr>[a-zA-Z0-9]+)\.log$')
>>>
>>>
>>> p.sub('\g<mystr>', s1)
'ARW'
>>> p.sub('\g<mystr>', s2)
'CZC'
>>>
>>>
>>> '\n{}, {}\n'.format(p.sub('\g<mystr>', s1), p.sub('\g<mystr>', s2))
'\nARW, CZC\n'
In case, you are not able or you don't want to use re module, then you can define lengths of strings that you don't need and index your string variables with them:
>>> i1 = len('VCALogParser_output_')
>>> i2 = len ('.log')
>>>
>>> '\n{}, {}\n'.format(s1[i1:-i2], s2[i1:-i2])
'\nARW, CZC\n'
But keep in mind that the above is valid as long as you have those common strings in all of your string variables.
fname.split('_')[-1]
is rought but this will give you 'CZC.log', 'ARW.log' and so on, assuming that all files have the same underscore-delimited format.
If the format of the file is always such that it ends with _ARW.log or _CZC.log this is really easy to do just using the standard string split() method, with two consecutive splits:
shortname = filename.split("_")[-1].split('.')[0]
Or, to make it (arguably) a bit more readable, we can use the os module:
shortname = os.path.splitext(filename)[0].split("_")[-1]
You can also try:
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> s1.split('_')[2].split('.')[0]
ARW
>>> s2.split('_')[2].split('.')[0]
CZC
Parse file name correctly, so basically my guess is that you wanna to strip file extension .log and prefix VCALogParser_output_ to do that it's enough to use str.replace rather than using str.split
Use os.linesep when you writing to file to have cross-browser
Code below would perform desired result(after applying steps listed above):
import os
filename = 'VCALogParser_output_SOME_NAME.log'
summary = 'some summary'
fname = filename.replace('VCALogParser_output_', '').replace('.log', '')
linesep = os.linesep
f_output.write('{linesep}{fname}, {summary}{linesep}'
.format(fname=fname, summary=summary, linesep=linesep))
# or if vars in execution scope strictly controlled pass locals() into format
f_output.write('{linesep}{fname}, {summary}{linesep}'.format(**locals()))

Extracting a substring of a string in Python based on presence of another string

common is always present regardless of string. Using that information, I'd like to grab the substring that comes just before it, in this case, "banana":
string = "apple_orange_banana_common_fruit"
In this case, "fruit":
string = "fruit_common_apple_banana_orange"
How would I go about doing this in Python?
You can use re.search() to extract the substring:
>>> import re
>>> s = 'apple_orange_banana_common_fruit'
>>> re.search(r'([a-zA-Z]+)_common', s).group(1)
'banana'
This will return a list of matches:
import re
string = "apple_orange_banana_common_fruit"
preceding_word = re.findall("[A-Za-z]+(?=_common)", string)
If common only occurs once per string, you might be better off using hwnd's solution.
import re
string = "apple_orange_bananna_common_fruit"
preceding_word = re.search('([a-zAZ]+)(?=_common)', string)
print (preceding_word.group(1))
>>> string = "fruit_common_apple_banana_orange"
>>> parts = string.split('_')
>>> print parts[parts.index('common') - 1]
fruit
>>> string = "apple_orange_banana_common_fruit"
>>> parts = string.split('_')
>>> print parts[parts.index('common') - 1]
banana

How to delete everything after a certain character in a string?

How would I delete everything after a certain character of a string in python? For example I have a string containing a file path and some extra characters. How would I delete everything after .zip? I've tried rsplit and split , but neither included the .zip when deleting extra characters.
Any suggestions?
Just take the first portion of the split, and add '.zip' back:
s = 'test.zip.zyz'
s = s.split('.zip', 1)[0] + '.zip'
Alternatively you could use slicing, here is a solution where you don't need to add '.zip' back to the result (the 4 comes from len('.zip')):
s = s[:s.index('.zip')+4]
Or another alternative with regular expressions:
import re
s = re.match(r'^.*?\.zip', s).group(0)
str.partition:
>>> s='abc.zip.blech'
>>> ''.join(s.partition('.zip')[0:2])
'abc.zip'
>>> s='abc.zip'
>>> ''.join(s.partition('.zip')[0:2])
'abc.zip'
>>> s='abc.py'
>>> ''.join(s.partition('.zip')[0:2])
'abc.py'
Use slices:
s = 'test.zip.xyz'
s[:s.index('.zip') + len('.zip')]
=> 'test.zip'
And it's easy to pack the above in a little helper function:
def removeAfter(string, suffix):
return string[:string.index(suffix) + len(suffix)]
removeAfter('test.zip.xyz', '.zip')
=> 'test.zip'
I think it's easy to create a simple lambda function for this.
mystrip = lambda s, ss: s[:s.index(ss) + len(ss)]
Can be used like this:
mystr = "this should stay.zipand this should be removed."
mystrip(mystr, ".zip") # 'this should stay.zip'
You can use the re module:
import re
re.sub('\.zip.*','.zip','test.zip.blah')

regex to find postition between two markers in string

i need to find anything between
show_detail&
and
;session_id=1445045
in
https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0
using regex in python.
i know i need to use lookbehind/ahead but i can't seem to make it work!
please help!
thanks :)
Why use a regex?
>>>> url = 'https://ww.site.gov.....'
>>> start = url.index('show_detail&') + len('show_detail&')
>>> end = url.index(';session_id=')
>>> url[start:end]
'id=4035219;num=1'
>>> s= "https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0"
>>> s.split(";session_id=1445045")[0].split("show_detail&")[-1]
'id=4035219;num=1'
>>>
You can use a non greedy match (.*?) in between your markers.
>>> import re
>>> url = "https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0"
>>> m = re.search("show_detail&(.*?);session_id=1445045", url)
>>> m.group(1)
'id=4035219;num=1'
regex = re.compile(r"(?<=show_detail&).*?(?=;session_id=1445045)"
should work. See here for more info on lookaround assertions.
import re
url = "https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0"
pattern = "([^>].+)(show_detail&)([^>].+)(session_id=1445045)([^>].+)"
reg = re.compile(r''''''+pattern+'''''',flags = re.S)
match =reg.search(url)
print match.group(3)
this would work i think

Categories

Resources