If have this input:
/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff
and i want to find the last time the char "/" appeared, and get the string BasicTest
what is a good way of doing that?
Thank you!
os.path module provides basic path name manipulations.
>>> from os.path import *
>>> file = '/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff'
>>> splitext(basename(dirname(file)))[0]
'BasicTest'
>>> s = "/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff"
>>> ind = s.rfind('/')
>>> ind1 = s[:ind].rfind('/')
>>> print(s[ind1+1:ind].split('.')[0])
BasicTest
here is an exmple with os:
>>> p = '/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff'
>>> os.path.dirname(p)
'/Users/myMac/Desktop/MemoryAccess/BasicTest.asm'
>>> os.path.splitext(os.path.dirname(p))
('/Users/myMac/Desktop/MemoryAccess/BasicTest', '.asm')
>>> os.path.basename(os.path.splitext(os.path.dirname(p))[0])
'BasicTest'
Well, "BasicTest" follows the next-to-last appearance of "/", but beyond that, try rfind.
The following will return BasicTest.asm which is half the battle:
'/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff'.split('/')[-2]
The same trick can be used to split on the '.'
'BasicTest.asm'.split('.')[0]
with re in python
import re
s = "/Users/myMac/Desktop/MemoryAccess/BasicTest.asm/someStuff"
pattern = re.compile(r"/(\w+)\.\w+/\w*$")
match = re.search(pattern,s)
print match.group(1)
Related
Example strings:
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
myString2 = "/desktop/51232132561/Screenshots/photo_3501.png"
myString3 = "/desktop/12321516123/Screenshots/photo_7501.png"
myString4 = "/desktop/5234324324/Screenshots/photo_11501.png"
I had a look around, and couldn't really figure out a proper way to do this. I want to be able to also retrieve the last numbers of my strings after the photo_ part, and store them in another variable (string, not int or float). Furthermore, I don't need the number before /Screenshots. It would also be nice if it can work for any number length. The photo_ will always remain inside the string too.
You can write a regex that only matches the end of the string
>>> import re
>>> myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
>>> re.search(r"photo_(\d+)\.png$", myString1).group(1)
'0000'
This calls for a regex solution:
import re
mystring = "/desktop/2512754353/Screenshots/photo_0000.png"
your_value = re.findall(r'(photo_[0-9]+)', mystring)[0]
print(your_value) # photo_0000
Using regex:
import re
data = [
"/desktop/2512754353/Screenshots/photo_0000.png",
"/desktop/51232132561/Screenshots/photo_3501.png",
"/desktop/12321516123/Screenshots/photo_7501.png",
"/desktop/5234324324/Screenshots/photo_11501.png"
]
id_regex = re.compile(r".+photo_(\d+)\.png")
ids = [int(id_regex.match(d).groups()[0]) for d in data]
print(ids) # [0, 3501, 7501, 11501]
Simple way to do it with split function :
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
first_split = myString1.split('photo_')
number = first_split[1].split('.')[0]
print(number)
Other way by using regex :
import re
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
number = re.findall(r'\d+', myString1)[1]
print(number)
Proper way would be to use pathlib in conjunction with re.
import re
from pathlib import Path
pattern = re.compile(r"(?<=photo_)[0-9]*")
pattern.search(Path(myString1).name).group(0)
> '0000'
Try this:
import re
myString1 = "/desktop/2512754353/Screenshots/photo_0000.png"
x=re.findall(r'photo_\d+',myString1.split("/")[-1])
print(x)
Another way using inbuilt string functions would be to slice the string between "photo" and ".png":
strings = [myString1, myString2, myString3, myString4]
>>> [s[s.rfind("photo")+6:s.rfind(".png")] for s in strings]
['0000', '3501', '7501', '11501']
I suggest:
import re
myString3 = "/desktop/12321516123/Screenshots/photo_7501.png"
s = re.findall('\_\d+',myString3)[0]
int(s[1:len(s)])
output: 7501
You can use pathlib and split:
from pathlib import Path
fn="/desktop/5234324324/Screenshots/photo_11501.png"
Path(fn).stem.split('_')[-1])
# 11501
The pathlib property .stem is the name of the path stripped of the path to it and the extension:
>>> Path(fn).stem
'photo_11501'
Then either split or partition on the '_' delimiter:
>>> Path(fn).stem.partition('_')
('photo', '_', '11501')
>>> Path(fn).stem.split('_')
['photo', '11501']
You can use split or partition entirely on strings that represent paths:
>>> fn.partition('.png')[0].partition('_')[-1]
'11501'
But using pathlib allows you to produce those paths as the result of a glob or other method and is likely more robust and certainly more cross platform.
f_output.write('\n{}, {}\n'.format(filename, summary))
I am printing the output as the name of the file. I am getting the output as VCALogParser_output_ARW.log, VCALogParser_output_CZC.log and so on. but I am interested only in printing ARW, CZC and so on. So please someone can tell me how to split this text ?
filename.split('_')[-1].split('.')[0]
this will give you : 'ARW'
summary.split('_')[-1].split('.')[0]
and this will give you: 'CZC'
If you are only interested in CZC and ARW without the .log then, you can do it with re.search method:
>>> import re
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> re.search(r'.*_(.*)\.log', s1).group(1)
'ARW'
>>> re.search(r'.*_(.*)\.log', s2).group(1)
'CZC'
Or better maker your patten p then call its search method when formatting your string:
>>> p = re.compile(r'.*_(.*)\.log')
>>>
>>> '\n{}, {}\n'.format(p.search(s1).group(1), p.search(s2).group(1))
'\nARW, CZC\n'
Also, it might be helpful using re.sub with positive look ahead and group naming:
>>> p = re.compile(r'.*(?<=_)(?P<mystr>[a-zA-Z0-9]+)\.log$')
>>>
>>>
>>> p.sub('\g<mystr>', s1)
'ARW'
>>> p.sub('\g<mystr>', s2)
'CZC'
>>>
>>>
>>> '\n{}, {}\n'.format(p.sub('\g<mystr>', s1), p.sub('\g<mystr>', s2))
'\nARW, CZC\n'
In case, you are not able or you don't want to use re module, then you can define lengths of strings that you don't need and index your string variables with them:
>>> i1 = len('VCALogParser_output_')
>>> i2 = len ('.log')
>>>
>>> '\n{}, {}\n'.format(s1[i1:-i2], s2[i1:-i2])
'\nARW, CZC\n'
But keep in mind that the above is valid as long as you have those common strings in all of your string variables.
fname.split('_')[-1]
is rought but this will give you 'CZC.log', 'ARW.log' and so on, assuming that all files have the same underscore-delimited format.
If the format of the file is always such that it ends with _ARW.log or _CZC.log this is really easy to do just using the standard string split() method, with two consecutive splits:
shortname = filename.split("_")[-1].split('.')[0]
Or, to make it (arguably) a bit more readable, we can use the os module:
shortname = os.path.splitext(filename)[0].split("_")[-1]
You can also try:
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> s1.split('_')[2].split('.')[0]
ARW
>>> s2.split('_')[2].split('.')[0]
CZC
Parse file name correctly, so basically my guess is that you wanna to strip file extension .log and prefix VCALogParser_output_ to do that it's enough to use str.replace rather than using str.split
Use os.linesep when you writing to file to have cross-browser
Code below would perform desired result(after applying steps listed above):
import os
filename = 'VCALogParser_output_SOME_NAME.log'
summary = 'some summary'
fname = filename.replace('VCALogParser_output_', '').replace('.log', '')
linesep = os.linesep
f_output.write('{linesep}{fname}, {summary}{linesep}'
.format(fname=fname, summary=summary, linesep=linesep))
# or if vars in execution scope strictly controlled pass locals() into format
f_output.write('{linesep}{fname}, {summary}{linesep}'.format(**locals()))
common is always present regardless of string. Using that information, I'd like to grab the substring that comes just before it, in this case, "banana":
string = "apple_orange_banana_common_fruit"
In this case, "fruit":
string = "fruit_common_apple_banana_orange"
How would I go about doing this in Python?
You can use re.search() to extract the substring:
>>> import re
>>> s = 'apple_orange_banana_common_fruit'
>>> re.search(r'([a-zA-Z]+)_common', s).group(1)
'banana'
This will return a list of matches:
import re
string = "apple_orange_banana_common_fruit"
preceding_word = re.findall("[A-Za-z]+(?=_common)", string)
If common only occurs once per string, you might be better off using hwnd's solution.
import re
string = "apple_orange_bananna_common_fruit"
preceding_word = re.search('([a-zAZ]+)(?=_common)', string)
print (preceding_word.group(1))
>>> string = "fruit_common_apple_banana_orange"
>>> parts = string.split('_')
>>> print parts[parts.index('common') - 1]
fruit
>>> string = "apple_orange_banana_common_fruit"
>>> parts = string.split('_')
>>> print parts[parts.index('common') - 1]
banana
How would I delete everything after a certain character of a string in python? For example I have a string containing a file path and some extra characters. How would I delete everything after .zip? I've tried rsplit and split , but neither included the .zip when deleting extra characters.
Any suggestions?
Just take the first portion of the split, and add '.zip' back:
s = 'test.zip.zyz'
s = s.split('.zip', 1)[0] + '.zip'
Alternatively you could use slicing, here is a solution where you don't need to add '.zip' back to the result (the 4 comes from len('.zip')):
s = s[:s.index('.zip')+4]
Or another alternative with regular expressions:
import re
s = re.match(r'^.*?\.zip', s).group(0)
str.partition:
>>> s='abc.zip.blech'
>>> ''.join(s.partition('.zip')[0:2])
'abc.zip'
>>> s='abc.zip'
>>> ''.join(s.partition('.zip')[0:2])
'abc.zip'
>>> s='abc.py'
>>> ''.join(s.partition('.zip')[0:2])
'abc.py'
Use slices:
s = 'test.zip.xyz'
s[:s.index('.zip') + len('.zip')]
=> 'test.zip'
And it's easy to pack the above in a little helper function:
def removeAfter(string, suffix):
return string[:string.index(suffix) + len(suffix)]
removeAfter('test.zip.xyz', '.zip')
=> 'test.zip'
I think it's easy to create a simple lambda function for this.
mystrip = lambda s, ss: s[:s.index(ss) + len(ss)]
Can be used like this:
mystr = "this should stay.zipand this should be removed."
mystrip(mystr, ".zip") # 'this should stay.zip'
You can use the re module:
import re
re.sub('\.zip.*','.zip','test.zip.blah')
i need to find anything between
show_detail&
and
;session_id=1445045
in
https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0
using regex in python.
i know i need to use lookbehind/ahead but i can't seem to make it work!
please help!
thanks :)
Why use a regex?
>>>> url = 'https://ww.site.gov.....'
>>> start = url.index('show_detail&') + len('show_detail&')
>>> end = url.index(';session_id=')
>>> url[start:end]
'id=4035219;num=1'
>>> s= "https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0"
>>> s.split(";session_id=1445045")[0].split("show_detail&")[-1]
'id=4035219;num=1'
>>>
You can use a non greedy match (.*?) in between your markers.
>>> import re
>>> url = "https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0"
>>> m = re.search("show_detail&(.*?);session_id=1445045", url)
>>> m.group(1)
'id=4035219;num=1'
regex = re.compile(r"(?<=show_detail&).*?(?=;session_id=1445045)"
should work. See here for more info on lookaround assertions.
import re
url = "https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0"
pattern = "([^>].+)(show_detail&)([^>].+)(session_id=1445045)([^>].+)"
reg = re.compile(r''''''+pattern+'''''',flags = re.S)
match =reg.search(url)
print match.group(3)
this would work i think