how do I separate a string that contains number - python

So, I have this string:
a='test32'
I want to separate this string so I get the text and the number in two separate variables, in python .

import re
r = re.compile("([a-zA-Z]+)([0-9]+)")
>>> m=r.match('test32')
>>> m.group(1)
'test'
>>> m.group(2)
'32'
>>>

Related

How to remove carriage return characters from string as if it was printed?

I would like to remove all occurrences of \r from a string as if it was printed via print() and store the result in another variable.
Example:
>>> s = "hello\rworld"
>>> print(s)
world
In this example, how do I "print" s to a new variable which then contains the string "world"?
Background:
I am using the subprocess module to capture the stdout which contains a lot of \r characters. In order to effectively analyze the string I would like to only have the resulting output.
Using a regex:
import re
s = "hello\rworld"
out = re.sub(r'([^\r]+)\r([^\r\n]+)',
lambda m: m.group(2)+m.group(1)[len(m.group(2)):],
s)
Output: 'world'
More complex example:
import re
s = "hello\r..\nworld"
out = re.sub(r'([^\r]+)\r([^\r\n]+)',
lambda m: m.group(2)+m.group(1)[len(m.group(2)):],
s)
Output:
..llo
world
I guess one very simple way to get the same result would be to split the string on every occurrence of the carriage return (\r) and then return only the last result.
>>> s = "hello\rworld"
>>> res = s.split("\r")[-1]
>>> res
'world'
You could use regex:
import re
s = "Example\n of\r text \r\nwith \\r!"
s2 = re.sub("\r\n", "\n", s)
s2 = re.sub("[^\n]*\r", "", s2)
print(s)
print(s2)

how to split the text using python?

f_output.write('\n{}, {}\n'.format(filename, summary))
I am printing the output as the name of the file. I am getting the output as VCALogParser_output_ARW.log, VCALogParser_output_CZC.log and so on. but I am interested only in printing ARW, CZC and so on. So please someone can tell me how to split this text ?
filename.split('_')[-1].split('.')[0]
this will give you : 'ARW'
summary.split('_')[-1].split('.')[0]
and this will give you: 'CZC'
If you are only interested in CZC and ARW without the .log then, you can do it with re.search method:
>>> import re
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> re.search(r'.*_(.*)\.log', s1).group(1)
'ARW'
>>> re.search(r'.*_(.*)\.log', s2).group(1)
'CZC'
Or better maker your patten p then call its search method when formatting your string:
>>> p = re.compile(r'.*_(.*)\.log')
>>>
>>> '\n{}, {}\n'.format(p.search(s1).group(1), p.search(s2).group(1))
'\nARW, CZC\n'
Also, it might be helpful using re.sub with positive look ahead and group naming:
>>> p = re.compile(r'.*(?<=_)(?P<mystr>[a-zA-Z0-9]+)\.log$')
>>>
>>>
>>> p.sub('\g<mystr>', s1)
'ARW'
>>> p.sub('\g<mystr>', s2)
'CZC'
>>>
>>>
>>> '\n{}, {}\n'.format(p.sub('\g<mystr>', s1), p.sub('\g<mystr>', s2))
'\nARW, CZC\n'
In case, you are not able or you don't want to use re module, then you can define lengths of strings that you don't need and index your string variables with them:
>>> i1 = len('VCALogParser_output_')
>>> i2 = len ('.log')
>>>
>>> '\n{}, {}\n'.format(s1[i1:-i2], s2[i1:-i2])
'\nARW, CZC\n'
But keep in mind that the above is valid as long as you have those common strings in all of your string variables.
fname.split('_')[-1]
is rought but this will give you 'CZC.log', 'ARW.log' and so on, assuming that all files have the same underscore-delimited format.
If the format of the file is always such that it ends with _ARW.log or _CZC.log this is really easy to do just using the standard string split() method, with two consecutive splits:
shortname = filename.split("_")[-1].split('.')[0]
Or, to make it (arguably) a bit more readable, we can use the os module:
shortname = os.path.splitext(filename)[0].split("_")[-1]
You can also try:
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> s1.split('_')[2].split('.')[0]
ARW
>>> s2.split('_')[2].split('.')[0]
CZC
Parse file name correctly, so basically my guess is that you wanna to strip file extension .log and prefix VCALogParser_output_ to do that it's enough to use str.replace rather than using str.split
Use os.linesep when you writing to file to have cross-browser
Code below would perform desired result(after applying steps listed above):
import os
filename = 'VCALogParser_output_SOME_NAME.log'
summary = 'some summary'
fname = filename.replace('VCALogParser_output_', '').replace('.log', '')
linesep = os.linesep
f_output.write('{linesep}{fname}, {summary}{linesep}'
.format(fname=fname, summary=summary, linesep=linesep))
# or if vars in execution scope strictly controlled pass locals() into format
f_output.write('{linesep}{fname}, {summary}{linesep}'.format(**locals()))

Extracting a substring of a string in Python based on presence of another string

common is always present regardless of string. Using that information, I'd like to grab the substring that comes just before it, in this case, "banana":
string = "apple_orange_banana_common_fruit"
In this case, "fruit":
string = "fruit_common_apple_banana_orange"
How would I go about doing this in Python?
You can use re.search() to extract the substring:
>>> import re
>>> s = 'apple_orange_banana_common_fruit'
>>> re.search(r'([a-zA-Z]+)_common', s).group(1)
'banana'
This will return a list of matches:
import re
string = "apple_orange_banana_common_fruit"
preceding_word = re.findall("[A-Za-z]+(?=_common)", string)
If common only occurs once per string, you might be better off using hwnd's solution.
import re
string = "apple_orange_bananna_common_fruit"
preceding_word = re.search('([a-zAZ]+)(?=_common)', string)
print (preceding_word.group(1))
>>> string = "fruit_common_apple_banana_orange"
>>> parts = string.split('_')
>>> print parts[parts.index('common') - 1]
fruit
>>> string = "apple_orange_banana_common_fruit"
>>> parts = string.split('_')
>>> print parts[parts.index('common') - 1]
banana

how to detect a repeated pattern in a string using re module of python

I'm trying to match a string using re module of python in which a pattern may repeat or not. The string starts with three alphabetical parts separated by :, then there is a = following with another alphabetical part. The string can finish here or continue to repeat patterns of alphabetical_part=alphabetical_part which are separated with a comma. Both samples are as below:
Finishes with just one repeat ==> aa:bb:cc=dd
Finishes with more than one repeat ==> aa:bb:cc=dd,ee=ff,gg=hh
As you see, there can't be a comma at the end of the string. I have wrote a pattern for matching this:
>>> pt = re.compile(r'\S+:\S+:[\S+=\S+$|,]+')
re.match returns a match object for this, but when I group the repeat pattern, I got something strange, see:
>>> st = 'xx:zz:rr=uu,ii=oo,ff=ee'
>>> pt = re.compile(r'\S+:\S+:([\S+=\S+$|,]+)')
>>> pt.findall(st)
['e']
I'm not sure if I wrote the right pattern or not; how can I check it? If it's wrong, what is the right answer though?
I think you want something like this,
>>> import re
>>> s = """ foo bar bar foo
xx:zz:rr=uu,ii=oo,ff=ee
aa:bb:cc=dd
xx:zz:rr=uu,ii=oo,ff=ee
bar foo"""
>>> m = re.findall(r'^[^=: ]+?[=:](?:[^=:,]+?[:=][^,\n]+?)(?:,[^=:,]+?[=:][^,\n]+?)*$', s, re.M)
>>> m
['xx:zz:rr=uu,ii=oo,ff=ee', 'aa:bb:cc=dd', 'xx:zz:rr=uu,ii=oo,ff=ee']
st = 'xx:zz:rr=uu,ii=oo,ff=ee'
m = re.findall(r'\w+:\w+:(\w+=\w+)((?:,\w+=\w+)*)', st )
>>> m
[('rr=uu', ',ii=oo,ff=ee')]
Don't use \S because this will also match :. It's better to use \w
Or :
re.findall(r'\w+:\w+:(\w+=\w+(?:,\w+=\w+)*)', st )[0].split(',')
# This will return: ['rr=uu', 'ii=oo', 'ff=ee']
Here's a more readable regex that should work for you:
\S+?:\S+?:(?:\S+?=\S+?.)+
It makes use of a non-capturing group (?:...) and the plus + repeat token to match on one or more of the "alphabetical_part=alphabetical_part"
Example:
>>> import re
>>> str = """ foo bar
... foo bar bar foo
... xx:zz:rr=uu,ii=oo,ff=ee
... aa:bb:cc=dd
... xx:zz:rr=uu,ii=oo,ff=ee
... bar foo """
>>> pat = re.compile(ur'\S+?:\S+?:(?:\S+?=\S+?.)+')
>>> re.findall(pat, str)
['xx:zz:rr=uu,ii=oo,ff=ee', 'aa:bb:cc=dd', 'xx:zz:rr=uu,ii=oo,ff=ee']

Wilcard matching substring in Python

I am completely new to Python and don't know how to get a sub-string which matches some wildcard condition from a string.
I am trying to get a timestamp from the following string:
sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data
I want to get only "1360922654.97671" part out of the string.
Please help.
Because you mentioned wildcards you can use re
In [77]: import re
In [78]: s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
In [79]: re.findall("\d+\.\d+", s)
Out[79]: ['1360922654.97671']
If the dots and dashes have their specific function within your string, you can use this:
>>> s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
>>> s.rsplit('.', 1)[0].split('-')[-1]
'1360922654.97671'
Step by step:
>>> s.rsplit('.', 1)
['sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671', 'data']
>>> s.rsplit('.', 1)[0]
'sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671'
>>> s.rsplit('.', 1)[0].split('-')
['sdc4', '251504', '7f5', 'f59c349f0e516894fc89d2686a0d57f5', '1360922654.97671']
>>> s.rsplit('.', 1)[0].split('-')[-1]
'1360922654.97671'
This will work for any strings in the form:
anything-WHATYOUWANT.stringwithoutdots
>>> s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
>>> s.split('-')[-1][:-5]
'1360922654.97671'
slightly fewer characters, only works where the last part of the string is .data or another 5 character string.

Categories

Resources