How do I split the following string?

How do I split the following string? - python

I have the following string where I need to extract only the first digits from it.
string = '50.2000\xc2\xb0 E'
How do I extract 50.2000 from string?

If the number can be followed by any kind of character, try using a regex:
>>> import re
>>> r = re.compile(r'(\d+\.\d+)')
>>> r.match('50.2000\xc2\xb0 E').group(1)
'50.2000'

mystring = '50.2000\xc2\xb0 E'
print mystring.split("\xc2", 1)[0]
Output
50.2000

If you just wanted to split the first digits, just slice the string:
start = 10 #start at the 10th digit
print mystring[start:]
Demo:
>>> my_string = 'abcasdkljf23u109842398470ujw{}{\\][\\['
>>> start = 10
>>> print(my_string[start:])
23u109842398470ujw{}{\][\[
You can, split the string at the first \:
>>> s = r'50.2000\xc2\xb0 E'
>>> s.split('\\', 1)
['50.2000', 'xc2\\xb0 E']

You could solve this using a regular expression:
In [1]: import re
In [2]: string = '50.2000\xc2\xb0 E'
In [3]: m = re.match('^([0-9]+\.?[0-9]*)', string)
In [4]: m.group(0)
Out[4]: '50.2000'

Related

search substrings and replace them after analyzing

I've got multiple occurences of strings like this in my file:
%na^me%
%name^%
%^name%
....
I want to search every string like this in my file and replace it after analyzing the string.
For example
string `%^name%` will be replaced `Data`
string `%name^%` will be replaced with `DATA`
....
To find my substring I use this function with regex
re.findall('(?<=\%)(.*?)(?=\%)', data)
It finds substrings fine. But how to replace them?
The solution I see is to create map and iterate over to replace everyoccurrence with some value.
But is there a better way?

You don't need to go for re.findall. Just re.sub would be fine.
>>> s = '''%na^me%
%name^%
%^name%'''
>>> m = re.sub(r'(?<=%)\^.*?(?=%)', r'Data', s)
>>> f = re.sub(r'(?<=%).*?\^(?=%)', r'DATA', m)
>>> print(f)
%na^me%
%DATA%
%Data%
Update:
>>> m = re.sub(r'(?<=%)\^.*?(?=%)', r'Data', s)
>>> f = re.sub(r'(?<=%).*?\^(?=%)', r'DATA', m)
>>> j = re.sub(r'(?<=%).*?.\^..*(?=%)', r'datA', f)
>>> print(j)
%datA%
%DATA%
%Data%
If you want to replace % also, you could try this,
>>> m = re.sub(r'%\^.*?%', r'Data', s)
>>> f = re.sub(r'%.*?\^%', r'DATA', m)
>>> j = re.sub(r'%.*?.\^..*%', r'datA', f)
>>> print(j)
datA
DATA
Data

You can use the following pattern , note that you need to escape ^ with \. and instead look-around you can use grouping Also you need to use r before your pattern if you want to python interpret you pattern as regex format :
>>> s="""%na^me%
... %name^%
... %^name%"""
>>> l= re.findall(r'%([a-zA-Z\^]+)%',s)
['na^me', 'name^', '^name']
and for replace the string you can use a dictionary like the following , and replace your pattern with str.replace():
>>> d={'^name':'Data','name^':'DATA','na^me':'DAta'}
>>> for i in l :
... s=s.replace(i,d[i])
...
>>> s
'%DAta%\n%DATA%\n%Data%'

Regex how to check last 4 numbers from long number

I would like to check only last 4 digit number with python
for example, if I have following numbers and I want to check last four number whether it start from 10
or 02
201600001057 ( I want to get 1057)
201600000216 ( I want to get 0216)
Thanks in advance

Why would you use regex for this?
last4 = str(number)[-4:]
if last4.startswith(('10', '02')):
print("yes, actually")

You can do it without regexp
>>> s="201600001057"
>>> s[-4:]
"1057"
>>> s[-4:].isdigit()
True
>>> s="201600001057a"
>>> s[-4:].isdigit()
False

(?=(?:10|02))\d{4}$
This should do it.See demo.
http://regex101.com/r/kP4pZ2/18
print re.findall(r"(?=(?:10|02))\d{4}$",x,re.M)
x is your string.

You could use re.search or re.match. It would match the strings only if the last four numbers starts with 10 or 02
>>> s = "201600001057"
>>> s1 = "201600000216"
>>> re.search(r'(?:10|02)\d{2}$', s)
<_sre.SRE_Match object at 0x7fdbb2b6d3d8>
>>> re.search(r'(?:10|02)\d{2}$', s).group()
'1057'
>>> re.search(r'(?:10|02)\d{2}$', s1).group()
'0216'
>>> if re.search(r'(?:10|02)\d{2}$', s1):
... print 'Matches'
...
Matches
>>> if re.search(r'(?:10|02)\d{2}$', s):
... print 'Matches'
...
Matches

the findall function in re module can be used
>>> import re
>>> x="201600001057"
>>> re.findall('\d{4}$', x)
['1057']

Building regular expression for Python

\b(?:AN|AcntNumber) : (\w+)
the above regex prints the 'AcntNumber' as well
AcntNumber : c422731c7c2a4f9cbe98fbfbf410265f
but I want only to print c422731c7c2a4f9cbe98fbfbf410265f. Can anyone help me please?

Split the string from : and you have your Account Number.
>>> string = "AcntNumber : c422731c7c2a4f9cbe98fbfbf410265f"
>>> frags = string.split(':')
>>> number = frags[1].strip()
>>> number
'c422731c7c2a4f9cbe98fbfbf410265f'
Or:
>>> import re
>>> string = "AcntNumber : c422731c7c2a4f9cbe98fbfbf410265f"
>>> e = "\b?:AN|AcntNumber : (\w+)"
>>> ext = re.findall(e, string)
>>> ext[0]
'c422731c7c2a4f9cbe98fbfbf410265f'
>>>

Python Regular expression repeat

I have a string like this
--x123-09827--x456-9908872--x789-267504
I am trying to get all value like
123:09827
456:9908872
789:267504
I've tried (--x([0-9]+)-([0-9])+)+
but it only gives me last pair result, I am testing it through python
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> p = "(--x([0-9]+)-([0-9]+))+"
>>> re.match(p,x)
>>> re.match(p,x).groups()
('--x789-267504', '789', '267504')
How should I write with nested repeat pattern?
Thanks a lot!
David

Code it like this:
x = "--x123-09827--x456-9908872--x789-267504"
p = "--x(?:[0-9]+)-(?:[0-9]+)"
print re.findall(p,x)

Just use the .findall method instead, it makes the expression simpler.
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> r = re.compile(r"--x(\d+)-(\d+)")
>>> r.findall(x)
[('123', '09827'), ('456', '9908872'), ('789', '267504')]
You can also use .finditer which might be helpful for longer strings.
>>> [m.groups() for m in r.finditer(x)]
[('123', '09827'), ('456', '9908872'), ('789', '267504')]

Use re.finditer or re.findall. Then you don't need the extra pair of parentheses that wrap the entire expression. For example,
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> p = "--x([0-9]+)-([0-9]+)"
>>> for m in re.finditer(p,x):
>>> print '{0} {1}'.format(m.group(1),m.group(2))

try this
p='--x([0-9]+)-([0-9]+)'
re.findall(p,x)

No need to use regex :
>>> "--x123-09827--x456-9908872--x789-267504".replace('--x',' ').replace('-',':').strip()
'123:09827 456:9908872 789:267504'

You don't need regular expressions for this. Here is a simple one-liner, non-regex solution:
>>> input = "--x123-09827--x456-9908872--x789-267504"
>>> [ x.replace("-", ":") for x in input.split("--x")[1:] ]
['123:09827', '456:9908872', '789:267504']
If this is an exercise on regex, here is a solution that uses the repetition (technically), though the findall(...) solution may be preferred:
>>> import re
>>> input = "--x123-09827--x456-9908872--x789-267504"
>>> regex = '--x(.+)'
>>> [ x.replace("-", ":") for x in re.match(regex*3, input).groups() ]
['123:09827', '456:9908872', '789:267504']

How do I coalesce a sequence of identical characters into just one?

Suppose I have this:
My---sun--is------very-big---.
I want to replace all multiple hyphens with just one hyphen.

import re
astr='My---sun--is------very-big---.'
print(re.sub('-+','-',astr))
# My-sun-is-very-big-.

If you want to replace any run of consecutive characters, you can use
>>> import re
>>> a = "AA---BC++++DDDD-EE$$$$FF"
>>> print(re.sub(r"(.)\1+",r"\1",a))
A-BC+D-E$F
If you only want to coalesce non-word-characters, use
>>> print(re.sub(r"(\W)\1+",r"\1",a))
AA-BC+DDDD-EE$FF
If it's really just hyphens, I recommend unutbu's solution.

If you really only want to coalesce hyphens, use the other suggestions. Otherwise you can write your own function, something like this:
>>> def coalesce(x):
... n = []
... for c in x:
... if not n or c != n[-1]:
... n.append(c)
... return ''.join(n)
...
>>> coalesce('My---sun--is------very-big---.')
'My-sun-is-very-big-.'
>>> coalesce('aaabbbccc')
'abc'

As usual, there's a nice itertools solution, using groupby:
>>> from itertools import groupby
>>> s = 'aaaaa----bbb-----cccc----d-d-d'
>>> ''.join(key for key, group in groupby(s))
'a-b-c-d-d-d'

How about:
>>> import re
>>> re.sub("-+", "-", "My---sun--is------very-big---.")
'My-sun-is-very-big-.'
the regular expression "-+" will look for 1 or more "-".

re.sub('-+', '-', "My---sun--is------very-big---")

How about an alternate without the re module:
'-'.join(filter(lambda w: len(w) > 0, 'My---sun--is------very-big---.'.split("-")))
Or going with Tim and FogleBird's previous suggestion, here's a more general method:
def coalesce_factory(x):
return lambda sent: x.join(filter(lambda w: len(w) > 0, sent.split(x)))
hyphen_coalesce = coalesce_factory("-")
hyphen_coalesce('My---sun--is------very-big---.')
Though personally, I would use the re module first :)
mcpeterson

Another simple solution is the String object's replace function.
while '--' in astr:
astr = astr.replace('--','-')

if you don't want to use regular expressions:
my_string = my_string.split('-')
my_string = filter(None, my_string)
my_string = '-'.join(my_string)

I have
my_str = 'a, b,,,,, c, , , d'
I want
'a,b,c,d'
compress all the blanks (the "replace" bit), then split on the comma, then if not None join with a comma in between:
my_str_2 = ','.join([i for i in my_str.replace(" ", "").split(',') if i])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I split the following string? - python

I have the following string where I need to extract only the first digits from it. string = '50.2000\xc2\xb0 E' How do I extract 50.2000 from string?

If the number can be followed by any kind of character, try using a regex: >>> import re >>> r = re.compile(r'(\d+\.\d+)') >>> r.match('50.2000\xc2\xb0 E').group(1) '50.2000'

mystring = '50.2000\xc2\xb0 E' print mystring.split("\xc2", 1)[0] Output 50.2000

You could solve this using a regular expression: In [1]: import re In [2]: string = '50.2000\xc2\xb0 E' In [3]: m = re.match('^([0-9]+\.?[0-9]*)', string) In [4]: m.group(0) Out[4]: '50.2000'

Related

search substrings and replace them after analyzing

Regex how to check last 4 numbers from long number

Building regular expression for Python

Python Regular expression repeat

How do I coalesce a sequence of identical characters into just one?

Categories

Resources