Regex for wrapping digits with curly braces?

Regex for wrapping digits with curly braces? - python

I am trying to using Python's re.sub() to match a string with an e character and insert curly braces immediately after the e character and after the lastdigit. For example:
12.34e56 to 12.34e{56}
1e10 to 1e{10}
I can't seem to find the correct regex to insert the desired curly braces. For example, I can properly insert the left brace like this:
>>> import re
>>> x = '12.34e10'
>>> pattern = re.compile(r'(e)')
>>> sub = z = re.sub(pattern, "\1e{", x)
>>> print(sub)
12.34e{10 # this is the correct placement for the left brace
My problem arises when using two back references.
>>> import re
>>> x = '12.34e10'
>>> pattern = re.compile(r'(e).+($)')
>>> sub = z = re.sub(pattern, "\1e{\2}", x)
>>> print(sub)
12.34e{} # this is not what I want, digits 10 have been removed
Can anyone point out my problem? Thanks for the help.

re.sub(r'e(\d+)', r'e{\1}', '12.34e56')
returns '12.34e{56}'
or, the same result but different logic (don't replace e with e):
re.sub(r'(?<=e)(\d+)', r'{\1}', '12.34e56')

Your brace placement is incorrect.
Here's a solution ensuring the that there's a number with optional decimal place before the e:
import re
samples = ['12.34e56','1e10']
for s in samples:
print re.sub(r'(\d+(?:\.\d+)?)e([0-9]+)',"\g<1>e{\g<2>}",s)
Yields:
12.34e{56}
1e{10}

Related

I want to implement OR operator in find() in python

While compiling the following code i am not getting an syntax error but not all results. The point of the program is to check a string sequence, find some specific substrings in it and print a resulting string having the substring and 19 characters following it. Print each time those strings occurs and every resulting string.
here is the code..
x=raw_input('GET STRING:: ');
m=len(x);
k=0;
while(k<m):
if('AAT'in x or 'AAC' in x or 'AAG' in x):
start = x.find('AAT') or x.find('AAC') or x.find('AAG')
end=start+19
print x[start:end]
When I'm inputting a string like ATGGAATCTTGTGATTGCATTGACACGCCATGCCCTGGTGAAGAACTCTTAGTGAAATATCAGTATATCT. It only searches for AAT and prints the resulting substring but not AAG and AAC. Can anyone help me implement the operator???

In your example, it's probably better to use a regular expression.
>>> text = 'ATGGAATCTTGTGATTGCATTGACACGCCATGCCCTGGTGAAGAACTCTTAGTGAAATATCAGTATATCT'
>>> re.search('(?:AA[TCG])(.{19})', text).group(1)
'CTTGTGATTGCATTGACAC'
You could change to re.findall if multiple matches are desired from the string. (But this won't work too well if you want over lapping matches (ie, your string of 3 appears again in the 19).

search for the first occurrence starting from k
mystring=raw_input('GET STRING:: ')
m=len(mystring)
k=0
while(k<m):
x=mystring[k:]
start=min(x.find('AAT'),x.find('AAC'),x.find('AAG'))
end=min(start+19,m)
print x[start:end]
k+=start+1

You should set start to the minimum non-negative value of the three find statements.

You can handle overlapping matches with regular expressions that use lookahead assertions together with a capturing group:
>>> import re
>>> regex = re.compile("(?=(AA[TCG].{19}))")
>>> regex.findall("ATGGAATCTTGTGATTGCATTGACACGCCATGCCCTGGTGAAGAACTCTTAGTGAAATATCAGTATATCT")
['AATCTTGTGATTGCATTGACAC', 'AAGAACTCTTAGTGAAATATCA', 'AACTCTTAGTGAAATATCAGTA']
>>>

How about this:
import re
str= "ATGGAATCTTGTGATTGCATTGACACGCCATGCCCTGGTGAAGAACTCTTAGTGAAATATCAGTATATCT"
alist = ['AAT','AAC','AAG']
newlist= [re.findall(e,str) for e in alist]
Output: [['AAT','AAT'],['AAC'],['AAG']].
Here a bit heavier with indexes:
import re
astr= "ATGGAATCTTGTGATTGCATTGACACGCCATGCCCTGGTGAAGAACTCTTAGTGAAATATCAGTATATCT"
def find_triple_base(astr, nth_sub):
return [(m.end(), m.group(), astr[m.end(0):m.end(0)+nth_sub]) for m in re.finditer(r'AA[TCG]', astr)]
for e in find_triple_base(astr, 19): print(e)
Output:
(7, 'AAT', 'CTTGTGATTGCATTGACAC')
(43, 'AAG', 'AACTCTTAGTGAAATATCA')
(46, 'AAC', 'TCTTAGTGAAATATCAGTA')
(58, 'AAT', 'ATCAGTATATCT')
What it does: findall finds all occurences of your base triples (alist) you'd like to find and generates a new list with 3 lists with base triples eg [['AAT','AAT'],['AAC'],['AAG']]. It's straight forward to print this out.
I hope this helps!

Have a look on this : http://ideone.com/U70n4y
Code:
x=raw_input('GET STRING:: ');
m=len(x);
k=0
if('AAT'in x ):
start = x.find('AAT')
end=start+19
print x[start:end]
elif('AAC' in x ):
start = x.find('AAC')
end=start+19
print x[start:end]
elif('AAG' in x):
start = x.find('AAG')
end=start+19
print x[start:end]
Edit : try this regexp code
import re
y=r"(?:AA[TCG]).{19}"
x=raw_input('GET STRING:: ');
l= re.findall(y,x)
for x in l:
print x
print len(x)
http://ideone.com/U70n4y

Regular expression to replace a character on odd repeated occurrences in Python

Can't get a regular expression to replace a character on odd repeated occurrences in Python.
Example:
char = ``...```.....``...`....`````...`
to
``...``````.....``...``....``````````...``
on even occurrences doesn't replace.

for example:
>>> import re
>>> s = "`...```.....``...`....`````...`"
>>> re.sub(r'((?<!`)(``)*`(?!`))', r'\1\1', s)
'``...``````.....``...``....``````````...``'

Maybe I'm old fashioned (or my regex skills aren't up to par), but this seems to be a lot easier to read:
import re
def double_odd(regex,string):
"""
Look for groups that match the regex. Double every second one.
"""
count = [0]
def _double(match):
count[0] += 1
return match.group(0) if count[0]%2 == 0 else match.group(0)*2
return re.sub(regex,_double,string)
s = "`...```.....``...`....`````...`"
print double_odd('`',s)
print double_odd('`+',s)
It seems that I might have been a little confused about what you were actually looking for. Based on the comments, this becomes even easier:
def odd_repl(match):
"""
double a match (all of the matched text) when the length of the
matched text is odd
"""
g = match.group(0)
return g*2 if len(g)%2 == 1 else g
re.sub(regex,odd_repl,your_string)

This may be not as good as the regex solution, but works:
In [101]: s1=re.findall(r'`{1,}',char)
In [102]: s2=re.findall(r'\.{1,}',char)
In [103]: fill=s1[-1] if len(s1[-1])%2==0 else s1[-1]*2
In [104]: "".join("".join((x if len(x)%2==0 else x*2,y)) for x,y in zip(s1,s2))+fill
Out[104]: '``...``````.....``...``....``````````...``'

Retrieving a full number

Assume I have a string as follows: expression = '123 + 321'.
I am walking over the string character-by-character as follows: for p in expression. I am I am checking if p is a digit using p.isdigit(). If p is a digit, I'd like to grab the whole number (so grab 123 and 321, not just p which initially would be 1).
How can I do that in Python?
In C (coming from a C background), the equivalent would be:
int x = 0;
sscanf(p, "%d", &x);
// the full number is now in x
EDIT:
Basically, I am accepting a mathematical expression from a user that accepts positive integers, +,-,*,/ as well as brackets: '(' and ')'. I am walking the string character by character and I need to be able to determine whether the character is a digit or not. Using isdigit(), I can that. If it is a digit however, I need to grab the whole number. How can that be done?

>>> from itertools import groupby
>>> expression = '123 + 321'
>>> expression = ''.join(expression.split()) # strip whitespace
>>> for k, g in groupby(expression, str.isdigit):
if k: # it's a digit
print 'digit'
print list(g)
else:
print 'non-digit'
print list(g)
digit
['1', '2', '3']
non-digit
['+']
digit
['3', '2', '1']

This is one of those problems that can be approached from many different directions. Here's what I think is an elegant solution based on itertools.takewhile:
>>> from itertools import chain, takewhile
>>> def get_numbers(s):
... s = iter(s)
... for c in s:
... if c.isdigit():
... yield ''.join(chain(c, takewhile(str.isdigit, s)))
...
>>> list(get_numbers('123 + 456'))
['123', '456']
This even works inside a list comprehension:
>>> def get_numbers(s):
... s = iter(s)
... return [''.join(chain(c, takewhile(str.isdigit, s)))
... for c in s if c.isdigit()]
...
>>> get_numbers('123 + 456')
['123', '456']
Looking over other answers, I see that this is not dissimilar to jamylak's groupby solution. I would recommend that if you don't want to discard the extra symbols. But if you do want to discard them, I think this is a bit simpler.

The Python documentation includes a section on simulating scanf, which gives you some idea of how you can use regular expressions to simulate the behavior of scanf (or sscanf, it's all the same in Python). In particular, r'\-?\d+' is the Python string that corresponds to the regular expression for an integer. (r'\d+' for a nonnegative integer.) So you could embed this in your loop as
integer = re.compile(r'\-?\d+')
for p in expression:
if p.isdigit():
# somehow find the current position in the string
integer.match(expression, curpos)
But that still reflects a very C-like way of thinking. In Python, your iterator variable p is really just an individual character that has actually been pulled out of the original string and is standing on its own. So in the loop, you don't naturally have access to the current position within the string, and trying to calculate it is going to be less than optimal.
What I'd suggest instead is using Python's built in regexp matching iteration method:
integer = re.compile(r'\-?\d+') # only do this once in your program
all_the_numbers = integer.findall(expression)
and now all_the_numbers is a list of string representations of all the integers in the expression. If you wanted to actually convert them to integers, then you could do this instead of the last line:
all_the_numbers = [int(s) for s in integer.finditer(expression)]
Here I've used finditer instead of findall because you don't have to make a list of all the strings before iterating over them again to convert them to integers.

Though I'm not familiar with sscanf, I'm no C developer, it looks like it's using format strings in a way not dissimilar to what I'd use python's re module for. Something like this:
import re
nums = re.compile('\d+')
found = nums.findall('123 + 321')
# if you know you're only looking for two values.
left, right = found

You can use shlex http://docs.python.org/library/shlex.html
>>> from shlex import shlex
>>> expression = '123 + 321'
>>> for e in shlex(expression):
... print e
...
123
+
321
>>> expression = '(92831 * 948) / 32'
>>> for e in shlex(expression):
... print e
...
(
92831
*
948
)
/
32

I'd split the string up on the ' + ' string, giving you what's outside of them:
>>> expression = '123 + 321'
>>> ex = expression.split(' + ')
>>> ex
['123', '321']
>>> int_ex = map(int, ex)
>>> int_ex
[123, 321]
>>> sum(int_ex)
444
It's dangerous, but you could use eval:
>>> eval('123 + 321')
444
I'm just taking a stab at you parsing the string, and doing raw calculations on it.

e_array = expression.split('+')
i_array = map(int, e_array)
And i_array holds all integers in the expression.
UPDATE
If you already know all the special characters in your expression and you want to eliminate them all
import re
e_array = re.split('[*/+\-() ]', expression) # all characters here is mult, div, plus, minus, left- right- parathesis and space
i_array = map(int, filter(lambda x: len(x), e_array))

python string manipulation [duplicate]

I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?

Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK

You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.

>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'

Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'

Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)

You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?

You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E

this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()

How to print string in this way

For every string, I need to print # each 6 characters.
For example:
example_string = "this is an example string. ok ????"
myfunction(example_string)
"this i#s an e#xample# strin#g. ok #????"
What is the most efficient way to do that ?

How about this?
'#'.join( [example_string[a:a+6] for a in range(0,len(example_string),6)])
It runs pretty quickly, too. On my machine, five microseconds per 100-character string:
>>> import timeit
>>> timeit.Timer( "'#'.join([s[a:a+6] for a in range(0,len(s),6)])", "s='x'*100").timeit()
4.9556539058685303

>>> str = "this is an example string. ok ????"
>>> import re
>>> re.sub("(.{6})", r"\1#", str)
'this i#s an e#xample# strin#g. ok #????'
Update:
Normally dot matches all characters except new-lines. Use re.S to make dot match all characters including new-line chars.
>>> pattern = re.compile("(.{6})", re.S)
>>> str = "this is an example string with\nmore than one line\nin it. It has three lines"
>>> print pattern.sub(r"\1#", str)
this i#s an e#xample# strin#g with#
more #than o#ne lin#e
in i#t. It #has th#ree li#nes

import itertools
def every6(sin, c='#'):
r = itertools.izip_longest(*([iter(sin)] * 6 + [c * (len(sin) // 6)]))
return ''.join(''.join(y for y in x if y is not None) for x in r)
print every6(example_string)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Regex for wrapping digits with curly braces? - python

re.sub(r'e(\d+)', r'e{\1}', '12.34e56') returns '12.34e{56}' or, the same result but different logic (don't replace e with e): re.sub(r'(?<=e)(\d+)', r'{\1}', '12.34e56')

Your brace placement is incorrect. Here's a solution ensuring the that there's a number with optional decimal place before the e: import re samples = ['12.34e56','1e10'] for s in samples: print re.sub(r'(\d+(?:\.\d+)?)e([0-9]+)',"\g<1>e{\g<2>}",s) Yields: 12.34e{56} 1e{10}

Related

I want to implement OR operator in find() in python

Regular expression to replace a character on odd repeated occurrences in Python

Retrieving a full number

python string manipulation [duplicate]

How to print string in this way

Categories

Resources