Replacing consecutive symbol with number digit in python using regex

Replacing consecutive symbol with number digit in python using regex - python

Here's the case :
str='myfile.#.#####-########.ext'
i want to replace the # with number : 456
so it should be :
str = 'myfile.456.00456-00000456.ext'
the second case :
str= 'myfile.%012d.tga'
replace the pattern with number 456 so it will become :
str= 'myfile.000000000456.tga'
i can solve this using string replacement method by grab the pattern then count the digits then fill with zero pad. Right now , i want to know how to do it using regex in python ? Can anyone help ? Thanks a lot.

The second case is straight forward and does not require regex and a regex would be an overkill. I would suggest you to use a string format replacement
'myfile.%012d.tga' % 456
Out[21]: 'myfile.000000000456.tga'
The first case is tricky but possible
>>> def repl(m):
return "{{0:0{}}}".format(len(m.group(1)))
>>> re.sub(r"(#+)", repl, st).format(456)
'myfile.456.00456-00000456.ext'

Through re.sub without format function.
>>> s = 'myfile.#.#####-########.ext'
>>> re.sub(r'#+', lambda m: '456' if len(m.group()) == 1 else m.group()[:-1].replace('#', '0') + '456', s)
'myfile.456.0000456-0000000456.ext'
For the second case,
>>> s = 'myfile.%012d.tga'
>>> re.sub(r'%(\d+)d', lambda m: str('0' * int(m.group(1)))[:-1] + '456', s)
'myfile.00000000000456.tga'

Finally, thanks for all who has answered my question. That 'lambda' thing reallt give me the idea , here's the answer for my question :
my first case using '#' :
s = 'myfile.##.####.########.ext'
print re.sub('#+', lambda x : '456'.zfill(len(x.group())) ,s)
---> myfile.456.0456.00000456.ext
my second case using %0xd format :
s = 'myfile.%06d--%012d--%02d.ext'
print re.sub('%0[0-9]+d', lambda x : x.group() % 456 ,s)
r---> myfile.000456--000000000456--456.ext
Here's just simple combination of both case above :
s = 'myfile.##.####.########.%06d---%02d.ext'
x=re.sub('#+', lambda x : '456'.zfill(len(x.group())) ,s)
print re.sub('%0[0-9]+d', lambda x : x.group() % 456 ,x)
---> myfile.456.0456.00000456.000456---456.ext
Don't forget to 'import re' to use the regex.

Related

Using string.replace(x, y) to replace all

I have just started to learn python and would like help using string.replace(x,y).
Specifically, replacing all to X's and x's depending whether the letter was originally capitalized or not.
e.g.
John S. Smith -> Xxxx X. Xxxxx
What I have created currently is below.
print("Enter text to translate:", end ="")
sentence = input ()
replaced = sentence.replace("", "x")
print(replaced)
However when I input text like "John Smith". I am returned with "xJxoxhxnx xSx.x xSxmxixtxhx".
Thank you in advance!
Edit: Although string.replace(x,y) may be longer to perform, I'd like to slowly build on my knowledge before finding faster and shorter ways to perform the same operation. I'd highly appreciate it if it was explained in terms of string.replace(x, y) instead of re.sub
Edit2: I have been notified that string.replace is the wrong tool to use. Thank you for your help! I will be reading into re.sub instead.

If you insist on using replace even though it's the wrong tool for the job (because it can only replace one letter at a time and has to go through the whole string every time), here's a way:
>>> s = 'John S. Smith'
>>> for c in s:
if c.islower():
s = s.replace(c, 'x')
if c.isupper():
s = s.replace(c, 'X')
>>> s
'Xxxx X. Xxxxx'
And a somewhat neat more efficient way:
>>> ''.join('x' * c.islower() or 'X' * c.isupper() or c for c in s)
'Xxxx X. Xxxxx'
And a regex way:
>>> re.sub('[A-Z]', 'X', re.sub('[a-z]', 'x', s))
'Xxxx X. Xxxxx'

import re
print("Enter text to translate:", end ="")
sentence = input()
replaced = re.sub("[A-Z]", 'X', re.sub("[a-z]", 'x', sentence))
print replaced
Use re.sub to replace individual character of string or iterate through the string.

Not the correct usecase for string replace.
There are 2 things that you can do:
Loop through the string and perform the operation
Use re.sub to replace using regex. (How to input a regex in string.replace?)

Plain way:
>>> my_string = "John S. Smith"
>>> replaced = ''
>>> for character in my_string:
if character.isupper():
replaced += 'X'
elif character.islower():
replaced += 'x'
else:
replaced += character
>>> replaced
'Xxxx X. Xxxxx'
One liner:
>>> ''.join('x' if c.islower() else 'X' if c.isupper() else c for c in my_string)
'Xxxx X. Xxxxx'

print("Enter text to translate:", end ="")
sentence = input ()
replaced = ''.join(['x' if (i>='a' and i<='z') else 'X' if (i>='A' and i<='Z') else i for i in sentence])
print(replaced)

You can use the re module, and provide re.sub with a callback function to achieve this.
import re
def get_capital(ch):
if ch.isupper():
return "X"
return "x"
def get_capitals(s):
return re.sub("[a-zA-Z]", lambda ch: get_capital(ch.group(0)), s)
print(get_capitals("John S. Smith"))
Has the output:
Xxxx X. Xxxxx
As a matter of style, I've not condensed any of this into lambdas or one-liners, although you probably could.
This will be significantly faster than just repeated concatenation.

** edit fixed code **
Try this instead, it is a bit of a simple and easily breakable example but this is what you would need to achieve what you want:
s = "Testing This"
r = ""
for c in s:
if c.isalpha():
if c.islower():
r += "x"
else:
r += "X"
else:
r += c
print(r)

How to efficiently match regex in python

I am writing a code to match the US phone number format
So it should match:
123-333-1111
(123)111-2222
123-2221111
But should not match
1232221111
matchThreeDigits = r"(?:\s*\(?[\d]{3}\)?\s*)"
matchFourDigits = r"(?:\s*[\d]{4}\s*)"
phoneRegex = '('+ '('+ matchThreeDigits + ')' + '-?' + '('+ matchThreeDigits + ')' + '-?' + '(' + matchFourDigits + ')' +')';
matches = re.findall(re.compile(phoneRegex),line)
The problem is I need to make sure at least one of () or '-' is present in present in the pattern (or else it can be a nine digit number rather than a phone number). I don't want to do another pattern search for efficiency reasons. Is there any way to accommodate this information in the regex pattern itself.

Something like this?
pattern = r'(\(?(\d{3})\)?(?P<A>-)?(\d{3})(?(A)-?|-)(\d{4}))'
Using it:
import re
regex = re.compile(pattern)
check = ['123-333-1111', '(123)111-2222', '123-2221111', '1232221111']
for number in check:
match = regex.match(number)
print number, bool(match)
if match:
# show the numbers
print 'nums:', filter(lambda x: x and x.isalnum(), match.groups())
>>>
123-333-1111 True
nums: ('123', '333', '1111')
(123)111-2222 True
nums: ('123', '111', '2222')
123-2221111 True
nums: ('123', '222', '1111')
1232221111 False
Note:
You requested an explanation of: (?P<A>-) and (?(A)-?|-)
(?P<A>-) : Is a named capture group with the name A, (?P<NAME> ... )
(?(A)-?|-) : Is a group that checks if the named group A captured something or not, if so it does the YES, else it does the NO capture. (?(NAME)YES|NO)
All this can be easily learned if you do a simple help(re) in the Python interpreter, or a Google search for Python Regular Expressions....

You can use the following regex:
regex = r'(?:\d{3}-|\(\d{3}\))\d{3}-?\d{4}'
assuming that (123)1112222 is acceptable.
The | acts as an or, and \( and \) escape ( and ), respectively.

import re
phoneRegex = re.compile("(\({0,1}[\d]{3}\)(?=[\d]{3})|[\d]{3}-)([\d]{3}[-]{0,1}[\d]{4})")
numbers = ["123-333-1111", "(123)111-2222", "123-2221111", "1232221111", "(123)-111-2222"]
for number in numbers:
print bool(re.match(phoneRegex, number))
Output
True
True
True
False
False
You can see an explanation to this regular expression here : http://regex101.com/r/bA4fH8

python capitalize first letter only

I am aware .capitalize() capitalizes the first letter of a string but what if the first character is a integer?
this
1bob
5sandy
to this
1Bob
5Sandy

Only because no one else has mentioned it:
>>> 'bob'.title()
'Bob'
>>> 'sandy'.title()
'Sandy'
>>> '1bob'.title()
'1Bob'
>>> '1sandy'.title()
'1Sandy'
However, this would also give
>>> '1bob sandy'.title()
'1Bob Sandy'
>>> '1JoeBob'.title()
'1Joebob'
i.e. it doesn't just capitalize the first alphabetic character. But then .capitalize() has the same issue, at least in that 'joe Bob'.capitalize() == 'Joe bob', so meh.

If the first character is an integer, it will not capitalize the first letter.
>>> '2s'.capitalize()
'2s'
If you want the functionality, strip off the digits, you can use '2'.isdigit() to check for each character.
>>> s = '123sa'
>>> for i, c in enumerate(s):
... if not c.isdigit():
... break
...
>>> s[:i] + s[i:].capitalize()
'123Sa'

This is similar to #Anon's answer in that it keeps the rest of the string's case intact, without the need for the re module.
def sliceindex(x):
i = 0
for c in x:
if c.isalpha():
i = i + 1
return i
i = i + 1
def upperfirst(x):
i = sliceindex(x)
return x[:i].upper() + x[i:]
x = '0thisIsCamelCase'
y = upperfirst(x)
print(y)
# 0ThisIsCamelCase
As #Xan pointed out, the function could use more error checking (such as checking that x is a sequence - however I'm omitting edge cases to illustrate the technique)
Updated per #normanius comment (thanks!)
Thanks to #GeoStoneMarten in pointing out I didn't answer the question! -fixed that

Here is a one-liner that will uppercase the first letter and leave the case of all subsequent letters:
import re
key = 'wordsWithOtherUppercaseLetters'
key = re.sub('([a-zA-Z])', lambda x: x.groups()[0].upper(), key, 1)
print key
This will result in WordsWithOtherUppercaseLetters

As seeing here answered by Chen Houwu, it's possible to use string package:
import string
string.capwords("they're bill's friends from the UK")
>>>"They're Bill's Friends From The Uk"

a one-liner: ' '.join(sub[:1].upper() + sub[1:] for sub in text.split(' '))

You can replace the first letter (preceded by a digit) of each word using regex:
re.sub(r'(\d\w)', lambda w: w.group().upper(), '1bob 5sandy')
output:
1Bob 5Sandy

def solve(s):
for i in s[:].split():
s = s.replace(i, i.capitalize())
return s
This is the actual code for work. .title() will not work at '12name' case

I came up with this:
import re
regex = re.compile("[A-Za-z]") # find a alpha
str = "1st str"
s = regex.search(str).group() # find the first alpha
str = str.replace(s, s.upper(), 1) # replace only 1 instance
print str

def solve(s):
names = list(s.split(" "))
return " ".join([i.capitalize() for i in names])
Takes a input like your name: john doe
Returns the first letter capitalized.(if first character is a number, then no capitalization occurs)
works for any name length

Analyzing string input until it reaches a certain letter on Python

I need help in trying to write a certain part of a program.
The idea is that a person would input a bunch of gibberish and the program will read it till it reaches an "!" (exclamation mark) so for example:
input("Type something: ")
Person types: wolfdo65gtornado!salmontiger223
If I ask the program to print the input it should only print wolfdo65gtornado and cut anything once it reaches the "!" The rest of the program is analyzing and counting the letters, but those part I already know how to do. I just need help with the first part. I been trying to look through the book but it seems I'm missing something.
I'm thinking, maybe utilizing a for loop and then placing restriction on it but I can't figure out how to make the random imputed string input be analyzed for a certain character and then get rid of the rest.
If you could help, I'll truly appreciate it. Thanks!

The built-in str.partition() method will do this for you. Unlike str.split() it won't bother to cut the rest of the str into different strs.
text = raw_input("Type something:")
left_text = text.partition("!")[0]
Explanation
str.partition() returns a 3-tuple containing the beginning, separator, and end of the string. The [0] gets the first item which is all you want in this case. Eg.:
"wolfdo65gtornado!salmontiger223".partition("!")
returns
('wolfdo65gtornado', '!', 'salmontiger223')

>>> s = "wolfdo65gtornado!salmontiger223"
>>> s.split('!')[0]
'wolfdo65gtornado'
>>> s = "wolfdo65gtornadosalmontiger223"
>>> s.split('!')[0]
'wolfdo65gtornadosalmontiger223'
if it doesnt encounter a "!" character, it will just grab the entire text though. if you would like to output an error if it doesn't match any "!" you can just do like this:
s = "something!something"
if "!" in s:
print "there is a '!' character in the context"
else:
print "blah, you aren't using it right :("

You want itertools.takewhile().
>>> s = "wolfdo65gtornado!salmontiger223"
>>> '-'.join(itertools.takewhile(lambda x: x != '!', s))
'w-o-l-f-d-o-6-5-g-t-o-r-n-a-d-o'
>>> s = "wolfdo65gtornado!salmontiger223!cvhegjkh54bgve8r7tg"
>>> i = iter(s)
>>> '-'.join(itertools.takewhile(lambda x: x != '!', i))
'w-o-l-f-d-o-6-5-g-t-o-r-n-a-d-o'
>>> '-'.join(itertools.takewhile(lambda x: x != '!', i))
's-a-l-m-o-n-t-i-g-e-r-2-2-3'
>>> '-'.join(itertools.takewhile(lambda x: x != '!', i))
'c-v-h-e-g-j-k-h-5-4-b-g-v-e-8-r-7-t-g'

Try this:
s = "wolfdo65gtornado!salmontiger223"
m = s.index('!')
l = s[:m]

To explain accepted answer.
Splitting
partition() function splits string in list with 3 elements:
mystring = "123splitABC"
x = mystring.partition("split")
print(x)
will give:
('123', 'split', 'ABC')
Access them like list elements:
print (x[0]) ==> 123
print (x[1]) ==> split
print (x[2]) ==> ABC

Suppose we have:
s = "wolfdo65gtornado!salmontiger223" + some_other_string
s.partition("!")[0] and s.split("!")[0] are both a problem if some_other_string contains a million strings, each a million characters long, separated by exclamation marks. I recommend the following instead. It's much more efficient.
import itertools as itts
get_start_of_string = lambda stryng, last, *, itts=itts:\
str(itts.takewhile(lambda ch: ch != last, stryng))
###########################################################
s = "wolfdo65gtornado!salmontiger223"
start_of_string = get_start_of_string(s, "!")
Why the itts=itts
Inside of the body of a function, such as get_start_of_string, itts is global.
itts is evaluated when the function is called, not when the function is defined.
Consider the following example:
color = "white"
get_fleece_color = lambda shoop: shoop + ", whose fleece was as " + color + " as snow."
print(get_fleece_color("Igor"))
# [... many lines of code later...]
color = "pink polka-dotted"
print(get_fleece_color("Igor's cousin, 3 times removed"))
The output is:
Igor, whose fleece was white as snow.
Igor's cousin, 3 times removed Igor, whose fleece was as pink polka-dotted as snow.

You can extract the beginning of a string, up until the first delimiter is encountered, by using regular expressions.
import re
slash_if_special = lambda ch:\
"\\" if ch in "\\^$.|?*+()[{" else ""
prefix_slash_if_special = lambda ch, *, _slash=slash_if_special: \
_slash(ch) + ch
make_pattern_from_char = lambda ch, *, c=prefix_slash_if_special:\
"^([^" + c(ch) + "]*)"
def get_string_up_untill(x_stryng, x_ch):
i_stryng = str(x_stryng)
i_ch = str(x_ch)
assert(len(i_ch) == 1)
pattern = make_pattern_from_char(ch)
m = re.match(pattern, x_stryng)
return m.groups()[0]
An example of the code above being used:
s = "wolfdo65gtornado!salmontiger223"
result = get_string_up_untill(s, "!")
print(result)
# wolfdo65gtornado

We can use itertools
s = "wolfdo65gtornado!salmontiger223"
result = "".join(itertools.takewhile(lambda x : x!='!' , s))
>>"wolfdo65gtornado"

python string manipulation [duplicate]

I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?

Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK

You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.

>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'

Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'

Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)

You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?

You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E

this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replacing consecutive symbol with number digit in python using regex - python

Related

Using string.replace(x, y) to replace all

How to efficiently match regex in python

python capitalize first letter only

Analyzing string input until it reaches a certain letter on Python

python string manipulation [duplicate]

Categories

Resources