Python: how to replace characters from i-th to j-th matches? - python

For example, if I have:
"+----+----+---+---+--+"
is it possible to replace from second to fourth + to -?
If I have
"+----+----+---+---+--+"
and I want to have
"+-----------------+--+"
I have to replace from 2-nd to 4-th + to -. Is it possible to achieve this by regex? and how?

If you can assume the first character is always a +:
string = '+' + re.sub(r'\+', r'-', string[1:], count=3)
Lop off the first character of your string and sub() the first three + characters, then add the initial + back on.
If you can't assume the first + is the first character of the string, find it first:
prefix = string.index('+') + 1
string = string[:prefix] + re.sub(r'\+', r'-', string[prefix:], count=3)

I would rather iterate over the string, and then replace the pluses according to what I found.
secondIndex = 0
fourthIndex = 0
count = 0
for i, c in enumerate(string):
if c == '+':
count += 1
if count == 2 and secondIndex == 0:
secondIndex = i
elif count == 4 and fourthIndex == 0:
fourthIndex = i
string = string[:secondIndex] + '-'*(fourthIndex-secondIndex+1) + string[fourthIndex+1:]
Test:
+----+----+---+---+--+
+-----------------+--+

I split the string into an array of strings using the character to replace as the separator.
Then rejoin the array, in sections, using the required separators.
example_str="+----+----+---+---+--+"
swap_char="+"
repl_char='-'
ith_match=2
jth_match=4
list_of_strings = example_str.split(swap_char)
new_string = ( swap_char.join(list_of_strings[0:ith_match]) + repl_char +
repl_char.join(list_of_strings[ith_match:jth_match]) +
swap_char + swap_char.join(list_of_strings[jth_match:]) )
print (example_str)
print (new_string)
running it gives :
$ python ./python_example.py
+----+----+---+---+--+
+-------------+---+--+

with regex? Yes, that's possible.
^(\+-+){1}((?:\+[^+]+){3})
explanation:
^
(\+-+){1} # read + and some -'s until 2nd +
( # group 2 start
(?:\+[^+]+){3} # read +, followed by non-plus'es, in total 3 times
) # group 2 end
testing:
$ cat test.py
import re
pattern = r"^(\+-+){1}((?:\+[^+]+){3})"
tests = ["+----+----+---+---+--+"]
for test in tests:
m = re.search(pattern, test)
if m:
print (test[0:m.start(2)] +
"-" * (m.end(2) - m.start(2)) +
test[m.end(2):])
Adjusting is simple:
^(\+-+){1}((?:\+[^+]+){3})
^ ^
the '1' indicates that you're reading up to the 2nd '+'
the '3' indicates that you're reading up to the 4th '+'
these are the only 2 changes you need to make, the group number stays the same.
Run it:
$ python test.py
+-----------------+--+

This is pythonic.
import re
s = "+----+----+---+---+--+"
idx = [ i.start() for i in re.finditer('\+', s) ][1:-2]
''.join([ j if i not in idx else '-' for i,j in enumerate(s) ])
However, if your string is constant and want it simple
print (s)
print ('+' + re.sub('\+---', '----', s)[1:])
Output:
+----+----+---+---+--+
+-----------------+--+

Using only comprehension lists:
s1="+----+----+---+---+--+"
indexes = [i for i,x in enumerate(s1) if x=='+'][1:4]
s2 = ''.join([e if i not in indexes else '-' for i,e in enumerate(s1)])
print(s2)
+-----------------+--+
I saw you already found a solution but I do not like regex so much, so maybe this will help another! :-)

Related

How to format a string of nine digits in Python?

I have a range of string such as: "024764108", "002231531", "005231329", they have exactly 9 digits. And I want to add - to each group of 3 digits. The result I want is as below:
"024-764-108", "002-231-531", "005-231-329".
How can I explain my think to python?
Here is a dynamic solution:
In [41]: df
Out[41]:
num
0 024764108
1 002231531
2 005231329
3 012345678901234
In [42]: df.num.str.extractall(r'(\d{3})').groupby(level=0)[0].apply('-'.join)
Out[42]:
0 024-764-108
1 002-231-531
2 005-231-329
3 012-345-678-901-234
Name: 0, dtype: object
If using python 3.6 you could consider 'f strings', f strings allow you to do some processing within the string.
f'{string[:3]}-{string[3:6]}-{string[6:]}'
Another option would be to split your string into three parts then do a join on the array.
split_string = [string[i: i + 3] for i in range(0, len(string), 3)]
formated_number = '-'.join(split_string)
The first line of this creates an array with sub strings of length 3, then it joins the elements of that array with a '-' character in between.
There is probably a better way to do this but you can use [] to split the string into sections of 3.
old_str = "024764108"
new_str = old_str[:3] + '-' + old_str[3:6] + '-' + old_str[6:]
Easy solution:
number = "024764108"
new_number = number[:3] + '-' + number[3:6]+ '-' + number[6:]
Consider this code, using string slicing: The segment of code that converts this str to your format is string[0:3] + "-" + string[3:6] + "-" + string[6:9]
Here is your updated method and some test cases. Also, it only accepts outputs which contain exactly 9 digits.
def format_digitstring(string:str):
if len(string) != 9:
return None
return string[0:3] + "-" + string[3:6] + "-" + string[6:9]
s1 = "024764108"
s2 = "002231531"
s3 = "005231329"
s4 = "00112341"
print(format_digitstring(s1))
print(format_digitstring(s2))
print(format_digitstring(s3))
print(format_digitstring(s4))
Output:
024-764-108
002-231-531
005-231-329
None
This also do:
import re
s='024764108'
print(('{}-'*2+'{}').format(*re.findall('(...)',s)))
or if you want to do it on all row, you can use panda's apply function.
Look ahead positive, \d{3} means three digits which followed with digit (?=\d), '-' is added after three digits ('\1-').
import re
number="024764108"
re.sub(r'(\d{3})(?=\d)',r'\1-',number)

Python find similar sequences in string

I want a code to return sum of all similar sequences in two string. I wrote the following code but it only returns one of them
from difflib import SequenceMatcher
a='Apple Banana'
b='Banana Apple'
def similar(a,b):
c = SequenceMatcher(None,a.lower(),b.lower()).get_matching_blocks()
return sum( [c[i].size if c[i].size>1 else 0 for i in range(0,len(c)) ] )
print similar(a,b)
and the output will be
6
I expect it to be: 11
get_matching_blocks() returns the longest contiguous matching subsequence. Here the longest matching subsequence is 'banana' in both the strings, with length 6. Hence it is returning 6.
Try this instead:
def similar(a,b):
c = 'something' # Initialize this to anything to make the while loop condition pass for the first time
sum = 0
while(len(c) != 1):
c = SequenceMatcher(lambda x: x == ' ',a.lower(),b.lower()).get_matching_blocks()
sizes = [i.size for i in c]
i = sizes.index(max(sizes))
sum += max(sizes)
a = a[0:c[i].a] + a[c[i].a + c[i].size:]
b = b[0:c[i].b] + b[c[i].b + c[i].size:]
return sum
This "subtracts" the matching part of the strings, and matches them again, until len(c) is 1, which would happen when there are no more matches left.
However, this script doesn't ignore spaces. In order to do that, I used the suggestion from this other SO answer: just preprocess the strings before you pass them to the function like so:
a = 'Apple Banana'.replace(' ', '')
b = 'Banana Apple'.replace(' ', '')
You can include this part inside the function too.
When we edit your code to this it will tell us where 6 is coming from:
from difflib import SequenceMatcher
a='Apple Banana'
b='Banana Apple'
def similar(a,b):
c = SequenceMatcher(None,a.lower(),b.lower()).get_matching_blocks()
for block in c:
print "a[%d] and b[%d] match for %d elements" % block
print similar(a,b)
a[6] and b[0] match for 6 elements
a[12] and b[12] match for 0 elements
I made a small change to your code and it is working like a charm, thanks #Antimony
def similar(a,b):
a=a.replace(' ', '')
b=b.replace(' ', '')
c = 'something' # Initialize this to anything to make the while loop condition pass for the first time
sum = 0
i = 2
while(len(c) != 1):
c = SequenceMatcher(lambda x: x == ' ',a.lower(),b.lower()).get_matching_blocks()
sizes = [i.size for i in c]
i = sizes.index(max(sizes))
sum += max(sizes)
a = a[0:c[i].a] + a[c[i].a + c[i].size:]
b = b[0:c[i].b] + b[c[i].b + c[i].size:]
return sum

Python: Count character in string which are following each other

I have a string in which I want to count the occurrences of # following each other to replace them by numbers to create a increment.
For example:
rawString = 'MyString1_test##_edit####'
for x in xrange(5):
output = doConvertMyString(rawString)
print output
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005
Assuming that the number of # is not fixed and that rawString is a user input containing only string.ascii_letters + string.digits + '_' + '#, how can I do that?
Here is my test so far:
rawString = 'MyString1_test##_edit####'
incrDatas = {}
key = '#'
counter = 1
for x in xrange(len(rawString)):
if rawString[x] != key:
counter = 1
continue
else:
if x > 0:
if rawString[x - 1] == key:
counter += 1
else:
pass
# ???
You may use zfill in the re.sub replacement to pad any amount of # chunks. #+ regex pattern matches 1 or more # symbols. The m.group() stands for the match the regex found, and thus, we replace all #s with the incremented x converted to string padded with the same amount of 0s as there are # in the match.
import re
rawString = 'MyString1_test##_edit####'
for x in xrange(5):
output = re.sub(r"#+", lambda m: str(x+1).zfill(len(m.group())), rawString)
print output
Result of the demo:
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005
The code below converts the rawString to a format string, using groupby in a list comprehension to find groups of hashes. Each run of hashes is converted into a format directive to print a zero-padded integer of the appropriate width, runs of non-hashes are simply joined back together.
This code works on Python 2.6 and later.
from itertools import groupby
def convert(template):
return ''.join(['{{x:0{0}d}}'.format(len(list(g))) if k else ''.join(g)
for k, g in groupby(template, lambda c: c == '#')])
rawString = 'MyString1_test##_edit####'
fmt = convert(rawString)
print(repr(fmt))
for x in range(5):
print(fmt.format(x=x))
output
'MyString1_test{x:02d}_edit{x:04d}'
MyString1_test00_edit0000
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
How about this-
rawString = 'MyString1_test##_edit####'
splitString = rawString.split('_')
for i in xrange(10): # you may put any count
print '%s_%s%02d_%s%04d' % (splitString[0], splitString[1][0:4], i, splitString[2][0:4], i, )
You can try this naive (and probably not most efficient) solution. It assumes that the number of '#' is fixed.
rawString = 'MyString1_test##_edit####'
for i in range(1, 6):
temp = rawString.replace('####', str(i).zfill(4)).replace('##', str(i).zfill(2))
print(temp)
>> MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005
test_string = 'MyString1_test##_edit####'
def count_hash(raw_string):
str_list = list(raw_string)
hash_count = str_list.count("#") + 1
for num in xrange(1, hash_count):
new_string = raw_string.replace("####", "000" + str(num))
new_string = new_string.replace("##", "0" + str(num))
print new_string
count_hash(test_string)
It's a bit clunky, and only works for # counts of less than 10, but seems to do what you want.
EDIT: By "only works" I mean that you'll get extra characters with the fixed number of # symbols inserted
EDIT2: amended code

Python: replace every letter except the nth letters in a string with a period(or another character)

So I have looked at the replace every nth letter and could not figure out the reverse. I started with this and quickly realized it would not work:
s = input("Enter a word or phrase: ")
l = len(s)
n = int(input("choose a number between 1 and %d: " %l))
print (s[0] + "." * (n-1)+ s[n]+ "." * (n-1) + s[n*2])
any help would be appreciated.
Let s be the original string and n the position not to be replaced.
''.join (c if i == n else '.' for i, c in enumerate (s) )
If the user enters 3, I'm assuming you want to replace the third, sixth, ninth...letter, right? Remember that indices are counted from 0:
>>> s = "abcdefghijklmnopqrstuvwxyz"
>>> remove = 3
>>> "".join(c if (i+1)%remove else "." for i,c in enumerate(s))
'ab.de.gh.jk.mn.pq.st.vw.yz'
Or, if you meant the opposite:
>>> "".join("." if (i+1)%remove else c for i,c in enumerate(s))
'..c..f..i..l..o..r..u..x..'
You can use reduce following way:
>>> s = "abcdefghijklmnopqrstuvwxyz"
>>> n=3
>>> print reduce(lambda i,x: i+x[1] if (x[0]+1)%n else i+".", enumerate(s), "")
ab.de.gh.jk.mn.pq.st.vw.yz
>>> print reduce(lambda i,x: i+"." if (x[0]+1)%n else i+x[1], enumerate(s), "")
..c..f..i..l..o..r..u..x..
Build from what you already know. You know how to find every nth character, and your result string will have all of those in it and no other character from the original string, so we can use that. We want to insert things between those, which is exactly what the str.join method does. You've already worked out that what to insert is '.' * n-1. So, you can do this:
>>> s = "abcdefghi"
>>> n = 3
>>> ('.' * (n-1)).join(s[::n])
'a..d..g'
The only trick is that you need to account for any characters after the last one that you want to leave in place. The number of those is the remainder when the highest valid index of s is divided by n - or, (len(s) - 1) % n. Which gives this slightly ugly result:
>>> ('.' * (n-1)).join(s[::n]) + '.' * ((len(s) - 1) % n)
'a..d..g..'
You probably want to use variables for the two sets of dots to help readability:
>>> dots = '.' * (n - 1)
>>> end_dots = '.' * ((len(s) - 1) % n)
>>> dots.join(s[::n]) + end_dots
'a..d..g..'
My tricky solution (I'll let you add comments):
s = 'abcdefghijklmnopqrstuvwxyz'
n = 3
single_keep_pattern = [False] * (n - 1) + [True]
keep_pattern = single_keep_pattern * ( len(s) / n + 1)
result_list = [(letter if keep else '.') for letter, keep in zip(s, keep_pattern)]
result = ''.join(result_list)
print result
gives:
..c..f..i..l..o..r..u..x..

Pythonic way to eval all octal values in a string as integers

So I've got a string that looks like "012 + 2 - 01 + 24" for example. I want to be able to quickly (less code) evaluate that expression...
I could use eval() on the string, but I don't want 012 to be represented in octal form (10), I want it to be represented as an int (12).
My solution for this works, but it is not elegant. I am sort of assuming that there is a really good pythonic way to do this.
My solution:
#expression is some string that looks like "012 + 2 - 01 + 24"
atomlist = []
for atom in expression.split():
if "+" not in atom and "-" not in atom:
atomlist.append(int(atom))
else:
atomlist.append(atom)
#print atomlist
evalstring = ""
for atom in atomlist:
evalstring+=str(atom)
#print evalstring
num = eval(evalstring)
Basically, I tear appart the string, and find numbers in it and turn them into ints, and then I rebuild the string with the ints (essentially removing leading 0's except where 0 is a number on its own).
How can this be done better?
I'd be tempted to use regular expressions to remove the leading zeroes:
>>> re.sub(r'\b0+(?!\b)', '', '012 + 2 + 0 - 01 + 204 - 0')
'12 + 2 + 0 - 1 + 204 - 0'
This removes zeroes at the start of every number, except when the number consists entirely of zeroes:
the first \b matches a word (token) boundary;
the 0+ matches one or more consecutive zeroes;
the (?!\b) (negative lookahead) inhibits matches where the sequence of zeroes is followed by a token boundary.
One advantage of this approach over split()-based alternatives is that it doesn't require spaces in order to work:
>>> re.sub(r'\b0+(?!\b)', '', '012+2+0-01+204-0')
'12+2+0-1+204-0'
You can do this in one line using lstrip() to strip off any leading zeros:
>>> eval("".join(token.lstrip('0') for token in s.split()))
37
I'd like to do it this way:
>>> s = '012 + 2 + 0 - 01 + 204 - 0'
>>> ' '.join(str(int(x)) if x.isdigit() else x for x in s.split())
'12 + 2 + 0 - 1 + 204 - 0'
Use float() if you want to handle them too :)
int does not assume that a leading zero indicates an octal number:
In [26]: int('012')
Out[26]: 12
Accordingly, you can safely evalute the expression with the following code
from operator import add, sub
from collections import deque
def mapper(item, opmap = {'+': add, '-': sub}):
try: return int(item)
except ValueError: pass
return opmap[item]
stack = deque()
# if item filters out empty strings between whitespace sequences
for item in (mapper(item) for item in "012 + 2 - 01 + 24".split(' ') if item):
if stack and callable(stack[-1]):
f = stack.pop()
stack.append(f(stack.pop(), item))
else: stack.append(item)
print stack.pop()
Not a one-liner, but it is safe, because you control all of the functions which can be executed.

Categories

Resources