Python: Count character in string which are following each other - python

I have a string in which I want to count the occurrences of # following each other to replace them by numbers to create a increment.
For example:
rawString = 'MyString1_test##_edit####'
for x in xrange(5):
output = doConvertMyString(rawString)
print output
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005
Assuming that the number of # is not fixed and that rawString is a user input containing only string.ascii_letters + string.digits + '_' + '#, how can I do that?
Here is my test so far:
rawString = 'MyString1_test##_edit####'
incrDatas = {}
key = '#'
counter = 1
for x in xrange(len(rawString)):
if rawString[x] != key:
counter = 1
continue
else:
if x > 0:
if rawString[x - 1] == key:
counter += 1
else:
pass
# ???

You may use zfill in the re.sub replacement to pad any amount of # chunks. #+ regex pattern matches 1 or more # symbols. The m.group() stands for the match the regex found, and thus, we replace all #s with the incremented x converted to string padded with the same amount of 0s as there are # in the match.
import re
rawString = 'MyString1_test##_edit####'
for x in xrange(5):
output = re.sub(r"#+", lambda m: str(x+1).zfill(len(m.group())), rawString)
print output
Result of the demo:
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005

The code below converts the rawString to a format string, using groupby in a list comprehension to find groups of hashes. Each run of hashes is converted into a format directive to print a zero-padded integer of the appropriate width, runs of non-hashes are simply joined back together.
This code works on Python 2.6 and later.
from itertools import groupby
def convert(template):
return ''.join(['{{x:0{0}d}}'.format(len(list(g))) if k else ''.join(g)
for k, g in groupby(template, lambda c: c == '#')])
rawString = 'MyString1_test##_edit####'
fmt = convert(rawString)
print(repr(fmt))
for x in range(5):
print(fmt.format(x=x))
output
'MyString1_test{x:02d}_edit{x:04d}'
MyString1_test00_edit0000
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004

How about this-
rawString = 'MyString1_test##_edit####'
splitString = rawString.split('_')
for i in xrange(10): # you may put any count
print '%s_%s%02d_%s%04d' % (splitString[0], splitString[1][0:4], i, splitString[2][0:4], i, )

You can try this naive (and probably not most efficient) solution. It assumes that the number of '#' is fixed.
rawString = 'MyString1_test##_edit####'
for i in range(1, 6):
temp = rawString.replace('####', str(i).zfill(4)).replace('##', str(i).zfill(2))
print(temp)
>> MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005

test_string = 'MyString1_test##_edit####'
def count_hash(raw_string):
str_list = list(raw_string)
hash_count = str_list.count("#") + 1
for num in xrange(1, hash_count):
new_string = raw_string.replace("####", "000" + str(num))
new_string = new_string.replace("##", "0" + str(num))
print new_string
count_hash(test_string)
It's a bit clunky, and only works for # counts of less than 10, but seems to do what you want.
EDIT: By "only works" I mean that you'll get extra characters with the fixed number of # symbols inserted
EDIT2: amended code

Related

Python Inserting a string

I need to insert a string (character by character) into another string at every 3rd position
For example:- string_1:-wwwaabkccgkll
String_2:- toadhp
Now I need to insert string2 char by char into string1 at every third position
So the output must be wwtaaobkaccdgkhllp
Need in Python.. even Java is ok
So i tried this
Test_str="hiimdumbiknow"
challenge="toadh"
new_st=challenge [k]
Last=list(test_str)
K=0
For i in range(Len(test_str)):
if(i%3==0):
last.insert(i,new_st)
K+=1
and the output i get
thitimtdutmbtiknow
You can split test_str into sub-strings to length 2, and then iterate merging them with challenge:
def concat3(test_str, challenge):
chunks = [test_str[i:i+2] for i in range(0,len(test_str),2)]
result = []
i = j = 0
while i<len(chunks) or j<len(challenge):
if i<len(chunks):
result.append(chunks[i])
i += 1
if j<len(challenge):
result.append(challenge[j])
j += 1
return ''.join(result)
test_str = "hiimdumbiknow"
challenge = "toadh"
print(concat3(test_str, challenge))
# hitimoduambdikhnow
This method works even if the lengths of test_str and challenge are mismatching. (The remaining characters in the longest string will be appended at the end.)
You can split Test_str in to groups of two letters and then re-join with each letter from challenge in between as follows;
import itertools
print(''.join(f'{two}{letter}' for two, letter in itertools.zip_longest([Test_str[i:i+2] for i in range(0,len(Test_str),2)], challenge, fillvalue='')))
Output:
hitimoduambdikhnow
*edited to split in to groups of two rather than three as originally posted
you can try this, make an iter above the second string and iterate over the first one and select which character should be part of the final string according the position
def add3(s1, s2):
def n():
try:
k = iter(s2)
for i,j in enumerate(s1):
yield (j if (i==0 or (i+1)%3) else next(k))
except:
try:
yield s1[i+1:]
except:
pass
return ''.join(n())
def insertstring(test_str,challenge):
result = ''
x = [x for x in test_str]
y = [y for y in challenge]
j = 0
for i in range(len(x)):
if i % 2 != 0 or i == 0:
result += x[i]
else:
if j < 5:
result += y[j]
result += x[i]
j += 1
get_last_element = x[-1]
return result + get_last_element
print(insertstring(test_str,challenge))
#output: hitimoduambdikhnow

How to change uppercase & lowercase alternatively in a string?

I want to create a new string from a given string with alternate uppercase and lowercase.
I have tried iterating over the string and changing first to uppercase into a new string and then to lower case into another new string again.
def myfunc(x):
even = x.upper()
lst = list(even)
for itemno in lst:
if (itemno % 2) !=0:
even1=lst[1::2].lowercase()
itemno=itemno+1
even2=str(even1)
print(even2)
Since I cant change the given string I need a good way of creating a new string alternate caps.
Here's a onliner
"".join([x.upper() if i%2 else x.lower() for i,x in enumerate(mystring)])
You can simply randomly choose for each letter in the old string if you should lowercase or uppercase it, like this:
import random
def myfunc2(old):
new = ''
for c in old:
lower = random.randint(0, 1)
if lower:
new += c.lower()
else:
new += c.upper()
return new
Here's one that returns a new string using with alternate caps:
def myfunc(x):
seq = []
for i, v in enumerate(x):
seq.append(v.upper() if i % 2 == 0 else v.lower())
return ''.join(seq)
This does the job also
def foo(input_message):
c = 0
output_message = ""
for m in input_message:
if (c%2==0):
output_message = output_message + m.lower()
else:
output_message = output_message + m.upper()
c = c + 1
return output_message
Here's a solution using itertools which utilizes string slicing:
from itertools import chain, zip_longest
x = 'inputstring'
zipper = zip_longest(x[::2].lower(), x[1::2].upper(), fillvalue='')
res = ''.join(chain.from_iterable(zipper))
# 'iNpUtStRiNg'
Using a string slicing:
from itertools import zip_longest
s = 'example'
new_s = ''.join(x.upper() + y.lower()
for x, y in zip_longest(s[::2], s[1::2], fillvalue=''))
# ExAmPlE
Using an iterator:
s_iter = iter(s)
new_s = ''.join(x.upper() + y.lower()
for x, y in zip_longest(s_iter, s_iter, fillvalue=''))
# ExAmPlE
Using the function reduce():
def func(x, y):
if x[-1].islower():
return x + y.upper()
else:
return x + y.lower()
new_s = reduce(func, s) # eXaMpLe
This code also returns alternative caps string:-
def alternative_strings(strings):
for i,x in enumerate(strings):
if i % 2 == 0:
print(x.upper(), end="")
else:
print(x.lower(), end= "")
return ''
print(alternative_strings("Testing String"))
def myfunc(string):
# Un-hash print statements to watch python build out the string.
# Script is an elementary example of using an enumerate function.
# An enumerate function tracks an index integer and its associated value as it moves along the string.
# In this example we use arithmetic to determine odd and even index counts, then modify the associated variable.
# After modifying the upper/lower case of the character, it starts adding the string back together.
# The end of the function then returns back with the new modified string.
#print(string)
retval = ''
for space, letter in enumerate(string):
if space %2==0:
retval = retval + letter.upper()
#print(retval)
else:
retval = retval + letter.lower()
#print(retval)
print(retval)
return retval
myfunc('Thisisanamazingscript')

Add a start index to a string index generator

I'm currently learning to create generators and to use itertools. So I decided to make a string index generator, but I'd like to add some parameters such as a "start index" allowing to define where to start generating the indexes.
I came up with this ugly solution which can be very long and not efficient with large indexes:
import itertools
import string
class StringIndex(object):
'''
Generator that create string indexes in form:
A, B, C ... Z, AA, AB, AC ... ZZ, AAA, AAB, etc.
Arguments:
- startIndex = string; default = ''; start increment for the generator.
- mode = 'lower' or 'upper'; default = 'upper'; is the output index in
lower or upper case.
'''
def __init__(self, startIndex = '', mode = 'upper'):
if mode == 'lower':
self.letters = string.ascii_lowercase
elif mode == 'upper':
self.letters = string.ascii_uppercase
else:
cmds.error ('Wrong output mode, expected "lower" or "upper", ' +
'got {}'.format(mode))
if startIndex != '':
if not all(i in self.letters for i in startIndex):
cmds.error ('Illegal characters in start index; allowed ' +
'characters are: {}'.format(self.letters))
self.startIndex = startIndex
def getIndex(self):
'''
Returns:
- string; current string index
'''
startIndexOk = False
x = 1
while True:
strIdMaker = itertools.product(self.letters, repeat = x)
for stringList in strIdMaker:
index = ''.join([s for s in stringList])
# Here is the part to simpify
if self.startIndex:
if index == self.startIndex:
startIndexOk = True
if not startIndexOk:
continue
###
yield index
x += 1
Any advice or improvement is welcome. Thank you!
EDIT:
The start index must be a string!
You would have to do the arithmetic (in base 26) yourself to avoid looping over itertools.product. But you can at least set x=len(self.startIndex) or 1!
Old (incorrect) answer
If you would do it without itertools (assuming you start with a single letter), you could do the following:
letters = 'abcdefghijklmnopqrstuvwxyz'
def getIndex(start, case):
lets = list(letters.lower()) if case == 'lower' else list(letters.upper())
# default is 'upper', but can also be an elif
for r in xrange(0,10):
for l in lets[start:]:
if l.lower() == 'z':
start = 0
yield ''.join(lets[:r])+l
I run until max 10 rows of letters are created, but you could ofcourse use an infinite while loop such that it can be called forever.
Correct answer
I found the solution in a different way: I used a base 26 number translator (based on (and fixxed since it didn't work perfectly): http://quora.com/How-do-I-write-a-program-in-Python-that-can-convert-an-integer-from-one-base-to-another)
I uses itertools.count() to count and just loops over all the possibilities.
The code:
import time
from itertools import count
def toAlph(x, letters):
div = 26
r = '' if x > 0 else letters[0]
while x > 0:
r = letters[x % div] + r
if (x // div == 1) and (x % div == 0):
r = letters[0] + r
break
else:
x //= div
return r
def getIndex(start, case='upper'):
alphabet = 'abcdefghijklmnopqrstuvwxyz'
letters = alphabet.upper() if case == 'upper' else alphabet
started = False
for num in count(0,1):
l = toAlph(num, letters)
if l == start:
started = True
if started:
yield l
iterator = getIndex('AA')
for i in iterator:
print(i)
time.sleep(0.1)

Python: how to replace characters from i-th to j-th matches?

For example, if I have:
"+----+----+---+---+--+"
is it possible to replace from second to fourth + to -?
If I have
"+----+----+---+---+--+"
and I want to have
"+-----------------+--+"
I have to replace from 2-nd to 4-th + to -. Is it possible to achieve this by regex? and how?
If you can assume the first character is always a +:
string = '+' + re.sub(r'\+', r'-', string[1:], count=3)
Lop off the first character of your string and sub() the first three + characters, then add the initial + back on.
If you can't assume the first + is the first character of the string, find it first:
prefix = string.index('+') + 1
string = string[:prefix] + re.sub(r'\+', r'-', string[prefix:], count=3)
I would rather iterate over the string, and then replace the pluses according to what I found.
secondIndex = 0
fourthIndex = 0
count = 0
for i, c in enumerate(string):
if c == '+':
count += 1
if count == 2 and secondIndex == 0:
secondIndex = i
elif count == 4 and fourthIndex == 0:
fourthIndex = i
string = string[:secondIndex] + '-'*(fourthIndex-secondIndex+1) + string[fourthIndex+1:]
Test:
+----+----+---+---+--+
+-----------------+--+
I split the string into an array of strings using the character to replace as the separator.
Then rejoin the array, in sections, using the required separators.
example_str="+----+----+---+---+--+"
swap_char="+"
repl_char='-'
ith_match=2
jth_match=4
list_of_strings = example_str.split(swap_char)
new_string = ( swap_char.join(list_of_strings[0:ith_match]) + repl_char +
repl_char.join(list_of_strings[ith_match:jth_match]) +
swap_char + swap_char.join(list_of_strings[jth_match:]) )
print (example_str)
print (new_string)
running it gives :
$ python ./python_example.py
+----+----+---+---+--+
+-------------+---+--+
with regex? Yes, that's possible.
^(\+-+){1}((?:\+[^+]+){3})
explanation:
^
(\+-+){1} # read + and some -'s until 2nd +
( # group 2 start
(?:\+[^+]+){3} # read +, followed by non-plus'es, in total 3 times
) # group 2 end
testing:
$ cat test.py
import re
pattern = r"^(\+-+){1}((?:\+[^+]+){3})"
tests = ["+----+----+---+---+--+"]
for test in tests:
m = re.search(pattern, test)
if m:
print (test[0:m.start(2)] +
"-" * (m.end(2) - m.start(2)) +
test[m.end(2):])
Adjusting is simple:
^(\+-+){1}((?:\+[^+]+){3})
^ ^
the '1' indicates that you're reading up to the 2nd '+'
the '3' indicates that you're reading up to the 4th '+'
these are the only 2 changes you need to make, the group number stays the same.
Run it:
$ python test.py
+-----------------+--+
This is pythonic.
import re
s = "+----+----+---+---+--+"
idx = [ i.start() for i in re.finditer('\+', s) ][1:-2]
''.join([ j if i not in idx else '-' for i,j in enumerate(s) ])
However, if your string is constant and want it simple
print (s)
print ('+' + re.sub('\+---', '----', s)[1:])
Output:
+----+----+---+---+--+
+-----------------+--+
Using only comprehension lists:
s1="+----+----+---+---+--+"
indexes = [i for i,x in enumerate(s1) if x=='+'][1:4]
s2 = ''.join([e if i not in indexes else '-' for i,e in enumerate(s1)])
print(s2)
+-----------------+--+
I saw you already found a solution but I do not like regex so much, so maybe this will help another! :-)

Python find and replace upon condition / with a function

String = n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l
I want the script to look at a pair at a time meaning:
evaluate n76a+q80a. if abs(76-80) < 10, then replace '+' with a '_':
else don't change anything.
Then evaluate q80a+l83a next and do the same thing.
The desired output should be:
n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l
What i tried is,
def aa_dist(x):
if abs(int(x[1:3]) - int(x[6:8])) < 10:
print re.sub(r'\+', '_', x)
with open(input_file, 'r') as alex:
oligos_list = alex.read()
aa_dist(oligos_list)
This is what I have up to this point. I know that my code will just replace all '+' into '_' because it only evaluates the first pair and and replace all. How should I do this?
import itertools,re
my_string = "n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l"
#first extract the numbers
my_numbers = map(int,re.findall("[0-9]+",my_string))
#split the string on + (useless comment)
parts = my_string.split("+")
def get_filler((a,b)):
'''this method decides on the joiner'''
return "_" if abs(a-b) < 10 else '+'
fillers = map(get_filler,zip(my_numbers,my_numbers[1:])) #figure out what fillers we need
print "".join(itertools.chain.from_iterable(zip(parts,fillers)))+parts[-1] #it will always skip the last part so gotta add it
is one way you might accomplish this... and is also an example of worthless comments
Through re module only.
>>> s = 'n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l'
>>> m = re.findall(r'(?=\b([^+]+\+[^+]+))', s) # This regex would helps to do a overlapping match. See the demo (https://regex101.com/r/jO6zT2/13)
>>> m
['n76a+q80a', 'q80a+l83a', 'l83a+i153a', 'i153a+l203f', 'l203f+r207a', 'r207a+s211a', 's211a+s215w', 's215w+f216a', 'f216a+e283l']
>>> l = []
>>> for i in m:
if abs(int(re.search(r'^\D*(\d+)', i).group(1)) - int(re.search(r'^\D*\d+\D*(\d+)', i).group(1))) < 10:
l.append(i.replace('+', '_'))
else:
l.append(i)
>>> re.sub(r'([a-z0-9]+)\1', r'\1',''.join(l))
'n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l'
By defining a separate function.
import re
def aa_dist(x):
l = []
m = re.findall(r'(?=\b([^+]+\+[^+]+))', x)
for i in m:
if abs(int(re.search(r'^\D*(\d+)', i).group(1)) - int(re.search(r'^\D*\d+\D*(\d+)', i).group(1))) < 10:
l.append(i.replace('+', '_'))
else:
l.append(i)
return re.sub(r'([a-z0-9]+)\1', r'\1',''.join(l))
string = 'n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l'
print aa_dist(string)
Output:
n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l

Categories

Resources