I am trying to remove consecutively same characters from a string. For example:
abb --> ab
aaab --> ab
ababa --> ababa (since no two consecutive characters are same)
My code:
T=int(input())
l=[0]
S1=""
for i in range(T):
S=input()
for j in range(len(S)-1):
if S[j]!=S[j+1]:
if S[j] != l[len(l)-1]:
l=[]
l.append(S[j])
l.append(S[j+1])
print(l)
for k in l:
S1+=k
print(S1)
S1=""
l=[0]
The code doesn't work for the third case (ababa). How do I fix this?
One concise approach would use itertools.groupby:
from itertools import groupby
def clean(s):
return ''.join(k for k, _ in groupby(s))
>>> clean("abb")
'ab'
>>> clean("aaab")
'ab'
>>> clean("ababa")
'ababa'
A rather simplified quadratic loop-based approach (linear in comments):
def clean(s):
res = "" # res = []
for c in s:
if not res or res[-1] != c:
res += c # res.append(c)
return res # return ''.join(res)
A verbose way of doing it, may not be most efficient if the strings are large:
value = 'aaaaaabbbbaaaaaacdeeeeefff'
def no_dups(value):
r = ''
for i in value:
if not r or r[-1] != i:
r += i
return r
print(no_dups(value))
# abacdef
Using regex, we could do re.sub(r'([a-z])\1+', r'\1', string_data)
import re
test_data = 'abb aaab ababa'.split()
for data in test_data:
print(f"{data} -->", re.sub(r'([a-z])\1+', r'\1', data))
Came out with this code, works properly:
T=int(input()) #No of testcases; for testing multiple strings
S1=""
for i in range(T):
S=input()
for j in range(0,len(S),2):
if j!=len(S)-1:
if S[j]!=S[j+1]:
S1+=S[j]
S1+=S[j+1]
else:
if S1[len(S1)-1]!=S[j]:
S1+=S[j]
print(S1)
S1=""
You can use regex as:
for char in set(string):
string = re.sub(f'{char}+', char, string)
string
results in
abb --> ab
aaab --> ab
ababa --> ababa
Related
I need to insert a string (character by character) into another string at every 3rd position
For example:- string_1:-wwwaabkccgkll
String_2:- toadhp
Now I need to insert string2 char by char into string1 at every third position
So the output must be wwtaaobkaccdgkhllp
Need in Python.. even Java is ok
So i tried this
Test_str="hiimdumbiknow"
challenge="toadh"
new_st=challenge [k]
Last=list(test_str)
K=0
For i in range(Len(test_str)):
if(i%3==0):
last.insert(i,new_st)
K+=1
and the output i get
thitimtdutmbtiknow
You can split test_str into sub-strings to length 2, and then iterate merging them with challenge:
def concat3(test_str, challenge):
chunks = [test_str[i:i+2] for i in range(0,len(test_str),2)]
result = []
i = j = 0
while i<len(chunks) or j<len(challenge):
if i<len(chunks):
result.append(chunks[i])
i += 1
if j<len(challenge):
result.append(challenge[j])
j += 1
return ''.join(result)
test_str = "hiimdumbiknow"
challenge = "toadh"
print(concat3(test_str, challenge))
# hitimoduambdikhnow
This method works even if the lengths of test_str and challenge are mismatching. (The remaining characters in the longest string will be appended at the end.)
You can split Test_str in to groups of two letters and then re-join with each letter from challenge in between as follows;
import itertools
print(''.join(f'{two}{letter}' for two, letter in itertools.zip_longest([Test_str[i:i+2] for i in range(0,len(Test_str),2)], challenge, fillvalue='')))
Output:
hitimoduambdikhnow
*edited to split in to groups of two rather than three as originally posted
you can try this, make an iter above the second string and iterate over the first one and select which character should be part of the final string according the position
def add3(s1, s2):
def n():
try:
k = iter(s2)
for i,j in enumerate(s1):
yield (j if (i==0 or (i+1)%3) else next(k))
except:
try:
yield s1[i+1:]
except:
pass
return ''.join(n())
def insertstring(test_str,challenge):
result = ''
x = [x for x in test_str]
y = [y for y in challenge]
j = 0
for i in range(len(x)):
if i % 2 != 0 or i == 0:
result += x[i]
else:
if j < 5:
result += y[j]
result += x[i]
j += 1
get_last_element = x[-1]
return result + get_last_element
print(insertstring(test_str,challenge))
#output: hitimoduambdikhnow
This code (adapted from a Prefix-Suffix code) is quite slow for larger corpora:
s1 = 'gafdggeg'
s2 = 'adagafrd'
Output: gaf
def pref_also_substr(s):
n = len(s)
for res in range(n, 0, -1):
prefix = s[0: res]
if (prefix in s1):
return res
# if no prefix and string2 match occurs
return 0
Any option for an efficient alternative?
I have another approach to solve this question. First you can find all substrings of s2 and replace the key in dictionary d with highest size.
s2 = "'adagafrd'"
# Get all substrings of string
# Using list comprehension + string slicing
substrings = [test_str[i: j] for i in range(len(test_str))
for j in range(i + 1, len(test_str) + 1)]
Now you can use startswith() function to check longest prefix from this list of substring and compare the size of substring.
s1 = 'gafdggeg'
d={}
for substring in substrings:
if s1.startswith(substring):
if not d:
d[substring]=len(substring)
else:
if len(substring)>list(d.values())[0]:
d={}
d[substring]=len(substring)
print(d)
Output:
{'gaf': 3}
def f(s1, s2):
for i in range(len(s1)):
i += 1
p = s1[:i]
if p in s2:
s2 = s2[s2.index(p):]
else:
return i - 1
Check the prefixes starting from length 1.
If find a prefix, discard the chars behind the prefix founded and continue searching.
I want to create alphabetically ascending names like the column names in excel. That is I want to have smth. like a,b,c,...,z,aa,ab,...az,...zz,aaa,aab,....
I have tried:
for i in range(1000):
mod = int(i%26)
div = int(i/26)
print(string.ascii_lowercase[div]+string.ascii_lowercase[mod])
Which works until zz but than fails because it runs out of index
aa
ab
ac
ad
ae
af
ag
ah
ai
aj
ak
al
.
.
.
zz
IndexError
You could make use of itertools.product():
from itertools import product
from string import ascii_lowercase
for i in range(1, 4):
for x in product(ascii_lowercase, repeat=i):
print(''.join(x))
First, you want all letters, then all pairs, then all triplets, etc. This is why we first need to iterate through all the string lengths you want (for i in range(...)).
Then, we need all possible associations with the i letters, so we can use product(ascii_lowercase) which is equivalent to a nested for loop repeated i times.
This will generate the tuples of size i required, finally just join() them to obtain a string.
To continuously generate names without limit, replace the for loop with while:
def generate():
i = 0
while True:
i += 1
for x in product(ascii_lowercase, repeat=i):
yield ''.join(x)
generator = generate()
next(generator) # 'a'
next(generator) # 'b'
...
For a general solution we can use a generator and islice from itertools:
import string
from itertools import islice
def generate():
base = ['']
while True:
next_base = []
for b in base:
for i in range(26):
next_base.append(b + string.ascii_lowercase[i])
yield next_base[-1]
base = next_base
print('\n'.join(islice(generate(), 1000)))
And the output:
a
b
c
...
z
aa
ab
...
zz
aaa
aab
...
And you can use islice to take as many strings as you need.
Try:
>>import string
>>string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
>>len(string.ascii_lowercase)
26
When your index in below line exceed 26 it raise exception
div = int(i/26)
, becouse of ascii_lowercase length:
But you can:
for i in range(26*26): # <--- 26 is string.ascii_lowercase
mod = int(i%26)
div = int(i/26)
print(string.ascii_lowercase[div]+string.ascii_lowercase[mod])
EDIT:
or you can use:
import string
n = 4 # number of chars
small_limit = len(string.ascii_lowercase)
limit = small_limit ** n
i = 0
while i < limit:
s = ''
for c in range(n):
index = int(i/(small_limit**c))%small_limit
s += string.ascii_lowercase[index]
print(s)
i += 1
You can use:
from string import ascii_lowercase
l = list(ascii_lowercase) + [letter1+letter2 for letter1 in ascii_lowercase for letter2 in ascii_lowercase]+ [letter1+letter2+letter3 for letter1 in ascii_lowercase for letter2 in ascii_lowercase for letter3 in ascii_lowercase]
There's an answer to this question provided on Code Review SE
A slight modification to the answer in the link gives the following which works for an arbitrary number of iterations.
def increment_char(c):
return chr(ord(c) + 1) if c != 'z' else 'a'
def increment_str(s):
lpart = s.rstrip('z')
num_replacements = len(s) - len(lpart)
new_s = lpart[:-1] + increment_char(lpart[-1]) if lpart else 'a'
new_s += 'a' * num_replacements
return new_s
s = ''
for _ in range(1000):
s = increment_str(s)
print(s)
I have the follwing string and I split it:
>>> st = '%2g%k%3p'
>>> l = filter(None, st.split('%'))
>>> print l
['2g', 'k', '3p']
Now I want to print the g letter two times, the k letter one time and the p letter three times:
ggkppp
How is it possible?
You could use generator with isdigit() to check wheter your first symbol is digit or not and then return following string with appropriate count. Then you could use join to get your output:
''.join(i[1:]*int(i[0]) if i[0].isdigit() else i for i in l)
Demonstration:
In [70]: [i[1:]*int(i[0]) if i[0].isdigit() else i for i in l ]
Out[70]: ['gg', 'k', 'ppp']
In [71]: ''.join(i[1:]*int(i[0]) if i[0].isdigit() else i for i in l)
Out[71]: 'ggkppp'
EDIT
Using re module when first number is with several digits:
''.join(re.search('(\d+)(\w+)', i).group(2)*int(re.search('(\d+)(\w+)', i).group(1)) if re.search('(\d+)(\w+)', i) else i for i in l)
Example:
In [144]: l = ['12g', '2kd', 'h', '3p']
In [145]: ''.join(re.search('(\d+)(\w+)', i).group(2)*int(re.search('(\d+)(\w+)', i).group(1)) if re.search('(\d+)(\w+)', i) else i for i in l)
Out[145]: 'ggggggggggggkdkdhppp'
EDIT2
For your input like:
st = '%2g_%3k%3p'
You could replace _ with empty string and then add _ to the end if the work from list endswith the _ symbol:
st = '%2g_%3k%3p'
l = list(filter(None, st.split('%')))
''.join((re.search('(\d+)(\w+)', i).group(2)*int(re.search('(\d+)(\w+)', i).group(1))).replace("_", "") + '_' * i.endswith('_') if re.search('(\d+)(\w+)', i) else i for i in l)
Output:
'gg_kkkppp'
EDIT3
Solution without re module but with usual loops working for 2 digits. You could define functions:
def add_str(ind, st):
if not st.endswith('_'):
return st[ind:] * int(st[:ind])
else:
return st[ind:-1] * int(st[:ind]) + '_'
def collect(l):
final_str = ''
for i in l:
if i[0].isdigit():
if i[1].isdigit():
final_str += add_str(2, i)
else:
final_str += add_str(1, i)
else:
final_str += i
return final_str
And then use them as:
l = ['12g_', '3k', '3p']
print(collect(l))
gggggggggggg_kkkppp
One-liner Regex way:
>>> import re
>>> st = '%2g%k%3p'
>>> re.sub(r'%|(\d*)(\w+)', lambda m: int(m.group(1))*m.group(2) if m.group(1) else m.group(2), st)
'ggkppp'
%|(\d*)(\w+) regex matches all % and captures zero or moredigit present before any word character into one group and the following word characters into another group. On replacement all the matched chars should be replaced with the value given in the replacement part. So this should loose % character.
or
>>> re.sub(r'%(\d*)(\w+)', lambda m: int(m.group(1))*m.group(2) if m.group(1) else m.group(2), st)
'ggkppp'
Assumes you are always printing single letter, but preceding number may be longer than single digit in base 10.
seq = ['2g', 'k', '3p']
result = ''.join(int(s[:-1] or 1) * s[-1] for s in seq)
assert result == "ggkppp"
LATE FOR THE SHOW BUT READY TO GO
Another way, is to define your function which converts nC into CCCC...C (ntimes), then pass it to a map to apply it on every element of the list l coming from the split over %, the finally join them all, as follows:
>>> def f(s):
x = 0
if s:
if len(s) == 1:
out = s
else:
for i in s:
if i.isdigit():
x = x*10 + int(i)
out = x*s[-1]
else:
out = ''
return out
>>> st
'%4g%10k%p'
>>> ''.join(map(f, st.split('%')))
'ggggkkkkkkkkkkp'
>>> st = '%2g%k%3p'
>>> ''.join(map(f, st.split('%')))
'ggkppp'
Or if you want to put all of these into one single function definition:
>>> def f(s):
out = ''
if s:
l = filter(None, s.split('%'))
for item in l:
x = 0
if len(item) == 1:
repl = item
else:
for c in item:
if c.isdigit():
x = x*10 + int(c)
repl = x*item[-1]
out += repl
return out
>>> st
'%2g%k%3p'
>>> f(st)
'ggkppp'
>>>
>>> st = '%4g%10k%p'
>>>
>>> f(st)
'ggggkkkkkkkkkkp'
>>> st = '%4g%101k%2p'
>>> f(st)
'ggggkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkpp'
>>> len(f(st))
107
EDIT :
In case of the presence of _ where the OP does not want this character to be repeated, then the best way in my opinion is to go with re.sub, it will make things easier, this way:
>>> def f(s):
pat = re.compile(r'%(\d*)([a-zA-Z]+)')
out = pat.sub(lambda m:int(m.group(1))*m.group(2) if m.group(1) else m.group(2), s)
return out
>>> st = '%4g_%12k%p__%m'
>>> f(st)
'gggg_kkkkkkkkkkkkp__m'
Loop the list, check first entry for number, and then append the second digit onwards:
string=''
l = ['2g', 'k', '3p']
for entry in l:
if len(entry) ==1:
string += (entry)
else:
number = int(entry[0])
for i in range(number):
string += (entry[1:])
Let's say I have a string that looks like "1000101"
I want to iterate over all possible ways to insert "!" where "1" is:
1000101
100010!
1000!01
1000!0!
!000101
!00010!
!000!01
!000!0!
scalable to any string and any number of "1"s
As (almost) always, itertools.product to the rescue:
>>> from itertools import product
>>> s = "10000101"
>>> all_poss = product(*(['1', '!'] if c == '1' else [c] for c in s))
>>> for x in all_poss:
... print(''.join(x))
...
10000101
1000010!
10000!01
10000!0!
!0000101
!000010!
!0000!01
!0000!0!
(Since we're working with one-character strings here we could even get away with
product(*('1!' if c == '1' else c for c in s))
if we wanted.)
Here you go. The recursive structure is that I can generate all the subcombos of s[1:] and then for each one of those combos I can insert in the front ! if s[0] is 1 and either way insert s[0]
def subcombs(s):
if not s:
return ['']
char = s[0]
res = []
for combo in subcombs(s[1:]):
if char == '1':
res.append('!' + combo)
res.append(char + combo)
return res
print(subcombs('1000101'))
['!000!0!', '1000!0!', '!00010!', '100010!', '!000!01', '1000!01', '!000101', '1000101']
An approach with generator:
def possibilities(s):
if not s:
yield ""
else:
for s_next in possibilities(s[1:]):
yield "".join([s[0], s_next])
if s[0] == '1':
yield "".join(['!', s_next])
print list(possibilities("1000101"))
Output:
['1000101', '!000101', '1000!01', '!000!01', '100010!', '!00010!', '1000!0!', '!000!0!']