Python regex: How to increase only one number in string? - python

I have a string of following types:
a1 = 'images1subimages1/folder100/hello1.png'
a1 = 'images1subimages1 folder100 hello1.png'
a1 = 'images1subimages1folder100hello1.png'
a1 = 'images1b100d1.png'
The first Integer of the string is num0 and we only care about it. We want to increase all occurrence of num0 by one and keep other numbers the same.
Required:
a2 = 'images2subimages2/folder100/hello2.png'
a2 = 'images2subimages2 folder100 hello2.png'
a2 = 'images2subimages2folder100hello2.png'
a2 = 'images2b100d2.png'
My attempt:
import re
a1 = 'images1subimages1/folder100/hello1.png'
nums = list(map(int, re.findall(r'\d+', a1)))
nums0 = nums[0]
nums_changed = [j+1 if j==nums[0] else j for i,j in enumerate(nums)]
parts = re.findall(r'(\w*\d+)',a1)
for i in range(len(parts)):
num_parts = list(map(int, re.findall(r'\d+', parts[i])))
for num_part in num_parts:
if num_part == nums0:
parts[i] = parts[i].replace(str(nums0), str(nums0+1))
ans = '/'.join(parts)
ans
This has the following result:
a1 = 'images1subimages1/folder100/hello1.png' # good
a1 = 'images1subimages1 folder100 hello1.png' # bad
Is there a general way to solve the problem using regex in python?

Ì suggest first extracting the first number and then increment all occurrences of this number when it is not enclosed with other digits with re.sub:
import re
a1 = 'images1subimages1/folder100/hello1.png'
num0_m = re.search(r'\d+', a1) # Extract the first chunk of 1+ digits
if num0_m: # If there is a match
rx = r'(?<!\d){}(?!\d)'.format(num0_m.group()) # Set a regex to match the number when not inside other digits
print(re.sub(rx, lambda x: str(int(x.group())+1), a1)) # Increment the matched numbers
# => images2subimages2/folder100/hello2.png
See the Python demo

You can split the string on numbers, increment the ones equal to the first one, and rebuild the string:
import re
def increment_first(s):
parts = re.split(r'(\d+)', s)
nums = list(map(int, parts[1::2]))
num0 = nums[0]
nums = [num + (num == num0) for num in nums]
parts[1::2] = map(str, nums)
return ''.join(parts)
Testing it on your data:
tests = ['images1subimages1/folder100/hello1.png',
'images1subimages1 folder100 hello1.png',
'images1subimages1folder100hello1.png',
'images1b100d1.png']
for test in tests:
print(test, increment_first(test))
Output:
images1subimages1/folder100/hello1.png images2subimages2/folder100/hello2.png
images1subimages1 folder100 hello1.png images2subimages2 folder100 hello2.png
images1subimages1folder100hello1.png images2subimages2folder100hello2.png
images1b100d1.png images2b100d2.png

Alas, I'm not as fast as some of these regex gurus. Here is my solution anyway.
Find the first occurrence of a number re.search(r'\d+', st).group(0)
Substitute the first occurrence where the found number is not preceded or followed by another number (?<!\d)+' + re.escape(first) + r'(?!\d)+.
import re
def increment_all_of_first_occurring_number(st):
first = re.search(r'\d+', st).group(0)
return re.sub(
r'(?<!\d)+' + re.escape(first) + r'(?!\d)+',
str(int(first) + 1),
st
)
if __name__ == '__main__':
a1 = 'images1subimages1/folder100/hello1.png'
a2 = 'images1subimages1 folder100 hello1.png'
a3 = 'images1subimages1folder100hello1.png'
a4 = 'images1b100d1.png'
b1 = 'images10subimages10/folder10101/hello10.png'
b2 = 'images10subimages10 folder10101 hello10.png'
b3 = 'images10subimages10folder10101hello10.png'
b4 = 'images10b10101d10.png'
print(increment_all_of_first_occurring_number(a1))
print(increment_all_of_first_occurring_number(a2))
print(increment_all_of_first_occurring_number(a3))
print(increment_all_of_first_occurring_number(a4))
print(increment_all_of_first_occurring_number(b1))
print(increment_all_of_first_occurring_number(b2))
print(increment_all_of_first_occurring_number(b3))
print(increment_all_of_first_occurring_number(b4))
Results
images2subimages2/folder100/hello2.png
images2subimages2 folder100 hello2.png
images2subimages2folder100hello2.png
images2b100d2.png
images11subimages11/folder10101/hello11.png
images11subimages11 folder10101 hello11.png
images11subimages11folder10101hello11.png
images11b10101d11.png

Related

Python - Counting Letter Frequency in a String

I want to write my each string's letter frequencies. My inputs and expected outputs are like this.
"aaaa" -> "a4"
"abb" -> "a1b2"
"abbb cc a" -> "a1b3 c2 a1"
"bbbaaacddddee" -> "b3a3c1d4e2"
"a b" -> "a1 b1"
I found this solution but it gives the frequencies in random order. How can I do this?
Does this satisfy your needs?
from itertools import groupby
s = "bbbaaac ddddee aa"
groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]
res1 = "".join("{}{}".format(label, count) for label, count in result)
# 'b3a3c1 1d4e2 1a2'
# spaces just as spaces, do not include their count
import re
re.sub(' [0-9]+', ' ', res1)
'b3a3c1 d4e2 a2'
For me, it is a little bit trickier that it looks at first. For example, it does look that "bbbaaacddddee" -> "b3a3c1d4e2" needs the count results to be outputted in the order of appearance in the passed string:
import re
def unique_elements(t):
l = []
for w in t:
if w not in l:
l.append(w)
return l
def splitter(s):
res = []
tokens = re.split("[ ]+", s)
for token in tokens:
s1 = unique_elements(token) # or s1 = sorted(set(token))
this_count = "".join([k + str(v) for k, v in list(zip(s1, [token.count(x) for x in s1]))])
res.append(this_count)
return " ".join(res)
print(splitter("aaaa"))
print(splitter("abb"))
print(splitter("abbb cc a"))
print(splitter("bbbaaacddddee"))
print(splitter("a b"))
OUTPUT
a4
a1b2
a1b3 c2 a1
b3a3c1d4e2
a1 b1
If the order of appearance is not a real deal, you can disregard the unique_elements function and simply substitute something like s1 = sorted(set(token)) within splitter, as indicated in the comment.
here is you answer
test_str = "here is your answer"
res = {}
list=[]
list=test_str.split()
# print(list)
for a in list:
res={}
for keys in a:
res[keys] = res.get(keys, 0) + 1
for key,value in res.items():
print(f"{key}{value}",end="")
print(end=" ")
There is no need to iterate every character in every word.
This is an alternate solution. (If you don't want to use itertools, that looked pretty tidy.)
def word_stats(data: str=""):
all = []
for word in data.split(" "):
res = []
while len(word)>0:
res.append(word[:1] + str(word.count(word[:1])))
word = word.replace(word[:1],"")
res.sort()
all.append("".join(res))
return " ".join(all)
print(word_stats("asjssjbjbbhsiaiic ifiaficjxzjooro qoprlllkskrmsnm mmvvllvlxjxj jfnnfcncnnccnncsllsdfi"))
print(word_stats("abbb cc a"))
print(word_stats("bbbaaacddddee"))
This would output:
c5d1f3i1j1l2n7s2
a1b3 c2 a1
a3b3c1d4e2

How to get Count of discrepancy values from two given string?

I have two Strings Paula and Pole. If we check Paula with Pole then we will get three discrepancy a,u,a is present in Paula but not present in Pole so it should return a value 3.
Input:
enter string1: Paula
enter string2: Pole
Expected Output:
3
String 1 is always correct here for a row of names.
I have tried something like below so far
import itertools
def compare(string1, string2, no_match_c=' ', match_c='|'):
if len(string2) < len(string1):
string1, string2 = string2, string1
result = ''
n_diff = 0
for c1, c2 in itertools.izip(string1, string2):
if c1 == c2:
result += match_c
else:
result += no_match_c
n_diff += 1
delta = len(string2) - len(string1)
result += delta * no_match_c
n_diff += delta
return (result, n_diff)
def main():
string1 = 'paula'
string2 = 'pole'
result, n_diff = compare(string1, string2, no_match_c='_')
print(n_diff)
main()
Answer should be in a function
Example of other string
string1 = Michelle
string2 = Michele
Output : 1
This is a simple approach to do what you want, assuming you always want to find the number of chars in string 1 not in string 2.
def compare(str1, str2):
#Return count of chars in str1 not in str2
s1 = set([x for x in str1])
s2 = set([x for x in str2])
ms = s1 ^ s2 & s1 #Finds the chars in string 1 not in str2
rslt = 0
for v in ms:
rslt += str1.count(v)
return rslt
You may try Counter, it can use to count the number of each character in both strings and support subtraction between counts.
from collections import Counter
def diff(s1, s2):
c1 = Counter(s1)
c2 = Counter(s2)
return sum((c1 - c2).values())
print(diff("Paula", "Pole")) # Output: 3
print(diff("Pole", "Paula")) # Output: 2
print(diff("Michelle", "Michele")) # Output: 1
print(diff("Michele", "Michelle")) # Output: 0
You can try that, using a list of zeors in the size of 256 (number of ASCII characters) which represents a counter for all characters.
def compare(string1, string2):
chars_counter = [0]*256
for c1 in string1:
chars_counter[ord(c1)] += 1
for c2 in string2:
if chars_counter[ord(c2)] != 0:
chars_counter[ord(c2)] -= 1
return sum(chars_counter)

Contract words in python with set length

I'm currently trying to make a sort of "word mixer": for two given words and the desired length specified, the program should return the "mix" of the two words. However, it can be any sort of mix: it can be the first half of the first word combined with the second half of the second word, it can be a random mix, anything really.
Examples:
fish + cake, length 5: fiske
dog + cat, length 4: doga
late + cross, length 6: losste
I've written a very sloppy code (as seen below), and I'd appreciate some tips on what am I doing wrong (since my outputs aren't really good) and if there's anything that can be improved.
from random import randint
name1 = "domingues"
name2 = "signorelli"
names = [name1,name2]
# a list of the desired lengths
lengths = [5,6,7]
mixes = []
def sizes(size):
if size == 5:
letters1 = randint(2,3)
else:
letters1 = randint(2,size-2)
letters2 = size-letters1
return letters1, letters2
def mix(letters1, letters2):
n = randint(0,1)
if n == 1:
a = 0
else:
a = 1
n1 = names[n]
n2 = names[a]
result = n1[0:letters2]+n2[-letters1::]
return result
file = open("results.txt","w+")
for leng in lengths:
file.write("RESULTS WITH "+str(leng)+" LETTERS \n")
file.write("\n")
for i in range(10):
let1, let2 = sizes(leng)
result = mix(let1,let2)
while result == name1 or result == name2:
result = mix(let2)
if result not in mixes:
mixes.append(result)
for m in mixes:
if m not in file:
file.write(m+" \n")
file.write("\n")
file.close()
(Thanks for taking your time to help me btw, I appreciate it!)
In general, this is AI-related problem, because we are implicitly want to get readable mixed words.
I just wrote simple (and dirty) code that tries to catch sequences of vowels and consonants from training data and builds mixed words according to catched rules.
import random
consonants_pat = 'BCDFGHJKLMNPQRSTVXZ'.lower()
vowels_pat = 'aeiouy'
train_data = '''
This our sentence to be used as a training dataset
It should be longer
'''
def build_mixer(train_data, num=3, mixed_len=(2, 4)):
def _get_random_pattern(td, wlen):
td_splitted = td.lower().split()
while True:
w = random.choice(list(filter(lambda x: len(x)>=wlen, td_splitted)))
for j in range(len(w)-wlen):
yield tuple(map(lambda x: 0 if x in vowels_pat else 1, w[j:j + wlen]))
def _select_vowels(w):
return
def _mixer(w1, w2, num=num, mixed_len=mixed_len):
allowed_letters = w1.lower().strip() + w2.lower().strip()
ind = 1
for j in range(num):
wlen = random.choice(range(*mixed_len))
pattern = _get_random_pattern(train_data, wlen)
_aux = allowed_letters
word = ''
try:
for pat in pattern:
for k in pat:
if k == 0:
choiced = random.choice(list(filter(lambda x: x in vowels_pat, _aux)))
word += choiced
else:
choiced = random.choice(list(filter(lambda x: x in consonants_pat, _aux)))
word += choiced
l = list(_aux)
l.remove(choiced)
_aux = ''.join(l)
ind += 1
yield word
if ind>num:
raise StopIteration
except IndexError:
continue
return _mixer
mixer = build_mixer(train_data, num=6, mixed_len=(3,6))
for mixed in mixer('this', 'horse'):
print(mixed)
I got the following words:
het hetihs hetihssro sheo hsio tohir
I recommend taking a random slice of the word string and combining it with another random slice from the second word. Get the len(word) and take a slice of the word randomly using random.randrange().
import random
def word_mixer(word1, word2):
slice1 = word1[:random.randrange(2, len(word1))]
slice2 = word2[:random.randrange(2, len(word2))]
return slice1 + slice2
mixed = word_mixer('weasel', 'snake')
print(mixed)
Output:
wesnak
weasesna
weassnak
Here's one way to do it.
import random
w1 = 'dog'
w2 = 'cat'
w3 = 'fish'
w4 = 'wolf'
def word_mixer(w1, w2, length):
new_word = w1 + w2
x = random.sample(range(len(new_word)), length)
result = []
for i in x:
result.append(new_word[i])
return "".join(result)
print(word_mixer(w3,w4,4))
print(word_mixer(w2,w4,5))
Output:
lfwi
falwc
A bit more smaller version of #AkshayNevrekar's post:
import random
w1 = 'dog'
w2 = 'cat'
w3 = 'fish'
w4 = 'wolf'
def word_mixer(w1, w2, length):
return ''.join(random.sample(w1 + w2, length))
print(word_mixer(w3, w4, 4))
print(word_mixer(w2, w4, 5))
We can also use random.sample and pass mixed string to it like this:
import random
w1=input("Enter first word")
w2=input("Enter second word")
len=int(input("Enter length"))
mixed=w1+w2
def wordmixer(mixed,len):
return ''.join(random.sample(mixed,len))
print(wordmixer(mixed,len))

Create alphabetically ascending list

I want to create alphabetically ascending names like the column names in excel. That is I want to have smth. like a,b,c,...,z,aa,ab,...az,...zz,aaa,aab,....
I have tried:
for i in range(1000):
mod = int(i%26)
div = int(i/26)
print(string.ascii_lowercase[div]+string.ascii_lowercase[mod])
Which works until zz but than fails because it runs out of index
aa
ab
ac
ad
ae
af
ag
ah
ai
aj
ak
al
.
.
.
zz
IndexError
You could make use of itertools.product():
from itertools import product
from string import ascii_lowercase
for i in range(1, 4):
for x in product(ascii_lowercase, repeat=i):
print(''.join(x))
First, you want all letters, then all pairs, then all triplets, etc. This is why we first need to iterate through all the string lengths you want (for i in range(...)).
Then, we need all possible associations with the i letters, so we can use product(ascii_lowercase) which is equivalent to a nested for loop repeated i times.
This will generate the tuples of size i required, finally just join() them to obtain a string.
To continuously generate names without limit, replace the for loop with while:
def generate():
i = 0
while True:
i += 1
for x in product(ascii_lowercase, repeat=i):
yield ''.join(x)
generator = generate()
next(generator) # 'a'
next(generator) # 'b'
...
For a general solution we can use a generator and islice from itertools:
import string
from itertools import islice
def generate():
base = ['']
while True:
next_base = []
for b in base:
for i in range(26):
next_base.append(b + string.ascii_lowercase[i])
yield next_base[-1]
base = next_base
print('\n'.join(islice(generate(), 1000)))
And the output:
a
b
c
...
z
aa
ab
...
zz
aaa
aab
...
And you can use islice to take as many strings as you need.
Try:
>>import string
>>string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
>>len(string.ascii_lowercase)
26
When your index in below line exceed 26 it raise exception
div = int(i/26)
, becouse of ascii_lowercase length:
But you can:
for i in range(26*26): # <--- 26 is string.ascii_lowercase
mod = int(i%26)
div = int(i/26)
print(string.ascii_lowercase[div]+string.ascii_lowercase[mod])
EDIT:
or you can use:
import string
n = 4 # number of chars
small_limit = len(string.ascii_lowercase)
limit = small_limit ** n
i = 0
while i < limit:
s = ''
for c in range(n):
index = int(i/(small_limit**c))%small_limit
s += string.ascii_lowercase[index]
print(s)
i += 1
You can use:
from string import ascii_lowercase
l = list(ascii_lowercase) + [letter1+letter2 for letter1 in ascii_lowercase for letter2 in ascii_lowercase]+ [letter1+letter2+letter3 for letter1 in ascii_lowercase for letter2 in ascii_lowercase for letter3 in ascii_lowercase]
There's an answer to this question provided on Code Review SE
A slight modification to the answer in the link gives the following which works for an arbitrary number of iterations.
def increment_char(c):
return chr(ord(c) + 1) if c != 'z' else 'a'
def increment_str(s):
lpart = s.rstrip('z')
num_replacements = len(s) - len(lpart)
new_s = lpart[:-1] + increment_char(lpart[-1]) if lpart else 'a'
new_s += 'a' * num_replacements
return new_s
s = ''
for _ in range(1000):
s = increment_str(s)
print(s)

How can I get the next string, in alphanumeric ordering, in Python?

I need a simple program that given a string, returns to me the next one in the alphanumeric ordering (or just the alphabetic ordering).
f("aaa")="aab"
f("aaZ")="aba"
And so on.
Is there a function for this in one of the modules already?
I don't think there's a built-in function to do this. The following should work:
def next_string(s):
strip_zs = s.rstrip('z')
if strip_zs:
return strip_zs[:-1] + chr(ord(strip_zs[-1]) + 1) + 'a' * (len(s) - len(strip_zs))
else:
return 'a' * (len(s) + 1)
Explanation: you find the last character which is not a z, increment it, and replace all of the characters after it with a's. If the entire string is z's, then return a string of all a's that is one longer.
Are the answers at How would you translate this from Perl to Python? sufficient? Not 100% what you're asking, but close...
A different, longer, but perhaps more readable and flexible solution:
def toval(s):
"""Converts an 'azz' string into a number"""
v = 0
for c in s.lower():
v = v * 26 + ord(c) - ord('a')
return v
def tostr(v, minlen=0):
"""Converts a number into 'azz' string"""
s = ''
while v or len(s) < minlen:
s = chr(ord('a') + v % 26) + s
v /= 26
return s
def next(s, minlen=0):
return tostr(toval(s) + 1, minlen)
s = ""
for i in range(100):
s = next(s, 5)
print s
You convert the string into a number where each letter represents a digit in base 26, increase the number by one and convert the number back into the string. This way you can do arbitrary math on values represented as strings of letters.
The ''minlen'' parameter controls how many digits the result will have (since 0 == a == aaaaa).
Sucks that python doesn't have what ruby has: String#next So here's a shitty solution to deal with alpha-numerical strings:
def next_string(s):
a1 = range(65, 91) # capital letters
a2 = range(97, 123) # letters
a3 = range(48, 58) # numbers
char = ord(s[-1])
for a in [a1, a2, a3]:
if char in a:
if char + 1 in a:
return s[:-1] + chr(char + 1)
else:
ns = next_string(s[:-1]) if s[:-1] else chr(a[0])
return ns + chr(a[0])
print next_string('abc') # abd
print next_string('123') # 124
print next_string('ABC') # ABD
# all together now
print next_string('a0') # a1
print next_string('1a') # 1b
print next_string('9A') # 9B
# with carry-over
print next_string('9') # 00
print next_string('z') # aa
print next_string('Z') # AA
# cascading carry-over
print next_string('a9') # b0
print next_string('0z') # 1a
print next_string('Z9') # AA0
print next_string('199') # 200
print next_string('azz') # baa
print next_string('Zz9') # AAa0
print next_string('$a') # $b
print next_string('$_') # None... fix it yourself
Not great. Kinda works for me.

Categories

Resources