I have a string like this:
"foo 15 bar -2hello 4 asdf+2"
I'd like to get:
"foo 14 bar -3hello 3 asdf+1"
I would like to replace every number (sequence of digits as signed base-10 integers) with the result of a subtraction executed on each of them, one for each number.
I've written a ~50 LOC function that iterates on characters, separating signs, digits and other text, applying the function and recombining the parts. Although it has one issue my intent with the question is not to review it. Instead I'm trying to ask, what is the pythonic way to solve this, is there an easier way?
For reference, here is my function with the known issue, but my intention is not asking for a review but finding the most pythonic way instead.
edit to answer the wise comment of Janne Karila:
preferred: retain sign if given: +2 should become +1
preferred: zero has no sign: +1 should become 0
preferred: no spaces: asdf - 4 becomes asdf - 3
required: only one sign: -+-2 becomes -+-3
edit on popular demand here is my buggy code :)
DISCLAIMER: Please note I'm not interested in fixing this code. I'm asking if there is a better approach than something like mine.
def apply_to_digits(some_str,handler):
sign = "+"
started = 0
number = []
tmp = []
result = []
for idx,char in enumerate(some_str):
if started:
if not char.isdigit():
if number:
ss = sign + "".join(number)
rewritten = str(handler(int(ss)))
result.append(rewritten)
elif tmp:
result.append("".join(tmp))
number = []
tmp = []
sign = "+"
started = 0
# char will be dealt later
else:
number.append(char)
continue
if char in "-+":
sign = char
started = 1
if tmp:
result.append("".join(tmp))
tmp = []
tmp.append(char)
continue
elif char.isdigit():
started = 1
if tmp:
result.append("".join(tmp))
tmp = []
number.append(char)
else:
tmp.append(char)
if number:
ss = sign + "".join(number)
rewritten = str(handler(int(ss)))
result.append(rewritten)
if tmp:
result.append("".join(tmp)), tmp
return "".join(result)
#
DISCLAIMER: Please note I'm not interested in fixing this code. I'm asking if there is a better approach than something like mine.
You could try using regex, and using re.sub:
>>> pattern = "(-?\d+)|(\+1)"
>>> def sub_one(match):
return str(int(match.group(0)) - 1)
>>> text = "foo 15 bar -2hello 4 asdf+2"
>>> re.sub(pattern, sub_one, text)
'foo 14 bar -3hello 3 asdf+1'
The regex (-?\d+)|(\+1) will either capture an optional - sign and one or more digits, OR the literal sequence +1. That way, the regex will make sure that all of your requirements when converting digits work properly.
The regex (-?\d+) by itself does the right thing most of the time, but the (\+1) exists to make sure that the string +1 always converts to zero, without a sign. If you change your mind, and want +1 to convert to +0, then you can just use only the first part of the regex: (-?d+).
You could probably compress this all into a one-liner if you wanted:
def replace_digits(text):
return re.sub("(-?\d+)|(\+1)", lambda m: str(int(m.group(0)) - 1), text)
Related
so i need to code a program which, for example if given the input 3[a]2[b], prints "aaabb" or when given 3[ab]2[c],prints "abababcc"(basicly prints that amount of that letter in the given order). i tried to use a for loop to iterate the first given input and then detect "[" letters in it so it'll know that to repeatedly print but i don't know how i can make it also understand where that string ends
also this is where i could get it to,which probably isnt too useful:
string=input()
string=string[::-1]
bulundu=6
for i in string:
if i!="]":
if i!="[":
lst.append(i)
if i=="[":
break
The approach I took is to remove the brackets, split the items into a list, then walk the list, and if the item is a number, add that many repeats of the next item to the result for output:
import re
data = "3[a]2[b]"
# Remove brackets and convert to a list
data = re.sub(r'[\[\]]', ' ', data).split()
result = []
for i, item in enumerate(data):
# If item is a number, print that many of the next item
if item.isdigit():
result.append(data[i+1] * int(item))
print(''.join(result))
# aaabb
A different approach, inspired by Subbu's use of re.findall. This approach finds all 'pairs' of numbers and letters using match groups, then multiplies them to produce the required text:
import re
data = "3[a]2[b]"
matches = re.findall('(\d+)\[([a-zA-Z]+)\]',data)
# [(3, 'a'), (2, 'b')]
for x in matches:
print(x[1] * int(x[0]), end='')
#aaabb
Lenghty and documented version using NO regex but simple string and list manipulation:
first split the input into parts that are numbers and texts
then recombinate them again
I opted to document with inline comments
This could be done like so:
# testcases are tuples of input and correct result
testcases = [ ("3[a]2[b]","aaabb"),
("3[ab]2[c]","abababcc"),
("5[12]6[c]","1212121212cccccc"),
("22[a]","a"*22)]
# now we use our algo for all those testcases
for inp,res in testcases:
split_inp = [] # list that takes the splitted values of the input
num = 0 # accumulator variable for more-then-1-digit numbers
in_text = False # bool that tells us if we are currently collecting letters
# go over all letters : O(n)
for c in inp:
# when a [ is reached our num is complete and we need to store it
# we collect all further letters until next ] in a list that we
# add at the end of your split_inp
if c == "[":
split_inp.append(num) # add the completed number
num = 0 # and reset it to 0
in_text = True # now in text
split_inp.append([]) # add a list to collect letters
# done collecting letters
elif c == "]":
in_text = False # no longer collecting, convert letters
split_inp[-1] = ''.join(split_inp[-1]) # to text
# between [ and ] ... simply add letter to list at end
elif in_text:
split_inp[-1].append(c) # add letter
# currently collecting numbers
else:
num *= 10 # increase current number by factor 10
num += int(c) # add newest number
print(repr(inp), split_inp, sep="\n") # debugging output for parsing part
# now we need to build the string from our parsed data
amount = 0
result = [] # intermediate list to join ['aaa','bb']
# iterate the list, if int remember it, it text, build composite
for part in split_inp:
if isinstance(part, int):
amount = part
else:
result.append(part*amount)
# join the parts
result = ''.join(result)
# check if all worked out
if result == res:
print("CORRECT: ", result + "\n")
else:
print (f"INCORRECT: should be '{res}' but is '{result}'\n")
Result:
'3[a]2[b]'
[3, 'a', 2, 'b']
CORRECT: aaabb
'3[ab]2[c]'
[3, 'ab', 2, 'c']
CORRECT: abababcc
'5[12]6[c]'
[5, '12', 6, 'c']
CORRECT: 1212121212cccccc
'22[a]'
[22, 'a']
CORRECT: aaaaaaaaaaaaaaaaaaaaaa
This will also handle cases of '5[12]' wich some of the other solutions wont.
You can capture both the number of repetitions n and the pattern to repeat v in one go using the described pattern. This essentially matches any sequence of digits - which is the first group we need to capture, reason why \d+ is between brackets (..) - followed by a [, followed by anything - this anything is the second pattern of interest, hence it is between backets (...) - which is then followed by a ].
findall will find all these matches in the passed line, then the first match - the number - will be cast to an int and used as a multiplier for the string pattern. The list of int(n) * v is then joined with an empty space. Malformed patterns may throw exceptions or return nothing.
Anyway, in code:
import re
pattern = re.compile("(\d+)\[(.*?)\]")
def func(x): return "".join([v*int(n) for n,v in pattern.findall(x)])
print(func("3[a]2[b]"))
print(func("3[ab]2[c]"))
OUTPUT
aaabb
abababcc
FOLLOW UP
Another solution which achieves the same result, without using regular expression (ok, not nice at all, I get it...):
def func(s): return "".join([int(x[0])*x[1] for x in map(lambda x:x.split("["), s.split("]")) if len(x) == 2])
I am not much more than a beginner and looking at the other answers, I thought understanding regex might be a challenge for a new contributor such as yourself since I myself haven't really dealt with regex.
The beginner friendly way to do this might be to loop through the input string and use string functions like isnumeric() and isalpha()
data = "3[a]2[b]"
chars = []
nums = []
substrings = []
for i, char in enumerate(data):
if char.isnumeric():
nums.append(char)
if char.isalpha():
chars.append(char)
for i, char in enumerate(chars):
substrings.append(char * int(nums[i]))
string = "".join(substrings)
print(string)
OUTPUT:
aaabb
And on trying different values for data:
data = "0[a]2[b]3[p]"
OUTPUT bbppp
data = "1[a]1[a]2[a]"
OUTPUT aaaa
NOTE: In case you're not familiar with the above functions, they are string functions, which are fairly self-explanatory. They are used as <your_string_here>.isalpha() which returns true if and only if the string is an alphabet (whitespace, numerics, and symbols return false
And, similarly for isnumeric()
For example,
"]".isnumeric() and "]".isalpha() return False
"a".isalpha() returns True
IF YOU NEED ANY CLARIFICATION ON A FUNCTION USED, PLEASE DO NOT HESITATE TO LEAVE A COMMENT
Coming from the C/C++ world and being a Python newb, I wrote this simple string function that takes an input string (guaranteed to be ASCII) and returns the last four characters. If there’s less than four characters, I want to fill the leading positions with the letter ‘A'. (this was not an exercise, but a valuable part of another complex function)
There are dozens of methods of doing this, from brute force, to simple, to elegant. My approach below, while functional, didn’t seem "Pythonic".
NOTE: I’m presently using Python 2.6 — and performance is NOT an issue. The input strings are short (2-8 characters), and I call this function only a few thousand times.
def copyFourTrailingChars(src_str):
four_char_array = bytearray("AAAA")
xfrPos = 4
for x in src_str[::-1]:
xfrPos -= 1
four_char_array[xfrPos] = x
if xfrPos == 0:
break
return str(four_char_array)
input_str = "7654321"
print("The output of {0} is {1}".format(input_str, copyFourTrailingChars(input_str)))
input_str = "21"
print("The output of {0} is {1}".format(input_str, copyFourTrailingChars(input_str)))
The output is:
The output of 7654321 is 4321
The output of 21 is AA21
Suggestions from Pythoneers?
I would use simple slicing and then str.rjust() to right justify the result using A as fillchar . Example -
def copy_four(s):
return s[-4:].rjust(4,'A')
Demo -
>>> copy_four('21')
'AA21'
>>> copy_four('1233423')
'3423'
You can simple adding four sentinel 'A' character before the original string, then take the ending four characters:
def copy_four(s):
return ('AAAA'+s)[-4:]
That's simple enough!
How about something with string formatting?
def copy_four(s):
return '{}{}{}{}'.format(*('A'*(4-len(s[-4:])) + s[-4:]))
Result:
>>> copy_four('abcde')
'bcde'
>>> copy_four('abc')
'Aabc'
Here's a nicer, more canonical option:
def copy_four(s):
return '{:A>4}'.format(s[-4:])
Result:
>>> copy_four('abcde')
'bcde'
>>> copy_four('abc')
'Aabc'
You could use slicing to get the last 4 characters, then string repetition (* operator) and concatenation (+ operator) as below:
def trailing_four(s):
s = s[-4:]
s = 'A' * (4 - len(s)) + s
return s
You can try this
def copy_four_trailing_chars(input_string)
list_a = ['A','A','A','A']
str1 = input_string[:-4]
if len(str1) < 4:
str1 = "%s%s" % (''.join(list_a[:4-len(str1)]), str1)
return str1
I'm looking for a way to take a string that looks like the following:
(a,1),(b,1),(a,1),(b,5),(a,1),(b,2),(a,1),(b,1),(a,2),(b,6),(a,2)
And replace the first "a" with an even number, the second with the next
up even number, and so on for however long the string is. Then I'd like to take the first "b" and assign it an odd number, then the next "b" gets the
next highest odd number, and so on for however long the string is. I'm
working primarily in Python 2.7, but would be willing to look into other languages if a solution exists in that.
The following regular expression substitution should work:
import re
def odd_even(x):
global a,b
if x.group(1) == 'a':
a += 2
return str(a)
else:
b += 2
return str(b)
a = 0
b = -1
source = "(a,1),(b,1),(a,1),(b,5),(a,1),(b,2),(a,1),(b,1),(a,2),(b,6),(a,2)"
print re.sub("([ab])", odd_even, source)
This prints:
(2,1),(1,1),(4,1),(3,5),(6,1),(5,2),(8,1),(7,1),(10,2),(9,6),(12,2)
even = 2
while "a" in string:
string = string.replace("a", str(even), 1)
even += 2
odd = 1
while "b" in string:
string = string.replace("b", str(odd), 1)
odd += 2
Hello I am fairly new at programming,
I would like to know is there a function or a method that allows us to find out how many letters have been changed in a string..
example:
input:
"Cold"
output:
"Hold"
Hence only 1 letter was changed
or the example:
input:
"Deer"
output:
"Dial"
Hence 3 letters were changed
I spoke too soon. First result googling:
https://pypi.python.org/pypi/python-Levenshtein/
This should be able to measure the minimum number of changes needed to get from one string to another.
If you don't need to consider character insertions or deletions, the problem is reduced to simply counting the number of characters that are different between the strings.
Since you're new to programming, a imperative-style program would be:
def differences(string1,string2):
i=0
different=0
for i in range(len(string1)):
if string1[i]!=string2[i]:
different= different+1
return different
something slightly more pythonic would be:
def differences(string1,string2):
different=0
for a,b in zip(string1,string2):
if a!=b:
different+= 1
return different
or, if you want to go fully functional:
def differences(string1,string2):
return sum(map(lambda (x,y):x!=y, zip(string1,string2)))
which, as #DSM suggested, is equivalent to the more readable generator expression:
def differences(string1,string2):
return sum(x != y for x,y in zip(string1, string2))
Use the itertools library as follows (Python 3.x)
from itertools import zip_longest
def change_count(string1, string2):
count = 0
for i, (char1, char2) in enumerate(zip_longest(string1, string2)):
if char1 != char2:
count = count + 1
return count
string1 = input("Enter one string: ")
string2 = input("Enter another string: ")
changed = change_count(string1, string2)
print("Times changed: ", changed)
Check out the difflib library, particularly then ndiff method. Note: this is kind of overkill for the required job, but it is really great for seeing the differences between two files (you can see which are new, which are changed, etc etc)
word1 = "Cold"
word2 = "Waldo"
i = 0
differences = difflib.ndiff(word1, word2)
for line in differences:
if line[0] is not " ":
i += 1
print(i)
I have the following problem: I would like to write a function in Python which, given a string, returns a string where every group of two characters is swapped.
For example given "ABCDEF" it returns "BADCFE".
The length of the string would be guaranteed to be an even number.
Can you help me how to do it in Python?
To add another option:
>>> s = 'abcdefghijkl'
>>> ''.join([c[1] + c[0] for c in zip(s[::2], s[1::2])])
'badcfehgjilk'
import re
print re.sub(r'(.)(.)', r'\2\1', "ABCDEF")
from itertools import chain, izip_longest
''.join(chain.from_iterable(izip_longest(s[1::2], s[::2], fillvalue = '')))
You can also use islices instead of regular slices if you have very large strings or just want to avoid the copying.
Works for odd length strings even though that's not a requirement of the question.
While the above solutions do work, there is a very simple solution shall we say in "layman's" terms. Someone still learning python and string's can use the other answers but they don't really understand how they work or what each part of the code is doing without a full explanation by the poster as opposed to "this works". The following executes the swapping of every second character in a string and is easy for beginners to understand how it works.
It is simply iterating through the string (any length) by two's (starting from 0 and finding every second character) and then creating a new string (swapped_pair) by adding the current index + 1 (second character) and then the actual index (first character), e.g., index 1 is put at index 0 and then index 0 is put at index 1 and this repeats through iteration of string.
Also added code to ensure string is of even length as it only works for even length.
string = "abcdefghijklmnopqrstuvwxyz123"
# use this prior to below iteration if string needs to be even but is possibly odd
if len(string) % 2 != 0:
string = string[:-1]
# iteration to swap every second character in string
swapped_pair = ""
for i in range(0, len(string), 2):
swapped_pair += (string[i + 1] + string[i])
# use this after above iteration for any even or odd length of strings
if len(swapped_pair) % 2 != 0:
swapped_adj += swapped_pair[-1]
print(swapped_pair)
badcfehgjilknmporqtsvuxwzy21 # output if the "needs to be even" code used
badcfehgjilknmporqtsvuxwzy213 # output if the "even or odd" code used
Here's a nifty solution:
def swapem (s):
if len(s) < 2: return s
return "%s%s%s"%(s[1], s[0], swapem (s[2:]))
for str in ("", "a", "ab", "abcdefgh", "abcdefghi"):
print "[%s] -> [%s]"%(str, swapem (str))
though possibly not suitable for large strings :-)
Output is:
[] -> []
[a] -> [a]
[ab] -> [ba]
[abcdefgh] -> [badcfehg]
[abcdefghi] -> [badcfehgi]
If you prefer one-liners:
''.join(reduce(lambda x,y: x+y,[[s[1+(x<<1)],s[x<<1]] for x in range(0,len(s)>>1)]))
Here's a another simple solution:
"".join([(s[i:i+2])[::-1]for i in range(0,len(s),2)])