I'm trying to build some kind of "dynamic substring" that is build out of a loop in a given string. The rule is that I need to find the longest substring in alphabetic order and in case I have a tide, I need to evaluate both and print the one with the bigger value.
I read that in python characters are already given a numeric value, so a is lower than b; knowing this I wrote the following:
s = "abcsaabcpaosdjaf"
ans = []
# Loop over the string
for i in range(len(s)-1):
if s[i] < s[i+1]:
#evaluate if it is in order and build the new string
ans = s[i]+s[i+1]
#print the result
print(ans)
the problem I have is that I don't know how to dynamically - I am not sure if this is the right way to say it - build the substring ans, right now I have the s[i]+s[i+1] but that only gives me a list of two characters that are in fact in alphabetic order, and it is fixed to only two. How can I do it in a way that it builds it as it goes?
Try this. Comments hopefully explain enough, but ask away if you don't understand.
s= "abcsaabcpaosdjaf"
best_answer = ''
current_answer = s[0]
# Loop over the string
for i in s[1:]:
# look to see if this letter is after the
# last letter in the current answer.
if ord(i) > ord(current_answer[-1]):
# if it is, add the letter to the current
# answer
current_answer += i
else:
# if it is not, we check if the current
# answer is longer than the best
# answer, and update it to the best
# answer if it is.
if len(current_answer) > len(best_answer):
best_answer = current_answer
# We then set the current answer
# to just the last letter read.
current_answer = i
import itertools
s= "abcsaabcpaosdjaf"
result = max(
(
list(next(sub)) + [b for a, b in sub]
for ascending, sub in itertools.groupby(zip(s,s[1:]), lambda x: x[0] <= x[1])
if ascending
),
key=len
)
print (''.join(result))
Credits to this
Related
so i need to code a program which, for example if given the input 3[a]2[b], prints "aaabb" or when given 3[ab]2[c],prints "abababcc"(basicly prints that amount of that letter in the given order). i tried to use a for loop to iterate the first given input and then detect "[" letters in it so it'll know that to repeatedly print but i don't know how i can make it also understand where that string ends
also this is where i could get it to,which probably isnt too useful:
string=input()
string=string[::-1]
bulundu=6
for i in string:
if i!="]":
if i!="[":
lst.append(i)
if i=="[":
break
The approach I took is to remove the brackets, split the items into a list, then walk the list, and if the item is a number, add that many repeats of the next item to the result for output:
import re
data = "3[a]2[b]"
# Remove brackets and convert to a list
data = re.sub(r'[\[\]]', ' ', data).split()
result = []
for i, item in enumerate(data):
# If item is a number, print that many of the next item
if item.isdigit():
result.append(data[i+1] * int(item))
print(''.join(result))
# aaabb
A different approach, inspired by Subbu's use of re.findall. This approach finds all 'pairs' of numbers and letters using match groups, then multiplies them to produce the required text:
import re
data = "3[a]2[b]"
matches = re.findall('(\d+)\[([a-zA-Z]+)\]',data)
# [(3, 'a'), (2, 'b')]
for x in matches:
print(x[1] * int(x[0]), end='')
#aaabb
Lenghty and documented version using NO regex but simple string and list manipulation:
first split the input into parts that are numbers and texts
then recombinate them again
I opted to document with inline comments
This could be done like so:
# testcases are tuples of input and correct result
testcases = [ ("3[a]2[b]","aaabb"),
("3[ab]2[c]","abababcc"),
("5[12]6[c]","1212121212cccccc"),
("22[a]","a"*22)]
# now we use our algo for all those testcases
for inp,res in testcases:
split_inp = [] # list that takes the splitted values of the input
num = 0 # accumulator variable for more-then-1-digit numbers
in_text = False # bool that tells us if we are currently collecting letters
# go over all letters : O(n)
for c in inp:
# when a [ is reached our num is complete and we need to store it
# we collect all further letters until next ] in a list that we
# add at the end of your split_inp
if c == "[":
split_inp.append(num) # add the completed number
num = 0 # and reset it to 0
in_text = True # now in text
split_inp.append([]) # add a list to collect letters
# done collecting letters
elif c == "]":
in_text = False # no longer collecting, convert letters
split_inp[-1] = ''.join(split_inp[-1]) # to text
# between [ and ] ... simply add letter to list at end
elif in_text:
split_inp[-1].append(c) # add letter
# currently collecting numbers
else:
num *= 10 # increase current number by factor 10
num += int(c) # add newest number
print(repr(inp), split_inp, sep="\n") # debugging output for parsing part
# now we need to build the string from our parsed data
amount = 0
result = [] # intermediate list to join ['aaa','bb']
# iterate the list, if int remember it, it text, build composite
for part in split_inp:
if isinstance(part, int):
amount = part
else:
result.append(part*amount)
# join the parts
result = ''.join(result)
# check if all worked out
if result == res:
print("CORRECT: ", result + "\n")
else:
print (f"INCORRECT: should be '{res}' but is '{result}'\n")
Result:
'3[a]2[b]'
[3, 'a', 2, 'b']
CORRECT: aaabb
'3[ab]2[c]'
[3, 'ab', 2, 'c']
CORRECT: abababcc
'5[12]6[c]'
[5, '12', 6, 'c']
CORRECT: 1212121212cccccc
'22[a]'
[22, 'a']
CORRECT: aaaaaaaaaaaaaaaaaaaaaa
This will also handle cases of '5[12]' wich some of the other solutions wont.
You can capture both the number of repetitions n and the pattern to repeat v in one go using the described pattern. This essentially matches any sequence of digits - which is the first group we need to capture, reason why \d+ is between brackets (..) - followed by a [, followed by anything - this anything is the second pattern of interest, hence it is between backets (...) - which is then followed by a ].
findall will find all these matches in the passed line, then the first match - the number - will be cast to an int and used as a multiplier for the string pattern. The list of int(n) * v is then joined with an empty space. Malformed patterns may throw exceptions or return nothing.
Anyway, in code:
import re
pattern = re.compile("(\d+)\[(.*?)\]")
def func(x): return "".join([v*int(n) for n,v in pattern.findall(x)])
print(func("3[a]2[b]"))
print(func("3[ab]2[c]"))
OUTPUT
aaabb
abababcc
FOLLOW UP
Another solution which achieves the same result, without using regular expression (ok, not nice at all, I get it...):
def func(s): return "".join([int(x[0])*x[1] for x in map(lambda x:x.split("["), s.split("]")) if len(x) == 2])
I am not much more than a beginner and looking at the other answers, I thought understanding regex might be a challenge for a new contributor such as yourself since I myself haven't really dealt with regex.
The beginner friendly way to do this might be to loop through the input string and use string functions like isnumeric() and isalpha()
data = "3[a]2[b]"
chars = []
nums = []
substrings = []
for i, char in enumerate(data):
if char.isnumeric():
nums.append(char)
if char.isalpha():
chars.append(char)
for i, char in enumerate(chars):
substrings.append(char * int(nums[i]))
string = "".join(substrings)
print(string)
OUTPUT:
aaabb
And on trying different values for data:
data = "0[a]2[b]3[p]"
OUTPUT bbppp
data = "1[a]1[a]2[a]"
OUTPUT aaaa
NOTE: In case you're not familiar with the above functions, they are string functions, which are fairly self-explanatory. They are used as <your_string_here>.isalpha() which returns true if and only if the string is an alphabet (whitespace, numerics, and symbols return false
And, similarly for isnumeric()
For example,
"]".isnumeric() and "]".isalpha() return False
"a".isalpha() returns True
IF YOU NEED ANY CLARIFICATION ON A FUNCTION USED, PLEASE DO NOT HESITATE TO LEAVE A COMMENT
I'm trying to compress a string in a way that any sequence of letters in strict alphabetical order is swapped with the first letter plus the length of the sequence.
For example, the string "abcdefxylmno", would become: "a6xyl4"
Single letters that aren't in order with the one before or after just stay the way they are.
How do I check that two letters are successors (a,b) and not simply in alphabetical order (a,c)? And how do I keep iterating on the string until I find a letter that doesn't meet this requirement?
I'm also trying to do this in a way that makes it easier to write an inverse function (that given the result string gives me back the original one).
EDIT :
I've managed to get the function working, thanks to your suggestion of using the alphabet string as comparison; now I'm very much stuck on the inverse function: given "a6xyl4" expand it back into "abcdefxylmno".
After quite some time I managed to split the string every time there's a number and I made a function that expands a 2 char string, but it fails to work when I use it on a longer string:
from string import ascii_lowercase as abc
def subString(start,n):
L=[]
ind = abc.index(start)
newAbc = abc[ind:]
for i in range(len(newAbc)):
while i < n:
L.append(newAbc[i])
i+=1
res = ''.join(L)
return res
def unpack(S):
for i in range(len(S)-1):
if S[i] in abc and S[i+1] not in abc:
lett = str(S[i])
num = int(S[i+1])
return subString(lett,num)
def separate(S):
lst = []
for i in S:
lst.append(i)
for el in lst:
if el.isnumeric():
ind = lst.index(el)
lst.insert(ind+1,"-")
a = ''.join(lst)
L = a.split("-")
if S[-1].isnumeric():
L.remove(L[-1])
return L
else:
return L
def inverse(S):
L = separate(S)
for i in L:
return unpack(i)
Each of these functions work singularly, but inverse(S) doesn't output anything. What's the mistake?
You can use the ord() function which returns an integer representing the Unicode character. Sequential letters in alphabetical order differ by 1. Thus said you can implement a simple funtion:
def is_successor(a,b):
# check for marginal cases if we dont ensure
# input restriction somewhere else
if ord(a) not in range(ord('a'), ord('z')) and ord(a) not in range(ord('A'),ord('Z')):
return False
if ord(b) not in range(ord('a'), ord('z')) and ord(b) not in range(ord('A'),ord('Z')):
return False
# returns true if they are sequential
return ((ord(b) - ord(a)) == 1)
You can use chr(int) method for your reversing stage as it returns a string representing a character whose Unicode code point is an integer given as argument.
This builds on the idea that acceptable subsequences will be substrings of the ABC:
from string import ascii_lowercase as abc # 'abcdefg...'
text = 'abcdefxylmno'
stack = []
cache = ''
# collect subsequences
for char in text:
if cache + char in abc:
cache += char
else:
stack.append(cache)
cache = char
# if present, append the last sequence
if cache:
stack.append(cache)
# stack is now ['abcdef', 'xy', 'lmno']
# Build the final string 'a6x2l4'
result = ''.join(f'{s[0]}{len(s)}' if len(s) > 1 else s for s in stack)
As described here:
https://leetcode.com/problems/first-unique-character-in-a-string/description/
I attempted one here but couldn't quite finish:
https://paste.pound-python.org/show/JuPLgdgqceMQYh5kk0Sf/
#Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.
#xamples:
#s = "leetcode"
#return 0.
#s = "loveleetcode",
#return 2.
#Note: You may assume the string contain only lowercase letters.
class Solution(object):
def firstUniqChar(self, s):
"""
:type s: str
:rtype: int
"""
for i in range(len(s)):
for j in range(i+1,len(s)):
if s[i] == s[j]:
break
#But now what. let's say i have complete loop of j where there's no match with i, how do I return i?
I'm ONLY interested in the brute force N^2 solution, nothing fancier. The idea in the above solution is to start a double loop, where inner loop searches for a match with the outer loop's char, and if there's match, break the inner loop and continue onto the next char on the outer loop.
But the question is, how do I handle when there's NO match, which is when I need to return the outer loop's index as the first unique one.
Can't quite figure out a graceful way to do it, and can handle edge case like a single char string.
Iterate over each char, and check if it appears in any of the following chars. We need to keep track of the characters we've already seen, to avoid falling into edge cases. Try this, it's an O(n^2) solution:
def firstUniqChar(s):
# store already seen chars
seen = []
for i, c in enumerate(s):
# return if char not previously seen and not in rest
if c not in seen and c not in s[i+1:]:
return i
# mark char as seen
seen.append(c)
# no unique chars were found
return -1
For completeness' sake, here's an O(n) solution:
def firstUniqChar(s):
# build frequency table
freq = {}
for i, c in enumerate(s):
if c not in freq:
# store [frequency, index]
freq[c] = [1, i]
else:
# update frequency
freq[c][0] += 1
# find leftmost char with frequency == 1
# it's more efficient to traverse the freq table
# instead of the (potentially big) input string
leftidx = float('+inf')
for f, i in freq.values():
if f == 1 and i < leftidx:
leftidx = i
# handle edge case: no unique chars were found
return leftidx if leftidx != float('+inf') else -1
For example:
firstUniqChar('cc')
=> -1
firstUniqChar('ccdd')
=> -1
firstUniqChar('leetcode')
=> 0
firstUniqChar('loveleetcode')
=> 2
Add an else to the for loop where you return.
for j ...:
...
else:
return i
I'd first like to note that your current algorithm for finding unique characters doesn't work correctly. That's because you can't assume the character at index i is unique just because none of the indexes j found the same character later in the string. The character at index i could be a repeat of an earlier character (which you'd have skipped when the previous j was equal to the current i).
You could fix the algorithm by letting j iterate over the whole range of indexes, and adding an extra check to ignore the matches when the indexes are the same to your if:
for i in range(len(s)):
for j in range(len(s)):
if i != j and s[i] == s[j]:
break
As Ignacio Vazquez-Abrams suggests in his answer, you can then add an else block to the inner for loop to make the code return when no match was found:
else: # this line should be indented to match the "for j" loop
return i
There are also a few ways you can solve this problem more simply if you use the builtin functions and types available in Python.
For instance, you can implement an O(n^2) solution equivalent to the one above using only one explicit loop, and using str.count to replace the inner one:
def firstUniqChar(s):
for i, c in enumerate(s):
if s.count(c) == 1:
return i
return None
I'm also using enumerate to get the character values and indexes together in one step, rather than iterating over a range and indexing later.
There's also a very easy way to make an O(n) solution using collections.Counter, which can do all the counting in one pass before you start checking the characters in order to try to find the first one that is unique:
from collections import Counter
def firstUniqChar(s):
count = Counter(s)
for i, c in enumerate(s):
if count[c] == 1:
return i
return None
I'm not sure your approach will work on an even palindrome, e.g. "redder" (note the second d). Try this instead:
s1 = "leetcode"
s2 = "loveleetcode"
s3 = "redder"
def unique_index(s):
ahead, behind = list(s), set()
for idx, char in enumerate(s):
ahead = ahead[1:]
if (char not in ahead) and (char not in behind):
return idx
behind.add(s[idx])
return -1
assert unique_index(s1) == 0
assert unique_index(s2) == 2
assert unique_index(s3) == -1
For each character, we look ahead and behind. Only characters disjoint from both groups will return an index. As iteration progresses, the list of what is observed ahead shortens, while what is seen behind extends. The default is -1 as stated in the actual leetcode challenge.
A second list is not required. #Óscar López's answer is the simplified answer.
I'm just starting to learn python and I have this exercise that's puzzling me:
Create a function that can pack or unpack a string of letters.
So aaabb would be packed a3b2 and vice versa.
For the packing part of the function, I wrote the following
def packer(s):
if s.isalpha(): # Defines if unpacked
stack = []
for i in s:
if s.count(i) > 1:
if (i + str(s.count(i))) not in stack:
stack.append(i + str(s.count(i)))
else:
stack.append(i)
print "".join(stack)
else:
print "Something's not quite right.."
return False
packer("aaaaaaaaaaaabbbccccd")
This seems to work all proper. But the assignment says that
if the input has (for example) the letter a after b or c, then
it should later be unpacked into it's original form.
So "aaabbkka" should become a3b2k2a, not a4b2k2.
I hence figured, that I cannot use the "count()" command, since
that counts all occurrences of the item in the whole string, correct?
What would be my options here then?
On to the unpacking -
I've thought of the basics what my code needs to do -
between the " if s.isalpha():" and else, I should add an elif that
checks whether or not the string has digits in it. (I figured this would be
enough to determine whether it's the packed version or unpacked).
Create a for loop and inside of it an if sentence, which then checks for every element:
2.1. If it has a number behind it > Return (or add to an empty stack) the number times the digit
2.2. If it has no number following it > Return just the element.
Big question number 2 - how do I check whether it's a number or just another
alphabetical element following an element in the list? I guess this must be done with
slicing, but those only take integers. Could this be achieved with the index command?
Also - if this is of any relevance - so far I've basically covered lists, strings, if and for
and I've been told this exercise is doable with just those (...so if you wouldn't mind keeping this really basic)
All help appreciated for the newbie enthusiast!
SOLVED:
def packer(s):
if s.isalpha(): # Defines if unpacked
groups= []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s' % (g[0], len(g)>1 and len(g) or '') for g in groups)
else: # Seems to be packed
stack = ""
for i in range(len(s)):
if s[i].isalpha():
if i+1 < len(s) and s[i+1].isdigit():
digit = s[i+1]
char = s[i]
i += 2
while i < len(s) and s[i].isdigit():
digit +=s[i]
i+=1
stack += char * int(digit)
else:
stack+= s[i]
else:
""
return "".join(stack)
print (packer("aaaaaaaaaaaabbbccccd"))
print (packer("a4b19am4nmba22"))
So this is my final code. Almost managed to pull it all off with just for loops and if statements.
In the end though I had to bring in the while loop to solve reading the multiple-digit numbers issue. I think I still managed to keep it simple enough. Thanks a ton millimoose and everyone else for chipping in!
A straightforward solution:
If a char is different, make a new group. Otherwise append it to the last group. Finally count all groups and join them.
def packer(s):
groups = []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s'%(g[0], len(g)) for g in groups)
Another approach is using re.
Regex r'(.)\1+' can match consecutive characters longer than 1. And with re.sub you can easily encode it:
regex = re.compile(r'(.)\1+')
def replacer(match):
return match.group(1) + str(len(match.group(0)))
regex.sub(replacer, 'aaabbkka')
#=> 'a3b2k2a'
I think You can use `itertools.grouby' function
for example
import itertools
data = 'aaassaaasssddee'
groupped_data = ((c, len(list(g))) for c, g in itertools.groupby(data))
result = ''.join(c + (str(n) if n > 1 else '') for c, n in groupped_data)
of course one can make this code more readable using generator instead of generator statement
This is an implementation of the algorithm I outlined in the comments:
from itertools import takewhile, count, islice, izip
def consume(items):
from collections import deque
deque(items, maxlen=0)
def ilen(items):
result = count()
consume(izip(items, result))
return next(result)
def pack_or_unpack(data):
start = 0
result = []
while start < len(data):
if data[start].isdigit():
# `data` is packed, bail
return unpack(data)
run = run_len(data, start)
# append the character that might repeat
result.append(data[start])
if run > 1:
# append the length of the run of characters
result.append(str(run))
start += run
return ''.join(result)
def run_len(data, start):
"""Return the end index of the run of identical characters starting at
`start`"""
return start + ilen(takewhile(lambda c: c == data[start],
islice(data, start, None)))
def unpack(data):
result = []
for i in range(len(data)):
if data[i].isdigit():
# skip digits, we'll look for them below
continue
# packed character
c = data[i]
# number of repetitions
n = 1
if (i+1) < len(data) and data[i+1].isdigit():
# if the next character is a digit, grab all the digits in the
# substring starting at i+1
n = int(''.join(takewhile(str.isdigit, data[i+1:])))
# append the repeated character
result.append(c*n) # multiplying a string with a number repeats it
return ''.join(result)
print pack_or_unpack('aaabbc')
print pack_or_unpack('a3b2c')
print pack_or_unpack('a10')
print pack_or_unpack('b5c5')
print pack_or_unpack('abc')
A regex-flavoured version of unpack() would be:
import re
UNPACK_RE = re.compile(r'(?P<char> [a-zA-Z]) (?P<count> \d+)?', re.VERBOSE)
def unpack_re(data):
matches = UNPACK_RE.finditer(data)
pairs = ((m.group('char'), m.group('count')) for m in matches)
return ''.join(char * (int(count) if count else 1)
for char, count in pairs)
This code demonstrates the most straightforward (or "basic") approach of implementing that algorithm. It's not particularly elegant or idiomatic or necessarily efficient. (It would be if written in C, but Python has the caveats such as: indexing a string copies the character into a new string, and algorithms that seem to copy data excessively might be faster than trying to avoid this if the copying is done in C and the workaround was implemented with a Python loop.)
I'm new to Python and am currently reading a chapter on String manipulation in "Dive into Python."
I was wondering what are some of the best (or most clever/creative) ways to do the following:
1) Extract from this string: "stackoverflow.com/questions/ask" the word 'questions.' I did string.split(/)[0]-- but that isn't very clever.
2) Find the longest palindrome in a given number or string
3) Starting with a given word (i.e. "cat")-- find all possible ways to get from that to another three- letter word ("dog"), changing one letter at a time such that each change in letters forms a new, valid word.
For example-- cat, cot, dot, dog
As personal exercise, here's to you, (hopefully) well commented code with some hints.
#!/usr/bin/env python2
# Let's take this string:
a = "palindnilddafa"
# I surround with a try/catch block, explanation following
try:
# In this loop I go from length of a minus 1 to 0.
# range can take 3 params: start, end, increment
# This way I start from the thow longest subsring,
# the one without the first char and without the last
# and go on this way
for i in range(len(a)-1, 0, -1):
# In this loop I want to know how many
# Palidnrome of i length I can do, that
# is len(a) - i, and I take all
# I start from the end to find the largest first
for j in range(len(a) - i):
# this is a little triky.
# string[start:end] is the slice operator
# as string are like arrays (but unmutable).
# So I take from j to j+i, all the offsets
# The result of "foo"[1:3] is "oo", to be clear.
# with string[::-1] you take all elements but in the
# reverse order
# The check string1 in string2 checks if string1 is a
# substring of string2
if a[j:j+i][::-1] in a:
# If it is I cannot break, 'couse I'll go on on the first
# cycle, so I rise an exception passing as argument the substring
# found
raise Exception(a[j:j+i][::-1])
# And then I catch the exception, carrying the message
# Which is the palindrome, and I print some info
except Exception as e:
# You can pass many things comma-separated to print (this is python2!)
print e, "is the longest palindrome of", a
# Or you can use printf formatting style
print "It's %d long and start from %d" % (len(str(e)), a.index(str(e)))
After the discussion, and I'm little sorry if it goes ot. I've written another implementation of palindrome-searcher, and if sberry2A can, I'd like to know the result of some benchmark tests!
Be aware, there are a lot of bugs (i guess) about pointers and the hard "+1 -1"-problem, but the idea is clear. Start from the middle and then expand until you can.
Here's the code:
#!/usr/bin/env python2
def check(s, i):
mid = s[i]
j = 1
try:
while s[i-j] == s[i+j]:
j += 1
except:
pass
return s[i-j+1:i+j]
def do_all(a):
pals = []
mlen = 0
for i in range(len(a)/2):
#print "check for", i
left = check(a, len(a)/2 + i)
mlen = max(mlen, len(left))
pals.append(left)
right = check(a, len(a)/2 - i)
mlen = max(mlen, len(right))
pals.append(right)
if mlen > max(2, i*2-1):
return left if len(left) > len(right) else right
string = "palindnilddafa"
print do_all(string)
No 3:
If your string is s:
max((j-i,s[i:j]) for i in range(len(s)-1) for j in range(i+2,len(s)+1) if s[i:j]==s[j-1:i-1:-1])[1]
will return the answer.
For #2 - How to find the longest palindrome in a given string?