Python: Clever ways at string manipulation - python

I'm new to Python and am currently reading a chapter on String manipulation in "Dive into Python."
I was wondering what are some of the best (or most clever/creative) ways to do the following:
1) Extract from this string: "stackoverflow.com/questions/ask" the word 'questions.' I did string.split(/)[0]-- but that isn't very clever.
2) Find the longest palindrome in a given number or string
3) Starting with a given word (i.e. "cat")-- find all possible ways to get from that to another three- letter word ("dog"), changing one letter at a time such that each change in letters forms a new, valid word.
For example-- cat, cot, dot, dog

As personal exercise, here's to you, (hopefully) well commented code with some hints.
#!/usr/bin/env python2
# Let's take this string:
a = "palindnilddafa"
# I surround with a try/catch block, explanation following
try:
# In this loop I go from length of a minus 1 to 0.
# range can take 3 params: start, end, increment
# This way I start from the thow longest subsring,
# the one without the first char and without the last
# and go on this way
for i in range(len(a)-1, 0, -1):
# In this loop I want to know how many
# Palidnrome of i length I can do, that
# is len(a) - i, and I take all
# I start from the end to find the largest first
for j in range(len(a) - i):
# this is a little triky.
# string[start:end] is the slice operator
# as string are like arrays (but unmutable).
# So I take from j to j+i, all the offsets
# The result of "foo"[1:3] is "oo", to be clear.
# with string[::-1] you take all elements but in the
# reverse order
# The check string1 in string2 checks if string1 is a
# substring of string2
if a[j:j+i][::-1] in a:
# If it is I cannot break, 'couse I'll go on on the first
# cycle, so I rise an exception passing as argument the substring
# found
raise Exception(a[j:j+i][::-1])
# And then I catch the exception, carrying the message
# Which is the palindrome, and I print some info
except Exception as e:
# You can pass many things comma-separated to print (this is python2!)
print e, "is the longest palindrome of", a
# Or you can use printf formatting style
print "It's %d long and start from %d" % (len(str(e)), a.index(str(e)))
After the discussion, and I'm little sorry if it goes ot. I've written another implementation of palindrome-searcher, and if sberry2A can, I'd like to know the result of some benchmark tests!
Be aware, there are a lot of bugs (i guess) about pointers and the hard "+1 -1"-problem, but the idea is clear. Start from the middle and then expand until you can.
Here's the code:
#!/usr/bin/env python2
def check(s, i):
mid = s[i]
j = 1
try:
while s[i-j] == s[i+j]:
j += 1
except:
pass
return s[i-j+1:i+j]
def do_all(a):
pals = []
mlen = 0
for i in range(len(a)/2):
#print "check for", i
left = check(a, len(a)/2 + i)
mlen = max(mlen, len(left))
pals.append(left)
right = check(a, len(a)/2 - i)
mlen = max(mlen, len(right))
pals.append(right)
if mlen > max(2, i*2-1):
return left if len(left) > len(right) else right
string = "palindnilddafa"
print do_all(string)

No 3:
If your string is s:
max((j-i,s[i:j]) for i in range(len(s)-1) for j in range(i+2,len(s)+1) if s[i:j]==s[j-1:i-1:-1])[1]
will return the answer.

For #2 - How to find the longest palindrome in a given string?

Related

How to implement a brute force solution to "Finding first unique character in a string"

As described here:
https://leetcode.com/problems/first-unique-character-in-a-string/description/
I attempted one here but couldn't quite finish:
https://paste.pound-python.org/show/JuPLgdgqceMQYh5kk0Sf/
#Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.
#xamples:
#s = "leetcode"
#return 0.
#s = "loveleetcode",
#return 2.
#Note: You may assume the string contain only lowercase letters.
class Solution(object):
def firstUniqChar(self, s):
"""
:type s: str
:rtype: int
"""
for i in range(len(s)):
for j in range(i+1,len(s)):
if s[i] == s[j]:
break
#But now what. let's say i have complete loop of j where there's no match with i, how do I return i?
I'm ONLY interested in the brute force N^2 solution, nothing fancier. The idea in the above solution is to start a double loop, where inner loop searches for a match with the outer loop's char, and if there's match, break the inner loop and continue onto the next char on the outer loop.
But the question is, how do I handle when there's NO match, which is when I need to return the outer loop's index as the first unique one.
Can't quite figure out a graceful way to do it, and can handle edge case like a single char string.
Iterate over each char, and check if it appears in any of the following chars. We need to keep track of the characters we've already seen, to avoid falling into edge cases. Try this, it's an O(n^2) solution:
def firstUniqChar(s):
# store already seen chars
seen = []
for i, c in enumerate(s):
# return if char not previously seen and not in rest
if c not in seen and c not in s[i+1:]:
return i
# mark char as seen
seen.append(c)
# no unique chars were found
return -1
For completeness' sake, here's an O(n) solution:
def firstUniqChar(s):
# build frequency table
freq = {}
for i, c in enumerate(s):
if c not in freq:
# store [frequency, index]
freq[c] = [1, i]
else:
# update frequency
freq[c][0] += 1
# find leftmost char with frequency == 1
# it's more efficient to traverse the freq table
# instead of the (potentially big) input string
leftidx = float('+inf')
for f, i in freq.values():
if f == 1 and i < leftidx:
leftidx = i
# handle edge case: no unique chars were found
return leftidx if leftidx != float('+inf') else -1
For example:
firstUniqChar('cc')
=> -1
firstUniqChar('ccdd')
=> -1
firstUniqChar('leetcode')
=> 0
firstUniqChar('loveleetcode')
=> 2
Add an else to the for loop where you return.
for j ...:
...
else:
return i
I'd first like to note that your current algorithm for finding unique characters doesn't work correctly. That's because you can't assume the character at index i is unique just because none of the indexes j found the same character later in the string. The character at index i could be a repeat of an earlier character (which you'd have skipped when the previous j was equal to the current i).
You could fix the algorithm by letting j iterate over the whole range of indexes, and adding an extra check to ignore the matches when the indexes are the same to your if:
for i in range(len(s)):
for j in range(len(s)):
if i != j and s[i] == s[j]:
break
As Ignacio Vazquez-Abrams suggests in his answer, you can then add an else block to the inner for loop to make the code return when no match was found:
else: # this line should be indented to match the "for j" loop
return i
There are also a few ways you can solve this problem more simply if you use the builtin functions and types available in Python.
For instance, you can implement an O(n^2) solution equivalent to the one above using only one explicit loop, and using str.count to replace the inner one:
def firstUniqChar(s):
for i, c in enumerate(s):
if s.count(c) == 1:
return i
return None
I'm also using enumerate to get the character values and indexes together in one step, rather than iterating over a range and indexing later.
There's also a very easy way to make an O(n) solution using collections.Counter, which can do all the counting in one pass before you start checking the characters in order to try to find the first one that is unique:
from collections import Counter
def firstUniqChar(s):
count = Counter(s)
for i, c in enumerate(s):
if count[c] == 1:
return i
return None
I'm not sure your approach will work on an even palindrome, e.g. "redder" (note the second d). Try this instead:
s1 = "leetcode"
s2 = "loveleetcode"
s3 = "redder"
def unique_index(s):
ahead, behind = list(s), set()
for idx, char in enumerate(s):
ahead = ahead[1:]
if (char not in ahead) and (char not in behind):
return idx
behind.add(s[idx])
return -1
assert unique_index(s1) == 0
assert unique_index(s2) == 2
assert unique_index(s3) == -1
For each character, we look ahead and behind. Only characters disjoint from both groups will return an index. As iteration progresses, the list of what is observed ahead shortens, while what is seen behind extends. The default is -1 as stated in the actual leetcode challenge.
A second list is not required. #Óscar López's answer is the simplified answer.

Create a dynamic string

I'm trying to build some kind of "dynamic substring" that is build out of a loop in a given string. The rule is that I need to find the longest substring in alphabetic order and in case I have a tide, I need to evaluate both and print the one with the bigger value.
I read that in python characters are already given a numeric value, so a is lower than b; knowing this I wrote the following:
s = "abcsaabcpaosdjaf"
ans = []
# Loop over the string
for i in range(len(s)-1):
if s[i] < s[i+1]:
#evaluate if it is in order and build the new string
ans = s[i]+s[i+1]
#print the result
print(ans)
the problem I have is that I don't know how to dynamically - I am not sure if this is the right way to say it - build the substring ans, right now I have the s[i]+s[i+1] but that only gives me a list of two characters that are in fact in alphabetic order, and it is fixed to only two. How can I do it in a way that it builds it as it goes?
Try this. Comments hopefully explain enough, but ask away if you don't understand.
s= "abcsaabcpaosdjaf"
best_answer = ''
current_answer = s[0]
# Loop over the string
for i in s[1:]:
# look to see if this letter is after the
# last letter in the current answer.
if ord(i) > ord(current_answer[-1]):
# if it is, add the letter to the current
# answer
current_answer += i
else:
# if it is not, we check if the current
# answer is longer than the best
# answer, and update it to the best
# answer if it is.
if len(current_answer) > len(best_answer):
best_answer = current_answer
# We then set the current answer
# to just the last letter read.
current_answer = i
import itertools
s= "abcsaabcpaosdjaf"
result = max(
(
list(next(sub)) + [b for a, b in sub]
for ascending, sub in itertools.groupby(zip(s,s[1:]), lambda x: x[0] <= x[1])
if ascending
),
key=len
)
print (''.join(result))
Credits to this

counting the number of substrings in a string

I am working on an Python assignment and I am stuck here.
Apparently, I have to write a code that counts the number of a given substring within a string.
I thought I got it right, then I am stuck here.
def count(substr,theStr):
# your code here
num = 0
i = 0
while substr in theStr[i:]:
i = i + theStr.find(substr)+1
num = num + 1
return num
substr = 'is'
theStr = 'mississipi'
print(count(substr,theStr))
if I run this, I expect to get 2 as the result, rather, I get 3...
See, other examples such as ana and banana works fine, but this specific example keeps making the error. I don't know what I did wrong here.
Would you PLEASE help me out.
In your code
while substr in theStr[i:]:
correctly advances over the target string theStr, however the
i = i + theStr.find(substr)+1
keeps looking from the start of theStr.
The str.find method accepts optional start and end arguments to limit the search:
str.find(sub[, start[, end]])
Return the lowest index in the string where substring sub is found
within the slice s[start:end]. Optional arguments start and end
are interpreted as in slice notation. Return -1 if sub is not found.
We don't really need to use in here: we can just check that find doesn't return -1. It's a bit wasteful performing an in search when we then need to repeat the search using find to get the index of the substring.
I assume that you want to find overlapping matches, since the str.count method can find non-overlapping matches, and since it's implemented in C it's more efficient than implementing it yourself in Python.
def count(substr, theStr):
num = i = 0
while True:
j = theStr.find(substr, i)
if j == -1:
break
num += 1
i = j + 1
return num
print(count('is', 'mississipi'))
print(count('ana', 'bananana'))
output
2
3
The core of this code is
j = theStr.find(substr, i)
i is initialised to 0, so we start searching from the beginning of theStr, and because of i = j + 1 subsequent searches start looking from the index following the last found match.
The code change you need is -
i = i + theStr[i:].find(substr)+ 1
instead of
i = i + theStr.find(substr)+ 1
In your code the substring is always found until i reaches position 4 or more. But while finding the index of the substring, you were using the original(whole) string which in turn returns the position as 1.
In your example of banana, after first iteration i becomes 2. So, in next iteration str[i:] becomes nana. And the position of substring ana in this sliced string and the original string is 1. So, the bug in the code is just suppressed and the code seems to work fine.
If your code is purely for learning purpose, the you can do this way. Otherwise you may want to make use of python provided functions (like count()) to do the job.
Counting the number of substrings:
def count(substr,theStr):
num = 0
for i in range(len(theStr)):
if theStr[i:i+len(substr)] == substr:
num += 1
return num
substr = 'is'
theStr = 'mississipi'
print(count(substr,theStr))
O/P : 2
where theStr[i:i+len(substr)] is slice string, i is strating index and i+len(substr) is ending index.
Eg.
i = 0
substr length = 2
first-time compare substring is => mi
String slice more details

Reversing encryption algorithm that XORs each character with another in the string, using parity to control offset

I've reversed the following algorithm from a challenge binary I'm investigating:
def encrypt(plain):
l = len(plain)
a = 10
cipher = ""
for i in range(0, l):
if i + a < l - 1:
cipher += chr( xor(plain[i], plain[i+a]) )
else:
cipher += chr( xor(plain[i], plain[a]) )
if ord(plain[i]) % 2 == 0: a += 1 # even
else: a -= 1 # odd
return cipher
from binascii import hexlify
print hexlify(encrypt("this is a test string"))
Essentially, it XORs each character with another character in the string, offset by a. a initial value is 10, as the function iterates over the characters in the string, a +=1 if the character's value is even or a -= 1 if it's odd.
I've worked out in my head how to reverse this cipher and retrieve plain text, it would require the use of a recursive function to find out which character offsets are even/odd in the original string. IE: Given the properties of XOR % 2, we now that if cipher[0] is odd then either plain[0] or plain[10] is odd, but not both. Similarly if cipher[0] is even then both plain[0] and plain[10] are even, or both are odd. From there a recursive algorithm should be able to work the rest.
Once we know which characters in plaintext are even/odd, reversing the rest is trivial. I've spent a few hours working this out, but now I'm at loss implementing it.
I've used basic recursive algorithms in the past but never anything that "branches out" to solve something like this.
Given a cipher string resulting from this function, how could we use a recursive algorithm to determine the parity of each character in the original plain string?
EDIT: Sorry just to be clear and in response to a comment, after scratching my head on this for a few hours I thought the recursion strategy outlined above would be the only way to solve this. If not I'm open to any hints/assistance to solving the title question.
You can solve this problem with what is known as recursive backtracking. Make an assumption then go down that path until you have decrypted the string or you've reached a contradiction. When you reach a contradiction you then return failure and the calling function will try the next possibility. If you return success, then return success to the caller.
I'm sorry but I couldn't resist trying to solve this. Here's what I came up with:
# Define constants for even/odd/notset so we can use them in a list of
# assumptions about parity.
even = 0
odd = 1
notset = 2
# Define success and failure so that success and failure can be passed
# as a result.
success = 1
failure = 0
def tryParity(i, cipher, a, parities, parityToSet):
newParities = list(parities)
for j, p in parityToSet:
try:
if parities[j] == notset:
newParities[j] = p
elif parities[j] != p:
# Failure due to contradiction.
return failure, []
except IndexError:
# If we get an IndexError then this can't be a valid set of values for the parity.
# Error caused by a bad value for "a".
return failure, []
# Update "a" based on parity of i
new_a = a+1 if newParities[i] == even else a-1
return findParities(i+1,cipher,new_a,newParities)
def findParities(i, cipher, a, parities):
# Start returning when you've reached the end of the cipher text.
# This is when success start bubbling back up through the call stack.
if i >= len(cipher):
return success, [parities] # list of parities
# o stands for the index of the other char that would have been XORed.
# "o" for "other"
o = i+a if i + a < len(cipher)-1 else a
result = None
resultParities = []
toTry = []
# Determine if cipher[index] is even or odd
if ord(cipher[i]) % 2 == 0:
# Try both even and both odd
toTry = (((i,even),(o,even)),
((i,odd),(o,odd)))
else:
# Try one or the other even, one or the other odd
toTry = (((i,odd),(o,even)),
((i,even),(o,odd)))
# Try first possiblity, if success add parities it came up with to result
resultA, resultParA = tryParity(i, cipher, a, parities, toTry[0])
if resultA == success:
result = success
resultParities.extend(resultParA)
# Try second possiblity, if success add parities it came up with to result
resultB, resultParB = tryParity(i, cipher, a, parities, toTry[1])
if resultB == success:
result = success
resultParities.extend(resultParB)
return result, resultParities
def decrypt(cipher):
a = 10
parities = list([notset for _ in range(len(cipher))])
# When done, possible parities will contain a list of lists,
# where the inner lists have the parity of each character in the cipher.
# Comes back with mutiple results because each
result, possibleParities = findParities(0,cipher,a,parities)
# A print for me to check that the parities that come back match the real parities
print(possibleParities)
print(list(map(lambda x: 0 if ord(x) % 2 == 0 else 1, "this is a test string")))
# Finally, armed with the parities, decrypt the cipher. I'll leave that to you.
# Maybe more recursion is needed
# test call
decrypt(encrypt("this is a test string"))
It seems to work, but I didn't try it on any other inputs.
This solution only gives you the parities, I left the decryption of the characters up to you. They could probably be done together, but I wanted to concentrate on answering your question as asked. I used Python 3 because it's what I have installed.
I'm a beginner in this area also. I recommend reading a Peter Norvig book. Thanks for the tough question.

Python - packing/unpacking by letters

I'm just starting to learn python and I have this exercise that's puzzling me:
Create a function that can pack or unpack a string of letters.
So aaabb would be packed a3b2 and vice versa.
For the packing part of the function, I wrote the following
def packer(s):
if s.isalpha(): # Defines if unpacked
stack = []
for i in s:
if s.count(i) > 1:
if (i + str(s.count(i))) not in stack:
stack.append(i + str(s.count(i)))
else:
stack.append(i)
print "".join(stack)
else:
print "Something's not quite right.."
return False
packer("aaaaaaaaaaaabbbccccd")
This seems to work all proper. But the assignment says that
if the input has (for example) the letter a after b or c, then
it should later be unpacked into it's original form.
So "aaabbkka" should become a3b2k2a, not a4b2k2.
I hence figured, that I cannot use the "count()" command, since
that counts all occurrences of the item in the whole string, correct?
What would be my options here then?
On to the unpacking -
I've thought of the basics what my code needs to do -
between the " if s.isalpha():" and else, I should add an elif that
checks whether or not the string has digits in it. (I figured this would be
enough to determine whether it's the packed version or unpacked).
Create a for loop and inside of it an if sentence, which then checks for every element:
2.1. If it has a number behind it > Return (or add to an empty stack) the number times the digit
2.2. If it has no number following it > Return just the element.
Big question number 2 - how do I check whether it's a number or just another
alphabetical element following an element in the list? I guess this must be done with
slicing, but those only take integers. Could this be achieved with the index command?
Also - if this is of any relevance - so far I've basically covered lists, strings, if and for
and I've been told this exercise is doable with just those (...so if you wouldn't mind keeping this really basic)
All help appreciated for the newbie enthusiast!
SOLVED:
def packer(s):
if s.isalpha(): # Defines if unpacked
groups= []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s' % (g[0], len(g)>1 and len(g) or '') for g in groups)
else: # Seems to be packed
stack = ""
for i in range(len(s)):
if s[i].isalpha():
if i+1 < len(s) and s[i+1].isdigit():
digit = s[i+1]
char = s[i]
i += 2
while i < len(s) and s[i].isdigit():
digit +=s[i]
i+=1
stack += char * int(digit)
else:
stack+= s[i]
else:
""
return "".join(stack)
print (packer("aaaaaaaaaaaabbbccccd"))
print (packer("a4b19am4nmba22"))
So this is my final code. Almost managed to pull it all off with just for loops and if statements.
In the end though I had to bring in the while loop to solve reading the multiple-digit numbers issue. I think I still managed to keep it simple enough. Thanks a ton millimoose and everyone else for chipping in!
A straightforward solution:
If a char is different, make a new group. Otherwise append it to the last group. Finally count all groups and join them.
def packer(s):
groups = []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s'%(g[0], len(g)) for g in groups)
Another approach is using re.
Regex r'(.)\1+' can match consecutive characters longer than 1. And with re.sub you can easily encode it:
regex = re.compile(r'(.)\1+')
def replacer(match):
return match.group(1) + str(len(match.group(0)))
regex.sub(replacer, 'aaabbkka')
#=> 'a3b2k2a'
I think You can use `itertools.grouby' function
for example
import itertools
data = 'aaassaaasssddee'
groupped_data = ((c, len(list(g))) for c, g in itertools.groupby(data))
result = ''.join(c + (str(n) if n > 1 else '') for c, n in groupped_data)
of course one can make this code more readable using generator instead of generator statement
This is an implementation of the algorithm I outlined in the comments:
from itertools import takewhile, count, islice, izip
def consume(items):
from collections import deque
deque(items, maxlen=0)
def ilen(items):
result = count()
consume(izip(items, result))
return next(result)
def pack_or_unpack(data):
start = 0
result = []
while start < len(data):
if data[start].isdigit():
# `data` is packed, bail
return unpack(data)
run = run_len(data, start)
# append the character that might repeat
result.append(data[start])
if run > 1:
# append the length of the run of characters
result.append(str(run))
start += run
return ''.join(result)
def run_len(data, start):
"""Return the end index of the run of identical characters starting at
`start`"""
return start + ilen(takewhile(lambda c: c == data[start],
islice(data, start, None)))
def unpack(data):
result = []
for i in range(len(data)):
if data[i].isdigit():
# skip digits, we'll look for them below
continue
# packed character
c = data[i]
# number of repetitions
n = 1
if (i+1) < len(data) and data[i+1].isdigit():
# if the next character is a digit, grab all the digits in the
# substring starting at i+1
n = int(''.join(takewhile(str.isdigit, data[i+1:])))
# append the repeated character
result.append(c*n) # multiplying a string with a number repeats it
return ''.join(result)
print pack_or_unpack('aaabbc')
print pack_or_unpack('a3b2c')
print pack_or_unpack('a10')
print pack_or_unpack('b5c5')
print pack_or_unpack('abc')
A regex-flavoured version of unpack() would be:
import re
UNPACK_RE = re.compile(r'(?P<char> [a-zA-Z]) (?P<count> \d+)?', re.VERBOSE)
def unpack_re(data):
matches = UNPACK_RE.finditer(data)
pairs = ((m.group('char'), m.group('count')) for m in matches)
return ''.join(char * (int(count) if count else 1)
for char, count in pairs)
This code demonstrates the most straightforward (or "basic") approach of implementing that algorithm. It's not particularly elegant or idiomatic or necessarily efficient. (It would be if written in C, but Python has the caveats such as: indexing a string copies the character into a new string, and algorithms that seem to copy data excessively might be faster than trying to avoid this if the copying is done in C and the workaround was implemented with a Python loop.)

Categories

Resources