Python - packing/unpacking by letters - python

I'm just starting to learn python and I have this exercise that's puzzling me:
Create a function that can pack or unpack a string of letters.
So aaabb would be packed a3b2 and vice versa.
For the packing part of the function, I wrote the following
def packer(s):
if s.isalpha(): # Defines if unpacked
stack = []
for i in s:
if s.count(i) > 1:
if (i + str(s.count(i))) not in stack:
stack.append(i + str(s.count(i)))
else:
stack.append(i)
print "".join(stack)
else:
print "Something's not quite right.."
return False
packer("aaaaaaaaaaaabbbccccd")
This seems to work all proper. But the assignment says that
if the input has (for example) the letter a after b or c, then
it should later be unpacked into it's original form.
So "aaabbkka" should become a3b2k2a, not a4b2k2.
I hence figured, that I cannot use the "count()" command, since
that counts all occurrences of the item in the whole string, correct?
What would be my options here then?
On to the unpacking -
I've thought of the basics what my code needs to do -
between the " if s.isalpha():" and else, I should add an elif that
checks whether or not the string has digits in it. (I figured this would be
enough to determine whether it's the packed version or unpacked).
Create a for loop and inside of it an if sentence, which then checks for every element:
2.1. If it has a number behind it > Return (or add to an empty stack) the number times the digit
2.2. If it has no number following it > Return just the element.
Big question number 2 - how do I check whether it's a number or just another
alphabetical element following an element in the list? I guess this must be done with
slicing, but those only take integers. Could this be achieved with the index command?
Also - if this is of any relevance - so far I've basically covered lists, strings, if and for
and I've been told this exercise is doable with just those (...so if you wouldn't mind keeping this really basic)
All help appreciated for the newbie enthusiast!
SOLVED:
def packer(s):
if s.isalpha(): # Defines if unpacked
groups= []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s' % (g[0], len(g)>1 and len(g) or '') for g in groups)
else: # Seems to be packed
stack = ""
for i in range(len(s)):
if s[i].isalpha():
if i+1 < len(s) and s[i+1].isdigit():
digit = s[i+1]
char = s[i]
i += 2
while i < len(s) and s[i].isdigit():
digit +=s[i]
i+=1
stack += char * int(digit)
else:
stack+= s[i]
else:
""
return "".join(stack)
print (packer("aaaaaaaaaaaabbbccccd"))
print (packer("a4b19am4nmba22"))
So this is my final code. Almost managed to pull it all off with just for loops and if statements.
In the end though I had to bring in the while loop to solve reading the multiple-digit numbers issue. I think I still managed to keep it simple enough. Thanks a ton millimoose and everyone else for chipping in!

A straightforward solution:
If a char is different, make a new group. Otherwise append it to the last group. Finally count all groups and join them.
def packer(s):
groups = []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s'%(g[0], len(g)) for g in groups)
Another approach is using re.
Regex r'(.)\1+' can match consecutive characters longer than 1. And with re.sub you can easily encode it:
regex = re.compile(r'(.)\1+')
def replacer(match):
return match.group(1) + str(len(match.group(0)))
regex.sub(replacer, 'aaabbkka')
#=> 'a3b2k2a'

I think You can use `itertools.grouby' function
for example
import itertools
data = 'aaassaaasssddee'
groupped_data = ((c, len(list(g))) for c, g in itertools.groupby(data))
result = ''.join(c + (str(n) if n > 1 else '') for c, n in groupped_data)
of course one can make this code more readable using generator instead of generator statement

This is an implementation of the algorithm I outlined in the comments:
from itertools import takewhile, count, islice, izip
def consume(items):
from collections import deque
deque(items, maxlen=0)
def ilen(items):
result = count()
consume(izip(items, result))
return next(result)
def pack_or_unpack(data):
start = 0
result = []
while start < len(data):
if data[start].isdigit():
# `data` is packed, bail
return unpack(data)
run = run_len(data, start)
# append the character that might repeat
result.append(data[start])
if run > 1:
# append the length of the run of characters
result.append(str(run))
start += run
return ''.join(result)
def run_len(data, start):
"""Return the end index of the run of identical characters starting at
`start`"""
return start + ilen(takewhile(lambda c: c == data[start],
islice(data, start, None)))
def unpack(data):
result = []
for i in range(len(data)):
if data[i].isdigit():
# skip digits, we'll look for them below
continue
# packed character
c = data[i]
# number of repetitions
n = 1
if (i+1) < len(data) and data[i+1].isdigit():
# if the next character is a digit, grab all the digits in the
# substring starting at i+1
n = int(''.join(takewhile(str.isdigit, data[i+1:])))
# append the repeated character
result.append(c*n) # multiplying a string with a number repeats it
return ''.join(result)
print pack_or_unpack('aaabbc')
print pack_or_unpack('a3b2c')
print pack_or_unpack('a10')
print pack_or_unpack('b5c5')
print pack_or_unpack('abc')
A regex-flavoured version of unpack() would be:
import re
UNPACK_RE = re.compile(r'(?P<char> [a-zA-Z]) (?P<count> \d+)?', re.VERBOSE)
def unpack_re(data):
matches = UNPACK_RE.finditer(data)
pairs = ((m.group('char'), m.group('count')) for m in matches)
return ''.join(char * (int(count) if count else 1)
for char, count in pairs)
This code demonstrates the most straightforward (or "basic") approach of implementing that algorithm. It's not particularly elegant or idiomatic or necessarily efficient. (It would be if written in C, but Python has the caveats such as: indexing a string copies the character into a new string, and algorithms that seem to copy data excessively might be faster than trying to avoid this if the copying is done in C and the workaround was implemented with a Python loop.)

Related

How do I check if the next item in a string is the alphabetical successor of the one before? + Inverse

I'm trying to compress a string in a way that any sequence of letters in strict alphabetical order is swapped with the first letter plus the length of the sequence.
For example, the string "abcdefxylmno", would become: "a6xyl4"
Single letters that aren't in order with the one before or after just stay the way they are.
How do I check that two letters are successors (a,b) and not simply in alphabetical order (a,c)? And how do I keep iterating on the string until I find a letter that doesn't meet this requirement?
I'm also trying to do this in a way that makes it easier to write an inverse function (that given the result string gives me back the original one).
EDIT :
I've managed to get the function working, thanks to your suggestion of using the alphabet string as comparison; now I'm very much stuck on the inverse function: given "a6xyl4" expand it back into "abcdefxylmno".
After quite some time I managed to split the string every time there's a number and I made a function that expands a 2 char string, but it fails to work when I use it on a longer string:
from string import ascii_lowercase as abc
def subString(start,n):
L=[]
ind = abc.index(start)
newAbc = abc[ind:]
for i in range(len(newAbc)):
while i < n:
L.append(newAbc[i])
i+=1
res = ''.join(L)
return res
def unpack(S):
for i in range(len(S)-1):
if S[i] in abc and S[i+1] not in abc:
lett = str(S[i])
num = int(S[i+1])
return subString(lett,num)
def separate(S):
lst = []
for i in S:
lst.append(i)
for el in lst:
if el.isnumeric():
ind = lst.index(el)
lst.insert(ind+1,"-")
a = ''.join(lst)
L = a.split("-")
if S[-1].isnumeric():
L.remove(L[-1])
return L
else:
return L
def inverse(S):
L = separate(S)
for i in L:
return unpack(i)
Each of these functions work singularly, but inverse(S) doesn't output anything. What's the mistake?
You can use the ord() function which returns an integer representing the Unicode character. Sequential letters in alphabetical order differ by 1. Thus said you can implement a simple funtion:
def is_successor(a,b):
# check for marginal cases if we dont ensure
# input restriction somewhere else
if ord(a) not in range(ord('a'), ord('z')) and ord(a) not in range(ord('A'),ord('Z')):
return False
if ord(b) not in range(ord('a'), ord('z')) and ord(b) not in range(ord('A'),ord('Z')):
return False
# returns true if they are sequential
return ((ord(b) - ord(a)) == 1)
You can use chr(int) method for your reversing stage as it returns a string representing a character whose Unicode code point is an integer given as argument.
This builds on the idea that acceptable subsequences will be substrings of the ABC:
from string import ascii_lowercase as abc # 'abcdefg...'
text = 'abcdefxylmno'
stack = []
cache = ''
# collect subsequences
for char in text:
if cache + char in abc:
cache += char
else:
stack.append(cache)
cache = char
# if present, append the last sequence
if cache:
stack.append(cache)
# stack is now ['abcdef', 'xy', 'lmno']
# Build the final string 'a6x2l4'
result = ''.join(f'{s[0]}{len(s)}' if len(s) > 1 else s for s in stack)

Scatter palindrome - How to parse through a dictionary to figure out combinations of substrings that form a palindrome

I was given this HC problem at an interview. I was able to come up with what I'll call a brute force method.
Here is the problem statement:
Find all the scatter palindromes in a given string, "aabb". The
substrings can be scattered but rearranged to form a palindrome.
example: a, aa, aab, aabb, a, abb, b, bb, bba and b are the substrings
that satisfy this criteria.
My logic:
divide the str into substrings
counter = 0
if the len(substr) is even:
and substr == reverse(substr)
increment counter
else:
store all the number of occurrences of each str of substr in a dict
process dict somehow to figure out if it can be arranged into a palindrome
###### this is where I ran out of ideas ######
My code:
class Solution:
def countSubstrings(self, s: str) -> int:
n = len(s)
c=0
for i in range(0,n-1): #i=0
print('i:',i)
for j in range(i+1,n+1): #j=1
print('j',j)
temp = s[i:j]
if len(temp) == 1:
c+=1
# if the len of substring is even,
# check if the reverse of the string is same as the string
elif(len(temp)%2 == 0):
if (temp == temp[::-1]):
c+=1
print("c",c)
else:
# create a dict to check how many times
# each value has occurred
d = {}
for l in range(len(temp)):
if temp[l] in d:
d[temp[l]] = d[temp[l]]+1
else:
d[temp[l]] = 1
print(d)
return c
op = Solution()
op.countSubstrings('aabb')
By now, it must be obvious I'm a beginner. I'm sure there are better, more complicated ways to solve this. Some of my code is adapted from visleck's logic here and I wasn't able to follow the second half of it. If someone can explain it, that would be great as well.
As a partial answer, the test for a string being a scattered palindrome is simple: if the number of letters which occur an odd number of times is at most 1, it is a scattered palindrome. Otherwise it isn't.
It can be implemented like this:
from collections import Counter
def scattered_palindrome(s):
counts = Counter(s)
return len([count for count in counts.values() if count % 2 == 1]) <= 1
For example,
>>> scattered_palindrome('abb')
True
>>> scattered_palindrome('abbb')
False
Note that at no stage is it necessary to compare a string with its reverse. Also, note that I used a Counter object to keep track of the letter counts. This is a streamlined way of creating a dictionary-like collection of letter counts.

How to implement a brute force solution to "Finding first unique character in a string"

As described here:
https://leetcode.com/problems/first-unique-character-in-a-string/description/
I attempted one here but couldn't quite finish:
https://paste.pound-python.org/show/JuPLgdgqceMQYh5kk0Sf/
#Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.
#xamples:
#s = "leetcode"
#return 0.
#s = "loveleetcode",
#return 2.
#Note: You may assume the string contain only lowercase letters.
class Solution(object):
def firstUniqChar(self, s):
"""
:type s: str
:rtype: int
"""
for i in range(len(s)):
for j in range(i+1,len(s)):
if s[i] == s[j]:
break
#But now what. let's say i have complete loop of j where there's no match with i, how do I return i?
I'm ONLY interested in the brute force N^2 solution, nothing fancier. The idea in the above solution is to start a double loop, where inner loop searches for a match with the outer loop's char, and if there's match, break the inner loop and continue onto the next char on the outer loop.
But the question is, how do I handle when there's NO match, which is when I need to return the outer loop's index as the first unique one.
Can't quite figure out a graceful way to do it, and can handle edge case like a single char string.
Iterate over each char, and check if it appears in any of the following chars. We need to keep track of the characters we've already seen, to avoid falling into edge cases. Try this, it's an O(n^2) solution:
def firstUniqChar(s):
# store already seen chars
seen = []
for i, c in enumerate(s):
# return if char not previously seen and not in rest
if c not in seen and c not in s[i+1:]:
return i
# mark char as seen
seen.append(c)
# no unique chars were found
return -1
For completeness' sake, here's an O(n) solution:
def firstUniqChar(s):
# build frequency table
freq = {}
for i, c in enumerate(s):
if c not in freq:
# store [frequency, index]
freq[c] = [1, i]
else:
# update frequency
freq[c][0] += 1
# find leftmost char with frequency == 1
# it's more efficient to traverse the freq table
# instead of the (potentially big) input string
leftidx = float('+inf')
for f, i in freq.values():
if f == 1 and i < leftidx:
leftidx = i
# handle edge case: no unique chars were found
return leftidx if leftidx != float('+inf') else -1
For example:
firstUniqChar('cc')
=> -1
firstUniqChar('ccdd')
=> -1
firstUniqChar('leetcode')
=> 0
firstUniqChar('loveleetcode')
=> 2
Add an else to the for loop where you return.
for j ...:
...
else:
return i
I'd first like to note that your current algorithm for finding unique characters doesn't work correctly. That's because you can't assume the character at index i is unique just because none of the indexes j found the same character later in the string. The character at index i could be a repeat of an earlier character (which you'd have skipped when the previous j was equal to the current i).
You could fix the algorithm by letting j iterate over the whole range of indexes, and adding an extra check to ignore the matches when the indexes are the same to your if:
for i in range(len(s)):
for j in range(len(s)):
if i != j and s[i] == s[j]:
break
As Ignacio Vazquez-Abrams suggests in his answer, you can then add an else block to the inner for loop to make the code return when no match was found:
else: # this line should be indented to match the "for j" loop
return i
There are also a few ways you can solve this problem more simply if you use the builtin functions and types available in Python.
For instance, you can implement an O(n^2) solution equivalent to the one above using only one explicit loop, and using str.count to replace the inner one:
def firstUniqChar(s):
for i, c in enumerate(s):
if s.count(c) == 1:
return i
return None
I'm also using enumerate to get the character values and indexes together in one step, rather than iterating over a range and indexing later.
There's also a very easy way to make an O(n) solution using collections.Counter, which can do all the counting in one pass before you start checking the characters in order to try to find the first one that is unique:
from collections import Counter
def firstUniqChar(s):
count = Counter(s)
for i, c in enumerate(s):
if count[c] == 1:
return i
return None
I'm not sure your approach will work on an even palindrome, e.g. "redder" (note the second d). Try this instead:
s1 = "leetcode"
s2 = "loveleetcode"
s3 = "redder"
def unique_index(s):
ahead, behind = list(s), set()
for idx, char in enumerate(s):
ahead = ahead[1:]
if (char not in ahead) and (char not in behind):
return idx
behind.add(s[idx])
return -1
assert unique_index(s1) == 0
assert unique_index(s2) == 2
assert unique_index(s3) == -1
For each character, we look ahead and behind. Only characters disjoint from both groups will return an index. As iteration progresses, the list of what is observed ahead shortens, while what is seen behind extends. The default is -1 as stated in the actual leetcode challenge.
A second list is not required. #Óscar López's answer is the simplified answer.

Counting vowels in a string using recursion

I understand that recursion is when a function calls itself, however I can't figure out how exactly to get my function to call it self to get the desired results. I need to simply count the vowels in the string given to the function.
def recVowelCount(s):
'return the number of vowels in s using a recursive computation'
vowelcount = 0
vowels = "aEiou".lower()
if s[0] in vowels:
vowelcount += 1
else:
???
I came up with this in the end, thanks to some insight from here.
def recVowelCount(s):
'return the number of vowels in s using a recursive computation'
vowels = "aeiouAEIOU"
if s == "":
return 0
elif s[0] in vowels:
return 1 + recVowelCount(s[1:])
else:
return 0 + recVowelCount(s[1:])
Try this, it's a simple solution:
def recVowelCount(s):
if not s:
return 0
return (1 if s[0] in 'aeiouAEIOU' else 0) + recVowelCount(s[1:])
It takes into account the case when the vowels are in either uppercase or lowercase. It might not be the most efficient way to traverse recursively a string (because each recursive call creates a new sliced string) but it's easy to understand:
Base case: if the string is empty, then it has zero vowels.
Recursive step: if the first character is a vowel add 1 to the solution, otherwise add 0. Either way, advance the recursion by removing the first character and continue traversing the rest of the string.
The second step will eventually reduce the string to zero length, therefore ending the recursion. Alternatively, the same procedure can be implemented using tail recursion - not that it makes any difference regarding performance, given that CPython doesn't implement tail recursion elimination.
def recVowelCount(s):
def loop(s, acc):
if not s:
return acc
return loop(s[1:], (1 if s[0] in 'aeiouAEIOU' else 0) + acc)
loop(s, 0)
Just for fun, if we remove the restriction that the solution has to be recursive, this is how I'd solve it:
def iterVowelCount(s):
vowels = frozenset('aeiouAEIOU')
return sum(1 for c in s if c in vowels)
Anyway this works:
recVowelCount('murcielago')
> 5
iterVowelCount('murcielago')
> 5
Your function probably needs to look generally like this:
if the string is empty, return 0.
if the string isn't empty and the first character is a vowel, return 1 + the result of a recursive call on the rest of the string
if the string isn't empty and the first character is not a vowel, return the result of a recursive call on the rest of the string.
Use slice to remove 1st character and test the others. You don't need an else block because you need to call the function for every case. If you put it in else block, then it will not be called, when your last character is vowel: -
### Improved Code
def recVowelCount(s):
'return the number of vowels in s using a recursive computation'
vowel_count = 0
# You should also declare your `vowels` string as class variable
vowels = "aEiou".lower()
if not s:
return 0
if s[0] in vowels:
return 1 + recVowelCount(s[1:])
return recVowelCount(s[1:])
# Invoke the function
print recVowelCount("rohit") # Prints 2
This will call your recursive function with new string with 1st character sliced.
this is the straightforward approach:
VOWELS = 'aeiouAEIOU'
def count_vowels(s):
if not s:
return 0
elif s[0] in VOWELS:
return 1 + count_vowels(s[1:])
else:
return 0 + count_vowels(s[1:])
here is the same with less code:
def count_vowels_short(s):
if not s:
return 0
return int(s[0] in VOWELS) + count_vowels_short(s[1:])
here is another one:
def count_vowels_tailrecursion(s, count=0):
return count if not s else count_vowels_tailrecursion(s[1:], count + int(s[0] in VOWELS))
Unfortunately, this will fail for long strings.
>>> medium_sized_string = str(range(1000))
>>> count_vowels(medium_sized_string)
...
RuntimeError: maximum recursion depth exceeded while calling a Python object
if this is something of interest, look at this blog article.
Here's a functional programming approach for you to study:
map_ = lambda func, lst: [func(lst[0])] + map_(func, lst[1:]) if lst else []
reduce_ = lambda func, lst, init: reduce_(func, lst[1:], func(init, lst[0])) if lst else init
add = lambda x, y: int(x) + int(y)
is_vowel = lambda a: a in 'aeiou'
s = 'How razorback-jumping frogs can level six piqued gymnasts!'
num_vowels = reduce_(add, map_(is_vowel, s), 0)
The idea is to divide the problem into two steps, where the first ("map") converts the data into another form (a letter -> 0/1) and the second ("reduce") collects converted items into one single value (the sum of 1's).
References:
http://en.wikipedia.org/wiki/Map_(higher-order_function)
http://en.wikipedia.org/wiki/Reduce_(higher-order_function)
http://en.wikipedia.org/wiki/MapReduce
Another, more advanced solution is to convert the problem into tail recursive and use a trampoline to eliminate the recursive call:
def count_vowels(s):
f = lambda s, n: lambda: f(s[1:], n + (s[0] in 'aeiou')) if s else n
t = f(s, 0)
while callable(t): t = t()
return t
Note that unlike naive solutions this one can work with very long strings without causing "recursion depth exceeded" errors.

Python: Clever ways at string manipulation

I'm new to Python and am currently reading a chapter on String manipulation in "Dive into Python."
I was wondering what are some of the best (or most clever/creative) ways to do the following:
1) Extract from this string: "stackoverflow.com/questions/ask" the word 'questions.' I did string.split(/)[0]-- but that isn't very clever.
2) Find the longest palindrome in a given number or string
3) Starting with a given word (i.e. "cat")-- find all possible ways to get from that to another three- letter word ("dog"), changing one letter at a time such that each change in letters forms a new, valid word.
For example-- cat, cot, dot, dog
As personal exercise, here's to you, (hopefully) well commented code with some hints.
#!/usr/bin/env python2
# Let's take this string:
a = "palindnilddafa"
# I surround with a try/catch block, explanation following
try:
# In this loop I go from length of a minus 1 to 0.
# range can take 3 params: start, end, increment
# This way I start from the thow longest subsring,
# the one without the first char and without the last
# and go on this way
for i in range(len(a)-1, 0, -1):
# In this loop I want to know how many
# Palidnrome of i length I can do, that
# is len(a) - i, and I take all
# I start from the end to find the largest first
for j in range(len(a) - i):
# this is a little triky.
# string[start:end] is the slice operator
# as string are like arrays (but unmutable).
# So I take from j to j+i, all the offsets
# The result of "foo"[1:3] is "oo", to be clear.
# with string[::-1] you take all elements but in the
# reverse order
# The check string1 in string2 checks if string1 is a
# substring of string2
if a[j:j+i][::-1] in a:
# If it is I cannot break, 'couse I'll go on on the first
# cycle, so I rise an exception passing as argument the substring
# found
raise Exception(a[j:j+i][::-1])
# And then I catch the exception, carrying the message
# Which is the palindrome, and I print some info
except Exception as e:
# You can pass many things comma-separated to print (this is python2!)
print e, "is the longest palindrome of", a
# Or you can use printf formatting style
print "It's %d long and start from %d" % (len(str(e)), a.index(str(e)))
After the discussion, and I'm little sorry if it goes ot. I've written another implementation of palindrome-searcher, and if sberry2A can, I'd like to know the result of some benchmark tests!
Be aware, there are a lot of bugs (i guess) about pointers and the hard "+1 -1"-problem, but the idea is clear. Start from the middle and then expand until you can.
Here's the code:
#!/usr/bin/env python2
def check(s, i):
mid = s[i]
j = 1
try:
while s[i-j] == s[i+j]:
j += 1
except:
pass
return s[i-j+1:i+j]
def do_all(a):
pals = []
mlen = 0
for i in range(len(a)/2):
#print "check for", i
left = check(a, len(a)/2 + i)
mlen = max(mlen, len(left))
pals.append(left)
right = check(a, len(a)/2 - i)
mlen = max(mlen, len(right))
pals.append(right)
if mlen > max(2, i*2-1):
return left if len(left) > len(right) else right
string = "palindnilddafa"
print do_all(string)
No 3:
If your string is s:
max((j-i,s[i:j]) for i in range(len(s)-1) for j in range(i+2,len(s)+1) if s[i:j]==s[j-1:i-1:-1])[1]
will return the answer.
For #2 - How to find the longest palindrome in a given string?

Categories

Resources