Python algorithm in list - python

In a list of N strings, implement an algorithm that outputs the largest n if the entire string is the same as the preceding n strings. (i.e., print out how many characters in front of all given strings match).
My code:
def solution(a):
import numpy as np
for index in range(0,a):
if np.equal(a[index], a[index-1]) == True:
i += 1
return solution
else:
break
return 0
# Test code
print(solution(['abcd', 'abce', 'abchg', 'abcfwqw', 'abcdfg'])) # 3
print(solution(['abcd', 'gbce', 'abchg', 'abcfwqw', 'abcdfg'])) # 0

Some comments on your code:
There is no need to use numpy if it is only used for string comparison
i is undefined when i += 1 is about to be executed, so that will not run. There is no actual use of i in your code.
index-1 is an invalid value for a list index in the first iteration of the loop
solution is your function, so return solution will return a function object. You need to return a number.
The if condition is only comparing complete words, so there is no attempt to only compare a prefix.
A possible way to do this, is to be optimistic and assume that the first word is a prefix of all other words. Then as you detect a word where this is not the case, reduce the size of the prefix until it is again a valid prefix of that word. Continue like that until all words have been processed. If at any moment you find the prefix is reduced to an empty string, you can actually exit and return 0, as it cannot get any less than that.
Here is how you could code it:
def solution(words):
prefix = words[0] # if there was only one word, this would be the prefix
for word in words:
while not word.startswith(prefix):
prefix = prefix[:-1] # reduce the size of the prefix
if not prefix: # is there any sense in continuing?
return 0 # ...: no.
return len(prefix)

The description is somewhat convoluted but it does seem that you're looking for the length of the longest common prefix.
You can get the length of the common prefix between two strings using the next() function. It can find the first index where characters differ which will correspond to the length of the common prefix:
def maxCommon(S):
cp = S[0] if S else "" # first string is common prefix (cp)
for s in S[1:]: # go through other strings (s)
cs = next((i for i,(a,b) in enumerate(zip(s,cp)) if a!=b),len(cp))
cp = cp[:cs] # truncate to new common size (cs)
return len(cp) # return length of common prefix
output:
print(maxCommon(['abcd', 'abce', 'abchg', 'abcfwqw', 'abcdfg'])) # 3
print(maxCommon(['abcd', 'gbce', 'abchg', 'abcfwqw', 'abcdfg'])) # 0

Related

How do I check if the next item in a string is the alphabetical successor of the one before? + Inverse

I'm trying to compress a string in a way that any sequence of letters in strict alphabetical order is swapped with the first letter plus the length of the sequence.
For example, the string "abcdefxylmno", would become: "a6xyl4"
Single letters that aren't in order with the one before or after just stay the way they are.
How do I check that two letters are successors (a,b) and not simply in alphabetical order (a,c)? And how do I keep iterating on the string until I find a letter that doesn't meet this requirement?
I'm also trying to do this in a way that makes it easier to write an inverse function (that given the result string gives me back the original one).
EDIT :
I've managed to get the function working, thanks to your suggestion of using the alphabet string as comparison; now I'm very much stuck on the inverse function: given "a6xyl4" expand it back into "abcdefxylmno".
After quite some time I managed to split the string every time there's a number and I made a function that expands a 2 char string, but it fails to work when I use it on a longer string:
from string import ascii_lowercase as abc
def subString(start,n):
L=[]
ind = abc.index(start)
newAbc = abc[ind:]
for i in range(len(newAbc)):
while i < n:
L.append(newAbc[i])
i+=1
res = ''.join(L)
return res
def unpack(S):
for i in range(len(S)-1):
if S[i] in abc and S[i+1] not in abc:
lett = str(S[i])
num = int(S[i+1])
return subString(lett,num)
def separate(S):
lst = []
for i in S:
lst.append(i)
for el in lst:
if el.isnumeric():
ind = lst.index(el)
lst.insert(ind+1,"-")
a = ''.join(lst)
L = a.split("-")
if S[-1].isnumeric():
L.remove(L[-1])
return L
else:
return L
def inverse(S):
L = separate(S)
for i in L:
return unpack(i)
Each of these functions work singularly, but inverse(S) doesn't output anything. What's the mistake?
You can use the ord() function which returns an integer representing the Unicode character. Sequential letters in alphabetical order differ by 1. Thus said you can implement a simple funtion:
def is_successor(a,b):
# check for marginal cases if we dont ensure
# input restriction somewhere else
if ord(a) not in range(ord('a'), ord('z')) and ord(a) not in range(ord('A'),ord('Z')):
return False
if ord(b) not in range(ord('a'), ord('z')) and ord(b) not in range(ord('A'),ord('Z')):
return False
# returns true if they are sequential
return ((ord(b) - ord(a)) == 1)
You can use chr(int) method for your reversing stage as it returns a string representing a character whose Unicode code point is an integer given as argument.
This builds on the idea that acceptable subsequences will be substrings of the ABC:
from string import ascii_lowercase as abc # 'abcdefg...'
text = 'abcdefxylmno'
stack = []
cache = ''
# collect subsequences
for char in text:
if cache + char in abc:
cache += char
else:
stack.append(cache)
cache = char
# if present, append the last sequence
if cache:
stack.append(cache)
# stack is now ['abcdef', 'xy', 'lmno']
# Build the final string 'a6x2l4'
result = ''.join(f'{s[0]}{len(s)}' if len(s) > 1 else s for s in stack)

Is there a way to select a group of indexes and remove them?

I am making a decode method, which selects a set of index values within strings to remove. But right now the problem is i am unable to understand how to select a set of indices to remove
I have tried making a list of items to designate and remove if found in the string, but this would only work for only a few types of string sets.
def decode(string, n):
for i in range(0,len(string), n):
string = string.replace(string[i],'')
return string
here n is the number of values to remove at a given index as well as the index from where to start removing the said values
I understand how to step through an index, but I am not sure how to remove string values according to the index.
print(decode('#P#y#t#h#o#n#',1)) #this works out to be Python
print(decode('AxYLet1x3’s T74codaa7e!',3 )) #this does not, this is supposed to be 'Let's Code'
With "switcher" flag:
def decode(inp_str, n):
s = ''
flag = True
for i in range(0, len(inp_str), n):
flag = not flag
if flag: s += inp_str[i: i + n]
return s
print(decode('#P#y#t#h#o#n#', 1)) # Python
print(decode('AxYLet1x3’s T74codaa7e!', 3)) # Let’s code!
Different approach would be to pick the characters at the right positions:
def decode(string, n):
res = ''
for i in range(len(string)//(2*n)+1):
res += string[2*n*i+n:2*n*i+2*n]
return res
Don't change the size of an iterable when going through it!
The best would be to replace the character with some placeholder that can't be in the string, and then stripping it.
E.g. for your first example you already have that string format. Removing them outside the loop (remember, loop is for marking the characters for deletion) would be:
return ''.join(c for c in string if c!='#')
As for the loop itself in this approach, I'll leave it up to you to debug it now. ;) See how index moves in the loop, see what your replace in fact does! E.g. as I said in the comment, n=1 would go through literally every character, not every second character!
Another solution is smart slicing with indexes. Assuming from your examples that you want to 'remove n, keep n' code:
def decode(string, n):
result = ""
for i in range(n,len(string), 2*n): # first index we want to keep is n, next is 3n, 5n... so we're jumping by 2n each time
result += string[i: i+n]
return result
First, you're returning right after the first iteration. Second, you're only replacing character at n'th position with "".
This should do what you require, it'll replace every 'n' number of characters after every 'n' index:
def decode(string, n):
for i in range(0,len(string)-1,n):
string = string[:i] + string[i+n:] # Remove elements at index "i", till "i+n"
return string
Output:
print(decode('#P#y#t#h#o#n#',1)) # Output = Python
print(decode('AxYLet1x3’s T74codaa7e!',3 )) # Output = Let's Code

How to implement a brute force solution to "Finding first unique character in a string"

As described here:
https://leetcode.com/problems/first-unique-character-in-a-string/description/
I attempted one here but couldn't quite finish:
https://paste.pound-python.org/show/JuPLgdgqceMQYh5kk0Sf/
#Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.
#xamples:
#s = "leetcode"
#return 0.
#s = "loveleetcode",
#return 2.
#Note: You may assume the string contain only lowercase letters.
class Solution(object):
def firstUniqChar(self, s):
"""
:type s: str
:rtype: int
"""
for i in range(len(s)):
for j in range(i+1,len(s)):
if s[i] == s[j]:
break
#But now what. let's say i have complete loop of j where there's no match with i, how do I return i?
I'm ONLY interested in the brute force N^2 solution, nothing fancier. The idea in the above solution is to start a double loop, where inner loop searches for a match with the outer loop's char, and if there's match, break the inner loop and continue onto the next char on the outer loop.
But the question is, how do I handle when there's NO match, which is when I need to return the outer loop's index as the first unique one.
Can't quite figure out a graceful way to do it, and can handle edge case like a single char string.
Iterate over each char, and check if it appears in any of the following chars. We need to keep track of the characters we've already seen, to avoid falling into edge cases. Try this, it's an O(n^2) solution:
def firstUniqChar(s):
# store already seen chars
seen = []
for i, c in enumerate(s):
# return if char not previously seen and not in rest
if c not in seen and c not in s[i+1:]:
return i
# mark char as seen
seen.append(c)
# no unique chars were found
return -1
For completeness' sake, here's an O(n) solution:
def firstUniqChar(s):
# build frequency table
freq = {}
for i, c in enumerate(s):
if c not in freq:
# store [frequency, index]
freq[c] = [1, i]
else:
# update frequency
freq[c][0] += 1
# find leftmost char with frequency == 1
# it's more efficient to traverse the freq table
# instead of the (potentially big) input string
leftidx = float('+inf')
for f, i in freq.values():
if f == 1 and i < leftidx:
leftidx = i
# handle edge case: no unique chars were found
return leftidx if leftidx != float('+inf') else -1
For example:
firstUniqChar('cc')
=> -1
firstUniqChar('ccdd')
=> -1
firstUniqChar('leetcode')
=> 0
firstUniqChar('loveleetcode')
=> 2
Add an else to the for loop where you return.
for j ...:
...
else:
return i
I'd first like to note that your current algorithm for finding unique characters doesn't work correctly. That's because you can't assume the character at index i is unique just because none of the indexes j found the same character later in the string. The character at index i could be a repeat of an earlier character (which you'd have skipped when the previous j was equal to the current i).
You could fix the algorithm by letting j iterate over the whole range of indexes, and adding an extra check to ignore the matches when the indexes are the same to your if:
for i in range(len(s)):
for j in range(len(s)):
if i != j and s[i] == s[j]:
break
As Ignacio Vazquez-Abrams suggests in his answer, you can then add an else block to the inner for loop to make the code return when no match was found:
else: # this line should be indented to match the "for j" loop
return i
There are also a few ways you can solve this problem more simply if you use the builtin functions and types available in Python.
For instance, you can implement an O(n^2) solution equivalent to the one above using only one explicit loop, and using str.count to replace the inner one:
def firstUniqChar(s):
for i, c in enumerate(s):
if s.count(c) == 1:
return i
return None
I'm also using enumerate to get the character values and indexes together in one step, rather than iterating over a range and indexing later.
There's also a very easy way to make an O(n) solution using collections.Counter, which can do all the counting in one pass before you start checking the characters in order to try to find the first one that is unique:
from collections import Counter
def firstUniqChar(s):
count = Counter(s)
for i, c in enumerate(s):
if count[c] == 1:
return i
return None
I'm not sure your approach will work on an even palindrome, e.g. "redder" (note the second d). Try this instead:
s1 = "leetcode"
s2 = "loveleetcode"
s3 = "redder"
def unique_index(s):
ahead, behind = list(s), set()
for idx, char in enumerate(s):
ahead = ahead[1:]
if (char not in ahead) and (char not in behind):
return idx
behind.add(s[idx])
return -1
assert unique_index(s1) == 0
assert unique_index(s2) == 2
assert unique_index(s3) == -1
For each character, we look ahead and behind. Only characters disjoint from both groups will return an index. As iteration progresses, the list of what is observed ahead shortens, while what is seen behind extends. The default is -1 as stated in the actual leetcode challenge.
A second list is not required. #Óscar López's answer is the simplified answer.

Create Your Own Find String Function

For a school project I have to create a function called find_str that essentially does the same thing as the .find string method, but we cannot use any string methods in our definition.
The project description reads: "Function find_str has two parameters (both strings). It returns the lowest index where the second parameter is found within the first parameter (it returns -1 if the second parameter is not found within the first parameter)."
I have spent a lot of time working on this project and have yet to come to a solution. This is the current definition that I have come up with:
def find_str (string, substring):
index = 0
length = len (substring)
for ch in string:
if ch == substring [0]:
subindex1 = 0
subindex2 = index
for i in range (length):
if ch == substring [i]:
subindex1 +=1
if subindex1 == length:
return index
ch = string [(subindex2)+1]
subindex2 +=1
index += 1
return "-1"
This sample of code only works in some instances, but not all.
For example:
print (find_str ("hello", "llo"))
returns:
2
as it should.
But
print (find_str ("hello", "el"))
returns:
ch = string [(subindex2)+1]
IndexError: string index out of range
I feel like I am overthinking this and there must be is an easier way to do it. Any input or help would be great! Thanks.
FFUsing a sub function to clear your thoughts often help.
def find_str (string, substring):
index = 0
length = len (substring)
for j in range(len(string)):
if is_next_sub(string, substring, j):
return j
return "-1"
def is_next_sub(string, substring, index):
for i in range(len(substring)):
if substring[i] != string[index + i]:
return False
return True
I'm not sure we should be helping you with 'homework'
How about this:
def find_str(string, substring):
for off in xrange(len(string)):
if string[off:].startswith(substring):
return off
return -1
I haven't checked through your code in detail, but it looks like you're trying to compare characters that don't exist.
Suppose you're searching "aaaaa" for the substring "aaa", and you need to find all matches...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Even though the characters always match, and there five characters in the string, there are only three positions that you might need to consider.
So before you look at the actual characters at all, you can restrict the number of start positions you might need to consider based on the lengths of the string and substring. You only loop for those start positions. That means you're not looping for start positions that cannot match. Also, if you don't do this...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Match at 3 : ...aa!
Match at 4 : ....a!!
Those exclamation points are places where you try to match a character in the substring with a character that doesn't exist, after the end of the string. You can check for that within the loop to avoid the error each time it occurs, but why not eliminate all those cases at once by not looping for the match positions that cannot occur?
The number of start positions you may need to check is len(fullstring) + 1 - len(substring), so you can derive a range of possible start positions using range(0, len(fullstring) + 1 - len(substring)).

Python - packing/unpacking by letters

I'm just starting to learn python and I have this exercise that's puzzling me:
Create a function that can pack or unpack a string of letters.
So aaabb would be packed a3b2 and vice versa.
For the packing part of the function, I wrote the following
def packer(s):
if s.isalpha(): # Defines if unpacked
stack = []
for i in s:
if s.count(i) > 1:
if (i + str(s.count(i))) not in stack:
stack.append(i + str(s.count(i)))
else:
stack.append(i)
print "".join(stack)
else:
print "Something's not quite right.."
return False
packer("aaaaaaaaaaaabbbccccd")
This seems to work all proper. But the assignment says that
if the input has (for example) the letter a after b or c, then
it should later be unpacked into it's original form.
So "aaabbkka" should become a3b2k2a, not a4b2k2.
I hence figured, that I cannot use the "count()" command, since
that counts all occurrences of the item in the whole string, correct?
What would be my options here then?
On to the unpacking -
I've thought of the basics what my code needs to do -
between the " if s.isalpha():" and else, I should add an elif that
checks whether or not the string has digits in it. (I figured this would be
enough to determine whether it's the packed version or unpacked).
Create a for loop and inside of it an if sentence, which then checks for every element:
2.1. If it has a number behind it > Return (or add to an empty stack) the number times the digit
2.2. If it has no number following it > Return just the element.
Big question number 2 - how do I check whether it's a number or just another
alphabetical element following an element in the list? I guess this must be done with
slicing, but those only take integers. Could this be achieved with the index command?
Also - if this is of any relevance - so far I've basically covered lists, strings, if and for
and I've been told this exercise is doable with just those (...so if you wouldn't mind keeping this really basic)
All help appreciated for the newbie enthusiast!
SOLVED:
def packer(s):
if s.isalpha(): # Defines if unpacked
groups= []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s' % (g[0], len(g)>1 and len(g) or '') for g in groups)
else: # Seems to be packed
stack = ""
for i in range(len(s)):
if s[i].isalpha():
if i+1 < len(s) and s[i+1].isdigit():
digit = s[i+1]
char = s[i]
i += 2
while i < len(s) and s[i].isdigit():
digit +=s[i]
i+=1
stack += char * int(digit)
else:
stack+= s[i]
else:
""
return "".join(stack)
print (packer("aaaaaaaaaaaabbbccccd"))
print (packer("a4b19am4nmba22"))
So this is my final code. Almost managed to pull it all off with just for loops and if statements.
In the end though I had to bring in the while loop to solve reading the multiple-digit numbers issue. I think I still managed to keep it simple enough. Thanks a ton millimoose and everyone else for chipping in!
A straightforward solution:
If a char is different, make a new group. Otherwise append it to the last group. Finally count all groups and join them.
def packer(s):
groups = []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s'%(g[0], len(g)) for g in groups)
Another approach is using re.
Regex r'(.)\1+' can match consecutive characters longer than 1. And with re.sub you can easily encode it:
regex = re.compile(r'(.)\1+')
def replacer(match):
return match.group(1) + str(len(match.group(0)))
regex.sub(replacer, 'aaabbkka')
#=> 'a3b2k2a'
I think You can use `itertools.grouby' function
for example
import itertools
data = 'aaassaaasssddee'
groupped_data = ((c, len(list(g))) for c, g in itertools.groupby(data))
result = ''.join(c + (str(n) if n > 1 else '') for c, n in groupped_data)
of course one can make this code more readable using generator instead of generator statement
This is an implementation of the algorithm I outlined in the comments:
from itertools import takewhile, count, islice, izip
def consume(items):
from collections import deque
deque(items, maxlen=0)
def ilen(items):
result = count()
consume(izip(items, result))
return next(result)
def pack_or_unpack(data):
start = 0
result = []
while start < len(data):
if data[start].isdigit():
# `data` is packed, bail
return unpack(data)
run = run_len(data, start)
# append the character that might repeat
result.append(data[start])
if run > 1:
# append the length of the run of characters
result.append(str(run))
start += run
return ''.join(result)
def run_len(data, start):
"""Return the end index of the run of identical characters starting at
`start`"""
return start + ilen(takewhile(lambda c: c == data[start],
islice(data, start, None)))
def unpack(data):
result = []
for i in range(len(data)):
if data[i].isdigit():
# skip digits, we'll look for them below
continue
# packed character
c = data[i]
# number of repetitions
n = 1
if (i+1) < len(data) and data[i+1].isdigit():
# if the next character is a digit, grab all the digits in the
# substring starting at i+1
n = int(''.join(takewhile(str.isdigit, data[i+1:])))
# append the repeated character
result.append(c*n) # multiplying a string with a number repeats it
return ''.join(result)
print pack_or_unpack('aaabbc')
print pack_or_unpack('a3b2c')
print pack_or_unpack('a10')
print pack_or_unpack('b5c5')
print pack_or_unpack('abc')
A regex-flavoured version of unpack() would be:
import re
UNPACK_RE = re.compile(r'(?P<char> [a-zA-Z]) (?P<count> \d+)?', re.VERBOSE)
def unpack_re(data):
matches = UNPACK_RE.finditer(data)
pairs = ((m.group('char'), m.group('count')) for m in matches)
return ''.join(char * (int(count) if count else 1)
for char, count in pairs)
This code demonstrates the most straightforward (or "basic") approach of implementing that algorithm. It's not particularly elegant or idiomatic or necessarily efficient. (It would be if written in C, but Python has the caveats such as: indexing a string copies the character into a new string, and algorithms that seem to copy data excessively might be faster than trying to avoid this if the copying is done in C and the workaround was implemented with a Python loop.)

Categories

Resources