Can someone explain me this leetcode string manipulation question? - python

I just started doing some leetcode questions, and quite not sure why in this problem, we considering a case when 2 words are equal. This is a problem statement:
Given two strings A and B of lowercase letters, return true if you can swap two letters in A so the result is equal to B, otherwise, return false.
Swapping letters is defined as taking two indices i and j (0-indexed) such that i != j and swapping the characters at A[i] and A[j]. For example, swapping at indices 0 and 2 in "abcd" results in "cbad"
And this is a solution
def buddyStrings(self, A, B):
if len(A) != len(B): return False
if A == B and len(set(A)) < len(A): return True
dif = [(a, b) for a, b in zip(A, B) if a != b]
return len(dif) == 2 and dif[0] == dif[1][::-1]
I cant why we consider second if condition and how this list comprehension workd in 3 if. I will appreciate any help.

I guess maybe this would be a bit simplified version:
class Solution:
def buddyStrings(self, A, B):
if len(A) != len(B):
return False
if A == B and len(set(A)) < len(A):
return True
diff = []
for i in list(zip(A, B)):
if i[0] != i[1]:
diff.append(i)
return len(diff) == 2 and diff[0] == diff[1][::-1]
for i in list(zip(A, B)) means for a pair in A and B.

Here are the checks:
def buddyStrings(self, A, B):
if len(A) != len(B): return False # can't be true if lengths not equal
if A == B and len(set(A)) < len(A): return True # if same letter appears twice in word
dif = [(a, b) for a, b in zip(A, B) if a != b] # get letters that don't match
return len(dif) == 2 and dif[0] == dif[1][::-1] # if mismatch length is 2 and mismatch of first letter is reverse of mismatch of second letter

dif = [(a, b) for a, b in zip(A, B) if a != b]
it finds all charachters that are not equal to each other in the same position.
for ABCDE and FGCDE
dif = [('A','F'), ('B','G')]

Related

string match string index out of range

Could someone please explain to me why this won't work? I could always copy the answer, but then I wouldn't learn why this didn't work.
a scrrenshot of the problem, my work, and the error
Given 2 strings, a and b, return the number of the positions where they contain the same length 2 substring. So "xxcaazz" and "xxbaaz" yields 3, since the "xx", "aa", and "az" substrings appear in the same place in both strings.
def string_match(a, b):
count = 0
if len(a) or len(b) == 0:
return count
else:
if len(a)> len(b):
for n in range(len(a)-2):
if a[n]==b[n]:
count +=1
else:
for n in range(len(b)-2):
if b[n]==a[n]:
count+=1
return count
Thank you so much!!
I tried using index locations to find matching characters in 2 strings
You should use the length of the shortest string in the loop. So you should switch your condition
from
if len(a) > len(b):
for n in range(len(a)-1):
to
if len(a) < len(b):
for n in range(len(a)-1):
That's for the Index out of range error, yet as mentionned in the comments there are some other bugs in your code
# first condition
if len(a) == 0 or len(b) == 0:
return 0
Full code
def string_match(a, b):
count = 0
if len(a) == 0 or len(b) == 0:
return 0
if len(a) < len(b):
for n in range(len(a)):
if a[n] == b[n]:
count += 1
else:
for n in range(len(b)):
if a[n] == b[n]:
count += 1
return count
NOTE your code does not answer the given task, take your time to develop a better solution
thank you for your help! I made the following adjustments and the code works!
def string_match(a, b):
count = 0
n=0
if len(a) == 0 or len(b) == 0:
return 0
else:
if len(a) < len(b):
for n in range(len(a)-1):
if a[n:n+2] == b[n:n+2]:
count += 1
else:
for n in range(len(b)-1):
if a[n:n+2] == b[n:n+2]:
count += 1
return count

What does [b, c][a < b < c]+'000'[a < c:] expression do?

This is the code about to find out dice gamble's prize amount:
a, b, c = map(int, input().split())
if a == b and b == c: #same all of dice numbers
print(10000 + (a * 1000))
elif a == b or b == c: #same two of dice numbers
print(1000 + (b * 100))
elif a == c: #same two of dice numbers
print(1000 + (a * 100))
else: #no same number
print(max(a, b, c)*100)
This is equivalent:
*_, a, b, c=sorted(input())
print(['1'+b, c][a < b < c]+'000'[a < c:])
But i can't understanding about what does
['1'+b, c][a < b < c]
and
'000'[a < c:]
do.
So, I had tried to find out the meaning of
`['1'+b, c][a < b < c]`
I found this is similar with
`c if a<b<c else '1'+b`
but i can't be sure about that.
anyway, about
`'000'[a < c:]`
I tried input a=c to
`print('000'[a < c:])`
It shows 000.
I tried else that input a<c, it shows 00
Anyone can tell me about this expression?
The original is unnecessarily cryptic. It uses the fact that:
int(False) == 0
int(True) == 1
For instance,
'000'[False:] == '000'[0:] == '000'
'000'[True:] == '000'[1:] == '00'
Similarly,
['1' + b, c][False] == ['1' + b, c][0] == '1' + b
['1' + b, c][True] == ['1' + b, c][1] == c
Here's an equivalent rewrite:
prefix = c if a < b < c else '1' + b
suffix = '00' if a < c else '000'
print(prefix + suffix)
a < c will evaluate to either True or False. So you are effectively getting either print('000'[True:]) or print('000'[False:]).
When you have the [] after a string, those will perform a slice on the string. You'd see this in actually practice as something like the following (here's a link for more info:
'abcde'[2] # returns 'c', which, starting from zero, is 2nd item
'abcde'[0] # returns 'a', which is the zero-th item
'abcde'[1:3] # returns 'bc', starting from the 1st item and going to the 3rd, not inclusive of the third
It looks like if you use booleans in such a slice, the False acts as a 0 and True acts as a 1
"abcde"[True] == "abcde"[1] # evaluates to true, these are the same
'abcde'[True] # evaluates to 'b'
"abcde"[False] == "abcde"[2] # evaluates to `False`
"abcde"[True] == "abcde"[2] # evaluates to `False`
"abcde"[False] == "abcde"[0] # evaluates to True, because False is effectively 0 here
So, having a [True:] or [False:] in that string-slice is the same as having [1:] or '[0:]`, which is saying to "give all characters starting with the second (for True/1:) or first (for False/0:) in this string".
The string '000' has length 3. The boolean a < c has value False or True. The operation s[i] for a string s and integer i refers to the i'th element of s, and the operation s[i:] takes the substring of s from index i through the end of s. A boolean value will be converted to an integer in Python as follows: False becomes 0 and True becomes 1.
So, '000'[a < c:] is the same as '000'[0:] if a < c is False, and this is the same as '000'. If a < c is True, then '000'[a < c:] is the same as '000'[1:] which is '00'.

Optimizing if-elif expressions in Python

I'm trying to optimize my code by using dictionaries instead of if-elif statements.
I've read that you can optimize code by using dictionaries instead of if-elif statements, but I don't know how to do that. I'd like to use the logical expressions below somehow in the dictionary. (The code iterates through a and b)
def e_ha(n, t, a, b, E):
if a == b:
return 6
elif (a%n == 0, a != n**2, b == a + 1) == (True, True, True):
return 0
elif ((a-1)%n == 0, (a-1) != n**2, b == a - 1) == (True, True, True):
return 0
elif (a%n == 0, b == a-(n-1)) == (True, True):
return 1
elif (b%n == 0, a == b-(n-1)) == (True, True):
return 1
elif abs(a-b) == 1:
return 1
elif abs(a-b) == n:
return 1
else:
return 0
One naive approach to achieve the best performance is to build a big table storing the results for all possible (a, b) pairs. However, this could consume lots of memory and becomes inpractical for large ns.
Here is how the code can be optimized using a normal approach, as explained in the following step-by-step.
1. Using Explicit and for Logical Expressions
As suggested in the comments, this is much more readable and also more efficient because of the short circuiting behavior of and. This change alone reduces the runtime by 60% in my tests.
2. Remove Redundant Conditions
Since both a and b range from 1 to n**2, if a == n**2, then b == a + 1 can never be fulfilled. Therefore the a != n**2 check in the condition a%n == 0 and a != n**2 and b == a + 1 is redundant. The same applies to the third condition. Eliminating them simplifies these conditions to:
...
elif a % n == 0 and b == a + 1:
elif (a - 1) % n == 0 and b == a - 1:
...
3. Avoid Repeated Computations in Conditions
Note that the above-improved conditions
a % n == 0 and b == a + 1 and (a - 1) % n == 0 and b == a - 1 are special cases of abs(a - b) == 1. Therefore these conditions can be rewritten using nested if-else as follows.
if abs(a - b) == 1:
if a % n == 0 and b > a: return 0
elif b % n == 0 and a > b: return 0 # a - 1 equals to b here so it is replaced to save one computation
else return 1
Also note that the value abs(a - b) is related to all the conditions. Therefore it can be computed before all conditions are checked. With this change, the code becomes
d = abs(a - b)
if d == 0: return 6
elif d == 1:
if a % n == 0 and b > a: return 0
elif b % n == 0 and a > b: return 0
else return 1
elif d == n - 1:
if a % n == 0 and a > b: return 1
elif b % n == 0 and b > a: return 1
else return 0
elif d == n: return 1
else: return 0
4. Simplify Logic
For example, the first nested if-else above can be simplified to
if min(a, b) % n == 0: return 0
else return 1
A more compact syntax is:
return 1 if min(a, b) % n == 0 else 0
5. Apply Python-specific Optimizations
In Python, the number 0 is regarded as having a falsy value. So for numbers if d != 0: and if d == 0: are equivalent to if d: and if not d: respectively. The latter is a bit faster. Applying this change results in the following optimized code (here a more compact syntax is used to shorten the answer).
d = abs(b - a)
if not d: return 6
elif d == 1: return 1 if min(a, b) % n else 0
elif d == n - 1: return 0 if max(a, b) % n else 1
else: return 1 if d == n else 0
Applying steps 2 to 5 above reduces the runtime by another 50%.
6. Adjust Order of Conditions based on Input Distribution
This change relies on the knowledge of the actual input distribution in the application. The target is to make the more frequently seen inputs return faster. In this example, assume the inputs a and b are uniformly distributed within [1, n**2] and n >= 10. In this case, the most frequent scenario is that the value d does not match any of the if conditions and 0 is returned at the end after all conditions are checked. In order to speedup, we can make it fail faster by first checking whether d can possibly lead to a non-zero return value.
d = abs(a - b)
if 1 < d < n - 1 or d > n: return 0 # Return 0 if d is not in [0, 1, n - 1, n]
elif d == 1: return 1 if min(a, b) % n else 0
elif d == n - 1: return 0 if max(a, b) % n else 1
else: return 1 if d == n else 6 # d == 0 case is moved to the last "else" since it is least frequently seen
7. Using Lookup Tables
Further speedup can be achieved by using lookup tables. Here, the values [0, 1, n - 1, n] for the first conditional check can be stored to speedup the check. In this case, there are two primary options for this: a set or a length-n+1 list of boolean values. The former uses less memory while the latter has better performance. Note that the lookup table should be constructed once outside the function and passed into it. The code using a boolean list as a lookup is as follows:
def func(n, a, b, lookup):
d = abs(a - b)
if not (d <= n and lookup[d]): return 0
...
Applying steps 6 and 7 (with boolean list lookup) reduces the runtime by another 15%.
Note that in this example a 2D lookup table (implemented as nested lists or dictionaries) can also be applied using (min(a, b) % n, d) as indices. However, under the same assumption of input distribution in step 6, this is slightly slower than a 1D lookup because of the overhead of one extra level of indexing.
The runtime above is the total time of applying the function to all possible (a, b) values within [1, n**2] for n=20.
Using a dictionary, where the keys are boolean expressions is not going to work the way you hope it does. There is no such thing as a boolean-expression-object that could take the place of the key, only booleans. In the end, boolean expressions evaluate to either True or False, so at most you could only have two key-value pairs.
I would suggest, however, you make things a bit more readable/pythonic:
if a%n == 0 and a != n**2 and b == a + 1:
or
if all((a%n == 0, a != n**2, b == a + 1)):
You can just use a list of tuples and loop through it:
def e_ha(n, t, a, b, E):
checks = [
(a == b, 6),
(all(a%n == 0, a != n**2, b == a + 1), 0 ),
(all((a-1)%n == 0, (a-1) != n**2, b == a - 1), 0),
(all(a%n == 0, b == a-(n-1)), 1),
(all(b%n == 0, a == b-(n-1)), 1 ),
(abs(a-b) == 1, 1),
(abs(a-b) == n, 1),
(true, 0)
]
for valid, return_value in checks:
if valid:
return return_value
Caveat:
This is most certainly not faster in any way. Timed it multiple times and it was always slower.
It is less readable than the alternative

Finding Greatest Common Divisor through iterative solution (python 3)

I am trying to find the great common divisor by using a function and solving it iteratively. Though, for some reason, I am not sure why I am not getting the right output.
The greatest common divisor between 30 & 15 should be 15, however, my output is always giving me the wrong number. I have a strong feeling that my "if" statement is strongly incorrect. Please help!
def square(a,b):
'''
x: int or float.
'''
c = a + b
while c > 0:
c -= 1
if a % c == 0 and b % c == 0:
return c
else:
return 1
obj = square(30,15)
print (obj)
You should return a value only if you finished iterating all numbers and found none of them a divisor to both numbers:
def square(a, b):
c = a + b
while c > 0:
if a % c == 0 and b % c == 0:
return c
c -= 1
return 1
However, the last return will be unneeded in this case, as c would go from a + b to 1, and mod 1 will always bring a common divisor, so the loop will always terminate with 1, for the worst case.
Also, a number greater than a and b can not be a common divisor of them. (x mod y for y > x yields x), and gcd is the formal name for the task, so I would go with
def gcd(a, b):
for c in range(min(a, b), 0, -1):
if a % c == b % c == 0:
return c
for iterational solution.
You might be interested to know that there is a common recursive solution to the GCD problem based on the Euclidian algorighm.
def gcd(a, b):
if b == 0:
return a
else:
return gcd(b, a % b)
print(gcd(30, 15))
# 15

Check if a list is a rotation of another list that works with duplicates

I have this function for determining if a list is a rotation of another list:
def isRotation(a,b):
if len(a) != len(b):
return False
c=b*2
i=0
while a[0] != c[i]:
i+=1
for x in a:
if x!= c[i]:
return False
i+=1
return True
e.g.
>>> a = [1,2,3]
>>> b = [2,3,1]
>>> isRotation(a, b)
True
How do I make this work with duplicates? e.g.
a = [3,1,2,3,4]
b = [3,4,3,1,2]
And can it be done in O(n)time?
The following meta-algorithm will solve it.
Build a concatenation of a, e.g., a = [3,1,2,3,4] => aa = [3,1,2,3,4,3,1,2,3,4].
Run any string adaptation of a string-matching algorithm, e.g., Boyer Moore to find b in aa.
One particularly easy implementation, which I would first try, is to use Rabin Karp as the underlying algorithm. In this, you would
calculate the Rabin Fingerprint for b
calculate the Rabin fingerprint for aa[: len(b)], aa[1: len(b) + 1], ..., and compare the lists only when the fingerprints match
Note that
The Rabin fingerprint for a sliding window can be calculated iteratively very efficiently (read about it in the Rabin-Karp link)
If your list is of integers, you actually have a slightly easier time than for strings, as you don't need to think what is the numerical hash value of a letter
-
You can do it in 0(n) time and 0(1) space using a modified version of a maximal suffixes algorithm:
From Jewels of Stringology:
Cyclic equality of words
A rotation of a word u of length n is any word of the form u[k + 1...n][l...k]. Let u, w be two words of the same length n. They are said to be cyclic-equivalent if u(i) == w(j) for some i, j.
If words u and w are written as circles, they are cyclic-equivalent if the circles coincide after appropriate rotations.
There are several linear-time algorithms for testing the cyclic-equivalence
of two words. The simplest one is to apply any string matching algorithm to pattern pat = u and text = ww because words u and w are cyclic=equivalent if pat occurs in text.
Another algorithm is to find maximal suffixes of uu and ww and check if
they are identical on prefixes of size n. We have chosen this problem because there is simpler interesting algorithm, working in linear time and constant space simultaneously, which deserves presentation.
Algorithm Cyclic-Equivalence(u, w)
{ checks cyclic equality of u and w of common length n }
x := uu; y := ww;
i := 0; j := 0;
while (i < n) and (j < n) do begin
k := 1;
while x[i + k] = y[j + k] do k := k + 1;
if k > n then return true;
if x[i + k]> y[i + k] then i := i + k else j := j + k;
{ invariant }
end;
return false;
Which translated to python becomes:
def cyclic_equiv(u, v):
n, i, j = len(u), 0, 0
if n != len(v):
return False
while i < n and j < n:
k = 1
while k <= n and u[(i + k) % n] == v[(j + k) % n]:
k += 1
if k > n:
return True
if u[(i + k) % n] > v[(j + k) % n]:
i += k
else:
j += k
return False
Running a few examples:
In [4]: a = [3,1,2,3,4]
In [5]: b =[3,4,3,1,2]
In [6]: cyclic_equiv(a,b)
Out[6]: True
In [7]: b =[3,4,3,2,1]
In [8]: cyclic_equiv(a,b)
Out[8]: False
In [9]: b =[3,4,3,2]
In [10]: cyclic_equiv(a,b)
Out[10]: False
In [11]: cyclic_equiv([1,2,3],[1,2,3])
Out[11]: True
In [12]: cyclic_equiv([3,1,2],[1,2,3])
Out[12]: True
A more naive approach would be to use a collections.deque to rotate the elements:
def rot(l1,l2):
from collections import deque
if l1 == l2:
return True
# if length is different we cannot get a match
if len(l2) != len(l1):
return False
# if any elements are different we cannot get a match
if set(l1).difference(l2):
return False
l2,l1 = deque(l2),deque(l1)
for i in range(len(l1)):
l2.rotate() # l2.appendleft(d.pop())
if l1 == l2:
return True
return False
I think you could use something like this:
a1 = [3,4,5,1,2,4,2]
a2 = [4,5,1,2,4,2,3]
# Array a2 is rotation of array a1 if it's sublist of a1+a1
def is_rotation(a1, a2):
if len(a1) != len(a2):
return False
double_array = a1 + a1
return check_sublist(double_array, a2)
def check_sublist(a1, a2):
if len(a1) < len(a2):
return False
j = 0
for i in range(len(a1)):
if a1[i] == a2[j]:
j += 1
else:
j = 0
if j == len(a2):
return True
return j == len(a2)
Just common sense if we are talking about interview questions:
we should remember that solution should be easy to code and to describe.
do not try to remember solution on interview. It's better to remember core principle and re-implement it.
Alternatively (I couldn't get the b in aa solution to work), you can 'rotate' your list and check if the rotated list is equal to b:
def is_rotation(a, b):
for n in range(len(a)):
c = c = a[-n:] + a[:-n]
if b == c:
return True
return False
I believe this would be O(n) as it only has one for loop. Hope it helps
This seems to work.
def func(a,b):
if len(a) != len(b):
return False
elif a == b:
return True
indices = [i for i, x in enumerate(b) if x == a[0] and i > 0]
for i in indices:
if a == b[i:] + b[:i]:
return True
return False
And this also:
def func(a, b):
length = len(a)
if length != len(b):
return False
i = 0
while i < length:
if a[0] == b[i]:
j = i
for x in a:
if x != b[j]:
break
j = (j + 1) % length
return True
i += 1
return False
You could try testing the performance of just using the rotate() function in the deque collection:
from collections import deque
def is_rotation(a, b):
if len(a) == len(b):
da = deque(a)
db = deque(b)
for offset in range(len(a)):
if da == db:
return True
da.rotate(1)
return False
In terms of performance, do you need to make this calculation many times for small arrays, or for few times on very large arrays? This would determine whether or not special case testing would speed it up.
If you can represent these as strings instead, just do:
def cyclically_equivalent(a, b):
return len(a) == len(b) and a in 2 * b
Otherwise, one should get a sublist searching algorithm, such as Knuth-Morris-Pratt (Google gives some implementations) and do
def cyclically_equivalent(a, b):
return len(a) == len(b) and sublist_check(a, 2 * b)
Knuth-Morris-Pratt algorithm is a string search algorithm that runs in O(n) where n is the length of a text S (assuming the existence of preconstructed table T, which runs in O(m) where m is the length of the search string). All in all it is O(n+m).
You could do a similar pattern matching algorithm inspired by KMP.
Concatenate a list to itself, like a+a or b+b - this is the searched text/list with 2*n elements
Build the table T based on the other list (be it b or a) - this is done in O(n)
Run the KMP inspired algorithm - this is done in O(2*n) (because you concatenate a list to itself)
Overall time complexity is O(2*n+n) = O(3*n) which is in O(n)

Categories

Resources