How is str.find so fast?

How is str.find so fast? - python

I had an earlier problem where I was looking for a substring while iterating the string and using slicing. Turns out that's a really bad idea regarding performance. str.find is much faster. But I don't understand why?
import random
import string
import timeit
# Generate 1 MB of random string data
haystack = "".join(random.choices(string.ascii_lowercase, k=1_000_000))
def f():
return [i for i in range(len(haystack)) if haystack[i : i + len(needle)] == needle]
def g():
return [i for i in range(len(haystack)) if haystack.startswith(needle, i)]
def h():
def find(start=0):
while True:
position = haystack.find(needle, start)
if position < 0:
return
start = position + 1
yield position
return list(find())
number = 100
needle = "abcd"
expectation = f()
for func in "fgh":
assert eval(func + "()") == expectation
t = timeit.timeit(func + "()", globals=globals(), number=number)
print(func, t)
Results:
f 26.46937609199813
g 16.11952730899793
h 0.07721933699940564

f and g are slow since they check if needle can be found in every possible location of haystack resulting in a O(n m) complexity. f is slower because of the slicing operation that creates a new string object (as pointed out by Barmar in the comments).
h is fast because it can skip many locations. For example, if the needle string is not found, only one find is performed. The built-in find function is highly optimized in C and thus faster than an interpreted pure-Python code. Additionally, the find function use an efficient algorithm called Crochemore and Perrin's Two-Way. This algorithm is much faster than searching needle at every possible location of haystack when the string is relatively big. The related CPython code is available here.
If the number of occurrence is relatively small, your implementation should already be good. Otherwise, it may be better to use a custom variant based on the CPTW algorithm of possibly the KMP algorithm but doing that in pure-Python will be very inefficient. You could do that in C or with Cython. That being said this is not trivial to do and not great to maintain.

The built-in Python functions are implemented in C, which allows them to be much faster. It's not possible to make a function that performs just as well when using Python.

Related

Why '==' is fast than manually traversal of two strings comparison in Python3?

I try to solve the problem 28. Implement Str on LeetCode.
However, I have some questions about the time complexity of the two versions of the implemented codes.
# Version 1
class Solution:
def strStr(self, haystack, needle):
len_h = len(haystack)
len_n = len(needle)
if not needle:
return 0
if len_n > len_h:
return -1
i = 0
while i<len_h :
found = True
if haystack[i] == needle[0]:
for j in range(len_n):
if i+j >= len_h or haystack[i+j] != needle[j]:
found = False
break
if found:
return i
i += 1
return -1
In this version, I try to find the needle substring in the haystack using the double loops.
I think the time complexity of the code is O(mn) where m is the length of the haystack and n is the length of the needle.
Unfortunately, the code cannot pass the tests due to the time exceeding.
Then, I try to optimize my code and get version 2 of the code.
# Version 2
class Solution:
def strStr(self, haystack, needle):
len_h = len(haystack)
len_n = len(needle)
if not needle:
return 0
if len_n > len_h:
return -1
i = 0
while i<len_h :
found = True
if haystack[i] == needle[0]:
if haystack[i:i+len_n] == needle:
return i
i += 1
return -1
I compare the needle and the substring of the haystack using string-slice and '==' instead of the manual comparison. Then, the code passes the tests.
Now, I have some questions:
What is the time complexity of the string slice?
What is the time complexity of the check operation (==) between two strings?
Why version 2 is fast than version 1 if the time complexity of the check operation is O(n)?
Thanks for any advice.

str.__eq__(self, other) (that is, equality for strings) is implemented in C and is lightning fast (as fast as any other language once it starts).
Your Python-implemented character-wise string comparison is slow for two reasons. First, the looping logic is implemented in Python, and Python loops are never very fast. Second, when you say needle[j] that is slicing one string to construct another one. That by itself is slow, and you do it in a nested loop, so the overall runtime will be disastrous. You end up calling str.__eq__ once per character, and every time it's called it has to check the length of the strings on each side (it does not know you just sliced a single character).

Does one for loop mean a time complexity of n in this case?

So, I've run into this problem in the daily coding problem challenge, and I've devised two solutions. However, I am unsure if one is better than the other in terms of time complexity (Big O).
# Given a list of numbers and a number k,
# return whether any two numbers from the list add up to k.
#
# For example, given [10, 15, 3, 7] and k of 17, return true since 10 + 7 is 17.
#
# Bonus: Can you do this in one pass?
# The above part seemed to denote this can be done in O(n).
def can_get_value(lst=[11, 15, 3, 7], k=17):
for x in lst:
for y in lst:
if x+y == k:
return True
return False
def optimized_can_get_value(lst=[10, 15, 3, 7], k=17):
temp = lst
for x in lst:
if k-x in temp:
return True
else:
return False
def main():
print(can_get_value())
print(optimized_can_get_value())
if __name__ == "__main__":
main()
I think the second is better than the first since it has one for loop, but I'm not sure if it is O(n), since I'm still running through two lists. Another solution I had in mind that was apparently a O(n) solution was using the python equivalent of "Java HashSets". Would appreciate confirmation, and explanation of why/why not it is O(n).

The first solution can_get_value() is textbook O(n^2). You know this.
The second solution is as well. This is because elm in list has O(n) complexity, and you're executing it n times. O(n) * O(n) = O(n^2).
The O(n) solution here is to convert from a list into a set (or, well, any type of hash table - dict would work too). The following code runs through the list exactly twice, which is O(n):
def can_get_value(lst, k):
st = set(lst) # make a hashtable (set) where each key is the same as its value
for x in st: # this executes n times --> O(n)
if k-x in st: # unlike for lists, `in` is O(1) for hashtables
return True
return False
This is thus O(n) * O(1) = O(n) in most cases.

In order to analyze the asymptotic runtime of your code, you need to know the runtime of each of the functions which you call as well. We generally think of arithmetic expressions like addition as being constant time (O(1)), so your first function has two for loops over n elements and the loop body only takes constant time, coming out to O(n * n * 1) = O(n^2).
The second function has only one for loop, but checking membership for a list is an O(n) function in the length of the list, so you still have O(n * n) = O(n^2). The latter option may still be faster (Python probably has optimized code for checking list membership), but it won't be asymptotically faster (the runtime still increases quadratically in n).
EDIT - as #Mark_Meyer pointed out, your second function is actually O(1) because there's a bug in it; sorry, I skimmed it and didn't notice. This answer assumes a corrected version of the second function like
def optimized_can_get_value(lst, k=17):
for x in lst:
if k - x in lst:
return True
return False
(Note - don't have a default value for you function which is mutable. See this SO question for the troubles that can bring. I also removed the temporary list because there's no need for that; it was just pointing to the same list object anyway.)
EDIT 2: for fun, here are a couple of O(n) solutions to this (both use that checking containment for a set is O(1)).
A one-liner which still stops as soon as a solution is found:
def get_value_one_liner(lst, k):
return any(k - x in set(lst) for x in lst)
EDIT 3: I think this is actually O(n^2) because we call set(lst) for each x. Using Python 3.8's assignment expressions could, I think, give us a one-liner that is still efficient. Does anybody have a good Python <3.8 one-liner?
And a version which tries not to do extra work by building up a set as it goes (not sure if this is actually faster in practice than creating the whole set at the start; it probably depends on the actual input data):
def get_value_early_stop(lst, k):
values = set()
for x in lst:
if x in values:
return True
values.add(k - x)
return False

Reverse slice from end of list to a specific index

Let's say I need a slice from the end of a sequence seq to the first occurrence of a given item x (inclusive). The naive attempt to write seq[-1:seq.index(x)-1:-1] creates a subtle bug:
seq = 'abc'
seq[-1:seq.index('b')-1:-1] # 'cb' as expected
seq[-1:seq.index('a')-1:-1] # '' because -1 is interpreted as end of seq
Is there any idiomatic way to write this?
seq[seq.index(x):][::-1] works fine, but it is presumably inefficient for large sequences since it creates an extra copy. (I do need a sequence in the end, so one copy is necessary; I just don't want to create a second copy.)
On a side note, this is a really easy bug to introduce, it can pass many tests, and is undetectable to any static analyzer (unless it warns about every slice with a negative step).
Update
There seems to be no perfect / idiomatic solution. I agree that it may not be the bottleneck as often as I thought, so I'll use [pos:][::-1] in most cases. When performance is important, I'd use the normal if check. However, I'll accept the solution that I found interesting even though it's hard to read; it's probably usable in certain rare cases (where I really need to fit the whole thing into an expression, and I don't want to define a new function).
Also, I tried timing this. For lists it seems there's always a 2x penalty for an extra slice even if they are as short as 2 items. For strings, the results are extremely inconsistent, to the point that I can't say anything:
import timeit
for n in (2, 5, 10, 100, 1000, 10000, 100000, 1000000):
c = list(range(n))
# c = 'x' * n
pos = n // 2 # pretend the item was found in the middle
exprs = 'c[pos:][::-1]', 'c[:pos:-1] if pos else c[::-1]'
results = [timeit.Timer(expr, globals=globals()).autorange() for expr in exprs]
times = [t/loops for loops, t in results]
print(n, times[0]/times[1])
Results for lists (ratio of extra slice / no extra slice times):
2 2.667782437753884
5 2.2672817613246914
10 1.4275235266754878
100 1.6167102119737584
1000 1.7309116253903338
10000 3.606259720606781
100000 2.636049703318956
1000000 1.9915776615090277
Of course, this ignores the fact that whatever it is we're doing with the resulting slice is much more costly, in relative terms, when the slice is short. So still, I agree that for sequences of small size, [::-1] is usually perfectly fine.

If an iterator result is okay, use a forward slice and call reversed on it:
reversed(seq[seq.index(whatever):])
If it isn't, subtract an extra len(seq) from the endpoint:
seq[:seq.index(whatever)-len(seq)-1:-1]
Or just take a forward slice, slice it again to reverse it, and eat the cost of the extra copy. It's probably not your bottleneck.
Whatever you do, leave a comment explaining it so people don't reintroduce the bug when editing, and write a unit test for this case.

IMHO, seq[seq.index(x):][::-1] is the most readable solution, but here's a way that's a little more efficient.
def sliceback(seq, key):
pos = seq.index(key)
return seq[:pos-1 if pos else None:-1]
seq = 'abc'
for k in seq:
print(k, sliceback(seq, k))
output
a cba
b cb
c c
As Budo Zindovic mentions in the comments, .index will raise an exception if the char isn't found in the string. Depending on the context, the code may not ever be called with a char that's not in seq, but if it's possible we need to handle it. The simplest way to do that is to catch the exception:
def sliceback(seq, key):
try:
pos = seq.index(key)
except ValueError:
return ''
return seq[:pos-1 if pos else None:-1]
seq = 'abc'
for k in 'abcd':
print(k, sliceback(seq, k))
output
a cba
b cb
c c
d
Python exception handling is very efficient. When the exception isn't actually raised it's faster than equivalent if-based code, but if the exception is raised more than 5-10% of the time it's faster to use an if.
Rather than testing for key before calling seq.index, it's more efficient to use find. Of course, that will only work if seq is a string; it won't work if seq is a list because (annoyingly) lists don't have a .find method.
def sliceback(seq, key):
pos = seq.find(key)
return '' if pos < 0 else seq[:pos-1 if pos else None:-1]

You can check for pos while assigning the string, for example:
result = seq[-1:pos-1:-1] if pos > 0 else seq[::-1]
input:
pos = seq.index('a')
output:
cba

How to decrease running time of this particular solution?

I am trying to write a piece of code that will generate a permutation, or some series of characters that are all different in a recursive fashion.
def getSteps(length, res=[]):
if length == 1:
if res == []:
res.append("l")
res.append("r")
return res
else:
for i in range(0,len(res)):
res.append(res[i] + "l")
res.append(res[i] + "r")
print(res)
return res
else:
if res == []:
res.append("l")
res.append("r")
return getSteps(length-1,res)
else:
for i in range(0,len(res)):
res.append(res[i] + "l")
res.append(res[i] + "r")
print(res)
return getSteps(length-1,res)
def sanitize(length, res):
return [i for i in res if len(str(i)) == length]
print(sanitize(2,getSteps(2)))
So this would return
"LL", "LR", "RR, "RL" or some permutation of the series.
I can see right off the bat that this function probably runs quite slowly, seeing as I have to loop through an entire array. I tried to make the process as efficient as I could, but this is as far as I can get. I know that some unnecessary things happen during the run, but I don't know how to make it much better. So my question is this: what would I do to increase the efficiency and decrease the running time of this code?
edit = I want to be able to port this code to java or some other language in order to understand the concept of recursion rather than use external libraries and have my problem solved without understanding it.

Your design is broken. If you call getSteps again, res won't be an empty list, it will have garbage left over from the last call in it.
I think you want to generate permutations recursively, but I don't understand where you are going with the getSteps function
Here is a simple recursive function
def fn(x):
if x==1:
return 'LR'
return [j+i for i in fn(x-1) for j in "LR"]

Is there a way to combine the binary approach and a recursive approach?
Yes, and #gribbler came very close to that in the post to which that comment was attached. He just put the pieces together in "the other order".
How can you construct all the bitstrings of length n, in increasing order (when viewed as binary integers)? Well, if you already have all the bitstrings of length n-1, you can prefix them all with 0, and then prefix them all again with 1. It's that easy.
def f(n):
if n == 0:
return [""]
return [a + b for a in "RL" for b in f(n-1)]
print(f(3))
prints
['RRR', 'RRL', 'RLR', 'RLL', 'LRR', 'LRL', 'LLR', 'LLL']
Replace R with 0, and L with 1, and you have the 8 binary integers from 0 through 7 in increasing order.

You should look into itertools. There is a function there called permutations which does exactly what you want to achieve here.

Python lists, dictionary optimization

I have been attending a couple of hackathons. I am beginning to understand that writing code is not enough. The code has to be optimized. That brings me to my question. Here are two questions that I faced.
def pairsum(numbers, k)
"""Write a function that returns two values in numbers whose sum is K"""
for i, j in numbers:
if i != j:
if i+j == k
return i, j
I wrote this function. And I was kind of stuck with optimization.
Next problem.
string = "ksjdkajsdkajksjdalsdjaksda"
def dedup(string):
""" write a function to remove duplicates in the variable string"""
output = []
for i in string:
if i not in output:
output.append(i)
These are two very simple programs that I wrote. But I am stuck with optimization after this. More on this, when we optimize code, how does the complexity reduce? Any pointers will help. Thanks in advance.

Knowing the most efficient Python idioms and also designing code that can reduce iterations and bail out early with an answer is a major part of optimization. Here are a few examples:
List list comprehensions and generators are usually fastest:
With a straightforward nested approach, a generator is faster than a for loop:
def pairsum(numbers, k):
"""Returns two unique values in numbers whose sum is k"""
return next((i, j) for i in numbers for j in numbers if i+j == k and i != j)
This is probably faster on average since it only goes though one iteration at most and does not check if a possible result is in numbers unless k-i != i:
def pairsum(numbers, k):
"""Returns two unique values in numbers whose sum is k"""
return next((k-i, i) for i in numbers if k-i != i and k-i in numbers)
Ouput:
>>> pairsum([1,2,3,4,5,6], 8)
(6, 2)
Note: I assumed numbers was a flat list since the doc string did not mention tuples and it makes the problem more difficult which is what I would expect in a competition.
For the second problem, if you are to create your own function as opposed to just using ''.join(set(s)) you were close:
def dedup(s):
"""Returns a string with duplicate characters removed from string s"""
output = ''
for c in s:
if c not in output:
output += c
return output
Tip: Do not use string as a name
You can also do:
def dedup(s):
for c in s:
s = c + s.replace(c, '')
return s
or a much faster recursive version:
def dedup(s, out=''):
s0, s = s[0], s.replace(s[0], '')
return dedup(s, n + s0) if s else out + s0
but not as fast as set for strings without lots of duplicates:
def dedup(s):
return ''.join(set(s))
Note: set() will not preserve the order of the remaining characters while the other approaches will preserve the order based on first occurrence.

Your first program is a little vague. I assume numbers is a list of tuples or something? Like [(1,2), (3,4), (5,6)]? If so, your program is pretty good, from a complexity standpoint - it's O(n). Perhaps you want a little more Pythonic solution? The neatest way to clean this up would be to join your conditions:
if i != j and i + j == k:
But this simply increases readability. I think it may also add an additional boolean operation, so it might not be an optimization.
I am not sure if you intended for your program to return the first pair of numbers which sum to k, but if you wanted all pairs which meet this requirement, you could write a comprehension:
def pairsum(numbers, k):
return list(((i, j) for i, j in numbers if i != j and i + j == k))
In that example, I used a generator comprehension instead of a list comprehension so as to conserve resources - generators are functions which act like iterators, meaning that they can save memory by only giving you data when you need it. This is called lazy iteration.
You can also use a filter, which is a function which returns only the elements from a set for which a predicate returns True. (That is, the elements which meet a certain requirement.)
import itertools
def pairsum(numbers, k):
return list(itertools.ifilter(lambda t: t[0] != t[1] and t[0] + t[1] == k, ((i, j) for i, j in numbers)))
But this is less readable in my opinion.
Your second program can be optimized using a set. If you recall from any discrete mathematics you may have learned in grade school or university, a set is a collection of unique elements - in other words, a set has no duplicate elements.
def dedup(mystring):
return set(mystring)
The algorithm to find the unique elements of a collection is generally going to be O(n^2) in time if it is O(1) in space - if you allow yourself to allocate more memory, you can use a Binary Search Tree to reduce the time complexity to O(n log n), which is likely how Python sets are implemented.
Your solution took O(n^2) time but also O(n) space, because you created a new list which could, if the input was already a string with only unique elements, take up the same amount of space - and, for every character in the string, you iterated over the output. That's essentially O(n^2) (although I think it's actually O(n*m), but whatever). I hope you see why this is. Read the Binary Search Tree article to see how it improves your code. I don't want to re-implement one again... freshman year was so grueling!

The key to optimization is basically to figure out a way to make the code do less work, in terms of the total number of primitive steps that needs to be performed. Code that employs control structures like nested loops quickly contributes to the number of primitive steps needed. Optimization is therefore often about replacing loops iterating over the a full list with something more clever.
I had to change the unoptimized pairsum() method sligtly to make it usable:
def pairsum(numbers, k):
"""
Write a function that returns two values in numbers whose sum is K
"""
for i in numbers:
for j in numbers:
if i != j:
if i+j == k:
return i,j
Here we see two loops, one nested inside the other. When describing the time complexity of a method like this, we often say that it is O(n²). Since when the length of the numbers array passed in grows proportional to n, then the number of primitive steps grows proportional to n². Specifically, the i+j == k conditional is evaluated exactly len(number)**2 times.
The clever thing we can do here is to presort the array at the cost of O(n log(n)) which allows us to hone in on the right answer by evaluating each element of the sorted array at most one time.
def fast_pairsum(numbers, k):
sortedints = sorted(numbers)
low = 0
high = len(numbers) - 1
i = sortedints[0]
j = sortedints[-1]
while low < high:
diff = i + j - k
if diff > 0:
# Too high, let's lower
high -= 1
j = sortedints[high]
elif diff < 0:
# Too low, let's increase.
low += 1
i = sortedints[low]
else:
# Just right
return i, j
raise Exception('No solution')
These kinds of optimization only begin to really matter when the size of the problem becomes large. On my machine the break-even point between pairsum() and fast_pairsum() is with a numbers array containing 13 integers. For smaller arrays pairsum() is faster, and for larger arrays fast_pairsum() is faster. As the size grows fast_pairsum() becomes drastically faster than the unoptimized pairsum().
The clever thing to do for dedup() is to avoid having to linearly scan through the output list to find out if you've already seen a character. This can be done by storing information about which characters you've seen in a set, which has O(log(n)) look-up cost, rather than the O(n) look-up cost of a regular list.
With the outer loop, the total cost becomes O(n log(n)) rather than O(n²).
def fast_dedup(string):
# if we didn't care about the order of the characters in the
# returned string we could simply do
# return set(string)
seen = set()
output = []
seen_add = seen.add
output_append = output.append
for i in string:
if i not in seen:
seen_add(i)
output_append(i)
return output
On my machine the break-even point between dedup() and fast_dedup() is with a string of length 30.
The fast_dedup() method also shows another simple optimization trick: Moving as much of the code out of the loop bodies as possible. Since looking up the add() and append() members in the seen and output objects takes time, it is cheaper to do it once outside the loop bodies and store references to the members in variables that is used repeatedly inside the loop bodies.

To properly optimize Python, one needs to find a good algorithm for the problem and a Python idiom close to that algorithm. Your pairsum example is a good case. First, your implementation appears wrong — numbers is most likely a sequence of numbers, not a sequence of pairs of numbers. Thus a naive implementation would look like this:
def pairsum(numbers, k)
"""Write a function that returns two values in numbers whose sum is K"""
for i in numbers:
for j in numbers:
if i != j and i + j != k:
return i, j
This will perform n^2 iterations, n being the length of numbers. For small ns this is not a problem, but once n gets into hundreds, the nested loops will become visibly slow, and once n gets into thousands, they will become unusable.
An optimization would be to recognize the difference between the inner and the outer loops: the outer loop traverses over numbers exactly once, and is unavoidable. The inner loop, however, is only used to verify that the other number (which has to be k - i) is actually present. This is a mere lookup, which can be made extremely fast by using a dict, or even better, a set:
def pairsum(numbers, k)
"""Write a function that returns two values in numbers whose sum is K"""
numset = set(numbers)
for i in numbers:
if k - i in numset:
return i, k - i
This is not only faster by a constant because we're using a built-in operation (set lookup) instead of a Python-coded loop. It actually does less work because set has a smarter algorithm of doing the lookup, it performs it in constant time.
Optimizing dedup in the analogous fashion is left as an excercise for the reader.

Your string one, order preserving is most easily and should be fairly efficient written as:
from collections import OrderedDict
new_string = ''.join(OrderedDict.fromkeys(old_string))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How is str.find so fast? - python

The built-in Python functions are implemented in C, which allows them to be much faster. It's not possible to make a function that performs just as well when using Python.

Related

Why '==' is fast than manually traversal of two strings comparison in Python3?

Does one for loop mean a time complexity of n in this case?

Reverse slice from end of list to a specific index

How to decrease running time of this particular solution?

Python lists, dictionary optimization

Categories

Resources