Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to match "necklaces" of symbols in Python by looking up their linear representations, for which I use normal strings. For example, the strings "AABC", "ABCA", "BCAA", "CAAB" all represent the same necklace (pictured).
In order to get an overview, I store only one of the equivalent strings of a given necklace as a "representative". As to check if I've stored a candidate necklace, I need a function to normalize any given string representation. As a kind of pseudo code, I wrote a function in Python:
import collections
def normalized(s):
q = collections.deque(s)
l = list()
l.append(''.join(q))
for i in range(len(s)-1):
q.rotate(1)
l.append(''.join(q))
l.sort()
return l[0]
For all the string representations in the above example necklace, this function returns "AABC", which comes first alphabetically.
Since I'm relatively new to Python, I wonder - if I'd start implementing an application in Python - would this function be already "good enough" for production code? In other words: Would an experienced Python programmer use this function, or are there obvious flaws?
If I understand you correctly you first need to construct all circular permutations of the input sequence and then determine the (lexicographically) smallest element. That is the root of your symbol loop.
Try this:
def normalized(s):
L = [s[i:] + s[:i] for i in range(len(s))]
return sorted(L)[0]
This code works only with strings, no conversions between queue and string as in your code. A quick timing test shows this code to run in 30-50% of the time.
It would be interesting to know the length of s in your application. As all permutations have to be stored temporarily len(s)^2 bytes are needed for the temp list L. Hopefully this is not a constraint in your case.
Edit:
Today I stumbled upon the observation that if you concatenate the original string to itself it will contain all rotations as substrings. So the code will be:
def normalized4(s):
ss = s + s # contains all rotations of s as substrings
n = len(s)
return min((ss[i:i+n] for i in range(n)))
This will indeed run faster as there is only one concatination left plus n slicings. Using stringlengths of 10 to 10**5 the runtime is between 55% and 66% on my machine, compared to the min() version with a generator.
Please note that you trade off speed for memory consumption (2x) which doesn't matter here but might do in a different setting.
You could use min rather then sorting:
def normalized2(s):
return min(( s[i:] + s[:i] for i in range(len(s)) ))
But still it needs to copy string len(s) times. Faster way is to filter starting indexes of smallest char, until you get only one. Effectively search for smallest loop:
def normalized3(s):
ssize=len(s)
minchar= min(s)
minindexes= [ i for i in range(ssize) if minchar == s[i] ]
for offset in range(1,ssize):
if len( minindexes ) == 1 :
break
minchar= min( s[(i+offset)%ssize] for i in minindexes )
minindexes= [i for i in minindexes if minchar == s[(i+offset)%ssize]]
return s[minindexes[0]:] + s[:minindexes[0]]
For long string this is much faster:
In [143]: loop = [ random.choice("abcd") for i in range(100) ]
In [144]: timeit normalized(loop)
1000 loops, best of 3: 237 µs per loop
In [145]: timeit normalized2(loop)
10000 loops, best of 3: 91.3 µs per loop
In [146]: timeit normalized3(loop)
100000 loops, best of 3: 16.9 µs per loop
But if we have much of repetition, this method is not eficient:
In [147]: loop = "abcd" * 25
In [148]: timeit normalized(loop)
1000 loops, best of 3: 245 µs per loop
In [149]: timeit normalized2(loop)
100000 loops, best of 3: 18.8 µs per loop
In [150]: timeit normalized3(loop)
1000 loops, best of 3: 612 µs per loop
We can also scan forward the string, but I doubt it could be any faster, without some fancy algorithm.
How about something like this:
patterns = ['abc', 'bca', 'cab']
normalized = lambda p: ''.join(sorted(p))
normalized_patterns = set(normalized(p) for p in patterns)
Example output:
In [1]: normalized = lambda p: ''.join(sorted(p))
In [2]: normalized('abba')
Out[2]: 'aabb'
In [3]: normalized('CBAE')
Out[3]: 'ABCE'
Related
I know this subject is well discussed but I've come around a case I don't really understand how the recursive method is "slower" than a method using "reduce,lambda,xrange".
def factorial2(x, rest=1):
if x <= 1:
return rest
else:
return factorial2(x-1, rest*x)
def factorial3(x):
if x <= 1:
return 1
return reduce(lambda a, b: a*b, xrange(1, x+1))
I know python doesn't optimize tail recursion so the question isn't about it. To my understanding, a generator should still generate n amount of numbers using the +1 operator. So technically, fact(n) should add a number n times just like the recursive one. The lambda in the reduce will be called n times just as the recursive method... So since we don't have tail call optimization in both case, stacks will be created/destroyed and returned n times. And a if in the generator should check when to raise a StopIteration exception.
This makes me wonder why does the recursive method still slowlier than the other one since the recursive one use simple arithmetic and doesn't use generators.
In a test, I replaced rest*x by x in the recursive method and the time spent got reduced on par with the method using reduce.
Here are my timings for fact(400), 1000 times
factorial3 : 1.22370505333
factorial2 : 1.79896998405
Edit:
Making the method start from 1 to n doesn't help either instead of n to 1. So not an overhead with the -1.
Also, can we make the recursive method faster. I tried multiple things like global variables that I can change... Using a mutable context by placing variables in cells that I can modify like an array and keep the recursive method without parameters. Sending the method used for recursion as a parameter so we don't have to "dereference" it in our scope...?! But nothings makes it faster.
I'll point out that I have a version of the fact that use a forloop that is much faster than both of those 2 methods so there is clearly space for improvement but I wouldn't expect anything faster than the forloop.
The slowness of the recursive version comes from the need to resolve on each call the named (argument) variables. I have provided a different recursive implementation that has only one argument and it works slightly faster.
$ cat fact.py
def factorial_recursive1(x):
if x <= 1:
return 1
else:
return factorial_recursive1(x-1)*x
def factorial_recursive2(x, rest=1):
if x <= 1:
return rest
else:
return factorial_recursive2(x-1, rest*x)
def factorial_reduce(x):
if x <= 1:
return 1
return reduce(lambda a, b: a*b, xrange(1, x+1))
# Ignore the rest of the code for now, we'll get back to it later in the answer
def range_prod(a, b):
if a + 1 < b:
c = (a+b)//2
return range_prod(a, c) * range_prod(c, b)
else:
return a
def factorial_divide_and_conquer(n):
return 1 if n <= 1 else range_prod(1, n+1)
$ ipython -i fact.py
In [1]: %timeit factorial_recursive1(400)
10000 loops, best of 3: 79.3 µs per loop
In [2]: %timeit factorial_recursive2(400)
10000 loops, best of 3: 90.9 µs per loop
In [3]: %timeit factorial_reduce(400)
10000 loops, best of 3: 61 µs per loop
Since in your example very large numbers are involved, initially I suspected that the performance difference might be due to the order of multiplication. Multiplying on every iteration a large partial product by the next number is proportional to the number of digits/bits in the product, so the time complexity of such a method is O(n2), where n is the number of bits in the final product. Instead it is better to use a divide and conquer technique, where the final result is obtained as a product of two approximately equally long values each of which is computed recursively in the same manner. So I implemented that version too (see factorial_divide_and_conquer(n) in the above code) . As you can see below it still loses to the reduce()-based version for small arguments (due to the same problem with named parameters) but outperforms it for large arguments.
In [4]: %timeit factorial_divide_and_conquer(400)
10000 loops, best of 3: 90.5 µs per loop
In [5]: %timeit factorial_divide_and_conquer(4000)
1000 loops, best of 3: 1.46 ms per loop
In [6]: %timeit factorial_reduce(4000)
100 loops, best of 3: 3.09 ms per loop
UPDATE
Trying to run the factorial_recursive?() versions with x=4000 hits the default recursion limit, so the limit must be increased:
In [7]: sys.setrecursionlimit(4100)
In [8]: %timeit factorial_recursive1(4000)
100 loops, best of 3: 3.36 ms per loop
In [9]: %timeit factorial_recursive2(4000)
100 loops, best of 3: 7.02 ms per loop
Given a string, find the first non-repeating character in it and return its index. If it doesn't exist, return -1. You may assume the string contain only lowercase letters.
I'm going to define a hash that tracks the occurrence of characters. Traverse the string from left to right, check if the current character is in the hash, continue if yes, otherwise in another loop traverse the rest of the string to see if the current character exists. Return the index if not and update the hash if it exists.
def firstUniqChar(s):
track = {}
for index, i in enumerate(s):
if i in track:
continue
elif i in s[index+1:]: # For the last element, i in [] holds False
track[i] = 1
continue
else:
return index
return -1
firstUniqChar('timecomplexity')
What's the time complexity (average and worst) of my algorithm?
Your algorithm has time complexity of O(kn) where k is the number of unique characters in the string. If k is a constant then it is O(n). As the problem description clearly bounds the number of alternatives for elements ("assume lower-case (ASCII) letters"), thus k is constant and your algorithm runs in O(n) time on this problem. Even though n will grow to infinite, you will only make O(1) slices of the string and your algorithm will remain O(n). If you removed track, then it would be O(n²):
In [36]: s = 'abcdefghijklmnopqrstuvwxyz' * 10000
In [37]: %timeit firstUniqChar(s)
100 loops, best of 3: 18.2 ms per loop
In [38]: s = 'abcdefghijklmnopqrstuvwxyz' * 20000
In [37]: %timeit firstUniqChar(s)
10 loops, best of 3: 36.3 ms per loop
In [38]: s = 'timecomplexity' * 40000 + 'a'
In [39]: %timeit firstUniqChar(s)
10 loops, best of 3: 73.3 ms per loop
It pretty much holds there that the T(n) is still of O(n) complexity - it scales exactly linearly with number of characters in the string, even though this is the worst-case scenario for your algorithm - there is no single character that is be unique.
I will present a not-that efficient, but simple and smart method here; count the character histogram first with collections.Counter; then iterate over the characters finding the one
from collections import Counter
def first_uniq_char_ultra_smart(s):
counts = Counter(s)
for i, c in enumerate(s):
if counts[c] == 1:
return i
return -1
first_uniq_char('timecomplexity')
This has time complexity of O(n); Counter counts the histogram in O(n) time and we need to enumerate the string again for O(n) characters. However in practice I believe my algorithm has low constants, because it uses a standard dictionary for Counter.
And lets make a very stupid brute-force algorithm. Since you can assume that the string contains only lower-case letters, then use that assumption:
import string
def first_uniq_char_very_stupid(s):
indexes = []
for c in string.ascii_lowercase:
if s.count(c) == 1:
indexes.append(s.find(c))
# default=-1 is Python 3 only
return min(indexes, default=-1)
Let's test my algorithm and some algorithms found in the other answers, on Python 3.5. I've chosen a case that is pathologically bad for my algorithm:
In [30]: s = 'timecomplexity' * 10000 + 'a'
In [31]: %timeit first_uniq_char_ultra_smart(s)
10 loops, best of 3: 35 ms per loop
In [32]: %timeit karin(s)
100 loops, best of 3: 11.7 ms per loop
In [33]: %timeit john(s)
100 loops, best of 3: 9.92 ms per loop
In [34]: %timeit nicholas(s)
100 loops, best of 3: 10.4 ms per loop
In [35]: %timeit first_uniq_char_very_stupid(s)
1000 loops, best of 3: 1.55 ms per loop
So, my stupid algorithm is the fastest, because it finds the a at the end and bails out. And my smart algorithm is slowest, One more reason for bad performance of my algorithm besides this being worst case is that OrderedDict is written in C on Python 3.5, and Counter is in Python.
Let's make a better test here:
In [60]: s = string.ascii_lowercase * 10000
In [61]: %timeit nicholas(s)
100 loops, best of 3: 18.3 ms per loop
In [62]: %timeit karin(s)
100 loops, best of 3: 19.6 ms per loop
In [63]: %timeit john(s)
100 loops, best of 3: 18.2 ms per loop
In [64]: %timeit first_uniq_char_very_stupid(s)
100 loops, best of 3: 2.89 ms per loop
So it appears that the "stupid" algorithm of mine isn't all that stupid at all, it exploits the speed of C while minimizing the number of iterations of Python code being run, and wins clearly in this problem.
As others have noted, your algorithm is O(n²) due to nested linear search. As discovered by #Antti, the OP's algorithm is linear and bound by O(kn) for k as the number of all possible lowercase letters.
My proposition for an O(n) solution:
from collections import OrderedDict
def first_unique_char(string):
duplicated = OrderedDict() # ordered dict of char to boolean indicating duplicate existence
for s in string:
duplicated[s] = s in duplicated
for char, is_duplicate in duplicated.items():
if not is_duplicate:
return string.find(char)
return -1
print(first_unique_char('timecomplexity')) # 4
Your algorithm is O(n2), because you have a "hidden" iteration over a slice of s inside the loop over s.
A faster algorithm would be:
def first_unique_character(s):
good = {} # char:idx
bad = set() # char
for index, ch in enumerate(s):
if ch in bad:
continue
if ch in good: # new repeat
bad.add(ch)
del good[ch]
else:
good[ch] = index
if not good:
return -1
return min(good.values())
This is O(n) because the in lookups use hash tables, and the number of distinct characters should be much less than len(s).
I'm tyring to make my code faster by removing some for loops and using arrays. The slowest step right now is the generation of the random lists.
context: I have a number of mutations in a chromosome, i want to perform 1000 random "chromosomes" with the same length and same number of mutation but their positions are randomized.
here is what I'm currently running to generate these randomized mutation positions:
iterations=1000
Chr_size=1000000
num_mut=500
randbps=[]
for k in range(iterations):
listed=np.random.choice(range(Chr_size),num_mut,replace=False)
randbps.append(listed)
I want to do something similar to what they cover in this question
np.random.choice(range(Chr_size),size=(num_mut,iterations),replace=False)
however without replacement applies to the array as a whole.
further context: later in the script i go through each randomized chromosome and count the number of mutations in a given window:
for l in range(len(randbps)):
arr=np.asarray(randbps[l])
for i in range(chr_last_window[f])[::step]:
counter=((i < arr) & (arr < i+window)).sum()
I don't know how np.random.choice is implemented but I am guessing it is optimized for a general case. Your numbers, on the other hand, are not likely to produce the same sequences. Sets may be more efficient for this case, building from scratch:
import random
def gen_2d(iterations, Chr_size, num_mut):
randbps = set()
while len(randbps) < iterations:
listed = set()
while len(listed) < num_mut:
listed.add(random.choice(range(Chr_size)))
randbps.add(tuple(sorted(listed)))
return np.array(list(randbps))
This function starts with an empty set, generates a single number in range(Chr_size) and adds that number to the set. Because of the properties of the sets it cannot add the same number again. It does the same thing for the randbps as well so each element of randbps is also unique.
For only one iteration of np.random.choice vs gen_2d:
iterations=1000
Chr_size=1000000
num_mut=500
%timeit np.random.choice(range(Chr_size),num_mut,replace=False)
10 loops, best of 3: 141 ms per loop
%timeit gen_2d(1, Chr_size, num_mut)
1000 loops, best of 3: 647 µs per loop
Based on the trick used in this solution, here's an approach that uses argsort/argpartition on an array of random elements to simulate numpy.random.choice without replacement to give us randbps as a 2D array -
np.random.rand(iterations,Chr_size).argpartition(num_mut)[:,:num_mut]
Runtime test -
In [2]: def original_app(iterations,Chr_size,num_mut):
...: randbps=[]
...: for k in range(iterations):
...: listed=np.random.choice(range(Chr_size),num_mut,replace=False)
...: randbps.append(listed)
...: return randbps
...:
In [3]: # Input params (scaled down version of params listed in question)
...: iterations=100
...: Chr_size=100000
...: num=50
...:
In [4]: %timeit original_app(iterations,Chr_size,num)
1 loops, best of 3: 1.53 s per loop
In [5]: %timeit np.random.rand(iterations,Chr_size).argpartition(num)[:,:num]
1 loops, best of 3: 424 ms per loop
I have a Python function that takes a list and returns a generator yielding 2-tuples of each adjacent pair, e.g.
>>> list(pairs([1, 2, 3, 4]))
[(1, 2), (2, 3), (3, 4)]
I've considered an implementation using 2 slices:
def pairs(xs):
for p in zip(xs[:-1], xs[1:]):
yield p
and one written in a more procedural style:
def pairs(xs):
last = object()
dummy = last
for x in xs:
if last is not dummy:
yield last,x
last = x
Testing using range(2 ** 15) as input yields the following times (you can find my testing code and output here):
2 slices: 100 loops, best of 3: 4.23 msec per loop
0 slices: 100 loops, best of 3: 5.68 msec per loop
Part of the performance hit for the sliceless implementation is the comparison in the loop (if last is not dummy). Removing that (making the output incorrect) improves its performance, but it's still slower than the zip-a-pair-of-slices implementation:
2 slices: 100 loops, best of 3: 4.48 msec per loop
0 slices: 100 loops, best of 3: 5.2 msec per loop
So, I'm stumped. Why is zipping together 2 slices, effectively iterating over the list twice in parallel, faster than iterating once, updating last and x as you go?
EDIT
Dan Lenski proposed a third implementation:
def pairs(xs):
for ii in range(1,len(xs)):
yield xs[ii-1], xs[ii]
Here's its comparison to the other implementations:
2 slices: 100 loops, best of 3: 4.37 msec per loop
0 slices: 100 loops, best of 3: 5.61 msec per loop
Lenski's: 100 loops, best of 3: 6.43 msec per loop
It's even slower! Which is baffling to me.
EDIT 2:
ssm suggested using itertools.izip instead of zip, and it's even faster than zip:
2 slices, izip: 100 loops, best of 3: 3.68 msec per loop
So, izip is the winner so far! But still for difficult-to inspect reasons.
Lots of interesting discussion elsewhere in this thread. Basically, we started out comparing two versions of this function, which I'm going to describe with the following dumb names:
The "zip-py" version:
def pairs(xs):
for p in zip(xs[:-1], xs[1:]):
yield p
The "loopy" version:
def pairs(xs):
last = object()
dummy = last
for x in xs:
if last is not dummy:
yield last,x
last = x
So why does the loopy version turn out to be slower? Basically, I think it comes down to a couple things:
The loopy version explicitly does extra work: it compares two objects' identities (if last is not dummy: ...) on every pair-generating iteration of the inner loop.
#mambocab's edit shows that not doing this comparison does make the loopy version
slightly faster, but doesn't fully close the gap.
The zippy version does more stuff in compiled C code that the loopy version does in Python code:
Combining two objects into a tuple. The loopy version does yield last,x, while in the zippy version the tuple p comes straight from zip, so it just does yield p.
Binding variable names to object: the loopy version does this twice in every loop, assigning x in the for loop and last=x. The zippy version does this just once, in the for loop.
Interestingly, there is one way in which the zippy version is actually doing more work: it uses two listiterators, iter(xs[:-1]) and iter(xs[1:]), which get passed to zip. The loopy version only uses one listiterator (for x in xs).
Creating a listiterator object (the output of iter([])) is likely a very highly optimized operation since Python programmers use it so frequently.
Iterating over list slices, xs[:-1] and xs[1:], is a very lightweight operation which adds almost no overhead compared to iterating over the whole list. Essentially, it just means moving the starting or ending point of the iterator, but not changing what happens on each iteration.
This i the result for the iZip which is actually closer to your implementation. Looks like what you would expect. The zip version is creating the entire list in memory within the function so it is the fastest. The loop version just los through the list, so it is a little slower. The izipis the closest in resemblance to the code, but I am guessing there is some memory-management backend processes which increase the time of execution.
In [11]: %timeit pairsLoop([1,2,3,4,5])
1000000 loops, best of 3: 651 ns per loop
In [12]: %timeit pairsZip([1,2,3,4,5])
1000000 loops, best of 3: 637 ns per loop
In [13]: %timeit pairsIzip([1,2,3,4,5])
1000000 loops, best of 3: 655 ns per loop
The version of code is shown below as requested:
from itertools import izip
def pairsIzip(xs):
for p in izip(xs[:-1], xs[1:]):
yield p
def pairsZip(xs):
for p in zip(xs[:-1], xs[1:]):
yield p
def pairsLoop(xs):
last = object()
dummy = last
for x in xs:
if last is not dummy:
yield last,x
last = x
I suspect the main reason that the second version is slower is because it does a comparison operation for every single pair that it yields:
# pair-generating loop
for x in xs:
if last is not dummy:
yield last,x
last = x
The first version does not do anything but spit out values. With the loop variables renamed, it's equivalent to this:
# pair-generating loop
for last,x in zip(xs[:-1], xs[1:]):
yield last,x
It's not especially pretty or Pythonic, but you could write a procedural version without a comparison in the inner loop. How fast does this one run?
def pairs(xs):
for ii in range(1,len(xs)):
yield xs[ii-1], xs[ii]
Regardless of ease of use, which is more computationally efficient? Constantly slicing lists and appending to them? Or taking substrings and doing the same?
As an example, let's say I have two binary strings "11011" and "01001". If I represent these as lists, I'll be choosing a random "slice" point. Let's say I get 3. I'll Take the first 3 characters of the first string and the remaining characters of the second string (so I'd have to slice both) and create a new string out of it.
Would this be more efficiently done by cutting the substrings or by representing it as a list ( [1, 1, 0, 1, 1] ) rather than a string?
>>> a = "11011"
>>> b = "01001"
>>> import timeit
>>> def strslice():
return a[:3] + b[3:]
>>> def lstslice():
return list(a)[:3] + list(b)[3:]
>>> c = list(a)
>>> d = list(b)
>>> def lsts():
return c[:3] + d[3:]
>>> timeit.timeit(strslice)
0.5103488475836432
>>> timeit.timeit(lstslice)
2.4350100538824613
>>> timeit.timeit(lsts)
1.0648406858527295
timeit is a good tool for micro-benchmarking, but it needs to be used with the utmost care when the operations you want to compare may involve in-place alterations -- in this case, you need to include extra operations designed to make needed copies. Then, first time just the "extra" overhead:
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b)'
100000 loops, best of 3: 5.01 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b)'
100000 loops, best of 3: 5.06 usec per loop
So making the two brand-new lists we need (to avoid alteration) costs a tad more than 5 microseconds (when focused on small differences, run things at least 2-3 times to eyeball the uncertainty range). After which:
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);x=a[:3]+b[3:]'
100000 loops, best of 3: 5.5 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);x=a[:3]+b[3:]'
100000 loops, best of 3: 5.47 usec per loop
string slicing and concatenation in this case can be seen to cost another 410-490 nanoseconds. And:
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);la[3:]=lb[3:]'
100000 loops, best of 3: 5.99 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);la[3:]=lb[3:]'
100000 loops, best of 3: 5.99 usec per loop
in-place list splicing can be seen to cost 930-980 nanoseconds. The difference is safely above the noise/uncertainty levels, so you can reliably state that for this use case working with strings is going to take roughly half as much time as working in-place with lists. Of course, it's also crucial to measure a range of use cases that are relevant and representative of your typical bottleneck tasks!
In general, modifying lists is more efficient than modifying strings, because strings are immutable.
It really depends on actual use cases, and as others have said, profile it, but in general, appending to lists will be better, because it can be done in place, whereas "appending to strings" actually creates a new string that concatenates the old strings. This can rapidly eat up memory. (Which is a different issue from computational efficiency, really).
Edit: If you want computational efficiency with binary values, don't use strings or lists. Use integers and bitwise operations. With recent versions of python, you can use binary representations when you need them:
>>> bin(42)
'0b101010'
>>> 0b101010
42
>>> int('101010')
101010
>>> int('101010', 2)
42
>>> int('0b101010')
...
ValueError: invalid literal for int() with base 10: '0b101010'
>>> int('0b101010', 2)
42
Edit 2:
def strslice(a, b):
return a[:3] + b[3:]
might be better written something like:
def binspice(a, b):
mask = 0b11100
return (a & mask) + (b & ~mask)
>>> a = 0b11011
>>> b = 0b1001
>>> bin(binsplice(a, b))
'0b11001
>>>
It might need to be modified if your binary numbers are different sizes.