I know that there are easy ways to generate lists of unique random integers (e.g. random.sample(range(1, 100), 10)).
I wonder whether there is some better way of generating a list of unique random floats, apart from writing a function that acts like a range, but accepts floats like this:
import random
def float_range(start, stop, step):
vals = []
i = 0
current_val = start
while current_val < stop:
vals.append(current_val)
i += 1
current_val = start + i * step
return vals
unique_floats = random.sample(float_range(0, 2, 0.2), 3)
Is there a better way to do this?
Answer
One easy way is to keep a set of all random values seen so far and reselect if there is a repeat:
import random
def sample_floats(low, high, k=1):
""" Return a k-length list of unique random floats
in the range of low <= x <= high
"""
result = []
seen = set()
for i in range(k):
x = random.uniform(low, high)
while x in seen:
x = random.uniform(low, high)
seen.add(x)
result.append(x)
return result
Notes
This technique is how Python's own random.sample() is implemented.
The function uses a set to track previous selections because searching a set is O(1) while searching a list is O(n).
Computing the probability of a duplicate selection is equivalent to the famous Birthday Problem.
Given 2**53 distinct possible values from random(), duplicates are infrequent.
On average, you can expect a duplicate float at about 120,000,000 samples.
Variant: Limited float range
If the population is limited to just a range of evenly spaced floats, then it is possible to use random.sample() directly. The only requirement is that the population be a Sequence:
from __future__ import division
from collections import Sequence
class FRange(Sequence):
""" Lazily evaluated floating point range of evenly spaced floats
(inclusive at both ends)
>>> list(FRange(low=10, high=20, num_points=5))
[10.0, 12.5, 15.0, 17.5, 20.0]
"""
def __init__(self, low, high, num_points):
self.low = low
self.high = high
self.num_points = num_points
def __len__(self):
return self.num_points
def __getitem__(self, index):
if index < 0:
index += len(self)
if index < 0 or index >= len(self):
raise IndexError('Out of range')
p = index / (self.num_points - 1)
return self.low * (1.0 - p) + self.high * p
Here is a example of choosing ten random samples without replacement from a range of 41 evenly spaced floats from 10.0 to 20.0.
>>> import random
>>> random.sample(FRange(low=10.0, high=20.0, num_points=41), k=10)
[13.25, 12.0, 15.25, 18.5, 19.75, 12.25, 15.75, 18.75, 13.0, 17.75]
You can easily use your list of integers to generate floats:
int_list = random.sample(range(1, 100), 10)
float_list = [x/10 for x in int_list]
Check out this Stack Overflow question about generating random floats.
If you want it to work with python2, add this import:
from __future__ import division
If you need to guarantee uniqueness, it may be more efficient to
Try and generate n random floats in [lo, hi] at once.
If the length of the unique floats is not n, try and generate however many floats are still needed
and continue accordingly until you have enough, as opposed to generating them 1-by-1 in a Python level loop checking against a set.
If you can afford NumPy doing so with np.random.uniform can be a huge speed-up.
import numpy as np
def gen_uniq_floats(lo, hi, n):
out = np.empty(n)
needed = n
while needed != 0:
arr = np.random.uniform(lo, hi, needed)
uniqs = np.setdiff1d(np.unique(arr), out[:n-needed])
out[n-needed: n-needed+uniqs.size] = uniqs
needed -= uniqs.size
np.random.shuffle(out)
return out.tolist()
If you cannot use NumPy, it still may be more efficient depending on your data needs to apply the same concept of checking for dupes afterwards, maintaining a set.
def no_depend_gen_uniq_floats(lo, hi, n):
seen = set()
needed = n
while needed != 0:
uniqs = {random.uniform(lo, hi) for _ in range(needed)}
seen.update(uniqs)
needed -= len(uniqs)
return list(seen)
Rough benchmark
Extreme degenerate case
# Mitch's NumPy solution
%timeit gen_uniq_floats(0, 2**-50, 1000)
153 µs ± 3.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# Mitch's Python-only solution
%timeit no_depend_gen_uniq_floats(0, 2**-50, 1000)
495 µs ± 43.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Raymond Hettinger's solution (single number generation)
%timeit sample_floats(0, 2**-50, 1000)
618 µs ± 13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
More "normal" case (with larger sample)
# Mitch's NumPy solution
%timeit gen_uniq_floats(0, 1, 10**5)
15.6 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Mitch's Python-only solution
%timeit no_depend_gen_uniq_floats(0, 1, 10**5)
65.7 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Raymond Hettinger's solution (single number generation)
%timeit sample_floats(0, 1, 10**5)
78.8 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
You could just use random.uniform(start, stop). With double precision floats, you can be relatively sure that they are unique if your set is small. If you want to generate a big number of random floats and need to avoid that you have a number twice, check before adding them to the list.
However, if you are looking for a selection of specific numbers, this is not the solution.
min_val=-5
max_val=15
numpy.random.random_sample(15)*(max_val-min_val) + min_val
or use uniform
numpy.random.uniform(min_val,max_val,size=15)
As stated in the documentation Python has the random.random() function:
import random
random.random()
Then you will get a float val as: 0.672807098390448
So all you need to do is make a for loop and print out random.random():
>>> for i in range(10):
print(random.random())
more_itertools has a generic numeric_range that handles both integers and floats.
import random
import more_itertools as mit
random.sample(list(mit.numeric_range(0, 2, 0.2)), 3)
# [0.8, 1.0, 0.4]
random.sample(list(mit.numeric_range(10.0, 20.0, 0.25)), 10)
# [17.25, 12.0, 19.75, 14.25, 15.25, 12.75, 14.5, 15.75, 13.5, 18.25]
random.uniform generate float values
import random
def get_random(low,high,length):
lst = []
while len(lst) < length:
lst.append(random.uniform(low,high))
lst = list(set(lst))
return lst
Related
I have an array, X, which I want to make monotonic. Specifically, I want to do
y = x.copy()
for i in range(1, len(x)):
y[i] = np.max(x[:i])
This is extremely slow for large arrays, but it feels like there should be a more efficient way of doing this. How can this operation be sped up?
The OP implementation is very inefficient because it does not use the information acquired on the previous iteration, resulting in O(n²) complexity.
def max_acc_OP(arr):
result = np.empty_like(arr)
for i in range(len(arr)):
result[i] = np.max(arr[:i + 1])
return result
Note that I fixed the OP code (which was otherwise throwing a ValueError: zero-size array to reduction operation maximum which has no identity) by allowing to get the largest value among those up to position i included.
It is easy to adapt that so that values at position i are excluded, but it leaves the first value of the result undefined, and it would never use the last value of the input. The first value of the result can be taken to be equal to the first value of the input, e.g.:
def max_acc2_OP(arr):
result = np.empty_like(arr)
result[0] = arr[0] # uses first value of input
for i in range(1, len(arr) + 1):
result[i] = np.max(arr[:i])
return result
It is equally easy to have similar adaptations for the code below, and I do not think it is particularly relevant to cover both cases of the value at position i included and excluded. Henceforth, only the "included" case is covered.
Back to the efficiency of the solotion, if you keep track of the current maximum and use that to fill your output array instead of re-computing the maximum for all value up to i at each iteration, you can easily get to O(n) complexity:
def max_acc(arr):
result = np.empty_like(arr)
curr_max = arr[0]
for i, x in enumerate(arr):
if x > curr_max:
curr_max = x
result[i] = curr_max
return result
However, this is still relatively slow because of the explicit looping.
Luckily, one can either rewrite this in vectorized form combining np.fmax() (or np.maximum() -- depending on how you need NaNs to be handled) and np.ufunc.accumulate():
np.fmax.accumulate()
# or
np.maximum.accumulate()
or, accelerating the solution above with Numba:
max_acc_nb = nb.njit(max_acc)
Some timings on relatively large inputs are provided below:
n = 10000
arr = np.random.randint(0, n, n)
%timeit -n 4 -r 4 max_acc_OP(arr)
# 97.5 ms ± 14.2 ms per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 np.fmax.accumulate(arr)
# 112 µs ± 134 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 np.maximum.accumulate(arr)
# 88.4 µs ± 107 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 max_acc(arr)
# 2.32 ms ± 146 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 max_acc_nb(arr)
# 9.11 µs ± 3.01 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
indicating that max_acc() is already much faster than max_acc_OP(), but np.maximum.accumulate() / np.fmax.accumulate() is even faster, and max_acc_nb() comes out as the fastest. As always, it is important to take these kind of numbers with a grain of salt.
I think it will work faster to just keep track of the maximum rather than calculating it each time for each sub-array
y = x.copy()
_max = y[0]
for i in range(1, len(x)):
y[i] = _max
_max = max(x[i], _max)
you can use list comprehension for it. but you need to start your loop from 1 not from 0. either you can use like that if you want loop from 0.
y=[np.max(x[:i+1]) for i in range(len(x))]
or like that
y=[np.max(x[:i]) for i in range(1,len(x)+1)]
Here is my code:
def seperateInputs(inp):
temp = inp.split()
n = int(temp[0])
wires = []
temp[1] = temp[1].replace('),(', ') (')
storeys = temp[1][1:len(temp[1])-1].split()
for each in storeys:
each = each[1:len(each)-1]
t = each.split(',')
wires.append((int(t[0]), int(t[1])))
return n, wires
def findCrosses(n, wires):
cross = 0
for i in range(len(wires)-1):
for j in range(i+1, len(wires)):
if (wires[i][0] < wires[j][0] and wires[i][1] > wires[j][1]) or (wires[i][0] > wires[j][0] and wires[i][1] < wires[j][1]):
cross += 1
return cross
def main():
m = int(input())
for i in range(m):
inp = input()
n, wires = seperateInputs(inp)
print(findCrosses(n, wires))
main()
The question asks:
I also tested my own sample input which got me the output that is correct:
Sample input:
3
20 [(1,8),(10,18),(17,19),(13,16),(4,1),(8,17),(2,10),(11,0),(3,2),(12,3),(18,14),(7,7),(19,5),(0,6)]
20 [(3,4),(10,7),(6,11),(7,17),(13,9),(15,19),(19,12),(16,14),(12,8),(0,3),(8,15),(4,18),(18,6),(5,5),(9,13),(17,1),(1,0)]
20 [(15,8),(0,14),(1,4),(6,5),(3,0),(13,15),(7,10),(5,9),(19,7),(17,13),(10,3),(16,16),(14,2),(11,11),(8,18),(9,12),(4,1)]
Sample output:
38
57
54
However although small input worked but medium to large input gives me TimeLimitExceeded error:
How do I optimize this? Is there a way to have much less operations than what I already have? TIA.
There are a handful of things you can do.
First, things are easier to compute if you sort the list first by the left building. This costs a little up-front, but makes things easier and fast as you process because you only need to compare how many lower second elements you've seen so far. The code is nice and simple for this:
l = [(3,4),(10,7),(6,11),(7,17),(13,9),(15,19),(19,12),(16,14),(12,8),(0,3),(8,15),(4,18),(18,6),(5,5),(9,13),(17,1),(1,0)]
def count_crossings(l):
s = sorted(l, key=lambda p: p[0])
endpoints = []
count = 0
for i, j in s:
count += sum(e > j for e in endpoints)
endpoints.append(j)
return count
count_crossings(l)
# 57
This is a little inefficient because you are looping through endpoints for every point. If you could also keep endpoints sorted, you would only need to count the number less than the given right hand boiling floor. Anytime you thing of keeping a list sorted, you should consider the amazing built-in library bisect. This will make things an order of magnitude faster:
import bisect
def count_crossings_b(l):
s = sorted(l, key=lambda p: p[0])
endpoints = []
count = 0
for i, j in s:
bisect.insort_left(endpoints, j)
count += len(endpoints) - bisect.bisect_right(endpoints, j)
return count
count_crossings_b(l)
# 57
The various timings on my laptop look like:
l = [(random.randint(1, 200), random.randint(1, 200)) for _ in range(1000)]
%timeit findCrosses(l) # original
# 179 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit count_crossings(l)
# 38.1 ms ± 2.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_crossings_b(l)
# 1.08 ms ± 22.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Basically, I have :
An array giving indexes "I", e.g. (1, 2),
And a list of the same length giving the corresponding number of repetitions "N", e.g. [1, 3]
And I want to create an array containing the indexes I repeated N times, i.e. (1, 2, 2, 2) here, where 1 is repeated one time and 2 is repeated 3 times.
The best solution I've come up with uses the np.repeat and np.concatenate functions :
import numpy as np
list_index = np.arange(2)
list_no_repetition = [1, 3]
result = np.concatenate([np.repeat(index, no_repetition)
for index, no_repetition in zip(list_index, list_no_repetition)])
print(result)
I wonder if there is a "prettier"/"more efficient solution".
Thank you for your help.
Not sure about prettier, but you could solve it completely with list comprehension:
[x for i,l in zip(list_index, list_no_repetition) for x in [i]*l]
Hello this is the alternative that I propose:
import numpy as np
list_index = np.arange(2)
list_no_repetition = [1, 3]
result = np.array([])
for i in range(len(list_index)):
tempA=np.empty(list_no_repetition[i])
tempA.fill(list_index[i])
result = np.concatenate([result, tempA])
result
You could also use a dictionary with key as the index and the value as the amount of times repeated. I think that Andreas had it right with the list comprehension.
import numpy as np
repeatdict = {
1:1,
2:3,
3:6
}
result = [x for key, value in repeatdict.items() for x in [key]*value]
print(result)
If by "efficiency" you mean speed, you can use timeit. Here are some results for some arbitrary, larger data.
First, define the functions and data:
# generate some data (list values/indices and number of reps)
N = 1000
li_2 = np.arange(N)
lnr_2 = np.random.randint(low=0, high=10, size=N)
# three functions produce the same result
def by_range(items, rep_cts):
x = np.full(sum(rep_cts), np.nan)
i = 0
for val, reps in zip(items, rep_cts):
x[i:i + reps] = val
i = i + reps
return x
def by_comp(items, reps):
return np.array([val for val, rep in zip(items, reps) for i in range(rep)])
def by_cat(list_index, list_no_repetition):
return np.concatenate([np.repeat(index, no_repetition)
for index, no_repetition in zip(list_index, list_no_repetition)])
About the same speed: first allocating an array and then filling it in, vs. doing a one-line double-for comprehension.
# 820 µs ± 11.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit by_range(li_2, lnr_2)
# 829 µs ± 4.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit by_comp(li_2, lnr_2)
Original method of concatenation is slightly slower:
# 2.19 ms ± 98.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit by_cat(li_2, lnr_2)
Note that the results will differ depending on where/how you run this, and the specific data you're dealing with.
I am trying to order the zeroes and ones in arrangement of the order. The expected output is what I am trying to get to. Without using a list comprehension preferably.
import numpy as np
order = np.array([0,1,0,1,0])
zeroes= np.array([10,55, 30])
ones = np.array([3,8])
Expected Output
[10, 3, 55, 8, 30]
How about this (no Python loops: 750x faster than a list comprehension, when tested on 200k elements):
# note: updated version: faster and more robust to faulty input
def altcat(zeroes, ones, order):
i0 = np.nonzero(order == 0)[0][:len(zeroes)]
i1 = np.nonzero(order == 1)[0][:len(ones)]
z = np.zeros_like(order, dtype=zeroes.dtype)
z[i0] = zeroes[:len(i0)]
z[i1] = ones[:len(i1)]
return z
On your example:
>>> altcat(zeroes=np.array([10,55, 30]), ones=np.array([3,8]),
... order=np.array([0,1,0,1,0]))
array([10, 3, 55, 8, 30])
Speed
# set up
n = 200_000
np.random.seed(0)
order = np.random.randint(0, 2, size=n)
n1 = order.sum()
n0 = n - n1
ones = np.random.randint(100, size=n1)
zeroes = np.random.randint(100, size=n0)
# for comparison, a method proposed elsewhere, based on lists
def altcat_list(zeroes, ones, order):
zeroes = list(zeroes)
ones = list(ones)
return [zeroes.pop(0) if i == 0 else ones.pop(0) for i in order]
Test:
a = %timeit -o altcat(zeroes, ones, order)
# 2.38 ms ± 573 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
b = %timeit -o altcat_list(zeroes, ones, order)
# 1.84 s ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
b.average / a.average
# 773.59
Note: I initially tried with n = 1_000_000, but while altcat does that in 12.4ms, the list-based version would take forever and I had to stop it.
It seems that the list-based method is worse than O(n) (100K: 0.4s; 200K: 1.84s; 400K: 10.4s).
Addendum
If you really want to do it with a list comprehension and not in pure numpy, then at least consider this:
def altcat_list_mod(zeroes, ones, order):
it = [iter(zeroes), iter(ones)]
return [next(it[i]) for i in order]
That's faster than altcat_list(), but still almost 25x slower than altcat():
# on 200k elements
c = %timeit -o altcat_list_mod(zeroes, ones, order)
# 60 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
c.average / a.average
# 24.93
I would like to create a numpy array where the first element is a defined constant, and every next element is defined as the function of the previous element in the following way:
import numpy as np
def build_array_recursively(length, V_0, function):
returnList = np.empty(length)
returnList[0] = V_0
for i in range(1,length):
returnList[i] = function(returnList[i-1])
return returnList
d_t = 0.05
print(build_array_recursively(20, 0.3, lambda x: x-x*d_t+x*x/2*d_t*d_t-x*x*x/6*d_t*d_t*d_t))
The print method above outputs
[0.3 0.28511194 0.27095747 0.25750095 0.24470843 0.23254756 0.22098752
0.20999896 0.19955394 0.18962586 0.18018937 0.17122037 0.16269589
0.15459409 0.14689418 0.13957638 0.13262186 0.1260127 0.11973187 0.11376316]
Is there a fast way of doing this in numpy without a for loop?
If so is there a way to handle two elements before the current one, e.g. can a Fibonacci array be constructed similarly?
I found a similar question here
Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one?
but was not answered in general. In my example, the difference equation is difficult to solve manually.
This is faster for what you want to do. You don't have to use recursion for the function.
Calculate the element based on previous element. Append calculated element to a list, and then change the list to numpy.
def method2(length, V_0, d_t):
k = [V_0]
x = V_0
for i in range(1, length):
x = x - x * d_t + x * x / 2 * d_t * d_t - x * x * x / 6 * d_t * d_t * d_t
k.append(x)
return np.asarray(k)
print(method2(20,0.3, 0.05))
Running you existing method 10000 times takes 0.438 seconds, while method2 takes 0.097 seconds.
Using a function to make the code clearer (instead of the inline lambda):
def fn(x):
return x-x*d_t+x*x/2*d_t*d_t-x*x*x/6*d_t*d_t*d_t
And a function that combines elements of build_array_recursively and method2:
def foo1(length, V_0, function):
returnList = np.empty(length)
returnList[0] = x = V_0
for i in range(1,length):
returnList[i] = x = function(x)
return returnList
In [887]: timeit build_array_recursively(20,0.3, fn);
61.4 µs ± 63 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [888]: timeit method2(20,0.3, fn);
16.9 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [889]: timeit foo1(20,0.3, fn);
13 µs ± 29.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The main time saver in method2 and foo2 is carrying over x, the last value, from one iteration to the next, rather than indexing with returnList[i-1].
The accumulation method, assigning to a preallocated array, or list append, is less important. Performance is usually similar.
Here the calculation is simple enough that details of what you do in the loop makes a big difference in the overall time.
All of these are loops. Some ufunc have a reduce (and accumulate) method, that can apply the function repeatedly to a elements of the input array. np.sum, np.cumsum, etc make use of this. But you can't do that with a general Python function.
You have to use some sort of compilation tool like numba to perform this sort of loop much faster.