How to probabilistically populate a list in python? - python

I want to use a basic for loop to populate a list of values in Python but I would like the values to be calculate probabilistically such that p% of the time the values are calculated in (toy) equation 1 and 100-p% of the time the values are calculated in equation 2.
Here's what I've got so far:
# generate list of random probabilities
p_list = np.random.uniform(low=0.0, high=1.0, size=(500,))
my_list = []
# loop through but where to put 'p'? append() should probably only appear once
for p in p_list:
calc1 = x*y # equation 1
calc2 = (x-y) # equation 2
my_list.append(calc1)
my_list.append(calc2)

You've already generated a list of probabilities - p_list - that correspond to each value in my_list you want to generate. The pythonic way to do so is via a a ternary operator and a list comprehension:
import random
my_list = [(x*y if random() < p else x-y) for p in p_list]
If we were to expand this into a proper for loop:
my_list = []
for p in p_list:
if random() < p:
my_list.append(x*y)
else:
my_list.append(x-y)
If we wanted to be even more pythonic, regarding calc1 and calc2, we could make them into lambdas:
calc1 = lambda x,y: x*y
calc2 = lambda x,y: x-y
...
my_list = [calc1(x,y) if random() < p else calc2(x,y) for p in p_list]
or, depending on how x and y vary for your function (assuming they're not static), you could even do the comprehension in two steps:
calc_list = [calc1 if random() < p else calc2 for p in p_list]
my_list = [calc(x,y) for calc in calc_list]

I took approach of minimal changes to the original code and easy to understand syntax:
import numpy as np
p_list = np.random.uniform(low=0.0, high=1.0, size=(500,))
my_list = []
# uncomment below 2 lines to make this code syntactially correct
#x = 1
#y = 2
for p in p_list:
# randoms are uniformly distributed over the half-open interval [low, high)
# so check if p is in [0, 0.5) for equation 1 or [0.5, 1) for equation 2
if p < 0.5:
calc1 = x*y # equation 1
my_list.append(calc1)
else:
calc2 = (x-y) # equation 2
my_list.append(calc2)

The other answers seem to assume you want to keep the calculated chances around. If all you are after is a list of results for which equation 1 was used p% of the time and equation 2 100-p% of the time, this is all you need:
from random import random, seed
inputs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# change the seed to see different 'random' outcomes
seed(1)
results = [x * x if random() > 0.5 else 2 * x for x in inputs]
print(results)

If you are ok to use numpy worth trying the choice method.
https://docs.scipy.org/doc/numpy-1.14.1/reference/generated/numpy.random.choice.html

Related

Conver a loop from Pinescript to Python

Im tryng to convert this formula (WMA Moving Average) for loop in Python from Pinescript
but for i to x not exist. I tried for i in range(x) but seems dont return same result.
What exactly means to? Documentation of Pinescript said means from i to x but i dont find the equivalent in Python
pine_wma(x, y) =>
norm = 0.0
sum = 0.0
for i = 0 to y - 1
weight = (y - i) * y
norm := norm + weight
sum := sum + x[i] * weight
sum / norm
plot(pine_wma(close, 15))
Python Code:
import pandas as pd
dataframe = pd.read_csv('dataframe.csv')
def formula_wma(x, y):
list = []
norm = 0.0
sum = 0.0
i = 0
for i in range(y - 1):
weight = (y - i) * y
norm = norm + weight
sum = sum + x[i] * weight
_wma = sum / norm
list.append(_wma)
i += 1
return list
wma_slow = formula_wma(dataframe['close'],45)
dataframe['wma_slow'] = pd.Series(wma_slow, index=dataframe.index[:len(wma_slow)])
print(dataframe['wma_slow'].to_string())
Output:
0 317.328133
[Skipping lines]
39 317.589010
40 317.449259
41 317.421662
42 317.378052
43 317.328133
44 NaN
45 NaN
[Skipping Lines]
2999 NaN
3000 NaN
First of all, don't reassign built-in names!
sum is a built-in function that calculates the summation of a sequence of numbers. So is list, it is a class constructor.
For example:
sum(range(10)) returns 45.
The above is equivalent to:
numbers = (0,1,2,3,4,5,6,7,8,9)
s = 0
for i in numbers: s += i
Second, don't increment the variable you use for looping inside the loop, unless you have a good reason for it.
That i += 1 at the end of the loop has no effect whatsoever, for loop automatically reassigns the name to the next item in the sequence, in this case the next item is incremented by one, so i automatically gets incremented.
Further, if there is anything using i after that line, they will break.
Lastly, the reason you are not getting the same result, is Python uses zero-based indexing and range excludes the stop.
I don't know about pine script, but from what you have written, from x to y must include y.
For example 0 to 10 in pine script will give you 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
But using range(10):
print(list(range(10)))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Why? Because there are exactly ten numbers in the range you specified.
In the first example, there are actually eleven numbers. If you know your math, the number of terms in an arithmetic sequence is the difference between the maximum term and the minimum term divided by the increment plus one.
So how to solve your problem?
Remove - 1 after y in range!
Fixed code:
import pandas as pd
dataframe = pd.read_csv('dataframe.csv')
def formula_wma(x, y):
lst = []
norm = 0.0
sum_ = 0.0
i = 0
for i in range(y):
weight = (y - i) * y
norm = norm + weight
sum_ = sum_ + x[i] * weight
_wma = sum_ / norm
lst.append(_wma)
return lst
wma_slow = formula_wma(dataframe['close'],45)
dataframe['wma_slow'] = pd.Series(wma_slow, index=dataframe.index[:len(wma_slow)])
print(dataframe['wma_slow'].to_string())

Is there a pythonic way to sample N consecutive elements from a list or numpy array

Is there a pythonic way to select N consecutive elements from a list or numpy array.
So Suppose:
Choice = [1,2,3,4,5,6]
I would like to create a new list of length N by randomly selecting element X in Choice along with the N-1 consecutive elements following choice.
So if:
X = 4
N = 4
The resulting list would be:
Selection = [5,6,1,2]
I think something similar to the following would work.
S = []
for i in range(X,X+N):
S.append(Selection[i%6])
But I was wondering if there is a python or numpy function that can select the elements at once that was more efficient.
Use itertools, specifically islice and cycle.
start = random.randint(0, len(Choice) - 1)
list(islice(cycle(Choice), start, start + n))
cycle(Choice) is an infinite sequence that repeats your original list, so that the slice start:start + n will wrap if necessary.
You could use a list comprehension, using modulo operations on the index to keep it in range of the list:
Choice = [1,2,3,4,5,6]
X = 4
N = 4
L = len(Choice)
Selection = [Choice[i % L] for i in range(X, X+N)]
print(Selection)
Output
[5, 6, 1, 2]
Note that if N is less than or equal to len(Choice), you can greatly simplify the code:
Choice = [1,2,3,4,5,6]
X = 4
N = 4
L = len(Choice)
Selection = Choice[X:X+N] if X+N <= L else Choice[X:] + Choice[:X+N-L]
print(Selection)
Since you are asking for the most efficient way I created a little benchmark to test the solutions proposed in this thread.
I rewrote your current solution as:
def op(choice, x):
n = len(choice)
selection = []
for i in range(x, x + n):
selection.append(choice[i % n])
return selection
Where choice is the input list and x is the random index.
These are the results if choice contains 1_000_000 random numbers:
chepner: 0.10840400000000017 s
nick: 0.2066781999999998 s
op: 0.25887470000000024 s
fountainhead: 0.3679908000000003 s
Full code
import random
from itertools import cycle, islice
from time import perf_counter as pc
import numpy as np
def op(choice, x):
n = len(choice)
selection = []
for i in range(x, x + n):
selection.append(choice[i % n])
return selection
def nick(choice, x):
n = len(choice)
return [choice[i % n] for i in range(x, x + n)]
def fountainhead(choice, x):
n = len(choice)
return np.take(choice, range(x, x + n), mode='wrap')
def chepner(choice, x):
n = len(choice)
return list(islice(cycle(choice), x, x + n))
results = []
n = 1_000_000
choice = random.sample(range(n), n)
x = random.randint(0, n - 1)
# Correctness
assert op(choice, x) == nick(choice,x) == chepner(choice,x) == list(fountainhead(choice,x))
# Benchmark
for f in op, nick, chepner, fountainhead:
t0 = pc()
f(choice, x)
t1 = pc()
results.append((t1 - t0, f))
for t, f in sorted(results):
print(f'{f.__name__}: {t} s')
If using a numpy array as the source, we could of course use numpy "fancy indexing".
So, if ChoiceArray is the numpy array equivalent of the list Choice, and if L is len(Choice) or len(ChoiceArray):
Selection = ChoiceArray [np.arange(X, N+X) % L]
Here's a numpy approach:
import numpy as np
Selection = np.take(Choice, range(X,N+X), mode='wrap')
Works even if Choice is a Python list rather than a numpy array.

Is there a way to get the index of a percentage of a cummulative sum from a sorted list?

Given a sorted list of real numbers, e.g.
x = range(20)
The task is to find the first index of the X% of the cumulative sum of the list, e.g.
def compute_cumpercent(lint, percent):
break_point = sum(lint) * percent
mass = 0
for i, c in enumerate(lint):
if mass > break_point:
return i
mass += c
To find the index of the number in the input list which is less than and closes to 25% of the cumulative sum,
>>> compute_cumpercent(x, 0.25)
11
Firstly, is there a mathematical / name for such a function?
Other than doing it with the simple loop as above, is there a way to do the same with numpy or some bisect or otherwise?
Assume that input list is always sorted.
Something like this maybe?
import numpy as np
x = range(20)
percent = 0.25
cumsum = np.cumsum(x)
break_point = cumsum[-1] * percent
np.argmax(cumsum >= break_point) + 1 # 11
import numpy as np
x = np.arange(20)
Percent = 25
CumSumArray = np.cumsum(x)
ValueToFind = CumSumArray[-1] * Percent / 100
Idx = np.argmax(CumSumArray > ValueToFind)[0] - 1
Following this hint, one can use searchsorted to find an index of the element, that is close (lower) to a percentile/quantile value.
See example below:
import numpy as np
def find_index_left(xs, v):
return np.searchsorted(xs, v, side='left') - 1
def find_index_quantile(xs, q):
v = np.quantile(xs, q)
return find_index_left(xs, v)
xs = [5, 10, 11, 15, 20]
assert np.quantile(xs, 0.9) == 18.0
assert find_index_left(xs, 18) == 3 # zero-based index for forth element
assert find_index_quantile(xs, 0.9) == 3
Note xs has to be sorted.

Randomly generating 3-tuples with distinct elements in python

I am trying to generate 3-tuples (x,y,z) in python such that no two of x , y or z have the same value. Furthermore , the variables x , y and z can be defined over separate ranges (0,p) , (0,q) and (0,r). I would like to be able to generate n such tuples. One obvious way is to call random.random() for each variable and check every time whether x=y=z . Is there a more efficient way to do this ?
You can write a generator that yields desired elements, for example:
def product_no_repeats(*args):
for p in itertools.product(*args):
if len(set(p)) == len(p):
yield p
and apply reservoir sampling to it:
def reservoir(it, k):
ls = [next(it) for _ in range(k)]
for i, x in enumerate(it, k + 1):
j = random.randint(0, i)
if j < k:
ls[j] = x
return ls
xs = range(0, 3)
ys = range(0, 4)
zs = range(0, 5)
size = 4
print reservoir(product_no_repeats(xs, ys, zs), size)

How do you make this code more pythonic?

Could you guys please tell me how I can make the following code more pythonic?
The code is correct. Full disclosure - it's problem 1b in Handout #4 of this machine learning course. I'm supposed to use newton's algorithm on the two data sets for fitting a logistic hypothesis. But they use matlab & I'm using scipy
Eg one question i have is the matrixes kept rounding to integers until I initialized one value to 0.0. Is there a better way?
Thanks
import os.path
import math
from numpy import matrix
from scipy.linalg import inv #, det, eig
x = matrix( '0.0;0;1' )
y = 11
grad = matrix( '0.0;0;0' )
hess = matrix('0.0,0,0;0,0,0;0,0,0')
theta = matrix( '0.0;0;0' )
# run until convergence=6or7
for i in range(1, 6):
#reset
grad = matrix( '0.0;0;0' )
hess = matrix('0.0,0,0;0,0,0;0,0,0')
xfile = open("q1x.dat", "r")
yfile = open("q1y.dat", "r")
#over whole set=99 items
for i in range(1, 100):
xline = xfile.readline()
s= xline.split(" ")
x[0] = float(s[1])
x[1] = float(s[2])
y = float(yfile.readline())
hypoth = 1/ (1+ math.exp(-(theta.transpose() * x)))
for j in range(0,3):
grad[j] = grad[j] + (y-hypoth)* x[j]
for k in range(0,3):
hess[j,k] = hess[j,k] - (hypoth *(1-hypoth)*x[j]*x[k])
theta = theta - inv(hess)*grad #update theta after construction
xfile.close()
yfile.close()
print "done"
print theta
One obvious change is to get rid of the "for i in range(1, 100):" and just iterate over the file lines. To iterate over both files (xfile and yfile), zip them. ie replace that block with something like:
import itertools
for xline, yline in itertools.izip(xfile, yfile):
s= xline.split(" ")
x[0] = float(s[1])
x[1] = float(s[2])
y = float(yline)
...
(This is assuming the file is 100 lines, (ie. you want the whole file). If you're deliberately restricting to the first 100 lines, you could use something like:
for i, xline, yline in itertools.izip(range(100), xfile, yfile):
However, its also inefficient to iterate over the same file 6 times - better to load it into memory in advance, and loop over it there, ie. outside your loop, have:
xfile = open("q1x.dat", "r")
yfile = open("q1y.dat", "r")
data = zip([line.split(" ")[1:3] for line in xfile], map(float, yfile))
And inside just:
for (x1,x2), y in data:
x[0] = x1
x[1] = x2
...
x = matrix([[0.],[0],[1]])
theta = matrix(zeros([3,1]))
for i in range(5):
grad = matrix(zeros([3,1]))
hess = matrix(zeros([3,3]))
[xfile, yfile] = [open('q1'+a+'.dat', 'r') for a in 'xy']
for xline, yline in zip(xfile, yfile):
x.transpose()[0,:2] = [map(float, xline.split(" ")[1:3])]
y = float(yline)
hypoth = 1 / (1 + math.exp(theta.transpose() * x))
grad += (y - hypoth) * x
hess -= hypoth * (1 - hypoth) * x * x.transpose()
theta += inv(hess) * grad
print "done"
print theta
the matrixes kept rounding to integers until I initialized one value
to 0.0. Is there a better way?
At the top of your code:
from __future__ import division
In Python 2.6 and earlier, integer division always returns an integer unless there is at least one floating point number within. In Python 3.0 (and in future division in 2.6), division works more how we humans might expect it to.
If you want integer division to return an integer, and you've imported from future, use a double //. That is
from __future__ import division
print 1//2 # prints 0
print 5//2 # prints 2
print 1/2 # prints 0.5
print 5/2 # prints 2.5
You could make use of the with statement.
the code that reads the files into lists could be drastically simpler
for line in open("q1x.dat", "r"):
x = map(float,line.split(" ")[1:])
y = map(float, open("q1y.dat", "r").readlines())

Categories

Resources