Using numpy array as test cases for looped Monte Carlo - python

I have what I'm sure is a very simple problem to solve but I can't seem to get it right and have not been able to search for an answer, likely because I am using the wrong vocabulary, etc.
My goal is to have an array, called array_1, which has different 'test cases'. For each of the elements in that array, I want to run a Monte Carlo, with the current element being the input to a function. I would like to get the mean of all of the results (num_samples) and store that into another array, which will be an array of the 'means' to be easily visualized. Hard coding for each of the conditions is easy but I am looking for a more automated method. Any help would be appreciated. What I'm currently working with is below:
import numpy as np
num_samples = 5
array_1 = ([1,2,3])
array_2 = np.zeros(num_samples)
array_3 = ([])
def func_add(a, b):
return a + b + 2
#def func_append(c):
for j in array_1:
for i in range(num_samples):
r = np.random.randint(1,2)
array_2[i] = func_add( j, r)
c = np.mean(array_2) #this value I want to put in a new array to have an 'array of means'
#print(b)
array_3 = np.append(array_3, c)
print(array_2)
print(np.mean(array_2))
print(c)
print(array_3)
Which returns:
[6. 6. 6. 6. 6.]
6.0
6.0
[4. 5. 6.]
EDIT 2: The results for array_3 seem to make sense but now I'm curious as to why array_2 only contains 6's. In the first case of the loops, j = 1 and r = 1, so the function should return 4 and place that in index 1 for array_2, or do they all get overwritten by the last case of the for loop, which also would make sense I think.
Thank you in advance.
EDIT: I think the problem is that maybe that I'm pulling the value from array_1 but I want to put the mean from the processing into the first index of some array(meaning I might have to create a third array to hold those values?)

Outside the loop, you can instantiate c as an empty numpy array. Inside the for loop, you append the mean to the end of c:
np.append(c, np.mean(array_2))
Thus, the array c grows with each iteration until it contains all results.

Related

Building for loops on 2 arrays

Let say I have 2 arrays like this
array_1 = [2,4,5,1,4]
array_2 = [9,5,6,7, 4]
If I want to perform operations on each elements then I would proceed as below
import numpy as np
Res = [np.nan] * (len(array_1) * len(array_2))
for i in range(len(array_1)):
for j in range(len(array_2)):
some_cal = i+j
Res[i + j * len(array_1)] = some_cal
However my actual calculation is little different.
Let say, the anchor element of 1st array is 5 and that for second array is 7. So I want to perform calculations only on elements as highlighted below
Can you please help with some suggestions how can I modify my earlier loop to efficiently perform calculations only on elements as highlighted combination?
Thanks for your time.

Appending values from one array to another array of unknown dimension

I have an array A of dimension (654 X 2). Now within a loop, I have an 'if' statement. If for a value of 'i', the condition is true, I need to append the values of 'i'th row of A into a separate array B. That means the dimension of B is not known to me beforehand. So, how to initialize such array B in python. If this is not the procedure, suggest me alternative ways to execute the same.
You could initialize B as empty array:
B = []
Then when needs, just append it with the value:
B.append( [x,y] )
you do not provide ANY code to start from, please read this to learn how to ask a question How to create a Minimal, Complete, and Verifiable example
from the almost 0 information that you've provided
you should try doing something like this:
B = []
for i in range(n):
if i % 2 == 0: # example of condition
B += [ A[i] ]
print(B)

Update numpy array with calculations done on a list of arrays

I have a list of length 50 filled with arrays of length 5. I am trying to calculate the distance between each array in the list and update a numpy array with the values.
The distance calculation is just taking the square root of the sum of the squared distance between each element in the arrays.
When I try:
primaryCustomer = np.zeros(shape = (50,50))
for customer in range(0,50):
for pair in range(0,50):
thisCustomer = [0 for i in range(51)]
if customer == pair:
thisCustomer[pair] = 999
else:
calculateScores = (((Customer[customer][0]-Customer[pair][0])**2
+ (Customer[customer][1]-Customer[pair][1])**2
+ (Customer[customer][2]-Customer[pair][2])**2
+ (Customer[customer][3]-Customer[pair][3])**2
+ (Customer[customer][4]-Customer[pair][4])**2 )**(0.5))
thisCustomer[pair] = calculateScores
np.append(primaryCustomer, thisCustomer)
a couple of things happen:
The final iteration of thisCustomer returns a list of all zeros, except the final element of 999 (corresponding to the 'if' portion of the statement above). So, I know it can update the list, but it doesn't do it in the 'else' portion.
I want the 'primaryCustomer' array to update with the Customer as the index and all of the calculated scores with each pair as the row values, but it doesn't seem to update at all
Any changes I make, like trying to treat thisCustomer in the loop as an array instead of a list and append to it, end up fixing one area but screwing up other ones even worse.
Here's how I'm getting the Customer data:
Customer = [[0,0,0,0,0] for i in range(51)]
for n in range(51):
Customer[n] = np.ones(5)
Customer[n][randint(2,4):5] = 0
np.random.shuffle(Customer[n])
I know there might be packaged ways to do this, but I'm trying to understand how things like KNN work in the background, so I'd like to keep to figuring out the logic in loops like above. Beyond that, any help would be greatly appreciated.
I think this is what you are going for, but correct me if I'm wrong:
import numpy as np
from random import randint
Customer = [[0, 0, 0, 0, 0] for i in range(51)]
for n in range(51):
Customer[n] = np.ones(5)
Customer[n][randint(2, 4):5] = 0
np.random.shuffle(Customer[n])
primaryCustomer = np.zeros(shape=(50, 50))
for customer in range(0, 50):
thisCustomer = [0 for i in range(51)]
for pair in range(0, 50):
if customer == pair:
primaryCustomer[customer][pair] = 999
else:
calculateScores = (((Customer[customer][0] - Customer[pair][0]) ** 2
+ (Customer[customer][1] - Customer[pair][1]) ** 2
+ (Customer[customer][2] - Customer[pair][2]) ** 2
+ (Customer[customer][3] - Customer[pair][3]) ** 2
+ (Customer[customer][4] - Customer[pair][4]) ** 2) ** 0.5)
primaryCustomer[customer][pair] = calculateScores
print(primaryCustomer)
I think the main issue I found with your loops was the location of thisCustomer = [0 for i in range(51)], I think you meant to have it up one more level like in mine. I don't see any need for this line though and altered thisCustomer[pair] to directly write to primaryCustomer[customer][pair] instead, thereby negating the need for thisCustomer = [0 for i in range(51)] every loop, which would speed up your program and improve memory usage by taking the line out entirely.
Sample output:
[[999. 2.23606798 1. ... 2. 0.
1.73205081]
[ 2.23606798 999. 2. ... 1. 2.23606798
1.41421356]
[ 1. 2. 999. ... 1.73205081 1.
2. ]
...
[ 2. 1. 1.73205081 ... 999. 2.
1.73205081]
[ 0. 2.23606798 1. ... 2. 999.
1.73205081]
[ 1.73205081 1.41421356 2. ... 1.73205081 1.73205081
999. ]]
A couple things to notice at first.
primaryCustomer[a][b] = primaryCustomer[b][a] because you are using a distance metric. This means that the ranges on your two for loops can be reset:
numCustomers = 51
primaryCustomer = np.zeros(shape = (numCustomers, numCustomers))
for customerA in range(numCustomers-1):
for customerB in range(customerA+1, numCustomers):
primaryCustomer[customerA][customerB] = dist(customerA,customerB)
primaryCustomer += np.transpose(primaryCustomer)
Note* you can change the second for loop's range to also start from 0 to keep your original loop structure, but then you will need to remove the transposition line. You can also have
primaryCustomer[a][b] = primaryCustomer[b][a] = dist(a,b) if you'd rather not use the transposition but still avoid unnecessary calculations.
primaryCustomer = np.zeros(shape = (50,50)) I'm assuming is meant to store the distance between two customers. However, it looks like you have 51 customers, not 50?
You should think about calculating the distances in a more general way. i.e. how could you make the distance calculation work independent of the list size?
Why are you creating an initial 2D array of 0s to store the distance, and then appending to it? The creation of the thisCustomer list doesn't seem necessary and in fact the solution posted by Reedinationer initializes it but never even uses it. Also, as someone stated alreadyd, that's not how np.append works. You're best off modifying the distance matrix you create originally directly.
Why is primaryCustomer[a][a] = 999? Shouldn't the distance between a list and itself be 0? If you really do want to have it be 999, I encourage you to figure out how to modify the code block above to account for that.

Modify a part of numpy array based upon a condition

I have a numpy array with zeros and non-zeros and shape (10,10).
To a subpart of this array I need to add a certain value, where initial value is not zero.
a[2:7,2:7] += 0.5 #But with a condition that a[a!=0]
Currently, I do it in a rather cumbersome way, by first making a copy of the array and modifying the second array consistently and then copying back to the first.
b = a.copy()
b[b!=0] = 1
b[2:7,2:7] *= 0.5
b[b ==1] =0
a += b
Is there more elegant way to achieve this?
As Thomas Kühn, correctly wrote in the comment, its good enough to create a reference to that subpart of the array and modify it. So the following does the job.
b = a[2:7,2:7]
b[b!=0] += 0.5

Min-Max difference in continuous part of certain length within a np.array

I have a numpy array of values like this:
a = np.array((1, 3, 4, 5, 10))
In this case the array has length 5. Now I want to know the difference between the lowest and highest value in the array, but only within a certain continuous part of the array, for example with length 3.
So in this case it would be the difference between 4 and 10, so 6. It would also be nice to have the index of the starting point of the continuous part (in the above example that would be 2). So something like this:
def f(a, lenght_of_part):
...
return (max_difference, starting index)
I know I could iterate over sliced parts of the array, but for my actual purpose I have ~150k arrays of length 1500, so that would take too long.
What would be an easy and quick way of doing this?
Thanks in advance!
This is a bit tricky to get done in a vectorised way in Numpy. One option is to use numpy.lib.stride_tricks.as_strided, which requires care, because it allows to access arbitrary memory. Here's an example for a window size of k = 3:
>>> k = 3
>>> shape = (len(a) - k + 1, k)
>>> b = numpy.lib.stride_tricks.as_strided(
a, shape=shape, strides=(a.itemsize, a.itemsize))
>>> moving_ptp = b.ptp(axis=1)
>>> start_index = moving_ptp.argmax()
>>> moving_ptp[start_index]
6

Categories

Resources