Randomly select a length of numbers from numpy array - python

I have a number of data files, each containing a large amount of data points.
After loading the file with numpy, I get a numpy array:
f=np.loadtxt("...-1.txt")
How do I randomly select a length of x, but the order of numbers should not be changed?
For example:
f = [1,5,3,7,4,8]
if I wanted to select a random length of 3 data points, the output should be:
1,5,3, or
3,7,4, or
5,3,7, etc.

Pure logic will get you there.
For a list f and a max length x, the valid starting points of your random slices are limited to 0, len(f)-x:
0 1 2 3
f = [1,5,3,7,4,8]
So all valid starting point can be selected with random.randrange(len(f)-x+1) (where the +1 is because randrange works like range).
Store the random starting point into a variable start and slice your array with [start:start+x], or be creative and use another slice after the first:
result = f[random.randrange(len(f)-x+1):][:3]

Building on usr2564301's answer you can take out only the elements you need in 1 go using a range so you avoid building a potentially very large intermediate array:
result = f[range(random.randrange(len(f)-x+1), x)]
A range also avoids that you build large index arrays when your length x becomes larger.

Related

Random partitioning given array with given bin sizes

How to randomly partition given array with given bin sizes?
Is there an inbuilt function for that? For example, I want something like
function(12,(2,3,3,2,2)) to output four partitions of numbers from 1 go 12 (or 0 to 11, doesn't matter). So output may be a list like [[3,4],[7,8,11],[12,1,2],[5,9],[6,10]](or some other efficient data structure). The first argument of the function may be just a number n, in which case it will consider np.arange(n) as the input, otherwise it may be any other ndarray.
Of course we can randomly permute the list and then pick the first 2, next 3, next 3, next 2 and last 2 elements. But does there exist something more efficient?
numpy.partition() function has a different meaning, it performs a step in quicksort, and I also couldn't find any such function in the numpy.random submodule.
Try this following solution:
def func(a, b:List):
# a is integer and b is a python list
indx = np.random.rand(a).argsort() # Get randomly arranged index
b = np.array(b)
return np.r_[np.split(indx,b.cumsum()[:-1])] # split the index and merge

Efficient block bootstrap of integer sequences

I'm trying to block bootstrap samples for Monte-Carlo simulation and need to generate a large array of index values (integers) containing blocks in Python. I need this to be very fast but cannot figure out how to vectorize it.
I want to generate a large number of paths, where each path contains a sequence of integers of length L. Suppose I have an array of integers (representing an index) form 0 to N, from which I will sample randomly to construct each path. When I sample, I choose a random integer i from 0 to N, and then populate the path with i,i+1,i+2..,i+w for some window w. I then choose another random starting index value and continue to populate the path with the new window, repeating until the path is fully populated. I do this for all paths.
I'm wondering if there is a way to speed this method up without having to loop over each path, since I intend to generate a very large number of paths (millions)
An example of my for loop method is below:
paths = 10000
path_length = 500
window_length = 5
index = np.arange(0,5000)
simulated_values = np.zeros([paths,path_length])
n_windows = int(np.ceil(path_length/window_length))
for i in range(0, paths):
temp=[]
for n in range(0, n_windows):
random_start = random.randint(0, len(index) - path_length)
temp.extend(range(random_start, random_start + window_length))
simulated_values[i,:] = temp
print(simulated_values)
I found a solution in a python package called recombinator. Seems to be fast enough, and there is support for GPU for further speed
https://pypi.org/project/recombinator/
from recombinator.block_bootstrap import circular_block_bootstrap
index = np.arange(0,5000)
path_length = 500
window_length = 5
temp = circular_block_bootstrap(index,block_length=window_length,replications=1000000,replace=True, sub_sample_length=path_length)

Fredo and Array Update in python

I will have an interview with a company which like the hackerearth.com. I don't know how to work and doing the code perfectly. Could you help me with the following example?
This is the example for the .hackerearth.com, however, I don't know that I should consider the constraint in the code? can I use a package like NumPy? or I should only use the basic calculation with my self? Could you check my response and let me know the problem with that? Thank you so much
Input Format:
First line of input consists of an integer N denoting the number of elements in the array A.
Second line consists of N space separated integers denoting the array elements.
Output Format:
The only line of output consists of the value of x.
Input Constraints:
1<N<100
1<A[i]<100
explanation:
An initial sum of array is 1+2+3+4+5=15
When we update all elements to 4, the sum of array which is greater than 15 .
Note that if we had updated the array elements to 3, which is not greater than 15 . So, 4 is the minimum value to which array elements need to be updated.
# Write your code here
import numpy as np
A= [1, 2, 3,4,5]
for i in range(1, max(A)+1):
old = sum(A)
new = sum(i*np.ones(len(A)))
diff = new-old
if diff>0:
print(i)
break
Well this isn't Code Review stack exchange, but:
You don't say how to calculate x. It seems to be something to do with finding an average value, but no-one can judge your code without know what it's trying to do. A web search suggests it is this:
Fredo is assigned a new task today. He is given an array A containing N integers. His task is to update all elements of array to some minimum value x , that is, ; such that sum of this new array is strictly greater than the sum of the initial array. Note that x should be as minimum as possible such that sum of the new array is greater than the sum of the initial array.
Given that the task starts by accepting input, it's important that your program does this part.
N = int(input()) # you can put a prompt string in here, but may conflict with limited output
A = list(map(int,input().split()))
# might need input checks
# might need range checks
# might check that A has exactly N values
you don't need to recalculate old = sum(A) every time around your search loop
calculation of new doesn't need a sum at all - it's just new = i * len(A)
there's no point in checking values of i at or below min(A)
your search will fail if all values of A are the same (try it), because you never look above max(A)
These remarks apply to your approach; a more efficient search would be binary chop, and there is also a mathematical way to go straight to the answer from sum(A) without any searching:
x = sum(A) // len(A) + 1
You don't need numpy or looping for this. Get the average of the array elements, then get the next higher integer from this.
N = 5
A = [1, 2, 3, 4, 5]
total = sum(A)
avg = A/N # not checking for zero-divide because conditions say N > 1
x = floor(avg + 1)
print(x)
Adding 1 is necessary to make the new sum greater than the original sum when the average is an exact integer (e.g. 15/5 == 3).

Python how to assign values to a certain row of a matrix?

In Python,
I created a 10 x 20 zero-matrix, called X:
X = numpy.zeros((10, 20))
I have another 50 x 20 matrix called A.
I want to let the 4th row of matrix X take the value of the 47th row of matrix A.
How can I write this in Python?
Note: if X is a list, then I could just write X.append () However, here X is not a list...then how can I do this?
Or, if I just have a list that contains 20 numbers, how can I let the 4th row of matrix X equal to that list of 20 numbers?
Thank you!
I'll try to answer this. So the correct syntax for selecting an entire row in numpy is
M[row_number, :]
The : part just selects the entire row in a shorthand way.
There is also a possibility of letting it go from some index to the end by using m:, where m is some known index.
If you want to go between to known indices, then we will use
M[row_number, m:n]
where m < n.
You can equate the rows/columns of a 2D-array only if they are of the same dimension.
I won't give you the exact piece of code that you'll need, but hopefully now you can figure it out using the above piece of code.
I will also suggest playing around with all kinds of matrices, and their operations like replacing some elements, columns, and rows, as well as playing with matrix multiplication until you get the hang of it.
Some useful, commands include
numpy.random.rand(m, n) # will create a matrix of dimension m x n with pseudo-random numbers between 0 and 1
numpy.random.rand(m, n) # will create a matrix of dimension m x n with pseudo-random numbers between -1 and 1
numpy.eye(m) # will create a m x m identity matrix.
numpy.ones((m, n))
And make sure to read through the docs.
Good luck! And let your Python journey be a fun one. :)

Calculate a discrete mean in python

I have a set of data points for which I have made a program that will look into the data set, from that set take every n points, and sum it, and put it in a new list. And with that I can make a simple bar plots.
Now I'd like to calculate a discrete mean for my new list.
The formula I'm using is this: t_av=(1/nsmp) Sum[N_i*t_i,{i,n_l,n_u}]
Basically I have nsmp bins that have N_i number in them, t_i is a time of a bin, and n_l is the first bin, and n_u is the last bin.
So if my list is this: [373, 156, 73, 27, 16],
I have 5 bins, and I have: t_av=1/5 (373*1+156*2+73*3+27*4+16*5)=218.4
Now I have run into a problem. I tried with this:
for i in range(0,len(L)):
sr_vr = L[i]*i
tsr=sr_vr/nsmp
Where nsmp is the number of bins I can set, and I have L calculated. Since range will go from 0,1,2,3,4 I won't get the correct answer, because my first bin is calculated by 0. If I say range(1,len(L)+1) I'll get IndexError: list index out of range, since that will mess up the L[i]*i part since he will still multiply second (1) element of the list with 1, and then he'll be one entry short for the last part.
How do I correct this?
You can just use L[i]*(i+1) (assuming you stick with zero-based indexing).
However you can also use enumerate() to loop over indices and values together, and you can even provide 1 as the second argument so that the indexing starts at 1 instead of 0.
Here is how I would write this:
tsr = sum(x * i for i, x in enumerate(L, 1)) / len(L)
Note that if you are on Python 2.x and L contains entirely integers this will perform integer division. To get a float just convert one of the arguments to a float (for example float(len(L))). You can also use from __future__ import division.

Categories

Resources