python random sample for samples larger than population

python random sample for samples larger than population - python

I have a list of integers which represent the number of applications submitted per day for a 60 day period. I need to randomly generate a list of 288 integers that sum to the number of applications per day. I have the following code:
import random as r
issued = [1000,200,344...]
def random_sum_to(n, num_terms = None):
num_terms = (num_terms or r.randint(2, n)) - 1
a = r.sample(range(1, n), num_terms) + [0, n]
list.sort(a)
return [a[i+1] - a[i] for i in range(len(a) - 1)]
for i in issued:
print(random_sum_to(i,288))
Where issued is the list of integers that are the sum of the applications submitted per day. This code works great for numbers greater than 288 but crashes for numbers less than 288. Reading on here i saw that random.choice should be used but I cannot figure out how to implement it correctly. Looking at the results it looks like 0 is never printed so that is clearly a potential source for the problem. Any suggestions?

Frankly, all this zip stuff, list comprehension etc might look like a clever thing, but why don't you use Multinomial sampling?
One liner, really, and sum is automatically, by definition, equal to N
import numpy as np
t = np.random.multinomial(200, [1/288.]*288, size=1) # sample 288 numbers summed to 200
print(t)
print(sum(t[0]))
t = np.random.multinomial(1000, [1/288.]*288, size=1) # sample 288 numbers summed to 1000
print(t)
print(sum(t[0]))

Obviously, if you want more than n numbers to add to n, some of those numbers will have to be zero. Thus, you can not pick distinct numbers as the separators as you do with sample. Instead, just use choice to pick any values as separator, including duplicates, add 0 and n as the start and endpoint, and get the differences.
def random_sum_to(n, num_terms = None):
num_terms = (num_terms or r.randint(2, n)) - 1
a = sorted([r.randrange(n) for _ in range(num_terms)])
return [y-x for x, y in zip([0]+a, a+[n])]
This way, a possible result for random_sum_to(200, 288) might look like this:
[0, 2, 1, 0, 0, 0, 1, 0, 2, 2, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 2, 0, 0, 2, 0, 2, 1, 0, 1, 0, 0, 1, 0, 1, 0, 2, 0, 0, 1, 0, 1, 0, 2, 0, 0, 1, 4, 0, 1, 2, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 2, 0, 2, 1, 0, 0, 1, 1, 2, 0, 0, 4, 1, 0, 1, 1, 0, 2, 0, 1, 0, 2, 0, 2, 0, 1, 2, 0, 1, 2, 1, 1, 0, 0, 1, 0, 1, 1, 1, 2, 1, 1, 2, 0, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 1, 2, 2, 0, 0, 3, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 3, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 2, 1, 0, 0, 1, 1, 1, 0, 0, 2, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 3, 0, 0, 1, 0, 1, 0, 1, 2, 0, 0, 0, 1, 2, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 2, 0, 1, 0, 0, 0, 4, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 3, 1, 0, 2, 0, 0, 1, 0, 1, 2, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 3, 3, 0, 1, 1, 0, 1, 0, 2, 0, 1, 0, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 2]

Related

How do i replace multiple consecutive parts of an array?

So the question revolve around character segmentation. My problem is the following:
I want to segment characters, based on y-axis pixel numbers, following this ( in python) : source
What i already done to get here:
read image io.imread
swap axis np.swapaxes
sum the numbers of each column (now row) - > got y array
I got to the point where i have two arrays (both of them are exactly what I use);
x = [94, 72, 2, 2, 1, 66, 1, 13, 1, 16, 1, 8, 1, 5, 1, 47, 1, 1, 1, 3, 1, 17, 14, 87, 100]
y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
y is the thresholded binary array of the y-axis, (0 if pixel count < 1275, 1 otherwise)
x is the itertools groupby version of the y array.
I have the avarege distance of the letters too, so i know which are the wrongly segmented parts. (according to the x, the avg is 28.)
And this is the image i would like to segment, it has 4 letters, "a","l","m","a":
picture which i would like to segment
So in theory, if i could somehow merge the parts where the number of the ones are lower than the avg, and turn the "separating" zeros to ones, i should get a list which is as long as the width of the image, and has zeros only where it should have.
If i use cv.line on the y array, it indeed does segment the characters, drawing a red line where the array is 0, but it oversegments it.
oversegmented image
What i would like to do is "modify" or just re-do the y array, based on the x.
I tried a lot of methods, but i just cant find the algorithm to go over the x, find the wrong values, delete the zeros inbetween, and modify a list according to that.
My best shot is this easy, nothing-like-my-original-idea piece:
num = 0
betterarray = []
for i in range(len(y)):
if( num == 1 and y[i] == 0 and y[i+1] == 1):
betterarray.append(1)
else :
betterarray.append(y[i])
num = y[i]
It does deletes the (most of the time) one column only bad segmentations, but as I guessed, it does delete some good segmentations aswell.

You should identify the wrongly segmented letters by comparing your segments to the peak segment average and modifying the x array by combining any peak segments that are smaller than the average.
def locate_oversegmentation(array, mask, avg):
length = len(array)
for i in range(length):
// less than average peak
if (mask[i]==1 and array[i]<=avg):
if (i-2>=0):
// previous peak is less than avg
if (array[i-2]<=avg):
mask[i-1] = 1
if (i+2<=length):
// next peak is less than avg
if (array[i+2]<=avg):
mask[i+1] = 1
return mask
This function takes in array x and a compact version of array y by grouping consecutive 0's and 1's. compact_y = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0] It will return a new array changing 0's between below avg peaks to 1's. The output array is a guide to combining peaks in array x.
Example:
x = [94, 72, 2, 2, 1, 66, 1, 13, 1, 16, 1, 8, 1, 5, 1, 47, 1, 1, 1, 3, 1, 17, 14, 87, 100]
compact_y = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
avg = 28
guide = locate_oversegmentation(x, compact_y, avg)
>> guide = [0,1,0,1,0,1,0,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,0,1,0]
Apply the guide on array x by adding consecutive 1's together in array x.

How to find the difference of elements within an array of 0 and 1

I currently have this array.
idx_binary = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
I'm trying to find all the instances where the difference between two elements in this array is 1 (so when it goes from 0 to 1).
This is what I currently have for that:
threshCross_idx = np.where(np.diff(idx_binary) == 1)
However, this is giving me all the instances of 1 ((array([48], dtype=int64))). Can anyone shed light on what I'm doing incorrectly? I'd expect the output to be 1, since there's only one instance where the difference between two elements is 1.

Cplex Error: Adding trivial infeasible linear constraint

I want to solve an integer programming model with cplex python. I have this model:
a and h are matrixes with 0s and 1s. p is a set of numbers.
here is a part of my cplex code for this model:
p=[i for i in range (len(h))]
x=mdl.binary_var_dict(p,name='x')
#objective
mdl.minimize(0)
#constraints
#1
mdl.add_constraints(mdl.sum(h[i][k]*x[i] for i in p)==4 for k in T)
#2
mdl.add_constraints(mdl.sum(a[i][k]*x[i] for i in p)==4 for k in T)
mdl.print_information()
Solution = mdl.solve(log_output=False)
mdl.get_solve_status()
print(Solution)
When I run the program I get this error:
Error: Adding trivial infeasible linear constraint: 0 == 4, rank: 1
Error: Adding trivial infeasible linear constraint: 0 == 4, rank: 1
Error: Adding trivial infeasible linear constraint: 0 == 4, rank: 23
Error: Adding trivial infeasible linear constraint: 0 == 4, rank: 23
'h' is a 600*22 matrix and 'a' is reverse of h(if there's a 1 (or 0) in h, it is 0 (or 1) in a). A sample of h:
[1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0]]
I don't understand where is the problem.

The error messages tells you what happens: you added a constraint that is trivially infeasible, i.e., that can obviously not be satisfied. From the error message it seems you added some == 4 constraints with an empty left-hand side.
From your code it looks that this would happen if p is empty.

How to increase a grid world's size by 1000 times

I'm using a program in which I have to input the environment's map. The input form looks like this.
self.map=[ [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 0, 1, 1],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
I want to increase the size of the given structure by thousand times and maintain the form of the structure. After increasing the structure size will be 18000x6000. The code looks like this
Can someone suggest me a way to achieve this or any alternate way.

If you really want to use Python's lists (numpy's arrays are better for large matrices) you could use
repeatfactor = 1000
mat = self.map # copy reference, not data
m = len(mat)
n = len(mat[0])
newmatrix = [[mat[r % m][c % n]
for c in range(n * repeatfactor)]
for r in range(m * repeatfactor)]

Try np.repeat twice--once in each axis. Not the prettiest, but should work. So something like this:
map_array = np.array(self.map)
map_array = np.repeat(map_array, 1000, axis=0)
map_array = np.repeat(map_array, 1000, axis=1)

Numpy: check whether a bit is set to 1 or 0 in an array?

Suppose the following:
bitstring = numpy.random.random_integers(0,2**32,size=8).astype(numpy.uint32)
How can I find out which of the 256 bits are set to 1? I've got this... but this is crazy, isn't it?
maximum = (2**32)-1
for checkbit in range (256):
yes = bool(numpy.bitwise_and((2**checkbit)%maximum, bitstring[ ( (checkbit // maximum) + checkbit % maximum ) // 32 ] ) )
print 'bit', checkbit, 'set to', yes, 'in string', ( (checkbit // maximum) + checkbit % maximum ) // 32
I believe the answer may be extremely simple, yet google hasn't helped at all, and this related question is referring only to bytes.
Since I need to do this op billions of times, I wonder if there's a pythonic way to make it work as fast as possible.

I'm not sure if you want to count the number of "1" bits or to check wether a specific bit is set.
To check, I guess the easier way is: bool(n&(1<<b)), where n is the number being tested and b is the bit (starting from 0).
To count the number of "1" bit, I guess there is nothing faster than a lookup table.
For instance, you can use 65k of memory and split the 256 bits into 16 groups of 16 bits. Then, you look up in a table the value of the counter.
In order to generate the table, you can use any of the other mentioned methods. For instance:
table = [bin(i).count('1') for i in xrange(1<<16)]
Then, to count the number of bits, you can just sum up the values form the table, for instance:
n = 0x123456789123456789
cnt = 0
while n > 0:
cnt += table[n%((1<<16)-1)]
n >>= 16
If you have enough memory, you can increase your table. For a 32 bit table you will need 4GB of memory. It is the classical tradeoff of processing vs memory consumption.

You can use [np.unpackbits]1, although you will first have to view your array as np.uint8, and take care your self of figuring how the endianess of your system affects the result you get:
>>> np.unpackbits(bitstring.view(np.uint8))
array([1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1,
1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0,
0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1,
1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0,
1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,
1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0,
1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1,
1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0,
0, 0, 0], dtype=uint8)

You can convert a number in python to a binary string with bin
n = 4187390046
binary_str = bin(n)
Which yields
Out[7]: '0b11111001100101101000000001011110'
Then you can find all indexes of 1 in that string with something like
def find_ones(s):
return [i - 2 for i, bit in enumerate(s) if bit == '1']
Because the binary string has the leading 0b you should adjust the values accordingly

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python random sample for samples larger than population - python

Related

How do i replace multiple consecutive parts of an array?

How to find the difference of elements within an array of 0 and 1

Cplex Error: Adding trivial infeasible linear constraint

How to increase a grid world's size by 1000 times

Numpy: check whether a bit is set to 1 or 0 in an array?

Categories

Resources