Why python np.random.choice does not match with matlab randsample? - python

I'm converting Matlab program to Python.
I'm trying to random sampling from an array by using np.random.choice, but the result doesn't match with Matlab randsample.
For example,
I did this with Python,
np.random.seed(100)
a = np.arange(10, 110, 10)
np.random.choice(a, 2, True)
>> Output: array([90, 90])
And the following is Matlab,
rng(100)
a = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
randsample(a, 2, true)
>> Output: 60 30
Values in both arrays are different.
Am I doing something wrong?
Any help will be appreciated,
Thanks!

Related

How to split a list into 2 unsorted groupings based on the median

I am aiming to sort a list into two subsections that don't need to be sorted.
Imagine I have a list of length 10 that has values 0-9 in it.
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
I would want to sort it in a way that indices 0 through 4 contain values 10, 20, 30, 40, and 50 in any ordering.
For example:
# SPLIT HERE V
[40, 30, 20, 50, 10, 70, 60, 80, 90, 100]
I've looked into various divide and conquer sorting algorithms, but I'm uncertain which one would be the best to use in this case.
My current thought is to use quicksort, but I believe there is a better way to do what I am searching to do since everything does not need to be sorted exactly, but sorted in a "general" sense that all values are on their respective side of the median in any ordering.
to me this seems to do the trick , unless you exactly need the output to be unordered :
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
sorted_arr = sorted(arr)
median_index = len(arr)//2
sub_list1, sub_list2 = sorted_arr[:median_index],sorted_arr[median_index:]
this outputs :
[10, 20, 30, 40, 50] [60, 70, 80, 90, 100]
The statistics package has a method for finding the median of a list of numbers. From there, you can use a for loop to separate the values into two separate lists based on whether or not it is greater than the median:
from statistics import median
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
med = median(arr)
result1 = []
result2 = []
for item in arr:
if item <= med:
result1.append(item)
else:
result2.append(item)
print(result1)
print(result2)
This outputs:
[50, 30, 20, 10, 40]
[90, 100, 70, 60, 80]
If you would like to solve the problem from scratch you could implement Median of Medians algorithm to find median of unsorted array in linear time. Then it depends what is your goal.
If you would like to make the reordering in place you could use the result of Median of Medians algorithm to select a Pivot for Partition Algorithm (part of quick sort).
On the other hand using python you could then just iterate through the array and append the values respectively to left or right array.
Other current other answers have the list split into two lists, and based on your example I am under the impression there is two groupings, but the output is one list.
import numpy as np
# setup
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
# output array
unsorted_grouping = []
# get median
median = np.median(arr)
# loop over array, if greater than median, append. Append always assigns
# values at the end of array
# else insert it at position 0, the beginning / left side
for val in arr:
if val >= median:
unsorted_grouping.append(val)
else:
unsorted_grouping.insert(0, val)
# output
unsorted_grouping
[40, 10, 20, 30, 50, 90, 100, 70, 60, 80]
You can use the statistics module to calculate the median, and then use it to add each value to one group or the other:
import statistics
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
median = statistics.median(arr)
bins = [], [] # smaller and bigger values
for value in arr:
bins[value > median].append(value)
print(bins[0]) # -> [50, 30, 20, 10, 40]
print(bins[1]) # -> [90, 100, 70, 60, 80]
You can do this with numpy (which is significantly faster if arr is large):
import numpy as np
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
arr = np.array(arr)
median = np.median(arr)
result1 = arr[arr <= median]
result2 = arr[arr > median]
Output:
array([50, 30, 20, 10, 40])
array([ 90, 100, 70, 60, 80])
And if you want one list as the output, you can do:
[*result1, *result2]
Output:
[50, 30, 20, 10, 40, 90, 100, 70, 60, 80]
My first Python program, so please bear with me.
Basically does QuickSort, as you suggest, but only sub-sorts the partition that holds the median index.
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
def partition(a, left, right):
pivot = (left + right)//2
a[left],a[pivot] = a[pivot], a[left] # swap
pivot = left
left += 1
while right >= left :
while left <= right and a[left] <= a[pivot] :
left += 1
while left <= right and a[right] > a[pivot] :
right -= 1
if left <= right:
a[left] , a[right] = a[right], a[left]
left += 1
right -= 1
else:
break
a[pivot], a[right] = a[right] , a[pivot]
return right
def medianSplit(array):
left = 0;
right = len(array) - 1;
med = len(array) // 2;
while (left < right):
pivot = partition(array, left, right)
if pivot > med:
right = pivot - 1;
else:
left = pivot + 1;
def main():
medianSplit(arr)
print(arr)
main()

Creating a limit to find the sum of array values to a set number Numpy Python

I want to make a function where the sum of the Arrays and Arrays2 array is equivalent to val. The function should modify the Arrays and Arrays2 values so that the last index will output the sum of all values in the array to be val. How will be able to get the Expected Output?
import numpy as np
Arrays = np.array([50, 30, 25, 87, 44, 68, 45])
Arrays2 = np.array([320])
val = 300
Expected output:
[50, 30, 25, 87, 44, 64]
[300]
something like this?
import numpy as np
Arrays = np.array([50, 30, 25, 87, 44, 68, 45])
Arrays2 = np.array([320])
val = 300
def thisRareFunction(arr):
outArrays = []
acum = 0
for x in arr:
acum += x
if acum <=val:
outArrays.append(x)
else:
outArrays.append(x -(acum-val))
break
return outArrays
print(thisRareFunction(Arrays))
print(thisRareFunction(Arrays2))

Numpy array not getting updated

I have a numpy array data3 of size 640X480.
I have written this code to update a specific condition which works well.
data4=np.where((data3<=119) & (data3>110),13,data3)
Following is the list:-
b = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 150]
To update the data, following is the code
for i in range(2,17):
data4=np.where((data3<=b[i]) & (data3>b[i-1]),i+1,data3)
any pointers why data doesn't get updated?
when i=16, the condition becomes:
data4=np.where((data3<=150) & (data3>150),i+1,data3)
of course the answer of "(data3<=150) & (data3>150)" is False (nothing), so all the data will be replace with data3.
So, at the end of the loop, you will get data4=data3.

Find index of min value in a matrix

I've a 2-Dim array containing the residual sum of squares of a given fit (unimportant here).
RSS[i,j] = np.sum((spectrum_theo - sp_exp_int) ** 2)
I would like to find the matrix element with the minimum value AND its position (i,j) in the matrix. Find the minimum element is OK:
RSS_min = RSS[RSS != 0].min()
but for the index, I've tried:
ij_min = np.where(RSS == RSS_min)
which gives me:
ij_min = (array([3]), array([20]))
I would like to obtain instead:
ij_min = (3,20)
If I try :
ij_min = RSS.argmin()
I obtain:
ij_min = 0,
which is a wrong result.
Does it exist a function, in Scipy or elsewhere, that can do it? I've searched on the web, but I've found answers leading only with 1-Dim arrays, not 2- or N-Dim.
Thanks!
The easiest fix based on what you have right now would just be to extract the elements from the array as a final step:
# ij_min = (array([3]), array([20]))
ij_min = np.where(RSS == RSS_min)
ij_min = tuple([i.item() for i in ij_min])
Does this work for you
import numpy as np
array = np.random.rand((1000)).reshape(10,10,10)
print np.array(np.where(array == array.min())).flatten()
in the case of multiple minimums you could try something like
import numpy as np
array = np.array([[1,1,2,3],[1,1,4,5]])
print zip(*np.where(array == array.min()))
You can combine argmin with unravel_index.
For example, here's an array RSS:
In [123]: np.random.seed(123456)
In [124]: RSS = np.random.randint(0, 99, size=(5, 8))
In [125]: RSS
Out[125]:
array([[65, 49, 56, 43, 43, 91, 32, 87],
[36, 8, 74, 10, 12, 75, 20, 47],
[50, 86, 34, 14, 70, 42, 66, 47],
[68, 94, 45, 87, 84, 84, 45, 69],
[87, 36, 75, 35, 93, 39, 16, 60]])
Use argmin (which returns an integer that is the index in the flattened array), and then pass that to unravel_index along with the shape of RSS to convert the index of the flattened array into the indices of the 2D array:
In [126]: ij_min = np.unravel_index(RSS.argmin(), RSS.shape)
In [127]: ij_min
Out[127]: (1, 1)
ij_min itself can be used as an index into RSS to get the minimum value:
In [128]: RSS_min = RSS[ij_min]
In [129]: RSS_min
Out[129]: 8

Numpy Array Slicing

I have a 1D numpy array, and some offset/length values. I would like to extract from this array all entries which fall within offset, offset+length, which are then used to build up a new 'reduced' array from the original one, that only consists of those values picked by the offset/length pairs.
For a single offset/length pair this is trivial with standard array slicing [offset:offset+length]. But how can I do this efficiently (i.e. without any loops) for many offset/length values?
Thanks,
Mark
>>> import numpy as np
>>> a = np.arange(100)
>>> ind = np.concatenate((np.arange(5),np.arange(10,15),np.arange(20,30,2),np.array([8])))
>>> a[[ind]]
array([ 0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 8])
There is the naive method; just doing the slices:
>>> import numpy as np
>>> a = np.arange(100)
>>>
>>> offset_length = [(3,10),(50,3),(60,20),(95,1)]
>>>
>>> np.concatenate([a[offset:offset+length] for offset,length in offset_length])
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 50, 51, 52, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 95])
The following might be faster, but you would have to test/benchmark.
It works by constructing a list of the desired indices, which is valid method of indexing a numpy array.
>>> indices = [offset + i for offset,length in offset_length for i in xrange(length)]
>>> a[indices]
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 50, 51, 52, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 95])
It's not clear if this would actually be faster than the naive method but it might be if you have a lot of very short intervals. But I don't know.
(This last method is basically the same as #fraxel's solution, just using a different method of making the index list.)
Performance testing
I've tested a few different cases: a few short intervals, a few long intervals, lots of short intervals. I used the following script:
import timeit
setup = 'import numpy as np; a = np.arange(1000); offset_length = %s'
for title, ol in [('few short', '[(3,10),(50,3),(60,10),(95,1)]'),
('few long', '[(3,100),(200,200),(600,300)]'),
('many short', '[(2*x,1) for x in range(400)]')]:
print '**',title,'**'
print 'dbaupp 1st:', timeit.timeit('np.concatenate([a[offset:offset+length] for offset,length in offset_length])', setup % ol, number=10000)
print 'dbaupp 2nd:', timeit.timeit('a[[offset + i for offset,length in offset_length for i in xrange(length)]]', setup % ol, number=10000)
print ' fraxel:', timeit.timeit('a[np.concatenate([np.arange(offset,offset+length) for offset,length in offset_length])]', setup % ol, number=10000)
This outputs:
** few short **
dbaupp 1st: 0.0474979877472
dbaupp 2nd: 0.190793991089
fraxel: 0.128381967545
** few long **
dbaupp 1st: 0.0416231155396
dbaupp 2nd: 1.58000087738
fraxel: 0.228138923645
** many short **
dbaupp 1st: 3.97210478783
dbaupp 2nd: 2.73584890366
fraxel: 7.34302687645
This suggests that my first method is the fastest when you have a few intervals (and it is significantly faster), and my second is the fastest when you have lots of intervals.

Categories

Resources