Numpy array not getting updated - python

I have a numpy array data3 of size 640X480.
I have written this code to update a specific condition which works well.
data4=np.where((data3<=119) & (data3>110),13,data3)
Following is the list:-
b = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 150]
To update the data, following is the code
for i in range(2,17):
data4=np.where((data3<=b[i]) & (data3>b[i-1]),i+1,data3)
any pointers why data doesn't get updated?

when i=16, the condition becomes:
data4=np.where((data3<=150) & (data3>150),i+1,data3)
of course the answer of "(data3<=150) & (data3>150)" is False (nothing), so all the data will be replace with data3.
So, at the end of the loop, you will get data4=data3.

Related

How to split a list into 2 unsorted groupings based on the median

I am aiming to sort a list into two subsections that don't need to be sorted.
Imagine I have a list of length 10 that has values 0-9 in it.
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
I would want to sort it in a way that indices 0 through 4 contain values 10, 20, 30, 40, and 50 in any ordering.
For example:
# SPLIT HERE V
[40, 30, 20, 50, 10, 70, 60, 80, 90, 100]
I've looked into various divide and conquer sorting algorithms, but I'm uncertain which one would be the best to use in this case.
My current thought is to use quicksort, but I believe there is a better way to do what I am searching to do since everything does not need to be sorted exactly, but sorted in a "general" sense that all values are on their respective side of the median in any ordering.
to me this seems to do the trick , unless you exactly need the output to be unordered :
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
sorted_arr = sorted(arr)
median_index = len(arr)//2
sub_list1, sub_list2 = sorted_arr[:median_index],sorted_arr[median_index:]
this outputs :
[10, 20, 30, 40, 50] [60, 70, 80, 90, 100]
The statistics package has a method for finding the median of a list of numbers. From there, you can use a for loop to separate the values into two separate lists based on whether or not it is greater than the median:
from statistics import median
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
med = median(arr)
result1 = []
result2 = []
for item in arr:
if item <= med:
result1.append(item)
else:
result2.append(item)
print(result1)
print(result2)
This outputs:
[50, 30, 20, 10, 40]
[90, 100, 70, 60, 80]
If you would like to solve the problem from scratch you could implement Median of Medians algorithm to find median of unsorted array in linear time. Then it depends what is your goal.
If you would like to make the reordering in place you could use the result of Median of Medians algorithm to select a Pivot for Partition Algorithm (part of quick sort).
On the other hand using python you could then just iterate through the array and append the values respectively to left or right array.
Other current other answers have the list split into two lists, and based on your example I am under the impression there is two groupings, but the output is one list.
import numpy as np
# setup
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
# output array
unsorted_grouping = []
# get median
median = np.median(arr)
# loop over array, if greater than median, append. Append always assigns
# values at the end of array
# else insert it at position 0, the beginning / left side
for val in arr:
if val >= median:
unsorted_grouping.append(val)
else:
unsorted_grouping.insert(0, val)
# output
unsorted_grouping
[40, 10, 20, 30, 50, 90, 100, 70, 60, 80]
You can use the statistics module to calculate the median, and then use it to add each value to one group or the other:
import statistics
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
median = statistics.median(arr)
bins = [], [] # smaller and bigger values
for value in arr:
bins[value > median].append(value)
print(bins[0]) # -> [50, 30, 20, 10, 40]
print(bins[1]) # -> [90, 100, 70, 60, 80]
You can do this with numpy (which is significantly faster if arr is large):
import numpy as np
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
arr = np.array(arr)
median = np.median(arr)
result1 = arr[arr <= median]
result2 = arr[arr > median]
Output:
array([50, 30, 20, 10, 40])
array([ 90, 100, 70, 60, 80])
And if you want one list as the output, you can do:
[*result1, *result2]
Output:
[50, 30, 20, 10, 40, 90, 100, 70, 60, 80]
My first Python program, so please bear with me.
Basically does QuickSort, as you suggest, but only sub-sorts the partition that holds the median index.
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
def partition(a, left, right):
pivot = (left + right)//2
a[left],a[pivot] = a[pivot], a[left] # swap
pivot = left
left += 1
while right >= left :
while left <= right and a[left] <= a[pivot] :
left += 1
while left <= right and a[right] > a[pivot] :
right -= 1
if left <= right:
a[left] , a[right] = a[right], a[left]
left += 1
right -= 1
else:
break
a[pivot], a[right] = a[right] , a[pivot]
return right
def medianSplit(array):
left = 0;
right = len(array) - 1;
med = len(array) // 2;
while (left < right):
pivot = partition(array, left, right)
if pivot > med:
right = pivot - 1;
else:
left = pivot + 1;
def main():
medianSplit(arr)
print(arr)
main()

Why python np.random.choice does not match with matlab randsample?

I'm converting Matlab program to Python.
I'm trying to random sampling from an array by using np.random.choice, but the result doesn't match with Matlab randsample.
For example,
I did this with Python,
np.random.seed(100)
a = np.arange(10, 110, 10)
np.random.choice(a, 2, True)
>> Output: array([90, 90])
And the following is Matlab,
rng(100)
a = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
randsample(a, 2, true)
>> Output: 60 30
Values in both arrays are different.
Am I doing something wrong?
Any help will be appreciated,
Thanks!

calculate the df.describe() for each value in a column and recreate a dataframe

Imagine the following data frame:
d={‘cluster’: [1,1,3,4,2,2],
‘Weight‘: [65, 70, 68, 75, 78, 62],
‘Height’: [170, 173, 174, 180, 184, 167]}
df=pd.DataFrame(d)
Now, how to use a for loop to return a dataframe that calculate the average weight and height for each value in cluster.
If I write stupid codes will be like this:
#creating subsets and concat
a=pd.DaFrame(df[df[‘cluster’]==1].describe().loc[‘mean’])
b= pd.DaFrame(df[df[‘cluster’]==2].describe().loc[‘mean)
....
DF= pd.concat([a,b], axis=1)
It will be ridiculous when there are more clusters in a column.
Thank you.
import pandas as pd
d={'cluster': [1,1,3,4,2,2],
'Weight': [65, 70, 68, 75, 78, 62],
'Height': [170, 173, 174, 180, 184, 167]}
df=pd.DataFrame(d)
df.groupby('cluster').agg(['mean'])
This implementation also has the benefit that you can add further aggregation-based functions (e.g. median) in the future if necessary.
Try:
import pandas as pd
d={'cluster': [1,1,3,4,2,2],
'Weight': [65, 70, 68, 75, 78, 62],
'Height': [170, 173, 174, 180, 184, 167]}
df=pd.DataFrame(d)
newdf = df.groupby('cluster').describe().iloc[:,1]
print(newdf)
EDIT: WeNYoBen does it better if you want only the means/don't need to pick anything else from describe()

Python - cut only the descending part of the dataset

I have a timeseries with various downcasts. My question is how do I slice a pandas dataframe (or in this case the array, just to keep it simple) to get the data and its indexes of the descending bits of the timeseries?
import matplotlib.pyplot as plt
import numpy as np
b = np.asarray([ 1.3068586 , 1.59882279, 2.11291473, 2.64699527,
3.23948166, 3.81979878, 4.37630243, 4.97740025,
5.59247254, 6.18671493, 6.77414586, 7.43078595,
8.02243495, 8.59612224, 9.22302662, 9.83263379,
10.43125902, 11.0956864 , 11.61107838, 12.09616684,
12.63973254, 12.49437955, 11.6433792 , 10.61083269,
9.50534291, 8.47418827, 7.40571742, 6.56611512,
5.66963658, 4.89748187, 4.10543794, 3.44828054,
2.76866318, 2.24306623, 1.68034463, 1.26568186,
1.44548443, 2.01225076, 2.60715524, 3.21968562,
3.8622007 , 4.57035958, 5.14021305, 5.77879484,
6.42776897, 7.09397923, 7.71722028, 8.30860725,
8.96652218, 9.66157193, 10.23469208, 10.79889453,
10.5788411 , 9.38270646, 7.82070643, 6.74893389,
5.68200335, 4.73429009, 3.78358222, 3.05924946,
2.30428171, 1.78052369, 1.27897065, 1.16840532,
1.59452726, 2.13085096, 2.70989933, 3.3396291 ,
3.97318058, 4.62429262, 5.23997774, 5.91232803,
6.5906609 , 7.21099657, 7.82936331, 8.49636247,
9.15634983, 9.76450244, 10.39680729, 11.04659976,
11.69287237, 12.35692643, 12.99957563, 13.66228386,
14.31806385, 14.91871927, 15.57212978, 16.22288287,
16.84697357, 17.50502002, 18.15907842, 18.83068151,
19.50945548, 20.18020639, 20.84441358, 21.52792846,
22.17933087, 22.84614545, 23.51212887, 24.18308399,
24.8552263 , 25.51709528, 26.18724379, 26.84531493,
27.50690265, 28.16610365, 28.83394822, 29.49621179,
30.15118676, 30.8019521 , 31.46714114, 32.1213546 ,
32.79366952, 33.45233007, 34.12158193, 34.77502197,
35.4532211 , 36.11018053, 36.76540453, 37.41746323])
plt.plot(-b)
plt.show()
You can just change the negative diffs to NaN and then plot:
bb = pd.Series(-b)
bb[bb.diff().ge(0)] = np.nan
bb.plot()
To get the indexes of descending values, use:
bb.index[bb.diff().lt(0)]
Int64Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 20, 37, 38, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 65, 66, 67, 68,
69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,
95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119],
dtype='int64')
create a second dataframe where you move everyting from one index then you do it by substracting them term to term. you should get what you want (getting only the ones with negative diff)
here:
df = DataFrame(b)
df = concat([df.shift(1),df],axis = 1)
df.columns = ['t-1','t']
df.reset_index()
df = df.drop(df.index[0])
df['diff'] = df['t']-df['t-1']
res = df[df['diff']<0]
There is also an easy numpy-only solution (the question is tagged pandas but the code uses only numpy) using np.where. You want the points where the graph is descending which means the data is ascending.
# the indices where the data is ascending.
ix, = np.where(np.diff(b) > 0)
# the values
c = b[ix]
Note that this will give you the first value in each ascending pair of consecutive values, while the pandas-based solution gives the second one. To get the same indices just add 1 to ix.
s = pd.Series(b)
assert np.all(s[s.diff() > 0].index == ix + 1)
assert np.all(s[s.diff() > 0] == b[ix + 1])

Find index of min value in a matrix

I've a 2-Dim array containing the residual sum of squares of a given fit (unimportant here).
RSS[i,j] = np.sum((spectrum_theo - sp_exp_int) ** 2)
I would like to find the matrix element with the minimum value AND its position (i,j) in the matrix. Find the minimum element is OK:
RSS_min = RSS[RSS != 0].min()
but for the index, I've tried:
ij_min = np.where(RSS == RSS_min)
which gives me:
ij_min = (array([3]), array([20]))
I would like to obtain instead:
ij_min = (3,20)
If I try :
ij_min = RSS.argmin()
I obtain:
ij_min = 0,
which is a wrong result.
Does it exist a function, in Scipy or elsewhere, that can do it? I've searched on the web, but I've found answers leading only with 1-Dim arrays, not 2- or N-Dim.
Thanks!
The easiest fix based on what you have right now would just be to extract the elements from the array as a final step:
# ij_min = (array([3]), array([20]))
ij_min = np.where(RSS == RSS_min)
ij_min = tuple([i.item() for i in ij_min])
Does this work for you
import numpy as np
array = np.random.rand((1000)).reshape(10,10,10)
print np.array(np.where(array == array.min())).flatten()
in the case of multiple minimums you could try something like
import numpy as np
array = np.array([[1,1,2,3],[1,1,4,5]])
print zip(*np.where(array == array.min()))
You can combine argmin with unravel_index.
For example, here's an array RSS:
In [123]: np.random.seed(123456)
In [124]: RSS = np.random.randint(0, 99, size=(5, 8))
In [125]: RSS
Out[125]:
array([[65, 49, 56, 43, 43, 91, 32, 87],
[36, 8, 74, 10, 12, 75, 20, 47],
[50, 86, 34, 14, 70, 42, 66, 47],
[68, 94, 45, 87, 84, 84, 45, 69],
[87, 36, 75, 35, 93, 39, 16, 60]])
Use argmin (which returns an integer that is the index in the flattened array), and then pass that to unravel_index along with the shape of RSS to convert the index of the flattened array into the indices of the 2D array:
In [126]: ij_min = np.unravel_index(RSS.argmin(), RSS.shape)
In [127]: ij_min
Out[127]: (1, 1)
ij_min itself can be used as an index into RSS to get the minimum value:
In [128]: RSS_min = RSS[ij_min]
In [129]: RSS_min
Out[129]: 8

Categories

Resources