Finding highest values "zone" in a 2d matrix in Python - python

I have a 2d matrix in Python like this (a 10 rows/20 columns list I use to later do an imshow):
[[-20.17 -12.88 -20.7 -25.69 -21.69 -34.22 -32.65 -31.74 -36.36 -37.65
-41.42 -41.14 -44.01 -43.19 -41.85 -39.25 -40.15 -41.31 -39.73 -28.66]
[ 14.18 53.86 70.03 64.39 72.37 39.95 30.44 28.14 20.77 17.98
25.74 25.66 27.56 37.61 42.39 42.39 35.79 41.65 41.65 41.84]
[ 33.71 68.35 69.39 66.7 59.99 40.08 40.08 40.8 26.19 19.82
19.82 18.07 20.32 19.51 24.77 22.81 21.45 21.45 21.45 23.7 ]
[103.72 55.11 32.3 29.47 16.53 15.54 9.4 8.11 5.06 5.06
13.07 13.07 12.99 13.47 13.47 13.47 12.92 12.92 14.27 20.63]
[ 59.02 18.6 37.53 24.5 13.01 34.35 8.16 13.66 12.57 8.11
8.11 8.11 8.11 8.11 8.11 5.66 5.66 5.66 5.66 7.41]
[ 52.69 14.17 7.25 -5.79 3.19 -1.75 -2.43 -3.98 -4.92 -6.68
-6.68 -6.98 -6.98 -8.89 -8.89 -9.15 -9.15 -9.15 -9.15 -9.15]
[ 29.24 10.78 0.6 -3.15 -12.55 3.04 -1.68 -1.68 -1.41 -6.15
-6.15 -6.15 -10.59 -10.59 -10.59 -10.59 -10.59 -9.62 -10.29 -10.29]
[ 6.6 0.11 2.42 0.21 -5.68 -10.84 -10.84 -13.6 -16.12 -14.41
-15.28 -15.28 -15.28 -18.3 -5.55 -13.16 -13.16 -13.16 -13.16 -14.15]
[ 3.67 -11.69 -6.99 -16.75 -19.31 -20.28 -21.5 -21.5 -34.02 -37.16
-25.51 -25.51 -26.36 -26.36 -26.36 -26.36 -29.38 -29.38 -29.59 -29.38]
[ 31.36 -2.87 0.34 -8.06 -12.14 -22.7 -24.39 -25.51 -26.36 -27.37
-29.38 -31.54 -31.54 -31.54 -32.41 -33.26 -33.26 -15.54 -15.54 -15.54]]
I'm trying to find a way to detect the "zone" of this matrix that contains the highest density of high values in it. It means it might not contain the highest single value of the whole list, obviously.
I suppose to do so I should define how big this zone is, so let's say it should be 2x2 (so I want to find what is the 'square' of 2x2 items containing the highest values).
I always think I have a logical solution to do so, but then I always fail following the logic of how it could work!
Anyone has a suggestion I could start from?

I know there might be some easier ways to do so, but this is the easiest for me. I've created the following function to perform this task which takes two arguments:
arr: a 2D numpy array.
zone_size: the size of the square zone.
And the function goes like so:
def get_heighest_zone(arr, zone_size):
max_sum = float("-inf")
row_idx, col_idx = 0, 0
for row in range(arr.shape[0]-zone_size):
for col in range(arr.shape[1]-zone_size):
curr_sum = np.sum(arr[row:row+zone_size, col:col+zone_size])
if curr_sum > max_sum:
row_idx, col_idx = row, col
max_sum = curr_sum
return arr[row_idx:row_idx+zone_size, col_idx:col_idx+zone_size]
Assuming arr is the numpy array posted in your question, applying this function over different zone_sizes will return these values:
>>> get_heighest_zone(arr, 2)
[[70.03 64.39]
[69.39 66.7 ]]
>>> get_heighest_zone(arr, 3)
[[53.86 70.03 64.39]
[68.35 69.39 66.7 ]
[55.11 32.3 29.47]]
>>> get_heighest_zone(arr, 4)
[[ 14.18 53.86 70.03 64.39]
[ 33.71 68.35 69.39 66.7 ]
[103.72 55.11 32.3 29.47]
[ 59.02 18.6 37.53 24.5 ]]
If the zone_size doesn't have to be square, then you will need to modify a little bit in the code. Also, you should assert that zone_size is less than the array size.
Hopefully, this is what you was looking for!

Related

how to subtract first value with all the rest value

I have data like this:
timestamp high windSpeed windDir windU windV
04/05/2019 10:02 100 4.39 179.1 -0.14 8.53
150 2.44 164.5 -1.26 4.57
200 4.29 180.9 0.12 8.32
04/05/2019 10:03 100 4.39 179.1 -0.15 8.53
150 2.44 164.5 -1.26 4.57
200 4.29 180.9 0.12 8.32
04/05/2019 10:04 100 4.52 179.1 -0.16 8.79
150 2.15 162.8 -1.24 4
200 3.34 181.9 0.21 6.49
04/05/2019 10:05 100 4.52 179.1 -0.17 8.79
150 2.15 162.8 -1.24 4
200 3.34 181.9 0.21 6.49
and I want to subtract the value from higher level with lower level in each time.This is what I got so far, but this one only give me 1 value. Anyone can help me please? thank you.
for timestamp, group in grouped:
HeightIndices = group["high"].keys()
for heightIndex in range(HeightIndices[0], HeightIndices[0] + len(HeightIndices) - 1):
windMag = sqrt(group["windU"] ** 2 + group["windV"] ** 2)
diffMag = windMag[heightIndex+1]-windMag[heightIndex]
I'm not sure if I'm accomplishing what you're asking, but based on my looking at your code, it seems you are trying to get the difference between the i-th and i+1-th index in the column "high" and call that variable diffMag. If that's the case you can probably use one of the two methods.
Solution 1:
diff_mag = []
for i in range(len(wind['height'])-1):
diff_mag[i] = wind['height'][i+1] - wind['height'][i]
Solution 2:
Use numpy diff.
np.diff(wind['height'])
I made the assumption you're using pandas here based on what your code block looks like. Hope that helps.
EDIT
Okay..I think I understand what you are saying now.
I think this should work:
windMag = []
for timestamp, group in grouped:
HeightIndices = group["high"].keys()
for heightIndex in range(HeightIndices[0], HeightIndices[0] + len(HeightIndices) - 1):
windMag.append(sqrt(group["windU"] ** 2 + group["windV"] ** 2))
diffMag = np.diff(windMag)

Dataframe split columns value, how to solve error message?

I have a panda dataframe with the following columns:
Stock ROC5 ROC20 ROC63 ROCmean
0 IBGL.SW -0.59 3.55 6.57 3.18
0 EHYA.SW 0.98 4.00 6.98 3.99
0 HIGH.SW 0.94 4.22 7.18 4.11
0 IHYG.SW 0.56 2.46 6.16 3.06
0 HYGU.SW 1.12 4.56 7.82 4.50
0 IBCI.SW 0.64 3.57 6.04 3.42
0 IAEX.SW 8.34 18.49 14.95 13.93
0 AGED.SW 9.45 24.74 28.13 20.77
0 ISAG.SW 7.97 21.61 34.34 21.31
0 IAPD.SW 0.51 6.62 19.54 8.89
0 IASP.SW 1.08 2.54 12.18 5.27
0 RBOT.SW 10.35 30.53 39.15 26.68
0 RBOD.SW 11.33 30.50 39.69 27.17
0 BRIC.SW 7.24 11.08 75.60 31.31
0 CNYB.SW 1.14 4.78 8.36 4.76
0 FXC.SW 5.68 13.84 19.29 12.94
0 DJSXE.SW 3.11 9.24 6.44 6.26
0 CSSX5E.SW -0.53 5.29 11.85 5.54
How can I write in the dataframe a new columns "Symbol" with the stock without ".SW".
Example first row result should be IBGL (modified value IBGL.SW).
Example last row result should be CSSX5E (splited value SSX5E.SW).
If I send the following command:
new_df['Symbol'] = new_df.loc[:, ('Stock')].str.split('.').str[0]
Than I receive an error message:
:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
new_df['Symbol'] = new_df.loc[:, ('Stock')].str.split('.').str[0]
How can I solve this problem?
Thanks a lot for your support.
METHOD 1:
You can do a vectorized operation by str.get(0) -
df['SYMBOL'] = df['Stock'].str.split('.').str.get(0)
METHOD 2:
You can do another vectorized operation by using expand=True in str.split() and then getting the first column.
df['SYMBOL'] = df['Stock'].str.split('.', expand = True)[0]
METHOD 3:
Or you can write a custom lambda function with apply (for more complex processes). Note, this is slower but good if you have your own UDF.
df['SYMBOL'] = df['Stock'].apply(lambda x:x.split('.')[0])
This is not an error, but a warning as you may have probably noticed your script finishes its execution.
edite: Given your comments it seems your issues generate previously in the code, therefore I suggest you use the following:
new_df = new_df.copy(deep=False)
And then proceed to solve it with:
new_df.loc['Symbol'] = new_df['Stock'].str.split('.').str[0]
new_df = new_df.copy()
new_df['Symbol'] = new_df.Stock.str.replace('.SW','')

Using an array as input into a function?

X =
[[14.23 3.06 5.64 2.43]
[13.2 2.76 4.38 2.14]
[13.16 3.24 5.68 2.67]
[14.37 3.49 7.8 2.5 ]
[13.24 2.69 4.32 2.87]
[14.2 3.39 6.75 2.45]
[14.39 2.52 5.25 2.45]
[14.06 2.51 5.05 2.61]
[14.83 2.98 5.2 2.17]
[13.86 3.15 7.22 2.27]
[14.1 3.32 5.75 2.3 ]
[14.12 2.43 5. 2.32]
[13.75 2.76 5.6 2.41]
[14.75 3.69 5.4 2.39]
[14.38 3.64 7.5 2.38]
[13.63 2.91 7.3 2.7 ]
[14.3 3.14 6.2 2.72]
[13.83 3.4 6.6 2.62]
[14.19 3.93 8.7 2.48]
[13.64 3.03 5.1 2.56]]
Here is my dataset. Now I want to calculate the Euclidean distance for 2 of vectors (rows).
Row1 = X[1]
Row2 = X[2]
My function:
def Edistance (v1, v2):
distance = 0.0
for i in range(len(v1)-1):
distance += (v1(i)) - (v2(i))**2
return sqrt(distance)
Edistance(Row1,Row2)
I then get Typerror: NumPy array is not callable. Can I not use an array in my functions input?
You can pass any object as a function argument and so you can pass arrays, but as #xdurch0 mentioned earlier, your syntax is wrong.
def Edistance (v1: dict, v2: dict): # You
distance = 0.0
for i in range(len(v1)-1):
distance += (v1(i)) - (v2(i))**2
return sqrt(distance)
What you try to do here is to call v1 and v2 as if they were a functions, since () used to execute the commands. But what you want to do, as far as i understand, is to use [] to reference at the element inside the array.
So, basically, you want to do v1[i] and v2[i] (instead of v1(i) and v2(i) respectively).

How to keep pandas group by column when applying transform function?

This is my pandas dataframe look's like:
sampling_time MQ2_LPG MQ2_CO MQ2_SMOKE MQ2_ALCOHOL MQ2_CH4 MQ2_H2 MQ2_PROPANE
0 2018-07-15 08:41:49.028 4.41 32.87 19.12 7.70 10.29 7.59 4.49
1 2018-07-15 08:41:49.028 2.98 19.08 12.47 4.72 6.34 5.15 3.02
2 2018-07-15 08:41:49.028 2.73 16.88 11.33 4.22 5.69 4.72 2.76
3 2018-07-15 08:41:49.028 2.69 16.47 11.11 4.13 5.57 4.64 2.71
4 2018-07-15 08:41:49.028 2.66 16.26 11.00 4.09 5.50 4.60 2.69
When I'm doing group by (split apply combine method), my sampling time column was removed.
transformed = dataframe.groupby('sampling_time').transform(lambda x: (x - x.mean()) / x.std())
transformed.head()
MQ2_LPG MQ2_CO MQ2_SMOKE MQ2_ALCOHOL MQ2_CH4 MQ2_H2 MQ2_PROPANE
0 15.710127 15.975636 15.773724 15.876433 15.874190 15.694674
1 3.519619 3.313661 3.494836 3.408578 3.404160 3.563717
2 1.388411 1.293621 1.389884 1.316656 1.352130 1.425885
3 1.047418 0.917159 0.983665 0.940110 0.973294 1.028148
4 0.791673 0.724337 0.780556 0.772756 0.752306 0.829280
Any help or suggestion about how to keep the sampling time column would be very appreciated.
You can do this by setting 'sampling_time' into the index, then when you runs groupby with transform, you will get your transform columns out with the index.
df1 = df.set_index('sampling_time')
df1.groupby('sampling_time').transform(lambda x: x-x.std())
output:
MQ2_LPG MQ2_CO MQ2_SMOKE MQ2_ALCOHOL \
sampling_time
2018-07-15 08:41:49.028 3.663522 25.760508 15.652432 6.154209
2018-07-15 08:41:49.028 2.233522 11.970508 9.002432 3.174209
2018-07-15 08:41:49.028 1.983522 9.770508 7.862432 2.674209
2018-07-15 08:41:49.028 1.943522 9.360508 7.642432 2.584209
2018-07-15 08:41:49.028 1.913522 9.150508 7.532432 2.544209
MQ2_CH4 MQ2_H2 MQ2_PROPANE
sampling_time
2018-07-15 08:41:49.028 8.243523 6.313227 3.7205
2018-07-15 08:41:49.028 4.293523 3.873227 2.2505
2018-07-15 08:41:49.028 3.643523 3.443227 1.9905
2018-07-15 08:41:49.028 3.523523 3.363227 1.9405
2018-07-15 08:41:49.028 3.453523 3.323227 1.9205

Remove values when reading a csv and return to a list

I have a csv file of subjects XY Coordinates. Some XY's have been removed if the X-Coordinate is less than 5. This can be for any player and changes over time. (See example dataset).
At the start of this file P2, P7, P12, P17 have removed data. Although, throughout the file each player will have data missing. for about 90% of the file there will be at least 4 players having missing data at any time point.
Frame Time P1_X P2_Y P2_X P2_Y P3_X P3_Y P4_X P4_Y P5_X P5_Y P6_X P6_Y P7_X P7_Y P8_X P8_Y P9_X P9_Y P10_X P10_Y P11_X P11_Y P12_X P12_Y
0 10:39.2 65.75 45.10 73.74 -3.52 61.91 41.80 67.07 -24.62 77.14 -22.98 93.95 3.51 56.52 28.44 70.21 11.06 73.08 -35.54 69.79 45.73 73.34 29.26 64.73 -40.69 70.90 6.11 70.94 -45.11 42.78 3.00 61.77 -1.05 72.07 38.62
1 10:39.3 65.77 45.16 73.69 -3.35 61.70 41.79 67.19 -24.59 77.17 -23.03 93.90 3.53 56.54 28.38 70.20 11.00 73.15 -35.48 69.79 45.86 73.20 29.30 64.96 -40.77 70.91 6.10 71.04 -45.29 42.84 3.02 61.82 -0.99 72.12 38.71
2 10:39.4 65.78 45.24 73.63 -3.17 61.70 41.79 67.32 -24.56 77.20 -23.05 93.83 3.55 56.59 28.31 70.20 10.92 73.20 -35.41 69.79 45.86 73.03 29.36 65.19 -40.84 70.91 6.10 71.15 -45.50 42.91 3.04 61.89 -0.91 72.16 38.80
3 10:39.5 65.78 45.33 73.57 -3.00 61.49 41.78 67.45 -24.50 77.25 -23.07 93.75 3.57 56.59 28.31 70.21 10.83 73.25 -35.33 69.77 46.01 72.86 29.43 65.45 -40.86 70.90 6.09 71.15 -45.50 43.01 3.08 61.98 -0.81 72.19 38.86
4 10:39.6 65.78 45.33 73.51 -2.86 61.32 41.76 67.45 -24.50 77.31 -23.09 93.64 3.60 56.65 28.22 70.23 10.72 73.29 -35.22 69.72 46.17 72.69 29.51 65.75 -40.84 70.88 6.08 71.24 -45.71 43.11 3.12 62.06 -0.70 72.22 38.90
5 10:39.7 65.75 45.44 73.51 -2.86 61.20 41.73 67.59 -24.37 77.38 -23.10 93.52 3.63 56.73 28.09 70.25 10.59 73.29 -35.22 69.68 46.33 72.49 29.60 66.06 -40.84 70.86 6.05 71.31 -45.91 43.22 3.14 62.13 -0.59 72.26 38.92
6 10:39.8 65.72 45.56 73.45 -2.72 61.08 41.71 67.72 -24.19 77.44 -23.12 93.39 3.69 56.80 27.91 70.27 10.45 73.34 -35.08 69.66 46.48 72.27 29.67 66.36 -40.87 70.86 6.01 71.39 -46.09 43.35 3.17 62.20 -0.47 72.29 38.93
7 10:39.9 65.72 45.56 73.34 -2.48 60.97 41.72 67.92 -23.76 77.51 -23.13 93.23 3.75 56.80 27.91 70.30 10.31 73.40 -34.76 69.64 46.63 72.01 29.74 66.62 -40.93 70.85 5.96 71.39 -46.09 43.51 3.18 62.27 -0.35 72.31 38.93
8 10:40.0 65.73 45.90 73.34 -2.48 60.86 41.72 67.92 -23.76 77.51 -23.13 93.05 3.80 56.91 27.47 70.30 10.31 73.40 -34.76 69.63 46.76 72.01 29.74 66.82 -41.06 70.83 5.88 71.53 -46.45 43.68 3.20 62.27 -0.35 72.29 38.92
9 10:40.1 65.73 46.09 73.29 -2.39 60.74 41.70 68.00 -23.52 77.60 -23.12 92.83 3.86 56.99 27.23 70.35 10.17 73.43 -34.58 69.64 46.88 71.72 29.80 66.99 -41.22 70.80 5.79 71.60 -46.63 43.86 3.23 62.34 -0.22 72.22 38.89
10 10:40.2 65.76 46.27 73.22 -2.32 60.60 41.65 68.07 -23.24 77.71 -23.05 92.83 3.86 57.14 26.98 70.43 10.05 73.47 -34.38 69.68 46.96 71.42 29.85 67.16 -41.38 70.77 5.70 71.64 -46.80 44.04 3.28 62.43 -0.08 72.13 38.86
11 10:40.3 65.81 46.43 73.12 -2.28 60.43 41.60 68.12 -22.93 77.83 -22.94 92.58 3.89 57.32 26.72 70.54 9.92 73.50 -34.16 69.75 46.99 71.08 29.89 67.16 -41.38 70.74 5.62 71.67 -46.96 44.21 3.33 62.54 0.09 72.03 38.84
12 10:40.4 65.87 46.58 72.98 -2.29 60.24 41.55 68.15 -22.57 77.94 -22.76 92.30 3.93 57.52 26.45 70.67 9.78 73.50 -33.91 69.85 47.00 70.72 29.91 67.31 -41.57 70.70 5.52 71.73 -47.15 44.37 3.40 62.66 0.24 72.03 38.84
13 10:40.5 65.91 46.69 72.80 -2.32 60.07 41.49 68.17 -22.18 78.01 -22.53 91.99 3.98 57.71 26.18 70.81 9.60 73.49 -33.68 69.97 47.03 70.33 29.92 67.45 -41.78 70.64 5.38 71.81 -47.35 44.37 3.40 62.80 0.40 71.96 38.81
14 10:40.6 65.94 46.80 72.60 -2.34 59.93 41.43 68.19 -21.77 78.05 -22.27 91.69 4.03 57.89 25.90 70.96 9.42 73.47 -33.47 70.10 47.09 69.93 29.93 67.54 -41.96 70.56 5.20 71.86 -47.53 44.54 3.50 62.98 0.58 71.91 38.77
15 10:40.7 65.95 46.93 72.36 -2.36 59.80 41.38 68.18 -21.32 78.08 -21.99 91.38 4.09 58.11 25.63 71.11 9.26 73.41 -33.26 70.24 47.15 69.50 29.91 67.58 -42.15 70.56 5.20 71.86 -47.69 44.54 3.50 63.16 0.77 71.91 38.77
16 10:40.8 65.93 47.09 72.10 -2.34 59.65 41.36 68.16 -20.86 78.11 -21.68 91.09 4.17 58.35 25.38 71.23 9.13 73.31 -33.05 70.38 47.20 69.07 29.84 67.56 -42.32 70.44 5.00 71.81 -47.84 45.00 3.79 63.34 0.97 71.80 38.60
17 10:40.9 65.92 47.23 71.85 -2.28 59.47 41.37 68.11 -20.41 78.11 -21.37 90.81 4.27 58.59 25.12 71.33 9.00 73.22 -32.84 70.52 47.26 68.63 29.75 67.47 -42.51 70.28 4.78 71.75 -47.97 45.26 3.94 63.52 1.14 71.73 38.46
Because there is missing data I tried to read the csv file as such. If I removed the try: except: function I received a Type Error stating I couldn't convert string to float.
with open('NoBench.csv') as csvfile :
readCSV = csv.reader(csvfile, delimiter=',')
n=0
for row in readCSV :
if n == 0 :
n+=1
try:
visuals[0].append([float(row[3]),float(row[5]),float(row[7]),float(row[9]),float(row[11]),float(row[13]),float(row[15]),float(row[17]),float(row[19]),float(row[21]),float(row[23]),float(row[25]),float(row[27]),float(row[29]),float(row[31]),float(row[33]),float(row[35]),float(row[37]),float(row[39]),float(row[41]),float(row[43])])
visuals[1].append([float(row[2]),float(row[4]),float(row[6]),float(row[8]),float(row[10]),float(row[12]),float(row[14]),float(row[16]),float(row[18]),float(row[20]),float(row[22]),float(row[24]),float(row[26]),float(row[28]),float(row[30]),float(row[32]),float(row[34]),float(row[36]),float(row[38]),float(row[40]),float(row[42])])
except ValueError:
continue
However, when I use this code, it only returns the values to the list when every row of data is present. As mentioned, this only occurs for about 10% of the file. I am using the xy's to create a scatter plot at each point so cannot change to 0,0 as that will create a false data point. How do I alter the code so it returns the xy values when players data isn't removed.
You can define your own convert before the loop:
def convert_float(x);
if x: # equivalent to if x == ''
return float(x)
else:
return 0.0 # or set the default value you expect to replace the missing data with.
In combination with #juanpa.arrivillaga's excellent suggestion, change the visual.append lines to these:
visual[0].append(list(map(convert_float, row[3::2]))
visual[1].append(list(map(convert_float, row[2::2]))
Also I'm not sure what your n+=1 line is supposed to do... if you merely wanted to skip the first row (headers), simply do this:
def convert_float(x);
if x:
return float(x)
else:
return 0.0
for i, row in enumerate(readCSV):
if n > 0:
visual[0].append(list(map(convert_float, row[3::2]))
visual[1].append(list(map(convert_float, row[2::2]))

Categories

Resources