Clark evans aggregation index in Python

Clark evans aggregation index in Python - python

The Clark-Evans index is one of the most basic statistics to measure point aggregation in spatial analysis. However, I can't find any implementation in Python. So I adapted the R code from the hyperlink above. I want to ask if the statistic and p-value are correct with such irregular study areas:
Function
import numpy as np
import os, math
from shapely.geometry import Polygon, Point
from sklearn.neighbors import KDTree
from statistics import NormalDist
def clarkEvans (X, Y, roi):
""" Clark evans index takes point x,y coordinates and a polygon for cell shape (roi) and outputs Clark-Evans index:
R ~1 suggests spatial randomness, while R<<1 suggests clustering and R>>1 suggests ordering"""
# Import cell boundries from roi file
pgon = Polygon(roi)
# Calculate intensity from points/area
areaW = pgon.area
npts = len(X)
intensity = npts/areaW
if npts <2:
return(np.nan)
tmp_df = list(zip(X, Y))
# Get nearest neighbours for each observation
kdt = KDTree(tmp_df, leaf_size=30, metric='euclidean') # This is very good for large datasets, but maybe bad for small ones?
dists, ids = kdt.query(tmp_df, k=2)
dists = [x[1] for x in dists]
# Clark-Evans Index (mean NN distances/mean NN distances under poisson)
Dobs = np.mean(dists)
Dpoison = 1/(2 * math.sqrt(intensity))
Rnaive = Dobs/Dpoison
# Calculate p-value under normal distribution
SE = math.sqrt(((4 - math.pi) * areaW)/(4 * math.pi))/npts
Z = (Dobs - Dpoison)/SE
# Diff between observed and expected NN distances should have Normal Distribution according to Central Limit Theorem (CLT)
p_val = NormalDist().pdf(Z) # p_val for clustering
# Return the ClarkEvans Index and p-value
return(round(Rnaive,3), round(p_val, 3))
Output
In the image is my Clark-Evans Index being applied to and plotted with two different datasets. The index is similar for both patterns, one of which seems more obviously clustered. The p-values seem switched, I would think the second plot would have a significant p-val, being the clustered one.
Input data
# The xy coordinates of observations plus the point vertices of the study area (roi)
x1 = [123, 105, 71, 109, 96, 49, 86, 80, 120, 98, 59, 100, 118, 69, 84, 21, 95, 77, 158, 118, 87, 77, 87, 77, 82, 106, 120, 125, 61, 24, 53, 106, 52, 103, 89, 99, 111, 58, 97, 83, 51, 45, 64, 112, 114, 73, 55, 111, 110, 102, 116, 107, 84, 97, 118, 96, 116, 45, 102, 145, 126, 50, 103, 98, 20, 79, 113, 99, 90, 143, 36, 120, 106, 91, 95, 15, 122, 69, 28, 71, 66, 119, 78, 75, 113, 44, 85, 60, 88, 68, 116, 40, 59, 105, 65, 94, 79, 95, 120, 67, 78, 59, 89, 84, 111, 78, 72, 156, 162, 134, 157, 120, 126, 86, 58, 137, 32, 91, 68, 119, 112, 70, 120, 62, 118, 114, 66, 55, 99, 72, 91, 109, 53, 94, 71, 145, 146, 106, 15, 83, 104, 61, 129, 51, 58, 59, 113, 107, 94, 94, 69, 118, 74, 124, 107, 99, 66, 115, 159, 71, 115, 122, 76, 68, 79, 107, 81, 104, 87, 106, 105, 112, 111, 79, 54, 108, 62, 115, 36, 74, 84, 75, 64, 92, 64, 82, 77, 56, 75, 69, 88, 105, 96, 61, 84, 106, 31, 53, 173, 102, 99, 124, 87, 70, 25, 19, 122, 101, 126, 60, 94, 78, 97, 64, 45, 92, 114, 87, 96, 160, 88, 66, 40, 124, 103, 60, 129, 120, 35, 95, 56, 76, 116, 65, 7, 103, 160, 63, 134, 101, 56, 50, 89, 92, 99, 89, 120, 47, 58, 47, 74, 124, 8, 93, 121, 53, 66, 63, 90, 114, 91, 71, 123, 55, 142, 97, 69, 141, 92, 76, 69, 74, 66, 90, 81, 96, 110, 61, 58, 62, 50, 125, 106, 115, 79, 94, 118, 117, 64, 99, 55, 53, 93, 57, 116, 61, 125, 10, 119, 74, 64, 77, 127, 115, 59, 53, 99, 81, 68, 101, 43, 122, 129, 109, 108, 84, 103, 59, 105, 76, 122, 101, 101, 108, 79, 75, 60, 111, 97, 104, 82, 67, 96, 70, 96, 104, 103, 66, 89, 114, 121, 119, 104, 93, 156, 108, 88, 98, 52, 112, 65, 99, 107, 90, 107, 115, 73, 106, 100, 120, 128, 66, 116, 69, 113, 69, 103, 62, 124, 110, 124, 72, 76, 115, 73, 84, 95, 100, 51, 61, 82, 97, 106, 68, 112, 69, 115, 67, 80, 72, 63, 123, 92, 101, 61, 69, 103, 112, 70, 59, 91, 90, 102, 111, 41, 101, 90, 33, 122, 161, 161]
y1 = [37, 51, 35, 67, 94, 114, 62, 24, 64, 92, 55, 11, 74, 38, 79, 77, 90, 77, 70, 70, 41, 46, 81, 83, 81, 65, 63, 43, 56, 95, 26, 8, 68, 82, 44, 78, 77, 72, 45, 68, 83, 99, 100, 58, 91, 89, 115, 34, 46, 68, 79, 71, 41, 43, 48, 83, 67, 69, 42, 55, 63, 69, 47, 67, 102, 72, 33, 77, 67, 1, 123, 59, 69, 47, 73, 79, 89, 48, 55, 97, 56, 92, 121, 70, 48, 47, 114, 62, 84, 78, 54, 55, 79, 76, 62, 63, 83, 71, 74, 83, 50, 67, 84, 81, 75, 59, 12, 77, 97, 6, 26, 55, 10, 74, 58, 59, 77, 76, 77, 68, 60, 50, 53, 89, 76, 87, 67, 86, 86, 73, 79, 74, 62, 54, 67, 58, 23, 76, 95, 63, 38, 76, 117, 18, 52, 46, 98, 62, 44, 36, 86, 52, 74, 51, 85, 100, 75, 73, 63, 38, 64, 91, 47, 70, 77, 88, 70, 88, 88, 39, 52, 45, 79, 56, 74, 60, 59, 69, 116, 44, 55, 48, 70, 83, 66, 87, 78, 73, 58, 76, 46, 50, 43, 81, 102, 45, 115, 88, 80, 34, 55, 55, 97, 103, 112, 122, 111, 97, 90, 81, 22, 36, 87, 86, 48, 39, 42, 83, 57, 16, 100, 89, 115, 75, 69, 86, 69, 69, 74, 39, 52, 23, 63, 49, 92, 96, 71, 105, 10, 75, 84, 80, 30, 30, 59, 52, 32, 119, 107, 74, 79, 101, 106, 99, 77, 66, 89, 83, 102, 94, 97, 78, 91, 93, 16, 11, 33, 16, 78, 50, 30, 26, 79, 34, 32, 86, 64, 40, 63, 51, 58, 52, 92, 98, 35, 36, 34, 47, 86, 88, 60, 80, 92, 96, 94, 94, 98, 111, 49, 54, 56, 36, 72, 94, 92, 102, 105, 32, 40, 30, 73, 59, 107, 39, 46, 40, 53, 57, 93, 92, 63, 59, 65, 68, 81, 69, 56, 53, 53, 85, 56, 55, 93, 45, 40, 68, 101, 93, 29, 44, 93, 93, 46, 67, 38, 34, 97, 93, 72, 90, 62, 68, 32, 31, 74, 71, 59, 38, 51, 95, 73, 82, 5, 53, 50, 34, 49, 43, 82, 77, 65, 88, 87, 89, 30, 38, 45, 36, 79, 89, 88, 100, 98, 45, 41, 20, 35, 51, 77, 64, 60, 63, 33, 44, 78, 82, 83, 70, 74, 78, 41, 61, 71, 40, 124, 82, 67, 121, 5, 65, 66]
roi1 = [[152.5078125, 3.7060546875], [158.8408203125, 12.455078125], [165.5126953125, 25.3154296875], [170.796875, 38.787109375], [171.013671875, 46.02734375], [172.6083984375, 53.0615234375], [172.6083984375, 63.9306640625], [174.419921875, 70.9169921875], [174.419921875, 85.41015625], [175.947265625, 92.4296875], [175.7998046875, 103.2626953125], [169.52734375, 116.3212890625], [166.9765625, 118.89453125], [159.7451171875, 119.2177734375], [152.7265625, 121.01953125], [138.2333984375, 121.029296875], [131.21875, 122.8408203125], [73.248046875, 122.8408203125], [66.2119140625, 124.5546875], [58.966796875, 124.65234375], [51.9990234375, 126.4638671875], [23.013671875, 126.4638671875], [19.42578125, 125.958984375], [16.5361328125, 123.7734375], [10.20703125, 115.0283203125], [0.5068359375, 95.57421875], [0.5537109375, 91.951171875], [9.0869140625, 80.318359375], [12.552734375, 73.9599609375], [18.884765625, 65.2119140625], [25.89453125, 56.994140625], [35.611328125, 41.7626953125], [42.345703125, 33.296875], [45.7568359375, 26.90625], [53.634765625, 14.7744140625], [58.1103515625, 9.078125], [64.916015625, 6.8984375], [86.654296875, 6.8984375], [93.6904296875, 5.291015625], [100.89453125, 4.57421875], [104.2763671875, 3.275390625], [122.3837890625, 3.025390625], [129.3935546875, 1.4638671875], [143.8857421875, 1.4638671875], [147.376953125, 0.4931640625]]
clarkEvans (x1, y1, roi1)
x2 = [94, 111, 79, 95, 86, 46, 30, 34, 53, 17, 44, 20, 42, 56, 23, 21, 50, 16, 50, 52, 47, 132, 44, 40, 43, 33, 29, 52, 24, 125, 86, 84]
y2 = [17, 71, 94, 88, 108, 132, 116, 115, 121, 132, 120, 121, 123, 116, 116, 139, 121, 124, 116, 140, 141, 33, 119, 118, 125, 130, 123, 122, 40, 23, 80, 107]
roi2 = [[129.4560546875, 3.6552734375], [132.3408203125, 5.84765625], [134.4638671875, 12.7744140625], [134.4638671875, 45.3828125], [132.65234375, 56.0302734375], [132.65234375, 66.8994140625], [131.4169921875, 70.3056640625], [130.7021484375, 77.5029296875], [129.029296875, 84.5419921875], [129.029296875, 88.1650390625], [127.2177734375, 95.1728515625], [127.16796875, 106.04296875], [125.40625, 113.0712890625], [125.40625, 116.6943359375], [123.896484375, 119.9873046875], [120.6533203125, 121.6025390625], [110.0654296875, 123.9130859375], [99.6181640625, 126.8427734375], [89.896484375, 131.7041015625], [83.8388671875, 135.638671875], [77.03515625, 138.134765625], [73.4228515625, 138.40625], [56.181640625, 143.8408203125], [45.568359375, 142.029296875], [31.076171875, 142.029296875], [27.4736328125, 141.6455078125], [20.626953125, 139.302734375], [15.5029296875, 134.1787109375], [11.0546875, 128.5234375], [4.537109375, 115.5849609375], [0.40625, 98.0078125], [0.513671875, 72.646484375], [4.927734375, 55.0859375], [11.4091796875, 42.123046875], [15.333984375, 36.0859375], [21.94921875, 27.4912109375], [34.7587890625, 14.6806640625], [43.416015625, 8.205078125], [57.9013671875, 7.970703125], [61.28125, 6.666015625], [71.9052734375, 4.541015625], [82.7705078125, 4.34765625], [89.7578125, 2.5361328125], [107.873046875, 2.5361328125], [114.8916015625, 1.015625], [122.126953125, 0.724609375]]
clarkEvans (x2, y2, roi2)
Using the original R function yields similar but not equal results:
clarkevans.test(X, alternative = "clustered")
>R= 0.87719, p-value = 9.542e-07 # First dataset
>R= 0.83365, p-value = 0.03591 # Second dataset
I'm not sure if the statistic and p-value calculation are valid since my study areas are irregularly shaped. The variable SE is calculated with pi, which seems like it is estimating a random distribution in a circular study area. Should I do Monte Carlo simulations instead? Is there a way of avoiding that?
Cheers!

I have not worked with the Clark-Evans (CE) index before, but having read the information you linked to and studied your code, my interpretation is this:
The index value for Dataset2 is less than the index value for Dataset1. This correctly reflects the visual difference in clusteredness, that is, the smaller index value is associated with data that is more clustered.
It is probably not meaningful to say that two CE index values are similar, other than special cases like observing that two CE index values are both smaller than 1 or both greater than 1, or if A < B < C then AB are more similar than AC.
The p-value and the index value measure different things. The index value measures degree of clusteredness (if less than 1) or regularity (if greater than 1). The p-value (inversely) measures how certain it is that the data are more clustered than would be expected by chance, or more regular than would be expected by chance. The p-value in particular is sensitive to the sample size as well as the distribution of points.
The use of pi in calculating SE reflects the assumption of Euclidean distances between points (rather than, say, city block distances). That is, the nearest neighbour of a point is the one at the smallest radial distance. The use of pi in calculating SE does not make any assumptions about the shape of the region of interest.
Particularly for small datasets (like Dataset2) you will want to track down information about the potential impact of boundary effects on the index value or the p-value.
More speculatively, I wonder if it would be useful to use a convex hull to help determine the region of interest rather than do this subjectively.

Related

How to convert an array into a new array based on a lookup dictionary

I'm trying to convert a numpy array into a new array by using each value in the existing array and finding its corresponding key from a dictionary. The new array should consist of the corresponding dictionary keys.
Here is what I have:
# dictionary where values are lists
available_weights = {0.009174311926605505: [7, 14, 21, 25, 31, 32, 35, 45, 52, 82, 83, 96, 112, 119, 142], 0.009523809523809525: [33, 37, 43, 44, 69, 73, 75, 78, 79, 80, 102, 104, 110, 115, 150], 0.1111111111111111: [91], 0.019230769230769232: [36, 50, 127, 139], 0.010869565217391304: [10, 48, 55, 62, 77, 88, 103, 124, 131, 137, 147], 0.014084507042253521: [2, 3, 4, 22, 27, 30, 41, 53, 87, 122, 123, 132, 143], 0.011494252873563218: [20, 34, 99, 125, 135, 138, 141], 0.045454545454545456: [0, 109], 0.01818181818181818: [49, 64, 72, 90, 146, 148], 0.07142857142857142: [106], 0.01282051282051282: [16, 63, 68, 98, 114, 130, 145], 0.010638297872340425: [8, 28, 40, 57, 61, 66, 71, 74, 76, 84, 85, 86, 128, 144], 0.02040816326530612: [6, 65], 0.021739130434782608: [29, 67, 92, 93], 0.02127659574468085: [47, 118, 120], 0.011111111111111112: [1, 13, 19, 24, 42, 54, 70, 89, 94, 107, 117, 126, 129, 140], 0.015625: [38, 60, 101, 133, 134, 136], 0.03333333333333333: [56, 58, 97, 121], 0.016666666666666666: [5, 26, 105, 113], 0.014705882352941176: [17, 46, 95]}
# existing numpy array
train_idx = [134, 45, 137, 140, 79, 98, 128, 80, 99, 71, 145, 35, 94, 122, 77, 23, 113, 44, 68, 21, 20, 125, 74, 139, 29, 109, 25, 34, 6, 81, 22, 114, 12, 95, 150, 106, 84, 19, 58, 59, 88, 143, 136, 43, 72, 132, 117, 13, 65, 111, 39, 14, 56, 11, 26, 90, 119, 112, 27, 57, 46, 147, 123, 16, 36, 100, 141, 38, 62, 32, 75, 146, 89, 37, 31, 40, 64, 87, 3, 103, 102, 104, 78, 53, 1, 142, 47, 130, 105, 4, 93, 52, 42, 10, 9, 115, 76, 54, 49, 116, 69, 5, 86, 66, 101, 107, 96, 110, 8, 73, 121, 138, 67, 124, 108, 97, 120, 2, 148, 127, 135, 18, 149, 82, 41, 144, 129, 118, 51, 126, 33, 85, 24, 0, 61, 92, 70, 15, 17, 50, 83, 30, 28, 91, 60, 48, 133, 55, 63, 7, 131]
So I want to use each value in train_idx to find the corresponding dictionary key in available_weights. The expected output should look like this (with a length of all 150 values):
new_array = [0.015625, 0.009174311926605505, 0.010869565217391304, ... ,0.01282051282051282, 0.009174311926605505, 0.010869565217391304]
Any help would be appreciated!

result = []
flipped = dict()
for value in train_idx:
flipped[value] = []
for key in available_weights:
if value in available_weights[key]:
flipped[value].append(key)
result.append(key)

how to generate combinations from multiple variables?

I have following variables;
D8 =[22, 27, 28, 30, 31, 40, 41, 42, 43, 45]
D9 = [79, 80, 90, 92, 93, 97, 98, 104, 105, 109]
D10=[61, 64, 66, 70, 72, 76, 81, 86, 87]
By using above variables, I tried to generate all the possible combinations as follows;
import itertools
stuff = [D8, D9, D10]
for L in range(0, len(stuff)+1):
for subset in itertools.combinations(stuff, L):
print(subset)
The result depicts as follows;
Output>>>
()
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45],)
([79, 80, 90, 92, 93, 97, 98, 104, 105, 109],)
([61, 64, 66, 70, 72, 76, 81, 86, 87],)
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45], [79, 80, 90, 92, 93, 97, 98, 104, 105, 109])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45], [61, 64, 66, 70, 72, 76, 81, 86, 87])
([79, 80, 90, 92, 93, 97, 98, 104, 105, 109], [61, 64, 66, 70, 72, 76, 81, 86, 87])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45], [79, 80, 90, 92, 93, 97, 98, 104, 105, 109], [61, 64, 66, 70, 72, 76, 81, 86, 87])
I want to know is there any other cleaner method to generate the above result, because in the current output I cannot flat each result into each individual outcome? Expected result should be like this?
Expected Result>>>
([])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45])
([79, 80, 90, 92, 93, 97, 98, 104, 105, 109])
([61, 64, 66, 70, 72, 76, 81, 86, 87])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 79, 80, 90, 92, 93, 97, 98, 104, 105, 109])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 61, 64, 66, 70, 72, 76, 81, 86, 87])
([79, 80, 90, 92, 93, 97, 98, 104, 105, 109, 61, 64, 66, 70, 72, 76, 81, 86, 87])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 79, 80, 90, 92, 93, 97, 98, 104, 105, 109, 61, 64, 66, 70, 72, 76, 81, 86, 87])
Thank you in advanced!!!

The itertools documentation contains the recipe flatten:
def flatten(list_of_lists):
"Flatten one level of nesting"
return chain.from_iterable(list_of_lists)
which you need to apply twice to get the desired result:
from itertools import combinations, chain
D8 = [22, 27, 28, 30, 31, 40, 41, 42, 43, 45]
D9 = [79, 80, 90, 92, 93, 97, 98, 104, 105, 109]
D10 = [61, 64, 66, 70, 72, 76, 81, 86, 87]
stuff = [D8, D9, D10]
def flatten(list_of_lists):
"""Flatten one level of nesting"""
return chain.from_iterable(list_of_lists)
result = flatten(map(flatten, combinations(stuff, length)) for length in range(len(stuff) + 1))
for xs in result:
print(list(xs))
producing:
[]
[22, 27, 28, 30, 31, 40, 41, 42, 43, 45]
[79, 80, 90, 92, 93, 97, 98, 104, 105, 109]
[61, 64, 66, 70, 72, 76, 81, 86, 87]
[22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 79, 80, 90, 92, 93, 97, 98, 104, 105, 109]
[22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 61, 64, 66, 70, 72, 76, 81, 86, 87]
[79, 80, 90, 92, 93, 97, 98, 104, 105, 109, 61, 64, 66, 70, 72, 76, 81, 86, 87]
[22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 79, 80, 90, 92, 93, 97, 98, 104, 105, 109, 61, 64, 66, 70, 72, 76, 81, 86, 87]

Hot to sketch a curve and convert it to numpy

I am in a real need of a tool that does the following:
You draw with your mouse a curve from a starting point to a finish point and then it exports this to an object, which can then be interpolated as a numpy array to a given number of points.
Is anybody aware of such tool, or a way to achieve something similar?
Thanks

This might get you started. Draw your curve with Photoshop, GIMP or your favourite painting/drawing program and save it as a white line on a black background in PNG/GIF/JPEG/TIFF format, or any other format that PIL/Pillow understands:
Load the image with PIL/Pillow, blur it a little to allow for discontinuities in your hand-drawn-curve, then find the brightest pixel in each column. Save as a ".npy" file that you can load and interpolate to any dimension of array you want:
#!/usr/bin/env python3
import sys
import numpy as np
from PIL import Image, ImageFilter
# Check an image file was supplied
if len(sys.argv) != 2:
print("Usage: curve2array IMAGE", file=sys.stderr)
exit(1)
# Assign filename and open image as greyscale
filename = sys.argv[1]
im = Image.open(filename).convert('L')
# OPTIONAL = Blur the image a little to allow discontinuities in hand-sketched curves
im = im.filter(ImageFilter.GaussianBlur(1))
# Convert image to Numpy array and get row number of brightest pixel in each column
na = np.array(im)
yvals = np.argmax(im, axis=0)
yvals = na.shape[0] - yvals # Make origin at bottom-left instead of top-left
# Write result as ".npy" file
np.save('result.npy', yvals)
My input image is 600px wide by 400 pixels tall, so there are 600 y-values and the last is the largest one at 394 because it's near the top of a 400 px high image. The array looks like this:
array([ 1, 1, 3, 6, 10, 13, 16, 20, 23, 26, 29, 32, 36,
39, 42, 45, 48, 51, 54, 57, 59, 62, 65, 68, 71, 74,
76, 79, 82, 84, 87, 90, 92, 95, 97, 100, 102, 104, 107,
109, 111, 114, 116, 118, 120, 123, 125, 127, 129, 131, 133, 135,
137, 139, 140, 142, 144, 146, 148, 149, 151, 153, 154, 156, 157,
159, 160, 162, 163, 164, 166, 167, 168, 169, 171, 172, 173, 174,
175, 176, 177, 178, 179, 180, 180, 181, 182, 182, 183, 184, 184,
185, 185, 186, 186, 187, 187, 187, 188, 188, 188, 188, 188, 188,
188, 188, 188, 188, 188, 187, 187, 187, 186, 186, 185, 185, 184,
184, 183, 182, 181, 180, 180, 179, 178, 177, 176, 175, 173, 172,
171, 170, 169, 167, 166, 165, 163, 162, 160, 159, 157, 156, 155,
153, 152, 150, 149, 147, 145, 144, 142, 141, 139, 138, 136, 135,
133, 132, 130, 129, 127, 126, 124, 123, 121, 120, 119, 117, 116,
114, 113, 112, 110, 109, 108, 106, 105, 104, 103, 101, 100, 99,
98, 97, 96, 95, 94, 92, 91, 90, 89, 88, 87, 86, 85,
85, 84, 83, 82, 81, 80, 79, 79, 78, 77, 76, 75, 75,
74, 73, 72, 72, 71, 70, 70, 69, 68, 68, 67, 66, 66,
65, 64, 64, 63, 63, 62, 61, 61, 60, 60, 59, 58, 58,
57, 57, 56, 56, 55, 55, 54, 54, 53, 53, 52, 52, 51,
51, 50, 50, 49, 49, 48, 48, 47, 47, 47, 46, 46, 45,
45, 45, 44, 44, 43, 43, 43, 42, 42, 42, 41, 41, 41,
40, 40, 40, 39, 39, 39, 38, 38, 38, 38, 37, 37, 37,
37, 36, 36, 36, 36, 36, 35, 35, 35, 35, 35, 34, 34,
34, 34, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35,
36, 36, 36, 36, 36, 37, 37, 37, 37, 38, 38, 38, 39,
39, 39, 40, 40, 40, 41, 41, 41, 42, 42, 42, 43, 43,
43, 44, 44, 45, 45, 46, 46, 46, 47, 47, 48, 48, 49,
49, 50, 50, 51, 51, 52, 52, 53, 53, 54, 55, 55, 56,
56, 57, 57, 58, 59, 59, 60, 60, 61, 62, 62, 63, 64,
64, 65, 66, 66, 67, 68, 69, 69, 70, 71, 72, 72, 73,
74, 75, 75, 76, 77, 78, 79, 79, 80, 81, 82, 83, 84,
85, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,
110, 111, 112, 113, 115, 116, 117, 118, 119, 120, 121, 123, 124,
125, 126, 128, 129, 130, 131, 133, 134, 135, 136, 138, 139, 140,
142, 143, 145, 146, 147, 149, 150, 152, 153, 155, 156, 157, 159,
160, 162, 163, 165, 167, 168, 170, 171, 173, 175, 176, 178, 179,
181, 183, 185, 186, 188, 190, 192, 193, 195, 197, 199, 201, 203,
204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 229,
231, 233, 235, 237, 239, 242, 244, 246, 248, 251, 253, 256, 258,
261, 263, 266, 268, 271, 273, 276, 279, 281, 284, 287, 290, 292,
295, 298, 301, 304, 307, 310, 314, 317, 320, 323, 327, 330, 334,
337, 341, 344, 348, 352, 356, 360, 364, 368, 372, 376, 381, 385,
390, 394])
I saved the script above as curve2array, so I run it with:
./curve2array image.png

How do you use for loop to find the minimum without using something similar to "min" on Python?

This is what I have so far, but I would like to use it without min and with for loop:
Numbers = [100, 97, 72, 83, 84, 78, 89, 84, 83, 75, 54, 98, 70, 88, 99, 69, 70, 79, 55, 82, 81, 75, 54, 82, 56, 73, 90, 100, 94, 89, 56, 64, 51, 72, 64, 94, 63, 82, 77, 68, 60, 93, 95, 60, 77, 78, 74, 67, 72, 99, 93, 79, 76, 86, 87, 74, 82]
for i in range(len(Numbers)):
print(min(Numbers))

I do not know why you do not want to use min (you should) - but if you do not want to you can loop over the numbers and keep track of the smallest.
min_ = None
for n in Numbers:
if min_ is None or n < min_:
min_ = n
min_ is now the minimum in the list Numbers.

Another identical method is:
numbers = [100, 97, 72, 83, 84, 78, 89, 84, 83, 75, 54, 98, 70, 88, 99, 69, 70, 79, 55, 82, 81, 75, 54, 82, 56, 73, 90, 100, 94, 89, 56, 64, 51, 72, 64, 94, 63, 82, 77, 68, 60, 93, 95, 60, 77, 78, 74, 67, 72, 99, 93, 79, 76, 86, 87, 74, 82]
smallest = None
for x in range(len(numbers)):
if (smallest == None or numbers[x] < smallest):
smallest = numbers[x]
print(smallest)
Output:
51

Averaging Vertically in Nested Lists

I am writing a grade book program that sets a nested list with assignments as columns and individual students along the rows. The program must calculate the average for each assignment and the average for each student. I've got the average by student, but now I can't figure out how to calculate the average by assignment. Any help would be appreciated!
# gradebook.py
# Display the average of each student's grade.
# Display tthe average for each assignment.
gradebook = [61, 74, 69, 62, 72, 66, 73, 65, 60, 63, 69, 63,
62, 61, 64],
[73, 80, 78, 76, 76, 79, 75, 73, 76, 74, 77, 79, 76,
78, 72],
[90, 92, 93, 92, 88, 93, 90, 95, 100, 99, 100, 91, 95, 99, 96],
[96, 89, 94, 88, 100, 96, 93, 92, 94, 98, 90, 90, 92, 91, 94],
[76, 76, 82, 78, 82, 76, 84, 82, 80, 82, 76, 86, 82, 84, 78],
[93, 92, 89, 84, 91, 86, 84, 90, 95, 86, 88, 95, 88, 84, 89],
[63, 66, 55, 67, 66, 68, 66, 56, 55, 62, 59, 67, 60, 70, 67],
[86, 92, 93, 88, 90, 90, 91, 94, 90, 86, 93, 89, 94, 94, 92],
[89, 80, 81, 89, 86, 86, 85, 80, 79, 90, 83, 85, 90, 79, 80],
[99, 73, 86, 77, 87, 99, 71, 96, 81, 83, 71, 75, 91, 74, 72]]
#make variable for assingment averages
#make a variable for student averages
stu_avg = [sum(row)/len(row) for row in gradebook]
print(stu_avg)
#Assignment Class
class Assignment:
def __init__(self, name, average):
self.average = average
self.name = name
def print_grade(self):
print("Assignment", self.name, ":", self.average)
#Student Class
class Student:
def __init__(self, name, average):
self.average = average
self.name = name
def print_grade(self):
print("Student", self.name, ":", self.average)
s1 = Student("1", stu_avg[0])
s2 = Student("2", stu_avg[1])
s3 = Student("3", stu_avg[2])
s4 = Student("4", stu_avg[3])
s5 = Student("5", stu_avg[4])
s6 = Student("6", stu_avg[5])
s7 = Student("7", stu_avg[6])
s8 = Student("8", stu_avg[7])
s9 = Student("9", stu_avg[8])
s10 = Student("10", stu_avg[9])
s1.print_grade()
s2.print_grade()
s3.print_grade()
s4.print_grade()
s5.print_grade()
s6.print_grade()
s7.print_grade()
s8.print_grade()
s9.print_grade()
s10.print_grade()

Instead of using loops, let's use matrices. They make calculation much much faster, especially when dealing with large datasets.
As an example,
Per student:
[1, 2, 3, 4] [1]
[4, 5, 6, 6] x [1]
[1, 1, 3, 1] [1]
Per assignment
[1, 2, 3, 4]T [1]
[4, 5, 6, 6] x [1]
[1, 1, 3, 1] [1]
The first operation returns per student sum, and the second returns per test sum. Divide appropriately to get the average.
Using numpy
import numpy as np
gradebook = [[61, 74, 69, 62, 72, 66, 73, 65, 60, 63, 69, 63, 62, 61, 64],
[73, 80, 78, 76, 76, 79, 75, 73, 76, 74, 77, 79, 76, 78, 72],
[90, 92, 93, 92, 88, 93, 90, 95, 100, 99, 100, 91, 95, 99, 96],
[96, 89, 94, 88, 100, 96, 93, 92, 94, 98, 90, 90, 92, 91, 94],
[76, 76, 82, 78, 82, 76, 84, 82, 80, 82, 76, 86, 82, 84, 78],
[93, 92, 89, 84, 91, 86, 84, 90, 95, 86, 88, 95, 88, 84, 89],
[63, 66, 55, 67, 66, 68, 66, 56, 55, 62, 59, 67, 60, 70, 67],
[86, 92, 93, 88, 90, 90, 91, 94, 90, 86, 93, 89, 94, 94, 92],
[89, 80, 81, 89, 86, 86, 85, 80, 79, 90, 83, 85, 90, 79, 80],
[99, 73, 86, 77, 87, 99, 71, 96, 81, 83, 71, 75, 91, 74, 72]]
def get_student_average(gradebook):
number_of_students = len(gradebook[0])
number_of_assignments = len(gradebook)
matrix = [1] * number_of_students
# [1, 1, 1, 1, ...]. This is 1 * 15. Need to transpose to make it 15*1
# Converting both to numpy matrices
matrix = np.array(matrix)
gradebook = np.array(gradebook)
# Transposing matrix and multiplying them
print(gradebook.dot(matrix.T))
def get_assignment_average(gradebook):
number_of_students = len(gradebook[0])
number_of_assignments = len(gradebook)
matrix = [1] * number_of_assignments
# [1, 1, 1, ...] . This is 1 * 10. Need to transpose to make it 10*1
matrix = np.array(matrix)
gradebook = np.array(gradebook)
gradebook = gradebook.T
matrix = matrix.T
print(gradebook.dot(matrix))
get_student_average(gradebook)
get_assignment_average(gradebook)
Results
student_avg -> [ 984 1142 1413 1397 1204 1334 947 1362 1262 1235]
test_avg -> [826 814 820 801 838 839 812 823 810 823 806 820 830 814 804]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.