pyplot.savefig with empty export - python

I'm working with pyplot.subplot() to make a bar chart and then save as an image. Currently the savefig is saving an empty image. I guess I'm a little stuck since there's not many examples I have found for bar and subplot.
Here's the code:
nums = [57, 83, 66, 90, 84, 83, 83, 64, 58, 94, 94, 79, 95, 61, 67, 79, 70, 86, 84, 57, 80, 98, 54, 82, 74, 87, 69, 55, 59, 87, 76, 53, 80, 89, 66, 68, 95, 89, 65, 69, 81, 55, 53, 77, 54, 88, 90, 73, 77, 63, 75, 98, 78, 62, 61, 91, 90, 69, 93, 80, 69, 63, 71, 57, 77, 76, 77, 67, 91, 66, 51, 98, 66, 96, 62, 98, 78, 86, 60, 93, 60, 62, 72, 71, 55, 55, 82, 99, 83, 61, 69, 58, 68, 71, 67, 87, 64, 50, 60]
gradeA = []
gradeB = []
gradeC = []
gradeD = []
gradeF = []
for a in nums:
if a in range(90,100):
gradeA.append(a)
if a in range(80,89):
gradeB.append(a)
if a in range(70,79):
gradeC.append(a)
if a in range(60,69):
gradeD.append(a)
if a in range(0,59):
gradeF.append(a)
import numpy as np
import matplotlib.pyplot as plt
gradeCounts = (len(gradeF), len(gradeD), len(gradeC), len(gradeB), len(gradeA))
# the x locations for the bars
ind = np.arange(5)
#bar width
width = 0.35
#draw graph
ax = plt.subplot()
#set up colors for different bars
colors = ['r','orange','y','g','b']
ax.bar(ind+.5*width, gradeCounts, width, color=colors)
# add some text for labels, title and axes ticks
ax.set_title('Grade Range')
ax.set_xticks(ind+width)
ax.set_xticklabels( ('F', 'D', 'C', 'B', 'A') )
plt.show()
plt.savefig('test.png')

Get rid of the penultimate line, plt.show()
Here is the result of running your code without that line

Related

how to generate combinations from multiple variables?

I have following variables;
D8 =[22, 27, 28, 30, 31, 40, 41, 42, 43, 45]
D9 = [79, 80, 90, 92, 93, 97, 98, 104, 105, 109]
D10=[61, 64, 66, 70, 72, 76, 81, 86, 87]
By using above variables, I tried to generate all the possible combinations as follows;
import itertools
stuff = [D8, D9, D10]
for L in range(0, len(stuff)+1):
for subset in itertools.combinations(stuff, L):
print(subset)
The result depicts as follows;
Output>>>
()
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45],)
([79, 80, 90, 92, 93, 97, 98, 104, 105, 109],)
([61, 64, 66, 70, 72, 76, 81, 86, 87],)
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45], [79, 80, 90, 92, 93, 97, 98, 104, 105, 109])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45], [61, 64, 66, 70, 72, 76, 81, 86, 87])
([79, 80, 90, 92, 93, 97, 98, 104, 105, 109], [61, 64, 66, 70, 72, 76, 81, 86, 87])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45], [79, 80, 90, 92, 93, 97, 98, 104, 105, 109], [61, 64, 66, 70, 72, 76, 81, 86, 87])
I want to know is there any other cleaner method to generate the above result, because in the current output I cannot flat each result into each individual outcome? Expected result should be like this?
Expected Result>>>
([])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45])
([79, 80, 90, 92, 93, 97, 98, 104, 105, 109])
([61, 64, 66, 70, 72, 76, 81, 86, 87])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 79, 80, 90, 92, 93, 97, 98, 104, 105, 109])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 61, 64, 66, 70, 72, 76, 81, 86, 87])
([79, 80, 90, 92, 93, 97, 98, 104, 105, 109, 61, 64, 66, 70, 72, 76, 81, 86, 87])
([22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 79, 80, 90, 92, 93, 97, 98, 104, 105, 109, 61, 64, 66, 70, 72, 76, 81, 86, 87])
Thank you in advanced!!!
The itertools documentation contains the recipe flatten:
def flatten(list_of_lists):
"Flatten one level of nesting"
return chain.from_iterable(list_of_lists)
which you need to apply twice to get the desired result:
from itertools import combinations, chain
D8 = [22, 27, 28, 30, 31, 40, 41, 42, 43, 45]
D9 = [79, 80, 90, 92, 93, 97, 98, 104, 105, 109]
D10 = [61, 64, 66, 70, 72, 76, 81, 86, 87]
stuff = [D8, D9, D10]
def flatten(list_of_lists):
"""Flatten one level of nesting"""
return chain.from_iterable(list_of_lists)
result = flatten(map(flatten, combinations(stuff, length)) for length in range(len(stuff) + 1))
for xs in result:
print(list(xs))
producing:
[]
[22, 27, 28, 30, 31, 40, 41, 42, 43, 45]
[79, 80, 90, 92, 93, 97, 98, 104, 105, 109]
[61, 64, 66, 70, 72, 76, 81, 86, 87]
[22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 79, 80, 90, 92, 93, 97, 98, 104, 105, 109]
[22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 61, 64, 66, 70, 72, 76, 81, 86, 87]
[79, 80, 90, 92, 93, 97, 98, 104, 105, 109, 61, 64, 66, 70, 72, 76, 81, 86, 87]
[22, 27, 28, 30, 31, 40, 41, 42, 43, 45, 79, 80, 90, 92, 93, 97, 98, 104, 105, 109, 61, 64, 66, 70, 72, 76, 81, 86, 87]

Clark evans aggregation index in Python

The Clark-Evans index is one of the most basic statistics to measure point aggregation in spatial analysis. However, I can't find any implementation in Python. So I adapted the R code from the hyperlink above. I want to ask if the statistic and p-value are correct with such irregular study areas:
Function
import numpy as np
import os, math
from shapely.geometry import Polygon, Point
from sklearn.neighbors import KDTree
from statistics import NormalDist
def clarkEvans (X, Y, roi):
""" Clark evans index takes point x,y coordinates and a polygon for cell shape (roi) and outputs Clark-Evans index:
R ~1 suggests spatial randomness, while R<<1 suggests clustering and R>>1 suggests ordering"""
# Import cell boundries from roi file
pgon = Polygon(roi)
# Calculate intensity from points/area
areaW = pgon.area
npts = len(X)
intensity = npts/areaW
if npts <2:
return(np.nan)
tmp_df = list(zip(X, Y))
# Get nearest neighbours for each observation
kdt = KDTree(tmp_df, leaf_size=30, metric='euclidean') # This is very good for large datasets, but maybe bad for small ones?
dists, ids = kdt.query(tmp_df, k=2)
dists = [x[1] for x in dists]
# Clark-Evans Index (mean NN distances/mean NN distances under poisson)
Dobs = np.mean(dists)
Dpoison = 1/(2 * math.sqrt(intensity))
Rnaive = Dobs/Dpoison
# Calculate p-value under normal distribution
SE = math.sqrt(((4 - math.pi) * areaW)/(4 * math.pi))/npts
Z = (Dobs - Dpoison)/SE
# Diff between observed and expected NN distances should have Normal Distribution according to Central Limit Theorem (CLT)
p_val = NormalDist().pdf(Z) # p_val for clustering
# Return the ClarkEvans Index and p-value
return(round(Rnaive,3), round(p_val, 3))
Output
In the image is my Clark-Evans Index being applied to and plotted with two different datasets. The index is similar for both patterns, one of which seems more obviously clustered. The p-values seem switched, I would think the second plot would have a significant p-val, being the clustered one.
Input data
# The xy coordinates of observations plus the point vertices of the study area (roi)
x1 = [123, 105, 71, 109, 96, 49, 86, 80, 120, 98, 59, 100, 118, 69, 84, 21, 95, 77, 158, 118, 87, 77, 87, 77, 82, 106, 120, 125, 61, 24, 53, 106, 52, 103, 89, 99, 111, 58, 97, 83, 51, 45, 64, 112, 114, 73, 55, 111, 110, 102, 116, 107, 84, 97, 118, 96, 116, 45, 102, 145, 126, 50, 103, 98, 20, 79, 113, 99, 90, 143, 36, 120, 106, 91, 95, 15, 122, 69, 28, 71, 66, 119, 78, 75, 113, 44, 85, 60, 88, 68, 116, 40, 59, 105, 65, 94, 79, 95, 120, 67, 78, 59, 89, 84, 111, 78, 72, 156, 162, 134, 157, 120, 126, 86, 58, 137, 32, 91, 68, 119, 112, 70, 120, 62, 118, 114, 66, 55, 99, 72, 91, 109, 53, 94, 71, 145, 146, 106, 15, 83, 104, 61, 129, 51, 58, 59, 113, 107, 94, 94, 69, 118, 74, 124, 107, 99, 66, 115, 159, 71, 115, 122, 76, 68, 79, 107, 81, 104, 87, 106, 105, 112, 111, 79, 54, 108, 62, 115, 36, 74, 84, 75, 64, 92, 64, 82, 77, 56, 75, 69, 88, 105, 96, 61, 84, 106, 31, 53, 173, 102, 99, 124, 87, 70, 25, 19, 122, 101, 126, 60, 94, 78, 97, 64, 45, 92, 114, 87, 96, 160, 88, 66, 40, 124, 103, 60, 129, 120, 35, 95, 56, 76, 116, 65, 7, 103, 160, 63, 134, 101, 56, 50, 89, 92, 99, 89, 120, 47, 58, 47, 74, 124, 8, 93, 121, 53, 66, 63, 90, 114, 91, 71, 123, 55, 142, 97, 69, 141, 92, 76, 69, 74, 66, 90, 81, 96, 110, 61, 58, 62, 50, 125, 106, 115, 79, 94, 118, 117, 64, 99, 55, 53, 93, 57, 116, 61, 125, 10, 119, 74, 64, 77, 127, 115, 59, 53, 99, 81, 68, 101, 43, 122, 129, 109, 108, 84, 103, 59, 105, 76, 122, 101, 101, 108, 79, 75, 60, 111, 97, 104, 82, 67, 96, 70, 96, 104, 103, 66, 89, 114, 121, 119, 104, 93, 156, 108, 88, 98, 52, 112, 65, 99, 107, 90, 107, 115, 73, 106, 100, 120, 128, 66, 116, 69, 113, 69, 103, 62, 124, 110, 124, 72, 76, 115, 73, 84, 95, 100, 51, 61, 82, 97, 106, 68, 112, 69, 115, 67, 80, 72, 63, 123, 92, 101, 61, 69, 103, 112, 70, 59, 91, 90, 102, 111, 41, 101, 90, 33, 122, 161, 161]
y1 = [37, 51, 35, 67, 94, 114, 62, 24, 64, 92, 55, 11, 74, 38, 79, 77, 90, 77, 70, 70, 41, 46, 81, 83, 81, 65, 63, 43, 56, 95, 26, 8, 68, 82, 44, 78, 77, 72, 45, 68, 83, 99, 100, 58, 91, 89, 115, 34, 46, 68, 79, 71, 41, 43, 48, 83, 67, 69, 42, 55, 63, 69, 47, 67, 102, 72, 33, 77, 67, 1, 123, 59, 69, 47, 73, 79, 89, 48, 55, 97, 56, 92, 121, 70, 48, 47, 114, 62, 84, 78, 54, 55, 79, 76, 62, 63, 83, 71, 74, 83, 50, 67, 84, 81, 75, 59, 12, 77, 97, 6, 26, 55, 10, 74, 58, 59, 77, 76, 77, 68, 60, 50, 53, 89, 76, 87, 67, 86, 86, 73, 79, 74, 62, 54, 67, 58, 23, 76, 95, 63, 38, 76, 117, 18, 52, 46, 98, 62, 44, 36, 86, 52, 74, 51, 85, 100, 75, 73, 63, 38, 64, 91, 47, 70, 77, 88, 70, 88, 88, 39, 52, 45, 79, 56, 74, 60, 59, 69, 116, 44, 55, 48, 70, 83, 66, 87, 78, 73, 58, 76, 46, 50, 43, 81, 102, 45, 115, 88, 80, 34, 55, 55, 97, 103, 112, 122, 111, 97, 90, 81, 22, 36, 87, 86, 48, 39, 42, 83, 57, 16, 100, 89, 115, 75, 69, 86, 69, 69, 74, 39, 52, 23, 63, 49, 92, 96, 71, 105, 10, 75, 84, 80, 30, 30, 59, 52, 32, 119, 107, 74, 79, 101, 106, 99, 77, 66, 89, 83, 102, 94, 97, 78, 91, 93, 16, 11, 33, 16, 78, 50, 30, 26, 79, 34, 32, 86, 64, 40, 63, 51, 58, 52, 92, 98, 35, 36, 34, 47, 86, 88, 60, 80, 92, 96, 94, 94, 98, 111, 49, 54, 56, 36, 72, 94, 92, 102, 105, 32, 40, 30, 73, 59, 107, 39, 46, 40, 53, 57, 93, 92, 63, 59, 65, 68, 81, 69, 56, 53, 53, 85, 56, 55, 93, 45, 40, 68, 101, 93, 29, 44, 93, 93, 46, 67, 38, 34, 97, 93, 72, 90, 62, 68, 32, 31, 74, 71, 59, 38, 51, 95, 73, 82, 5, 53, 50, 34, 49, 43, 82, 77, 65, 88, 87, 89, 30, 38, 45, 36, 79, 89, 88, 100, 98, 45, 41, 20, 35, 51, 77, 64, 60, 63, 33, 44, 78, 82, 83, 70, 74, 78, 41, 61, 71, 40, 124, 82, 67, 121, 5, 65, 66]
roi1 = [[152.5078125, 3.7060546875], [158.8408203125, 12.455078125], [165.5126953125, 25.3154296875], [170.796875, 38.787109375], [171.013671875, 46.02734375], [172.6083984375, 53.0615234375], [172.6083984375, 63.9306640625], [174.419921875, 70.9169921875], [174.419921875, 85.41015625], [175.947265625, 92.4296875], [175.7998046875, 103.2626953125], [169.52734375, 116.3212890625], [166.9765625, 118.89453125], [159.7451171875, 119.2177734375], [152.7265625, 121.01953125], [138.2333984375, 121.029296875], [131.21875, 122.8408203125], [73.248046875, 122.8408203125], [66.2119140625, 124.5546875], [58.966796875, 124.65234375], [51.9990234375, 126.4638671875], [23.013671875, 126.4638671875], [19.42578125, 125.958984375], [16.5361328125, 123.7734375], [10.20703125, 115.0283203125], [0.5068359375, 95.57421875], [0.5537109375, 91.951171875], [9.0869140625, 80.318359375], [12.552734375, 73.9599609375], [18.884765625, 65.2119140625], [25.89453125, 56.994140625], [35.611328125, 41.7626953125], [42.345703125, 33.296875], [45.7568359375, 26.90625], [53.634765625, 14.7744140625], [58.1103515625, 9.078125], [64.916015625, 6.8984375], [86.654296875, 6.8984375], [93.6904296875, 5.291015625], [100.89453125, 4.57421875], [104.2763671875, 3.275390625], [122.3837890625, 3.025390625], [129.3935546875, 1.4638671875], [143.8857421875, 1.4638671875], [147.376953125, 0.4931640625]]
clarkEvans (x1, y1, roi1)
x2 = [94, 111, 79, 95, 86, 46, 30, 34, 53, 17, 44, 20, 42, 56, 23, 21, 50, 16, 50, 52, 47, 132, 44, 40, 43, 33, 29, 52, 24, 125, 86, 84]
y2 = [17, 71, 94, 88, 108, 132, 116, 115, 121, 132, 120, 121, 123, 116, 116, 139, 121, 124, 116, 140, 141, 33, 119, 118, 125, 130, 123, 122, 40, 23, 80, 107]
roi2 = [[129.4560546875, 3.6552734375], [132.3408203125, 5.84765625], [134.4638671875, 12.7744140625], [134.4638671875, 45.3828125], [132.65234375, 56.0302734375], [132.65234375, 66.8994140625], [131.4169921875, 70.3056640625], [130.7021484375, 77.5029296875], [129.029296875, 84.5419921875], [129.029296875, 88.1650390625], [127.2177734375, 95.1728515625], [127.16796875, 106.04296875], [125.40625, 113.0712890625], [125.40625, 116.6943359375], [123.896484375, 119.9873046875], [120.6533203125, 121.6025390625], [110.0654296875, 123.9130859375], [99.6181640625, 126.8427734375], [89.896484375, 131.7041015625], [83.8388671875, 135.638671875], [77.03515625, 138.134765625], [73.4228515625, 138.40625], [56.181640625, 143.8408203125], [45.568359375, 142.029296875], [31.076171875, 142.029296875], [27.4736328125, 141.6455078125], [20.626953125, 139.302734375], [15.5029296875, 134.1787109375], [11.0546875, 128.5234375], [4.537109375, 115.5849609375], [0.40625, 98.0078125], [0.513671875, 72.646484375], [4.927734375, 55.0859375], [11.4091796875, 42.123046875], [15.333984375, 36.0859375], [21.94921875, 27.4912109375], [34.7587890625, 14.6806640625], [43.416015625, 8.205078125], [57.9013671875, 7.970703125], [61.28125, 6.666015625], [71.9052734375, 4.541015625], [82.7705078125, 4.34765625], [89.7578125, 2.5361328125], [107.873046875, 2.5361328125], [114.8916015625, 1.015625], [122.126953125, 0.724609375]]
clarkEvans (x2, y2, roi2)
Using the original R function yields similar but not equal results:
clarkevans.test(X, alternative = "clustered")
>R= 0.87719, p-value = 9.542e-07 # First dataset
>R= 0.83365, p-value = 0.03591 # Second dataset
I'm not sure if the statistic and p-value calculation are valid since my study areas are irregularly shaped. The variable SE is calculated with pi, which seems like it is estimating a random distribution in a circular study area. Should I do Monte Carlo simulations instead? Is there a way of avoiding that?
Cheers!
I have not worked with the Clark-Evans (CE) index before, but having read the information you linked to and studied your code, my interpretation is this:
The index value for Dataset2 is less than the index value for Dataset1. This correctly reflects the visual difference in clusteredness, that is, the smaller index value is associated with data that is more clustered.
It is probably not meaningful to say that two CE index values are similar, other than special cases like observing that two CE index values are both smaller than 1 or both greater than 1, or if A < B < C then AB are more similar than AC.
The p-value and the index value measure different things. The index value measures degree of clusteredness (if less than 1) or regularity (if greater than 1). The p-value (inversely) measures how certain it is that the data are more clustered than would be expected by chance, or more regular than would be expected by chance. The p-value in particular is sensitive to the sample size as well as the distribution of points.
The use of pi in calculating SE reflects the assumption of Euclidean distances between points (rather than, say, city block distances). That is, the nearest neighbour of a point is the one at the smallest radial distance. The use of pi in calculating SE does not make any assumptions about the shape of the region of interest.
Particularly for small datasets (like Dataset2) you will want to track down information about the potential impact of boundary effects on the index value or the p-value.
More speculatively, I wonder if it would be useful to use a convex hull to help determine the region of interest rather than do this subjectively.

How do you use for loop to find the minimum without using something similar to "min" on Python?

This is what I have so far, but I would like to use it without min and with for loop:
Numbers = [100, 97, 72, 83, 84, 78, 89, 84, 83, 75, 54, 98, 70, 88, 99, 69, 70, 79, 55, 82, 81, 75, 54, 82, 56, 73, 90, 100, 94, 89, 56, 64, 51, 72, 64, 94, 63, 82, 77, 68, 60, 93, 95, 60, 77, 78, 74, 67, 72, 99, 93, 79, 76, 86, 87, 74, 82]
for i in range(len(Numbers)):
print(min(Numbers))
I do not know why you do not want to use min (you should) - but if you do not want to you can loop over the numbers and keep track of the smallest.
min_ = None
for n in Numbers:
if min_ is None or n < min_:
min_ = n
min_ is now the minimum in the list Numbers.
Another identical method is:
numbers = [100, 97, 72, 83, 84, 78, 89, 84, 83, 75, 54, 98, 70, 88, 99, 69, 70, 79, 55, 82, 81, 75, 54, 82, 56, 73, 90, 100, 94, 89, 56, 64, 51, 72, 64, 94, 63, 82, 77, 68, 60, 93, 95, 60, 77, 78, 74, 67, 72, 99, 93, 79, 76, 86, 87, 74, 82]
smallest = None
for x in range(len(numbers)):
if (smallest == None or numbers[x] < smallest):
smallest = numbers[x]
print(smallest)
Output:
51

Averaging Vertically in Nested Lists

I am writing a grade book program that sets a nested list with assignments as columns and individual students along the rows. The program must calculate the average for each assignment and the average for each student. I've got the average by student, but now I can't figure out how to calculate the average by assignment. Any help would be appreciated!
# gradebook.py
# Display the average of each student's grade.
# Display tthe average for each assignment.
gradebook = [61, 74, 69, 62, 72, 66, 73, 65, 60, 63, 69, 63,
62, 61, 64],
[73, 80, 78, 76, 76, 79, 75, 73, 76, 74, 77, 79, 76,
78, 72],
[90, 92, 93, 92, 88, 93, 90, 95, 100, 99, 100, 91, 95, 99, 96],
[96, 89, 94, 88, 100, 96, 93, 92, 94, 98, 90, 90, 92, 91, 94],
[76, 76, 82, 78, 82, 76, 84, 82, 80, 82, 76, 86, 82, 84, 78],
[93, 92, 89, 84, 91, 86, 84, 90, 95, 86, 88, 95, 88, 84, 89],
[63, 66, 55, 67, 66, 68, 66, 56, 55, 62, 59, 67, 60, 70, 67],
[86, 92, 93, 88, 90, 90, 91, 94, 90, 86, 93, 89, 94, 94, 92],
[89, 80, 81, 89, 86, 86, 85, 80, 79, 90, 83, 85, 90, 79, 80],
[99, 73, 86, 77, 87, 99, 71, 96, 81, 83, 71, 75, 91, 74, 72]]
#make variable for assingment averages
#make a variable for student averages
stu_avg = [sum(row)/len(row) for row in gradebook]
print(stu_avg)
#Assignment Class
class Assignment:
def __init__(self, name, average):
self.average = average
self.name = name
def print_grade(self):
print("Assignment", self.name, ":", self.average)
#Student Class
class Student:
def __init__(self, name, average):
self.average = average
self.name = name
def print_grade(self):
print("Student", self.name, ":", self.average)
s1 = Student("1", stu_avg[0])
s2 = Student("2", stu_avg[1])
s3 = Student("3", stu_avg[2])
s4 = Student("4", stu_avg[3])
s5 = Student("5", stu_avg[4])
s6 = Student("6", stu_avg[5])
s7 = Student("7", stu_avg[6])
s8 = Student("8", stu_avg[7])
s9 = Student("9", stu_avg[8])
s10 = Student("10", stu_avg[9])
s1.print_grade()
s2.print_grade()
s3.print_grade()
s4.print_grade()
s5.print_grade()
s6.print_grade()
s7.print_grade()
s8.print_grade()
s9.print_grade()
s10.print_grade()
Instead of using loops, let's use matrices. They make calculation much much faster, especially when dealing with large datasets.
As an example,
Per student:
[1, 2, 3, 4] [1]
[4, 5, 6, 6] x [1]
[1, 1, 3, 1] [1]
Per assignment
[1, 2, 3, 4]T [1]
[4, 5, 6, 6] x [1]
[1, 1, 3, 1] [1]
The first operation returns per student sum, and the second returns per test sum. Divide appropriately to get the average.
Using numpy
import numpy as np
gradebook = [[61, 74, 69, 62, 72, 66, 73, 65, 60, 63, 69, 63, 62, 61, 64],
[73, 80, 78, 76, 76, 79, 75, 73, 76, 74, 77, 79, 76, 78, 72],
[90, 92, 93, 92, 88, 93, 90, 95, 100, 99, 100, 91, 95, 99, 96],
[96, 89, 94, 88, 100, 96, 93, 92, 94, 98, 90, 90, 92, 91, 94],
[76, 76, 82, 78, 82, 76, 84, 82, 80, 82, 76, 86, 82, 84, 78],
[93, 92, 89, 84, 91, 86, 84, 90, 95, 86, 88, 95, 88, 84, 89],
[63, 66, 55, 67, 66, 68, 66, 56, 55, 62, 59, 67, 60, 70, 67],
[86, 92, 93, 88, 90, 90, 91, 94, 90, 86, 93, 89, 94, 94, 92],
[89, 80, 81, 89, 86, 86, 85, 80, 79, 90, 83, 85, 90, 79, 80],
[99, 73, 86, 77, 87, 99, 71, 96, 81, 83, 71, 75, 91, 74, 72]]
def get_student_average(gradebook):
number_of_students = len(gradebook[0])
number_of_assignments = len(gradebook)
matrix = [1] * number_of_students
# [1, 1, 1, 1, ...]. This is 1 * 15. Need to transpose to make it 15*1
# Converting both to numpy matrices
matrix = np.array(matrix)
gradebook = np.array(gradebook)
# Transposing matrix and multiplying them
print(gradebook.dot(matrix.T))
def get_assignment_average(gradebook):
number_of_students = len(gradebook[0])
number_of_assignments = len(gradebook)
matrix = [1] * number_of_assignments
# [1, 1, 1, ...] . This is 1 * 10. Need to transpose to make it 10*1
matrix = np.array(matrix)
gradebook = np.array(gradebook)
gradebook = gradebook.T
matrix = matrix.T
print(gradebook.dot(matrix))
get_student_average(gradebook)
get_assignment_average(gradebook)
Results
student_avg -> [ 984 1142 1413 1397 1204 1334 947 1362 1262 1235]
test_avg -> [826 814 820 801 838 839 812 823 810 823 806 820 830 814 804]

How to select specific column ranges in pandas? [duplicate]

I have to read several files some in Excel format and some in CSV format. Some of the files have hundreds of columns.
Is there a way to select several ranges of columns without specifying all the column names or positions? For example something like selecting columns 1 -10, 15, 17 and 50-100:
df = df.ix[1:10, 15, 17, 50:100]
I need to know how to do this both when creating dataframe from Excel files and CSV files and after the data framers created.
use np.r_
np.r_[1:10, 15, 17, 50:100]
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 15, 17, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
so you can do
df.iloc[:, np.r_[1:10, 15, 17, 50:100]]
I find #piRSquared 's answer straightforward.
You may also use:
Locs = list(range(0,10)) + [14, 16] + list(range(49, 100))
# columns 1 -10, 15, 17 and 50-100
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 14, 16, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
df = df.iloc[:Locs]
use inner join
like
result = pd.concat([df1, df4], axis=1, join="inner")

Categories

Resources