Is there a simpler way for finding a number - python

I'm writing a python script.
I have a list of numbers:
b = [55.0, 54.0, 54.0, 53.0, 52.0, 51.0, 50.0, 49.0, 48.0, 47.0,
45.0, 45.0, 44.0, 43.0, 41.0, 40.0, 39.0, 39.0, 38.0, 37.0, 36.0, 35.0, 34.0, 33.0, 32.0, 31.0, 30.0, 28.0, 27.0, 27.0, 26.0, 25.0, 24.0, 23.0, 22.0, 22.0, 20.0, 19.0, 18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 11.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0]
I need to parse the list and see if the list contains '50'. If it does not,I have to search for one less number 49. if it is not there I have to look for 48. I can do this down to 47.
In python, is there a one liner code I can do this, or can I use a lambda for this?

You could use min() and abs():
>>> b = [55.0, 54.0, 54.0, 53.0, 52.0, 51.0, 50.0, 49.0, 48.0, 47.0, 45.0, 45.0, 44.0, 43.0, 41.0, 40.0, 39.0, 39.0, 38.0, 37.0, 36.0, 35.0, 34.0, 33.0, 32.0, 31.0, 30.0, 28.0, 27.0, 27.0, 26.0, 25.0, 24.0, 23.0, 22.0, 22.0, 20.0, 19.0, 18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 11.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0]
>>> min(b, key=lambda x:abs(x-50))
50.0
>>> min(b, key=lambda x:abs(x-20.1))
20.0

max(i for i in b if i <= 50)
It will raise a ValueError if there are no elements that match the condition.

max(filter(lambda i: i<=50, b))
or, to handle list with all elements above 50:
max(filter(lambda i: i<=50, b) or [None])

You can do this with a generator expression and max.
max(n for n in b if n >= 47 and n <= 50)

highestValue = max(b)
lowestValue = min(b)
if 50 in b:
pass
Three different ways of finding numbers, highest, lowest and if 50 is in the mix.
And if you need to check if multiple numbers is in your hughe list, say you need to know if 50, 30 and 40 is in there:
set(b).issuperset(set([50, 40, 30]))

Oneliner without any lambda (raises ValueError if value not found):
max((x for x in b if 46 < x <= 50))
or version that returns None in this case:
from itertools import chain
max(chain((x for x in b if 46 < x <= 50), (None,)))

Related

How can I put the result of a calculation to be put in a list?

while seed != 1.0:
if (seed % 2 == 0) :
seed = seed / 2
else:
seed = seed * 3 + 1
I want to put the Result of the Calculation to be put in a list.
Could I use return?
If yes how?
You can use the list method append
results = []
while seed != 1.0:
if (seed % 2 == 0) :
seed = seed / 2
else:
seed = seed * 3 + 1
results.append(seed)
print(results)
Output for seed = 50: [25.0, 76.0, 38.0, 19.0, 58.0, 29.0, 88.0, 44.0, 22.0, 11.0, 34.0, 17.0, 52.0, 26.0, 13.0, 40.0, 20.0, 10.0, 5.0, 16.0, 8.0, 4.0, 2.0, 1.0]
Note: the return keyword can only be used in functions and will only be useful in this example if this is in a function at the end instead of print(results)
def three_n_plus_one(seed):
results = []
while seed != 1.0:
if (seed % 2 == 0) :
seed = seed / 2
else:
seed = seed * 3 + 1
results.append(seed)
return results
You can call the function like this:
print(three_n_plus_one(50))
It gives the same output - [25.0, 76.0, 38.0, 19.0, 58.0, 29.0, 88.0, 44.0, 22.0, 11.0, 34.0, 17.0, 52.0, 26.0, 13.0, 40.0, 20.0, 10.0, 5.0, 16.0, 8.0, 4.0, 2.0, 1.0]

How to plot grouped bars in the correct order

I am making a grouped bar chart of proficiency levels on a standardized test. Here is my code:
bush_prof_boy = bush.groupby(['BOY Prof'])['BOY Prof'].count()
bush_prof_pct_boy = bush_prof_boy/bush['BOY Prof'].count() * 100
bush_prof_eoy = bush.groupby(['EOY Prof'])['EOY Prof'].count()
bush_prof_pct_eoy = bush_prof_eoy/bush['EOY Prof'].count() * 100
labels = ['Remedial', 'Below Proficient', 'Proficient', 'Advanced']
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, bush_prof_pct_boy, width, label='BOY',
color='mediumorchid')
rects2 = ax.bar(x + width/2, bush_prof_pct_eoy, width, label='EOY', color='teal')
ax.set_ylabel('% of Students at Proficiency Level', fontsize=18)
ax.set_title('Bushwick Middle Change in Proficiency Levels', fontsize=25)
ax.set_xticks(x)
ax.set_xticklabels(labels, fontsize=25)
ax.legend(fontsize=25)
plt.yticks(fontsize=15)
plt.figure(figsize=(5,15))
plt.show()
"BOY" stands for "Beginning of Year" and "EOY" "End of Year" so the bar graph is intended to show percent of students who fell into each proficiency level at the beginning and end of the year. The graph looks alright but when I drill into the numbers, I can see that the labels for EOY are incorrect. Here is my graph:
The percentages for BOY are graphed correctly, but the EOY ones are with the wrong labels. Here are the actual percentages, which I am certain are correct:
BOY %
Advanced 14.0
Below Proficient 38.0
Proficient 34.0
Remedial 14.0
EOY %
Advanced 39.0
Below Proficient 18.0
Proficient 32.0
Remedial 11.0
Using data from Kaggle: Brooklyn NY Schools
Calculating the bar groups separately can be problematic. It is better to make the calculations within one dataframe, shape the dataframe, and then plot, because this will ensure the bars are plotted in the correct groups.
Since no data is provided, this begins with wide form numeric data and then cleans and shapes the dataframe.
Numeric values are converted to categorical with .cut
Dataframe is converted to long form with .melt, and then use .groupby to calculate percentage within the 'x of Year'
Reshaped with .pivot, and plot with pandas.DataFrame.plot
Tested in python 3.8, pandas 1.3.1, and matplotlib 3.4.2
Imports, Load and Clean the DataFrame
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
# data
data = {'BOY': [11.0, 11.0, 11.0, 11.0, 11.0, 8.0, 11.0, 14.0, 12.0, 13.0, 11.0, 14.0, 10.0, 9.0, 10.0, 10.0, 10.0, 12.0, 12.0, 13.0, 12.0, 11.0, 9.0, 12.0, 16.0, 12.0, 12.0, 12.0, 15.0, 10.0, 10.0, 10.0, 8.0, 11.0, 12.0, 14.0, 10.0, 8.0, 11.0, 12.0, 14.0, 12.0, 13.0, 15.0, 13.0, 8.0, 8.0, 11.0, 10.0, 11.0, 13.0, 11.0, 13.0, 15.0, 10.0, 8.0, 10.0, 9.0, 8.0, 11.0, 13.0, 11.0, 8.0, 11.0, 15.0, 11.0, 12.0, 17.0, 12.0, 11.0, 18.0, 14.0, 15.0, 16.0, 7.0, 11.0, 15.0, 16.0, 13.0, 13.0, 13.0, 0.0, 11.0, 15.0, 14.0, 11.0, 13.0, 16.0, 14.0, 12.0, 8.0, 13.0, 13.0, 14.0, 7.0, 10.0, 16.0, 10.0, 13.0, 10.0, 14.0, 8.0, 16.0, 13.0, 12.0, 14.0, 12.0, 14.0, 16.0, 15.0, 13.0, 13.0, 10.0, 14.0, 8.0, 10.0, 10.0, 11.0, 12.0, 10.0, 12.0, 14.0, 17.0, 13.0, 14.0, 16.0, 15.0, 13.0, 16.0, 9.0, 16.0, 15.0, 11.0, 11.0, 15.0, 14.0, 12.0, 15.0, 11.0, 16.0, 14.0, 14.0, 15.0, 14.0, 14.0, 14.0, 16.0, 15.0, 12.0, 12.0, 14.0, 15.0, 13.0, 14.0, 13.0, 17.0, 14.0, 13.0, 14.0, 13.0, 13.0, 12.0, 10.0, 15.0, 14.0, 12.0, 12.0, 14.0, 12.0, 14.0, 13.0, 15.0, 13.0, 14.0, 14.0, 12.0, 11.0, 15.0, 14.0, 14.0, 10.0], 'EOY': [16.0, 16.0, 16.0, 14.0, 10.0, 14.0, 16.0, 14.0, 15.0, 15.0, 15.0, 11.0, 11.0, 15.0, 10.0, 14.0, 17.0, 14.0, 9.0, 15.0, 14.0, 16.0, 14.0, 13.0, 11.0, 13.0, 12.0, 14.0, 15.0, 13.0, 14.0, 15.0, 12.0, 19.0, 9.0, 13.0, 11.0, 14.0, 17.0, 17.0, 14.0, 13.0, 14.0, 10.0, 16.0, 15.0, 12.0, 11.0, 12.0, 14.0, 15.0, 10.0, 15.0, 14.0, 14.0, 15.0, 18.0, 15.0, 10.0, 10.0, 15.0, 15.0, 13.0, 15.0, 19.0, 13.0, 18.0, 20.0, 21.0, 17.0, 18.0, 17.0, 18.0, 17.0, 12.0, 16.0, 15.0, 18.0, 19.0, 17.0, 20.0, 11.0, 18.0, 19.0, 11.0, 12.0, 17.0, 20.0, 17.0, 15.0, 13.0, 18.0, 14.0, 17.0, 12.0, 12.0, 16.0, 12.0, 14.0, 15.0, 14.0, 10.0, 20.0, 13.0, 18.0, 20.0, 11.0, 20.0, 17.0, 20.0, 13.0, 17.0, 15.0, 18.0, 14.0, 13.0, 13.0, 18.0, 10.0, 13.0, 12.0, 18.0, 20.0, 20.0, 16.0, 18.0, 15.0, 20.0, 22.0, 18.0, 21.0, 18.0, 18.0, 18.0, 17.0, 16.0, 19.0, 16.0, 20.0, 19.0, 19.0, 20.0, 20.0, 14.0, 18.0, 20.0, 20.0, 18.0, 16.0, 21.0, 20.0, 18.0, 15.0, 14.0, 17.0, 19.0, 21.0, 14.0, 18.0, 15.0, 18.0, 21.0, 19.0, 17.0, 16.0, 16.0, 15.0, 20.0, 19.0, 16.0, 21.0, 17.0, 19.0, 15.0, 18.0, 20.0, 18.0, 20.0, 18.0, 16.0, 16.0]}
df = pd.DataFrame(data)
# replace numbers with categorical labels; could also create new columns
labels = ['Remedial', 'Below Proficient', 'Proficient', 'Advanced']
bins = [1, 11, 13, 15, np.inf]
df['BOY'] = pd.cut(x=df.BOY, labels=labels, bins=bins, right=True)
df['EOY'] = pd.cut(x=df.EOY, labels=labels, bins=bins, right=True)
# melt the relevant columns into a long form
dfm = df.melt(var_name='Tested', value_name='Proficiency')
# set the categorical label order, which makes the xaxis labels print in the specific order
dfm['Proficiency'] = pd.Categorical(dfm['Proficiency'], labels, ordered=True)
Groupby, Percent Calculation, and Shape for Plotting
# groupby and get the value counts
dfg = dfm.groupby('Tested')['Proficiency'].value_counts().reset_index(level=1, name='Size').rename({'level_1': 'Proficiency'}, axis=1)
# divide by the Tested value counts to get the percent
dfg['percent'] = dfg['Size'].div(dfm.Tested.value_counts()).mul(100).round(1)
# reshape to plot
dfp = dfg.reset_index().pivot(index='Proficiency', columns='Tested', values='percent')
# display(dfp)
Tested BOY EOY
Proficiency
Remedial 34.8 9.9
Below Proficient 28.7 12.7
Proficient 27.1 25.4
Advanced 8.8 51.9
Plot
ax = dfp.plot(kind='bar', figsize=(15, 5), rot=0, color=['orchid', 'teal'])
# formatting
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
ax.set_ylabel('Students at Proficiency Level', fontsize=18)
ax.set_xlabel('')
ax.set_title('Bushwick Middle Change in Proficiency Levels', fontsize=25)
ax.set_xticklabels(ax.get_xticklabels(), fontsize=25)
ax.legend(fontsize=25)
_ = plt.yticks(fontsize=15)
# add bar labels
for p in ax.containers:
ax.bar_label(p, fmt='%.1f%%', label_type='edge', fontsize=12)
# pad the spacing between the number and the edge of the figure
ax.margins(y=0.2)
See the bar labels match dfp

Sorting python dictionary with tuples of keys and using lambda

Here is how my dictionary looks like:
{((0,), (16,)): 12.0, ((0,), (17,)): 14.0, ((1,), (16,)): 10.0, ((1,), (17,)): 13.0, ((2,), (15,)): 9.0, ((2,), (16,)): 21.0, ((2,), (17,)): 42.0, ((3,), (16,)): 13.0, ((3,), (17,)): 21.0, ((4,), (16,)): 9.0, ((4,), (17,)): 12.0, ((8,), (15,)): 7.0, ((8,), (16,)): 16.0, ((8,), (17,)): 20.0, ((9,), (16,)): 9.0, ((9,), (17,)): 13.0, ((10,), (15,)): 10.0, ((10,), (16,)): 18.0, ((10,), (17,)): 35.0, ((11,), (15,)): 10.0, ((11,), (16,)): 20.0, ((11,), (17,)): 37.0, ((12,), (15,)): 8.0, ((12,), (16,)): 16.0, ((12,), (17,)): 21.0, ((13,), (15,)): 7.0, ((13,), (16,)): 20.0, ((13,), (17,)): 25.0, ((14,), (16,)): 8.0, ((14,), (17,)): 8.0}
I'm trying to sort the dictionary based on the recommendation here:
sorted_items = [i[0] for i in sorted(sorted_items.items(),
key=lambda kv: (kv, -kv[0][0], -kv[1][0]),
reverse=True)]
This is the error I get:
TypeError: bad operand type for unary -: 'tuple'
How do i address this?

Python: Linregress slope and y-intercept

I'm working on a program that can calculate the slope using the linregress native scipyy function, but I'm getting two errors (depending on how I try to fix it). The two lists should be two-dimensional, basically x and y values.
from __future__ import division
from scipy.stats import linregress
import matplotlib.pyplot as mplot
import numpy as np
xs=[[20.0, 80.0, 45.0, 42.0, 93.0, 98.0, 65.0, 43.0, 72.0, 36.0, 9.0, 60.0, 47.0, 84.0, 31.0, 46.0, 57.0, 76.0, 27.0, 85.0, 0.0, 39.0, 2.0, 56.0, 68.0, 6.0, 41.0, 28.0, 61.0, 12.0, 32.0, 1.0, 54.0, 77.0, 18.0, 86.0, 62.0, 23.0, 30.0, 69.0, 4.0, 71.0, 64.0, 92.0, 24.0, 79.0, 8.0, 35.0, 49.0, 53.0, 7.0, 59.0, 70.0, 37.0, 13.0, 15.0, 73.0, 89.0, 96.0, 83.0, 22.0, 95.0, 19.0, 67.0, 5.0, 88.0, 38.0, 50.0, 55.0, 52.0, 81.0, 58.0, 11.0, 51.0, 99.0, 78.0, 25.0, 33.0, 40.0, 75.0, 3.0, 91.0, 48.0, 90.0, 82.0, 26.0, 10.0, 16.0, 21.0, 66.0, 14.0, 87.0, 74.0, 97.0, 94.0, 44.0, 29.0, 17.0, 63.0, 34.0], [87.0, 17.0, 69.0, 72.0, 76.0, 62.0, 20.0, 77.0, 5.0, 49.0, 81.0, 3.0, 24.0, 36.0, 44.0, 91.0, 99.0, 35.0, 43.0, 50.0, 12.0, 54.0, 46.0, 30.0, 37.0, 45.0, 90.0, 85.0, 70.0, 83.0, 38.0, 22.0, 23.0, 0.0, 60.0, 47.0, 26.0, 1.0, 95.0, 73.0, 65.0, 94.0, 84.0, 8.0, 34.0, 56.0, 66.0, 13.0, 75.0, 52.0, 19.0, 55.0, 67.0, 39.0, 21.0, 80.0, 98.0, 33.0, 11.0, 68.0, 40.0, 32.0, 2.0, 79.0, 82.0, 93.0, 96.0, 88.0, 14.0, 92.0, 41.0, 89.0, 28.0, 29.0, 42.0, 6.0, 86.0, 74.0, 58.0, 16.0, 31.0, 64.0, 15.0, 53.0, 25.0, 59.0, 61.0, 78.0, 51.0, 7.0, 57.0, 9.0, 97.0, 63.0, 48.0, 71.0, 18.0, 10.0, 4.0, 27.0]]
ys=[[155.506, 50.592, 104.447, 111.318, 36.148, 36.87, 74.266, 106.413, 58.341, 122.563, 180.555, 85.202, 96.84, 50.726, 126.56, 100.686, 88.303, 54.797, 138.487, 44.946, 200.9, 116.524, 193.652, 82.8, 65.823, 184.436, 113.738, 133.458, 83.765, 167.408, 129.491, 200.469, 89.238, 51.799, 159.217, 49.382, 78.443, 146.051, 129.045, 63.805, 185.564, 65.614, 74.243, 43.408, 140.863, 53.446, 182.767, 127.373, 94.494, 91.079, 187.194, 81.254, 68.702, 121.368, 164.756, 169.696, 59.483, 45.978, 33.057, 47.12, 154.755, 33.872, 160.754, 70.256, 190.393, 38.398, 113.188, 100.493, 84.511, 88.635, 49.353, 81.821, 178.876, 95.307, 32.2, 54.715, 141.389, 132.337, 109.673, 57.611, 189.251, 39.283, 97.31, 41.173, 47.529, 140.03, 173.058, 160.288, 154.773, 67.903, 164.718, 42.032, 60.739, 28.656, 34.302, 107.022, 137.344, 160.195, 73.636, 123.797], [14.138, 100.87, 30.287, 28.675, 21.826, 42.445, 97.938, 29.574, 125.976, 59.404, 26.609, 125.743, 95.329, 75.467, 59.497, 15.342, 9.834, 77.402, 65.019, 54.468, 112.64, 45.466, 55.197, 79.992, 71.146, 55.39, 14.795, 15.971, 28.535, 25.862, 73.239, 92.455, 87.635, 137.6, 38.59, 53.718, 86.26, 130.567, 11.274, 33.867, 40.035, 11.07, 16.109, 114.732, 76.552, 45.85, 31.827, 110.877, 26.292, 55.738, 101.801, 48.601, 33.632, 66.647, 98.39, 23.904, 11.172, 78.215, 109.417, 31.653, 68.368, 79.593, 124.548, 21.513, 19.828, 13.48, 9.993, 22.043, 108.229, 16.904, 66.704, 12.262, 79.947, 85.012, 66.754, 124.114, 17.548, 25.872, 45.392, 101.775, 78.085, 36.358, 101.795, 52.045, 87.637, 42.784, 37.011, 26.036, 50.146, 119.666, 42.514, 113.313, 9.125, 42.394, 51.954, 26.898, 96.678, 112.108, 125.252, 86.296]]
slope, intercept, r_value, std_err = linregress(xs,ys)
print(slope)
My error is:
in linregress
ssxm, ssxym, ssyxm, ssym = np.cov(x, y, bias=1).flat
ValueError: too many values to unpack (expected 4)
I've tried changing my code to something like this:
slope, intercept, r_value, std_err = linregress(xs[:,0], ys[:,0])
But then my error becomes a TypeError:
TypeError: list indices must be integers or slices, not tuple
Does anyone have any suggestions? Perhaps there's something I don't understand about the use of the linregress function. I'm sure my first error has to do with my lists being 2D. For the second error, I'm lost.
You have two problems:
When interpreted as arrays, your variables xs and ys are two-dimensional with shape (2, 100). When linregress is given both arguments x and y, it expects them to be one-dimensional arrays.
As you can see in the "Returns" section of the docstring, linregress returns five values, not four.
You'll have to call linregress twice, and handle the five return values. For example,
In [144]: slope, intercept, rvalue, pvalue, stderr = linregress(xs[0], ys[0])
In [145]: slope, intercept, rvalue
Out[145]: (-1.7059670627062702, 187.5658196039604, -0.9912859597363385)
In [146]: slope, intercept, rvalue, pvalue, stderr = linregress(xs[1], ys[1])
In [147]: slope, intercept, rvalue
Out[147]: (-1.2455432103210327, 121.51968891089112, -0.9871123119133126)

How to take value from one cell and add to list over multiple excel files

I'm trying to select the same cell from multiple excel files and add them to a list, but I keep getting double of the same number. How do I solve this?
I'm using xlrd, os, and numpy libraries to do this.
for root, dirs, files in os.walk("/Users/Isaac/Experiment"):
xlsfiles = [_ for _ in files if _.endswith('xlsx')]
my_matrix = []
my_matrix_2 = []
for xlsfile in xlsfiles:
workbook = xlrd.open_workbook(os.path.join(root,xlsfile))
worksheet = workbook.sheet_by_index(0)
for col in range(worksheet.ncols):
my_matrix_2.append(worksheet.cell_value(4,1))
print my_matrix_2
What I get as as a result is
[4.0, 4.0, 40.0, 40.0, 44.0, 44.0, 48.0, 48.0, 52.0, 52.0, 56.0, 56.0, 60.0, 60.0, 64.0, 64.0, 68.0, 68.0, 72.0, 72.0, 76.0, 76.0, 8.0, 8.0, 80.0, 80.0, 84.0, 84.0, 88.0, 88.0, 92.0, 92.0, 96.0, 96.0, 100.0, 100.0, 12.0, 12.0, 16.0, 16.0, 20.0, 20.0, 24.0, 24.0, 28.0, 28.0, 32.0, 32.0, 36.0, 36.0]

Categories

Resources