I am using python. I made this numpy.float64 and this shows the Chicago Cubs' win times by decades.
yr1874to1880 = np.mean(wonArray[137:143])
yr1881to1890 = np.mean(wonArray[127:136])
yr1891to1900 = np.mean(wonArray[117:126])
yr1901to1910 = np.mean(wonArray[107:116])
yr1911to1920 = np.mean(wonArray[97:106])
yr1921to1930 = np.mean(wonArray[87:96])
yr1931to1940 = np.mean(wonArray[77:86])
yr1941to1950 = np.mean(wonArray[67:76])
yr1951to1960 = np.mean(wonArray[57:66])
yr1961to1970 = np.mean(wonArray[47:56])
yr1971to1980 = np.mean(wonArray[37:46])
yr1981to1990 = np.mean(wonArray[27:36])
yr1991to2000 = np.mean(wonArray[17:26])
yr2001to2010 = np.mean(wonArray[7:16])
yr2011to2016 = np.mean(wonArray[0:6])
I want to put them together but I don't know how to. I tried for the list but it did not work. Does anyone know how to put them together in order to put them in the graph? I want to make a scatter graph with matplotlib. Thank you.
So with what you've shown, each variable you're setting becomes a float value. You can make them into a list by declaring:
list_of_values = [yr1874to1880, yr1881to1890, ...]
Adding all of the declared values to this results in a list of floats. For example, with just the two values above added:
>>>print list_of_values
[139.5, 131.0]
So that should explain how to obtain a list with the data from np.mean(). However, I'm guessing another question being asked is "how do I scatter plot this?" Using what is provided here, we have one axis of data, but to plot we need another (can't have a graph without x and y). Decide what the average wins is going to be compared against, and then that can be iterated over. For example, I'll use a simple integer in "decade" to act as the x axis:
import matplotlib.pyplot as plt
decade = 1
for i in list_of_values:
y = i
x = decade
decade += 1
plt.scatter(x, y)
plt.show()
Related
I am working on Ironpython in Revit application.
This is the code below I was trying in python. Help would be appreciated.
From the list of points, there is a first point and second point. I have created functions for them.
The script should check if the y coordinates are same and draw line if true.
Its not working and returning unexpected error - new line error.
`The inputs to this node will be stored as a list in the IN variables.`
points = IN[0]
`# Place your code below this line`
lines = []
def fp(x)
firstpoint = points[x]
return firstpoint
def sp(x)
secondpoint = points[x+1]
return secondpoint
x = 0
while x <= points.Count:
if (fp(x).Y == sp(x).Y) or (fp(x).Z == sp(x).Z):
setlines = Line.ByStartPointEndPoint(fp(x), sp(x))
lines.append(setlines)
x = x + 1
`# Assign your output to the OUT variable.`
OUT = lines
As #itprorh66 points out, there's really not enough info here to definitively answer your question, but one issue is you're incorrectly comparing what I assume are floats.
fp(x).Y == sp(x).Y
Instead of comparing for direct equality, you'll need to compare for equality within a tolerance. Here is some discussion on how to do that, What is the best way to compare floats for almost-equality in Python?
So I'm comparing NBA betting lines between different sportsbooks over time
Procedure:
Open pickle file of scraped data
Plot the scraped data
The pickle file is a dictionary of NBA betting lines over time. Each of the two teams are their own nested dictionary. Each key in these team-specific dictionaries represents a different sportsbook. The values for these sportsbook keys are lists of tuples, representing timeseries data. It looks roughly like this:
dicto = {
'Time': <time that the game starts>,
'Team1': {
Market1: [ (time1, value1), (time2, value2), etc...],
Market2: [ (time1, value1), (time2, value2), etc...],
etc...
}
'Team2': {
<SAME FORM AS TEAM1>
}
}
There are no issues with scraping or manipulating this data. The issue comes when I plot it. Here is the code for the script that unpickles and plots these dictionaries:
import matplotlib.pyplot as plt
import pickle, datetime, os, time, re
IMAGEPATH = 'Images'
reg = re.compile(r'[A-Z]+#[A-Z]+[0-9|-]+')
noDate = re.compile(r'[A-Z]+#[A-Z]+')
# Turn 1 into '01'
def zeroPad(num):
if num < 10:
return '0' + str(num)
else:
return num
# Turn list of time-series tuples into an x list and y list
def unzip(lst):
x = []
y = []
for i in lst:
x.append(f'{i[0].hour}:{zeroPad(i[0].minute)}')
y.append(i[1])
return x, y
# Make exactly 5, evenly spaced xticks
def prune(xticks):
last = len(xticks)
first = 0
mid = int(len(xticks) / 2) - 1
upMid = int( mid + (last - mid) / 2)
downMid = int( (mid - first) / 2)
out = []
count = 0
for i in xticks:
if count in [last, first, mid, upMid, downMid]:
out.append(i)
else:
out.append('')
count += 1
return out
def plot(filename, choice):
IMAGEPATH = 'Images'
IMAGEPATH = os.path.join(IMAGEPATH, choice)
with open(filename, 'rb') as pik:
dicto = pickle.load(pik)
fig, axs = plt.subplots(2)
gameID = noDate.search(filename).group(0)
tm = dicto['Time']
fig.suptitle(gameID + '\n' + str(tm))
i = 0
for team in dicto.keys():
axs[i].set_title(team)
if team == 'Time':
continue
for market in dicto[team].keys():
lst = dicto[team][market]
x, y = unzip(lst)
axs[i].plot(x, y, label= market)
axs[i].set_xticks(prune(x))
axs[i].set_xticklabels(rotation=45, labels = x)
i += 1
plt.tight_layout()
#Finish
outputFile = reg.search(filename).group(0)
date = (datetime.datetime.today() - datetime.timedelta(hours = 6)).date()
fig.savefig(os.path.join(IMAGEPATH, str(date), f'{outputFile}.png'))
plt.close()
Here is the image that results from calling the plot function on one of the dictionaries that I described above. It is pretty much exactly as I intended it, except for one very strange and bothersome problem.
You will notice that the bottom right tick looks haunted, demonic, jpeggy, whatever you want to call it. I am highly suspicious that this problem occurs in the prune function, which I use to set the xtick values of the plot.
The reason that I prune the values with a function like this is because these dictionaries are continuously updated, so setting a static number of xticks would not work. And if I don't prune the xticks, they end up becoming unreadable due to overlapping one another.
I am quite confused as to what could cause an xtick to look like this. It happens consistently, for every dictionary, every time. Before I added the prune function (when the xticks unbound, overlapping one another), this issue did not occur. So when I say I'm suspicious that the prune function is the cause, I am really quite certain.
I will be happy to share an instance of one of these dictionaries, but they are saved as .pickle files, and I'm pretty sure it's bad practice to share pickle files over the internet. I have been warned about potential malware, so I'll just stay away from that. But if you need to see the dictionary, I can take the time to prettily print one and share a screenshot. Any help is greatly appreciated!
Matplotlib does this when there are many xticks or yticks which are plotted on the same value. It is normal. If you can limit the number of times the specific value is plotted - you can make it appear indistinguishable from the rest of the xticks.
Plot a simple example to test this out and you will see for yourself.
I'm trying to plot a demand profile for heating energy for a specific building with Python and matplotlib.
But instead of being a single line it looks like this:
Did anyone ever had plotting results like this?
Or does anyone have an idea whats going on here?
The corresponding code fragment is:
for b in list_of_buildings:
print(b.label, b.Q_Heiz_a, b.Q_Heiz_TT, len(b.lp.heating_list))
heating_datalist=[]
for d in range(timesteps):
b.lp.heating_list[d] = b.lp.heating_list[d]*b.Q_Heiz_TT
heating_datalist.append((d, b.lp.heating_list[d]))
xs_heat = [x[0] for x in heating_datalist]
ys_heat = [x[1] for x in heating_datalist]
pyplot.plot(xs_heat, ys_heat, lw=0.5)
pyplot.title(TT)
#get legend entries from list_of_buildings
list_of_entries = []
for b in list_of_buildings:
list_of_entries.append(b.label)
pyplot.legend(list_of_entries)
pyplot.xlabel("[min]")
pyplot.ylabel("[kWh]")
Additional info:
timesteps is a list like [0.00, 0.01, 0.02, ... , 23.59] - the minutes of the day (24*60 values)
b.lp.heating_list is a list containing some float values
b.Q_Heiz_TT is a constant
Based on your information, I have created a minimal example that should reproduce your problem (if not, you may have not explained the problem/parameters in sufficient detail). I'd urge you to create such an example yourself next time, as your question is likely to get ignored without it. The example looks like this:
import numpy as np
import matplotlib.pyplot as plt
N = 24*60
Q_Heiz_TT = 0.5
lp_heating_list = np.random.rand(N)
lp_heating_list = lp_heating_list*Q_Heiz_TT
heating_datalist = []
for d in range(N):
heating_datalist.append((d, lp_heating_list[d]))
xs_heat = [x[0] for x in heating_datalist]
ys_heat = [x[1] for x in heating_datalist]
plt.plot(xs_heat, ys_heat)
plt.show()
What is going in in here? For each d in range(N) (with N = 24*60, i.e. each minute of the day), you plot all values up to and including lp_heating_list[d] versus d. This is because heating_datalist, is appended with the current value of d and corresponding value in lp_heating_list. What you get is 24x60=1440 lines that partially overlap one another. Depending on how your backend is handling things, it may be very slow and start to look messy.
A much better approach would be to simply use
plt.plot(range(timesteps), lp_heating_list)
plt.show()
Which plots only one line, instead of 1440 of them.
I suspect there is an indentation problem in your code.
Try this:
heating_datalist=[]
for d in range(timesteps):
b.lp.heating_list[d] = b.lp.heating_list[d]*b.Q_Heiz_TT
heating_datalist.append((d, b.lp.heating_list[d]))
xs_heat = [x[0] for x in heating_datalist] # <<<<<<<<
ys_heat = [x[1] for x in heating_datalist] # <<<<<<<<
pyplot.plot(xs_heat, ys_heat, lw=0.5) # <<<<<<<<
That way you'll plot only one line per building, which is probably what you want.
Besides, you can use zip to generate x values and y values like this:
xs_heat, ys_heat = zip(*heating_datalist)
This works because zip is it's own inverse!
I have a Python dictionary containing for each variable a tuple with an array of points in time and an array of numbers (1/0) representing the Boolean values that the variable holds at a certain point in time. For example:
dictionary["a"] = ([0,1,3], [1,1,0])
means that the variable "a" is true at both point in time 0 and 1, at point in time 2 "a" holds an arbitrary value and at point in time 3 it is false.
I would like to generate a plot using matplotlib.pyplot that will look somehow like this:
I already tried something like:
import matplotlib.pyplot as plt
plt.figure(1)
graphcount = 1
for x in dictionary:
plt.subplot(len(dictionary), 1, graphcount)
plt.step(dictionary[x][0], dictionary[x][1])
plt.xlabel("time")
plt.ylabel(x)
graphcount += 1
plt.show()
but it does not give me the right results. For example, if dictionary["a"] = ([2], [1]) no line is shown at all. Can someone please point me in the right direction on how to do this? Thank you!
According to your description the line should start at the first point and end at the last point. If the first and last points are the same then your line will be made of only one point. In order to see a line with only one point you need to use a visible marker.
Regarding the location of the jumps, the docstring says:
where: [ ‘pre’ | ‘post’ | ‘mid’ ]
If ‘pre’ (the default), the interval from x[i] to x[i+1] has level y[i+1].
If ‘post’, that interval has level y[i].
If ‘mid’, the jumps in y occur half-way between the x-values.
So I guess you want 'mid'.
dictionary = {}
dictionary['a'] = ([0,1,3], [1,1,0])
dictionary['b'] = ([2], [1])
plt.figure(1)
graphcount = 1
for x in dictionary:
plt.subplot(len(dictionary), 1, graphcount)
plt.step(dictionary[x][0], dictionary[x][1], 'o-', where='mid')
plt.xlabel("time")
plt.ylabel(x)
graphcount += 1
plt.show()
I have a polygon shapefile of the U.S. made up of individual states as their attribute values. In addition, I have arrays storing latitude and longitude values of point events that I am also interested in. Essentially, I would like to 'spatial join' the points and polygons (or perform a check to see which polygon [i.e., state] each point is in), then sum the number of points in each state to find out which state has the most number of 'events'.
I believe the pseudocode would be something like:
Read in US.shp
Read in lat/lon points of events
Loop through each state in the shapefile and find number of points in each state
print 'Here is a list of the number of points in each state: '
Any libraries or syntax would be greatly appreciated.
Based on what I can tell, the OGR library is what I need, but I am having trouble with the syntax:
dsPolygons = ogr.Open('US.shp')
polygonsLayer = dsPolygons.GetLayer()
#Iterating all the polygons
polygonFeature = polygonsLayer.GetNextFeature()
k=0
while polygonFeature:
k = k + 1
print "processing " + polygonFeature.GetField("STATE") + "-" + str(k) + " of " + str(polygonsLayer.GetFeatureCount())
geometry = polygonFeature.GetGeometryRef()
#Read in some points?
geomcol = ogr.Geometry(ogr.wkbGeometryCollection)
point = ogr.Geometry(ogr.wkbPoint)
point.AddPoint(-122.33,47.09)
point.AddPoint(-110.11,33.33)
#geomcol.AddGeometry(point)
print point.ExportToWkt()
print point
numCounts=0.0
while pointFeature:
if pointFeature.GetGeometryRef().Within(geometry):
numCounts = numCounts + 1
pointFeature = pointsLayer.GetNextFeature()
polygonFeature = polygonsLayer.GetNextFeature()
#Loop through to see how many events in each state
I like the question. I doubt I can give you the best answer, and definitely can't help with OGR, but FWIW I'll tell you what I'm doing right now.
I use GeoPandas, a geospatial extension of pandas. I recommend it — it's high-level and does a lot, giving you everything in Shapely and fiona for free. It is in active development by twitter/#kajord and others.
Here's a version of my working code. It assumes you have everything in shapefiles, but it's easy to generate a geopandas.GeoDataFrame from a list.
import geopandas as gpd
# Read the data.
polygons = gpd.GeoDataFrame.from_file('polygons.shp')
points = gpd.GeoDataFrame.from_file('points.shp')
# Make a copy because I'm going to drop points as I
# assign them to polys, to speed up subsequent search.
pts = points.copy()
# We're going to keep a list of how many points we find.
pts_in_polys = []
# Loop over polygons with index i.
for i, poly in polygons.iterrows():
# Keep a list of points in this poly
pts_in_this_poly = []
# Now loop over all points with index j.
for j, pt in pts.iterrows():
if poly.geometry.contains(pt.geometry):
# Then it's a hit! Add it to the list,
# and drop it so we have less hunting.
pts_in_this_poly.append(pt.geometry)
pts = pts.drop([j])
# We could do all sorts, like grab a property of the
# points, but let's just append the number of them.
pts_in_polys.append(len(pts_in_this_poly))
# Add the number of points for each poly to the dataframe.
polygons['number of points'] = gpd.GeoSeries(pts_in_polys)
The developer tells me that spatial joins are 'new in the dev version', so if you feel like poking around in there, I'd love to hear how that goes! The main problem with my code is that it's slow.
import geopandas as gpd
# Read the data.
polygons = gpd.GeoDataFrame.from_file('polygons.shp')
points = gpd.GeoDataFrame.from_file('points.shp')
# Spatial Joins
pointsInPolygon = gpd.sjoin(points, polygons, how="inner", op='intersects')
# Add a field with 1 as a constant value
pointsInPolygon['const']=1
# Group according to the column by which you want to aggregate data
pointsInPolygon.groupby(['statename']).sum()
**The column ['const'] will give you the count number of points in your multipolygons.**
#If you want to see others columns as well, just type something like this :
pointsInPolygon = pointsInPolygon.groupby('statename').agg({'columnA':'first', 'columnB':'first', 'const':'sum'}).reset_index()
[1]: https://geopandas.org/docs/user_guide/mergingdata.html#spatial-joins
[2]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html