I have a Python dictionary containing for each variable a tuple with an array of points in time and an array of numbers (1/0) representing the Boolean values that the variable holds at a certain point in time. For example:
dictionary["a"] = ([0,1,3], [1,1,0])
means that the variable "a" is true at both point in time 0 and 1, at point in time 2 "a" holds an arbitrary value and at point in time 3 it is false.
I would like to generate a plot using matplotlib.pyplot that will look somehow like this:
I already tried something like:
import matplotlib.pyplot as plt
plt.figure(1)
graphcount = 1
for x in dictionary:
plt.subplot(len(dictionary), 1, graphcount)
plt.step(dictionary[x][0], dictionary[x][1])
plt.xlabel("time")
plt.ylabel(x)
graphcount += 1
plt.show()
but it does not give me the right results. For example, if dictionary["a"] = ([2], [1]) no line is shown at all. Can someone please point me in the right direction on how to do this? Thank you!
According to your description the line should start at the first point and end at the last point. If the first and last points are the same then your line will be made of only one point. In order to see a line with only one point you need to use a visible marker.
Regarding the location of the jumps, the docstring says:
where: [ ‘pre’ | ‘post’ | ‘mid’ ]
If ‘pre’ (the default), the interval from x[i] to x[i+1] has level y[i+1].
If ‘post’, that interval has level y[i].
If ‘mid’, the jumps in y occur half-way between the x-values.
So I guess you want 'mid'.
dictionary = {}
dictionary['a'] = ([0,1,3], [1,1,0])
dictionary['b'] = ([2], [1])
plt.figure(1)
graphcount = 1
for x in dictionary:
plt.subplot(len(dictionary), 1, graphcount)
plt.step(dictionary[x][0], dictionary[x][1], 'o-', where='mid')
plt.xlabel("time")
plt.ylabel(x)
graphcount += 1
plt.show()
Related
So I'm comparing NBA betting lines between different sportsbooks over time
Procedure:
Open pickle file of scraped data
Plot the scraped data
The pickle file is a dictionary of NBA betting lines over time. Each of the two teams are their own nested dictionary. Each key in these team-specific dictionaries represents a different sportsbook. The values for these sportsbook keys are lists of tuples, representing timeseries data. It looks roughly like this:
dicto = {
'Time': <time that the game starts>,
'Team1': {
Market1: [ (time1, value1), (time2, value2), etc...],
Market2: [ (time1, value1), (time2, value2), etc...],
etc...
}
'Team2': {
<SAME FORM AS TEAM1>
}
}
There are no issues with scraping or manipulating this data. The issue comes when I plot it. Here is the code for the script that unpickles and plots these dictionaries:
import matplotlib.pyplot as plt
import pickle, datetime, os, time, re
IMAGEPATH = 'Images'
reg = re.compile(r'[A-Z]+#[A-Z]+[0-9|-]+')
noDate = re.compile(r'[A-Z]+#[A-Z]+')
# Turn 1 into '01'
def zeroPad(num):
if num < 10:
return '0' + str(num)
else:
return num
# Turn list of time-series tuples into an x list and y list
def unzip(lst):
x = []
y = []
for i in lst:
x.append(f'{i[0].hour}:{zeroPad(i[0].minute)}')
y.append(i[1])
return x, y
# Make exactly 5, evenly spaced xticks
def prune(xticks):
last = len(xticks)
first = 0
mid = int(len(xticks) / 2) - 1
upMid = int( mid + (last - mid) / 2)
downMid = int( (mid - first) / 2)
out = []
count = 0
for i in xticks:
if count in [last, first, mid, upMid, downMid]:
out.append(i)
else:
out.append('')
count += 1
return out
def plot(filename, choice):
IMAGEPATH = 'Images'
IMAGEPATH = os.path.join(IMAGEPATH, choice)
with open(filename, 'rb') as pik:
dicto = pickle.load(pik)
fig, axs = plt.subplots(2)
gameID = noDate.search(filename).group(0)
tm = dicto['Time']
fig.suptitle(gameID + '\n' + str(tm))
i = 0
for team in dicto.keys():
axs[i].set_title(team)
if team == 'Time':
continue
for market in dicto[team].keys():
lst = dicto[team][market]
x, y = unzip(lst)
axs[i].plot(x, y, label= market)
axs[i].set_xticks(prune(x))
axs[i].set_xticklabels(rotation=45, labels = x)
i += 1
plt.tight_layout()
#Finish
outputFile = reg.search(filename).group(0)
date = (datetime.datetime.today() - datetime.timedelta(hours = 6)).date()
fig.savefig(os.path.join(IMAGEPATH, str(date), f'{outputFile}.png'))
plt.close()
Here is the image that results from calling the plot function on one of the dictionaries that I described above. It is pretty much exactly as I intended it, except for one very strange and bothersome problem.
You will notice that the bottom right tick looks haunted, demonic, jpeggy, whatever you want to call it. I am highly suspicious that this problem occurs in the prune function, which I use to set the xtick values of the plot.
The reason that I prune the values with a function like this is because these dictionaries are continuously updated, so setting a static number of xticks would not work. And if I don't prune the xticks, they end up becoming unreadable due to overlapping one another.
I am quite confused as to what could cause an xtick to look like this. It happens consistently, for every dictionary, every time. Before I added the prune function (when the xticks unbound, overlapping one another), this issue did not occur. So when I say I'm suspicious that the prune function is the cause, I am really quite certain.
I will be happy to share an instance of one of these dictionaries, but they are saved as .pickle files, and I'm pretty sure it's bad practice to share pickle files over the internet. I have been warned about potential malware, so I'll just stay away from that. But if you need to see the dictionary, I can take the time to prettily print one and share a screenshot. Any help is greatly appreciated!
Matplotlib does this when there are many xticks or yticks which are plotted on the same value. It is normal. If you can limit the number of times the specific value is plotted - you can make it appear indistinguishable from the rest of the xticks.
Plot a simple example to test this out and you will see for yourself.
As part of parsing a PDB file, I've extracted a set of coordinates (x, y, z) for particular atoms that I want to exist as floats. However, I also need to know how many sets of coordinates I have extracted.
Below is my code through the coordinate extraction, and what I thought would produce the count of how many sets of three coordinates I've extracted.
When using len(coordinates), I unfortunately get back that each set of coordinates contains 3 tuples (the x, y, and z coordinates.
Any insight into how to properly count the number of sets would be helpful. I'm quite new to Python and am still in the stage of being unsure about if I am even asking this correctly!
from sys import argv
with open(argv[1]) as pbd:
print()
for line in pbd:
if line[:4] == 'ATOM':
atom_type = line[13:16]
if atom_type == "CA" or "N" or "C":
x = float(line[31:38])
y = float(line[39:46])
z = float(line[47:54])
coordinates = (x, y, z)
# printing (coordinates) gives
# (36.886, 53.177, 21.887)
# (38.323, 52.817, 21.996)
# (38.493, 51.553, 22.83)
# (37.73, 51.314, 23.77)
print(len(coordinates))
# printing len(coordinates)) gives
# 3
# 3
# 3
# 3
Thank you for any insight!
If you want to count the number of specific atoms in your file, try this one
from sys import argv
with open(argv[1]) as pbd:
print()
atomCount = 0
for line in pbd:
if line[:4] == 'ATOM':
atom_type = line[13:16]
if atom_type == "CA" or "N" or "C":
atomCount += 1
print(atomCount)
What it does is basically, you traverse your whole pbd file and check the type of each atom(seems fourth column in your data). Each time you encounter your desired atom types you increase a counter variable by 1.
Your coordinates variable is a tuple, tuples are ordered and unchangeable. Use lists is better.
coordinates=[]
for ....:
coordinates.append([x,y,z])
len(coordinates) # should be 4 I guess.
I'm trying to plot a demand profile for heating energy for a specific building with Python and matplotlib.
But instead of being a single line it looks like this:
Did anyone ever had plotting results like this?
Or does anyone have an idea whats going on here?
The corresponding code fragment is:
for b in list_of_buildings:
print(b.label, b.Q_Heiz_a, b.Q_Heiz_TT, len(b.lp.heating_list))
heating_datalist=[]
for d in range(timesteps):
b.lp.heating_list[d] = b.lp.heating_list[d]*b.Q_Heiz_TT
heating_datalist.append((d, b.lp.heating_list[d]))
xs_heat = [x[0] for x in heating_datalist]
ys_heat = [x[1] for x in heating_datalist]
pyplot.plot(xs_heat, ys_heat, lw=0.5)
pyplot.title(TT)
#get legend entries from list_of_buildings
list_of_entries = []
for b in list_of_buildings:
list_of_entries.append(b.label)
pyplot.legend(list_of_entries)
pyplot.xlabel("[min]")
pyplot.ylabel("[kWh]")
Additional info:
timesteps is a list like [0.00, 0.01, 0.02, ... , 23.59] - the minutes of the day (24*60 values)
b.lp.heating_list is a list containing some float values
b.Q_Heiz_TT is a constant
Based on your information, I have created a minimal example that should reproduce your problem (if not, you may have not explained the problem/parameters in sufficient detail). I'd urge you to create such an example yourself next time, as your question is likely to get ignored without it. The example looks like this:
import numpy as np
import matplotlib.pyplot as plt
N = 24*60
Q_Heiz_TT = 0.5
lp_heating_list = np.random.rand(N)
lp_heating_list = lp_heating_list*Q_Heiz_TT
heating_datalist = []
for d in range(N):
heating_datalist.append((d, lp_heating_list[d]))
xs_heat = [x[0] for x in heating_datalist]
ys_heat = [x[1] for x in heating_datalist]
plt.plot(xs_heat, ys_heat)
plt.show()
What is going in in here? For each d in range(N) (with N = 24*60, i.e. each minute of the day), you plot all values up to and including lp_heating_list[d] versus d. This is because heating_datalist, is appended with the current value of d and corresponding value in lp_heating_list. What you get is 24x60=1440 lines that partially overlap one another. Depending on how your backend is handling things, it may be very slow and start to look messy.
A much better approach would be to simply use
plt.plot(range(timesteps), lp_heating_list)
plt.show()
Which plots only one line, instead of 1440 of them.
I suspect there is an indentation problem in your code.
Try this:
heating_datalist=[]
for d in range(timesteps):
b.lp.heating_list[d] = b.lp.heating_list[d]*b.Q_Heiz_TT
heating_datalist.append((d, b.lp.heating_list[d]))
xs_heat = [x[0] for x in heating_datalist] # <<<<<<<<
ys_heat = [x[1] for x in heating_datalist] # <<<<<<<<
pyplot.plot(xs_heat, ys_heat, lw=0.5) # <<<<<<<<
That way you'll plot only one line per building, which is probably what you want.
Besides, you can use zip to generate x values and y values like this:
xs_heat, ys_heat = zip(*heating_datalist)
This works because zip is it's own inverse!
I am using python. I made this numpy.float64 and this shows the Chicago Cubs' win times by decades.
yr1874to1880 = np.mean(wonArray[137:143])
yr1881to1890 = np.mean(wonArray[127:136])
yr1891to1900 = np.mean(wonArray[117:126])
yr1901to1910 = np.mean(wonArray[107:116])
yr1911to1920 = np.mean(wonArray[97:106])
yr1921to1930 = np.mean(wonArray[87:96])
yr1931to1940 = np.mean(wonArray[77:86])
yr1941to1950 = np.mean(wonArray[67:76])
yr1951to1960 = np.mean(wonArray[57:66])
yr1961to1970 = np.mean(wonArray[47:56])
yr1971to1980 = np.mean(wonArray[37:46])
yr1981to1990 = np.mean(wonArray[27:36])
yr1991to2000 = np.mean(wonArray[17:26])
yr2001to2010 = np.mean(wonArray[7:16])
yr2011to2016 = np.mean(wonArray[0:6])
I want to put them together but I don't know how to. I tried for the list but it did not work. Does anyone know how to put them together in order to put them in the graph? I want to make a scatter graph with matplotlib. Thank you.
So with what you've shown, each variable you're setting becomes a float value. You can make them into a list by declaring:
list_of_values = [yr1874to1880, yr1881to1890, ...]
Adding all of the declared values to this results in a list of floats. For example, with just the two values above added:
>>>print list_of_values
[139.5, 131.0]
So that should explain how to obtain a list with the data from np.mean(). However, I'm guessing another question being asked is "how do I scatter plot this?" Using what is provided here, we have one axis of data, but to plot we need another (can't have a graph without x and y). Decide what the average wins is going to be compared against, and then that can be iterated over. For example, I'll use a simple integer in "decade" to act as the x axis:
import matplotlib.pyplot as plt
decade = 1
for i in list_of_values:
y = i
x = decade
decade += 1
plt.scatter(x, y)
plt.show()
I have a simple, stupid Python problem. Given a graph, I'm trying to sample from a random variable whose distribution is the same as that of the degree distribution of the graph.
This seems like it should pretty straightforward. Yet somehow I am still managing to mess this up. My code looks like this:
import numpy as np
import scipy as sp
import graph_tool.all as gt
G = gt.random_graph(500, deg_sampler=lambda: np.random.poisson(1), directed=False)
deg = gt.vertex_hist(G,"total",float_count=False)
# Extract counts and values
count = list(deg[0])
value = list(deg[1])
# Generate vector of probabilities for each node
p = [float(x)/sum(count) for x in count]
# Load into a random variable for sampling
x = sp.stats.rv_discrete(values=(value,p))
print x.rvs(1)
However, upon running this it returns an error:
Traceback (most recent call last):
File "temp.py", line 16, in <module>
x = sp.stats.rv_discrete(values=(value,p))
File "/usr/lib/python2.7/dist-packages/scipy/stats/distributions.py", line 5637, in __init__
self.pk = take(ravel(self.pk),indx, 0)
File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 103, in take
return take(indices, axis, out, mode)
IndexError: index out of range for array
I'm not sure why this is. If in the code above I write instead:
x = sp.stats.rv_discrete(values=(range(len(count)),p))
Then the code runs fine, but it gives a weird result--clearly the way I've specified this distribution, a value of "0" ought to be most common. But this code gives "1" with high probability and never returns a "0," so something is getting shifted over somehow.
Can anyone clarify what is going on here? Any help would be greatly appreciated!
I believe the first argument for x.rvs() would be the loc arg. If you make loc=1 by calling x.rvs(1), you're adding 1 to all values.
Instead, you want
x.rvs(size=1)
As an aside, I'd recommend that you replace this:
# Extract counts and values
count = list(deg[0])
value = list(deg[1])
# Generate vector of probabilities for each node
p = [float(x)/sum(count) for x in count]
With:
count, value = deg # automatically unpacks along first axis
p = count.astype(float) / count.sum() # count is an array, so you can divide all elements at once