Weird plot with matplotlib

Weird plot with matplotlib - python

I'm trying to plot a demand profile for heating energy for a specific building with Python and matplotlib.
But instead of being a single line it looks like this:
Did anyone ever had plotting results like this?
Or does anyone have an idea whats going on here?
The corresponding code fragment is:
for b in list_of_buildings:
print(b.label, b.Q_Heiz_a, b.Q_Heiz_TT, len(b.lp.heating_list))
heating_datalist=[]
for d in range(timesteps):
b.lp.heating_list[d] = b.lp.heating_list[d]*b.Q_Heiz_TT
heating_datalist.append((d, b.lp.heating_list[d]))
xs_heat = [x[0] for x in heating_datalist]
ys_heat = [x[1] for x in heating_datalist]
pyplot.plot(xs_heat, ys_heat, lw=0.5)
pyplot.title(TT)
#get legend entries from list_of_buildings
list_of_entries = []
for b in list_of_buildings:
list_of_entries.append(b.label)
pyplot.legend(list_of_entries)
pyplot.xlabel("[min]")
pyplot.ylabel("[kWh]")
Additional info:
timesteps is a list like [0.00, 0.01, 0.02, ... , 23.59] - the minutes of the day (24*60 values)
b.lp.heating_list is a list containing some float values
b.Q_Heiz_TT is a constant

Based on your information, I have created a minimal example that should reproduce your problem (if not, you may have not explained the problem/parameters in sufficient detail). I'd urge you to create such an example yourself next time, as your question is likely to get ignored without it. The example looks like this:
import numpy as np
import matplotlib.pyplot as plt
N = 24*60
Q_Heiz_TT = 0.5
lp_heating_list = np.random.rand(N)
lp_heating_list = lp_heating_list*Q_Heiz_TT
heating_datalist = []
for d in range(N):
heating_datalist.append((d, lp_heating_list[d]))
xs_heat = [x[0] for x in heating_datalist]
ys_heat = [x[1] for x in heating_datalist]
plt.plot(xs_heat, ys_heat)
plt.show()
What is going in in here? For each d in range(N) (with N = 24*60, i.e. each minute of the day), you plot all values up to and including lp_heating_list[d] versus d. This is because heating_datalist, is appended with the current value of d and corresponding value in lp_heating_list. What you get is 24x60=1440 lines that partially overlap one another. Depending on how your backend is handling things, it may be very slow and start to look messy.
A much better approach would be to simply use
plt.plot(range(timesteps), lp_heating_list)
plt.show()
Which plots only one line, instead of 1440 of them.

I suspect there is an indentation problem in your code.
Try this:
heating_datalist=[]
for d in range(timesteps):
b.lp.heating_list[d] = b.lp.heating_list[d]*b.Q_Heiz_TT
heating_datalist.append((d, b.lp.heating_list[d]))
xs_heat = [x[0] for x in heating_datalist] # <<<<<<<<
ys_heat = [x[1] for x in heating_datalist] # <<<<<<<<
pyplot.plot(xs_heat, ys_heat, lw=0.5) # <<<<<<<<
That way you'll plot only one line per building, which is probably what you want.
Besides, you can use zip to generate x values and y values like this:
xs_heat, ys_heat = zip(*heating_datalist)
This works because zip is it's own inverse!

Related

How do reduce a set of columns along another set of columns, holding all other columns?

I think this is a simple operation, but for some reason I'm not finding immediate indicators in my quick perusal of the Pandas docs.
I have prototype working code below, but it seems kinda dumb IMO. I'm sure that there are much better ways to do this, and concepts to describe it.
Is there a better way? If not, at least better way to describe?
Abstract Problem
Basically, I have columns p0, p1, y0, y1, .... ... are just things I'd like held constant (remain as separate in table). p0, p1 are things I'd like to reduce against. y0, y1 are columns I'd like to be reduced.
DataFrame.grouby didn't seem like what I wanted. When perusing the code, I wasn't sure if anything else was I wanted. Multi-indexing also seemed like a possible context, but I didn't immediately see an example of what I desired.
Here's the code that does I what I want:
def merge_into(*, from_, to_):
for k, v in from_.items():
to_[k] = v
def reduce_along(df, along_cols, reduce_cols, df_reduce=pd.DataFrame.mean):
hold_cols = set(df.columns) - set(along_cols) - set(reduce_cols)
# dumb way to remember Dict[HeldValues, ValuesToReduce]
to_reduce_map = defaultdict(list)
for i in range(len(df)):
row = df.iloc[i]
# can I instead use a series? is that hashable?
key = tuple(row[hold_cols])
to_reduce = row[reduce_cols]
to_reduce_map[key].append(to_reduce)
rows = []
for key, to_reduce_list in to_reduce_map.items():
# ... yuck?
row = pd.Series({k: v for k, v in zip(hold_cols, key)})
reduced = df_reduce(pd.DataFrame(to_reduce_list))
merge_into(from_=reduced, to_=row)
rows.append(row)
return pd.DataFrame(rows)
reducto = reduce_along(summary, ["p0", "p1"], ["y0", "y1"])
display(reducto)
Background
I am running some sweeps for ML stuff; in it, I sweep on some model architecture param, as well as dataset size and the seed that controls random initialization of the model parameters.
I'd like to reduce along the seed to get a "feel" for what architectures are possibly more robust to initialization; for now, I'd like to see what dataset size helps the most. In the future, I'd like to do (heuristic) reduction along dataset size as well.

Actually, looks like DataFrame.groupby(hold_cols).agg({k: ["mean"] for k in reduce_cols}) is what I want. Source: https://jamesrledoux.com/code/group-by-aggregate-pandas
# See: https://stackoverflow.com/a/47699378/7829525
std = functools.partial(np.std)
def reduce_along(df, along_cols, reduce_cols, agg=[np.mean, std]):
hold_cols = list(set(df.columns) - set(along_cols) - set(reduce_cols))
hold_cols = [x for x in df.columns if x in hold_cols] # Preserve order
# From: https://jamesrledoux.com/code/group-by-aggregate-pandas
df = df.groupby(hold_cols).agg({k: ag for k in reduce_cols})
df = df.reset_index()
return df

Weird "demonic" xtick in matplotlib (jpeg artifacts? No way...)

So I'm comparing NBA betting lines between different sportsbooks over time
Procedure:
Open pickle file of scraped data
Plot the scraped data
The pickle file is a dictionary of NBA betting lines over time. Each of the two teams are their own nested dictionary. Each key in these team-specific dictionaries represents a different sportsbook. The values for these sportsbook keys are lists of tuples, representing timeseries data. It looks roughly like this:
dicto = {
'Time': <time that the game starts>,
'Team1': {
Market1: [ (time1, value1), (time2, value2), etc...],
Market2: [ (time1, value1), (time2, value2), etc...],
etc...
}
'Team2': {
<SAME FORM AS TEAM1>
}
}
There are no issues with scraping or manipulating this data. The issue comes when I plot it. Here is the code for the script that unpickles and plots these dictionaries:
import matplotlib.pyplot as plt
import pickle, datetime, os, time, re
IMAGEPATH = 'Images'
reg = re.compile(r'[A-Z]+#[A-Z]+[0-9|-]+')
noDate = re.compile(r'[A-Z]+#[A-Z]+')
# Turn 1 into '01'
def zeroPad(num):
if num < 10:
return '0' + str(num)
else:
return num
# Turn list of time-series tuples into an x list and y list
def unzip(lst):
x = []
y = []
for i in lst:
x.append(f'{i[0].hour}:{zeroPad(i[0].minute)}')
y.append(i[1])
return x, y
# Make exactly 5, evenly spaced xticks
def prune(xticks):
last = len(xticks)
first = 0
mid = int(len(xticks) / 2) - 1
upMid = int( mid + (last - mid) / 2)
downMid = int( (mid - first) / 2)
out = []
count = 0
for i in xticks:
if count in [last, first, mid, upMid, downMid]:
out.append(i)
else:
out.append('')
count += 1
return out
def plot(filename, choice):
IMAGEPATH = 'Images'
IMAGEPATH = os.path.join(IMAGEPATH, choice)
with open(filename, 'rb') as pik:
dicto = pickle.load(pik)
fig, axs = plt.subplots(2)
gameID = noDate.search(filename).group(0)
tm = dicto['Time']
fig.suptitle(gameID + '\n' + str(tm))
i = 0
for team in dicto.keys():
axs[i].set_title(team)
if team == 'Time':
continue
for market in dicto[team].keys():
lst = dicto[team][market]
x, y = unzip(lst)
axs[i].plot(x, y, label= market)
axs[i].set_xticks(prune(x))
axs[i].set_xticklabels(rotation=45, labels = x)
i += 1
plt.tight_layout()
#Finish
outputFile = reg.search(filename).group(0)
date = (datetime.datetime.today() - datetime.timedelta(hours = 6)).date()
fig.savefig(os.path.join(IMAGEPATH, str(date), f'{outputFile}.png'))
plt.close()
Here is the image that results from calling the plot function on one of the dictionaries that I described above. It is pretty much exactly as I intended it, except for one very strange and bothersome problem.
You will notice that the bottom right tick looks haunted, demonic, jpeggy, whatever you want to call it. I am highly suspicious that this problem occurs in the prune function, which I use to set the xtick values of the plot.
The reason that I prune the values with a function like this is because these dictionaries are continuously updated, so setting a static number of xticks would not work. And if I don't prune the xticks, they end up becoming unreadable due to overlapping one another.
I am quite confused as to what could cause an xtick to look like this. It happens consistently, for every dictionary, every time. Before I added the prune function (when the xticks unbound, overlapping one another), this issue did not occur. So when I say I'm suspicious that the prune function is the cause, I am really quite certain.
I will be happy to share an instance of one of these dictionaries, but they are saved as .pickle files, and I'm pretty sure it's bad practice to share pickle files over the internet. I have been warned about potential malware, so I'll just stay away from that. But if you need to see the dictionary, I can take the time to prettily print one and share a screenshot. Any help is greatly appreciated!

Matplotlib does this when there are many xticks or yticks which are plotted on the same value. It is normal. If you can limit the number of times the specific value is plotted - you can make it appear indistinguishable from the rest of the xticks.
Plot a simple example to test this out and you will see for yourself.

How do you make a list of numpy.float64?

I am using python. I made this numpy.float64 and this shows the Chicago Cubs' win times by decades.
yr1874to1880 = np.mean(wonArray[137:143])
yr1881to1890 = np.mean(wonArray[127:136])
yr1891to1900 = np.mean(wonArray[117:126])
yr1901to1910 = np.mean(wonArray[107:116])
yr1911to1920 = np.mean(wonArray[97:106])
yr1921to1930 = np.mean(wonArray[87:96])
yr1931to1940 = np.mean(wonArray[77:86])
yr1941to1950 = np.mean(wonArray[67:76])
yr1951to1960 = np.mean(wonArray[57:66])
yr1961to1970 = np.mean(wonArray[47:56])
yr1971to1980 = np.mean(wonArray[37:46])
yr1981to1990 = np.mean(wonArray[27:36])
yr1991to2000 = np.mean(wonArray[17:26])
yr2001to2010 = np.mean(wonArray[7:16])
yr2011to2016 = np.mean(wonArray[0:6])
I want to put them together but I don't know how to. I tried for the list but it did not work. Does anyone know how to put them together in order to put them in the graph? I want to make a scatter graph with matplotlib. Thank you.

So with what you've shown, each variable you're setting becomes a float value. You can make them into a list by declaring:
list_of_values = [yr1874to1880, yr1881to1890, ...]
Adding all of the declared values to this results in a list of floats. For example, with just the two values above added:
>>>print list_of_values
[139.5, 131.0]
So that should explain how to obtain a list with the data from np.mean(). However, I'm guessing another question being asked is "how do I scatter plot this?" Using what is provided here, we have one axis of data, but to plot we need another (can't have a graph without x and y). Decide what the average wins is going to be compared against, and then that can be iterated over. For example, I'll use a simple integer in "decade" to act as the x axis:
import matplotlib.pyplot as plt
decade = 1
for i in list_of_values:
y = i
x = decade
decade += 1
plt.scatter(x, y)
plt.show()

Python: Plot step function for true/false signals

I have a Python dictionary containing for each variable a tuple with an array of points in time and an array of numbers (1/0) representing the Boolean values that the variable holds at a certain point in time. For example:
dictionary["a"] = ([0,1,3], [1,1,0])
means that the variable "a" is true at both point in time 0 and 1, at point in time 2 "a" holds an arbitrary value and at point in time 3 it is false.
I would like to generate a plot using matplotlib.pyplot that will look somehow like this:
I already tried something like:
import matplotlib.pyplot as plt
plt.figure(1)
graphcount = 1
for x in dictionary:
plt.subplot(len(dictionary), 1, graphcount)
plt.step(dictionary[x][0], dictionary[x][1])
plt.xlabel("time")
plt.ylabel(x)
graphcount += 1
plt.show()
but it does not give me the right results. For example, if dictionary["a"] = ([2], [1]) no line is shown at all. Can someone please point me in the right direction on how to do this? Thank you!

According to your description the line should start at the first point and end at the last point. If the first and last points are the same then your line will be made of only one point. In order to see a line with only one point you need to use a visible marker.
Regarding the location of the jumps, the docstring says:
where: [ ‘pre’ | ‘post’ | ‘mid’ ]
If ‘pre’ (the default), the interval from x[i] to x[i+1] has level y[i+1].
If ‘post’, that interval has level y[i].
If ‘mid’, the jumps in y occur half-way between the x-values.
So I guess you want 'mid'.
dictionary = {}
dictionary['a'] = ([0,1,3], [1,1,0])
dictionary['b'] = ([2], [1])
plt.figure(1)
graphcount = 1
for x in dictionary:
plt.subplot(len(dictionary), 1, graphcount)
plt.step(dictionary[x][0], dictionary[x][1], 'o-', where='mid')
plt.xlabel("time")
plt.ylabel(x)
graphcount += 1
plt.show()

Python Matplotlib How to plot a line chart in weekly intervale

I am working in a project where I need to plot data and into a line chart. The problem is that i don't have the X values i have just the Y values
here is the list of the value that i want to plot :
testlist =['278264', '322823', '287298', '295212', '299174', '277271', '352717', '583802', '1167864', '1622965', '1759879', '1779014', '174791']
the result that i am loking for is something like this screenshot
but with my code i am geting this result
i tried the following code but i am having an error :ValueError: x and y must have same first dimension
testlist =['278264', '322823', '287298', '295212', '299174', '277271', '352717', '583802', '1167864', '1622965', '1759879', '1779014', '174791']
last = len(testlist)
for i in range (0,last):
intValue= int(testlist[i])
testlist[i]=intValue
x = [1,7,13,19,25,31]
y = testlist
plt.plot(x,y)
any Idea on how can i solve this issue ?
thank you

Obviously you got an error because you have 13 y values and 7 x values.
Usually points are (x, y) right? So you need to have len(x) == len(y).
Where did you got the screenshot you want? Can you retrieve its data? As you said, you don't have the x axis, but you have to get it, otherwise the x axis will be arbitrary, make sense isn't it?
Regards,
Paul

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Weird plot with matplotlib - python

Related

How do reduce a set of columns along another set of columns, holding all other columns?

Weird "demonic" xtick in matplotlib (jpeg artifacts? No way...)

How do you make a list of numpy.float64?

Python: Plot step function for true/false signals

Python Matplotlib How to plot a line chart in weekly intervale

Categories

Resources