How to add custom line to BoxWhisker holoviews plot? - python

I need to add a horizontal line to the boxplot. Looked through holoviews manual and seems that HLine is supposed to be used in such case. Unfortunately I get an error:
ValueError: all the input arrays must have same number of dimensions
Example:
import numpy as np
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
groups = [chr(65+g) for g in np.random.randint(0, 3, 200)]
boxwhisker = hv.BoxWhisker(
(groups, np.random.randint(0, 5, 200), np.random.randn(200)),
['Group', 'Category'],
'Value'
).sort() * hv.HLine(1)
boxwhisker.opts(
opts.BoxWhisker(
box_color='white',
height=400,
show_legend=False,
whisker_color='gray',
width=600
),
opts.HLine(color='green', line_width=2)
)
layout = hv.Layout(boxwhisker)
hv.save(layout, 'boxplot.html')
Traceback:
File "/home/python3.6/site-packages/holoviews/plotting/renderer.py", line 545, in save
plot = self_or_cls.get_plot(obj)
File "/home/python3.6/site-packages/holoviews/plotting/bokeh/renderer.py", line 135, in get_plot
plot = super(BokehRenderer, self_or_cls).get_plot(obj, renderer, **kwargs)
File "/home/python3.6/site-packages/holoviews/plotting/renderer.py", line 207, in get_plot
plot.update(init_key)
File "/home/python3.6/site-packages/holoviews/plotting/plot.py", line 595, in update
return self.initialize_plot()
File "/home/python3.6/site-packages/holoviews/plotting/bokeh/plot.py", line 995, in initialize_plot
subplots = subplot.initialize_plot(ranges=ranges, plots=shared_plots)
File "/home/python3.6/site-packages/holoviews/plotting/bokeh/plot.py", line 1115, in initialize_plot
adjoined_plots.append(subplot.initialize_plot(ranges=ranges, plots=passed_plots))
File "/home/python3.6/site-packages/holoviews/plotting/bokeh/element.py", line 2058, in initialize_plot
self._update_ranges(element, ranges)
File "/home/python3.6/site-packages/holoviews/plotting/bokeh/element.py", line 747, in _update_ranges
xfactors, yfactors = self._get_factors(element, ranges)
File "/home/python3.6/site-packages/holoviews/plotting/bokeh/element.py", line 2031, in _get_factors
xfactors = np.concatenate(xfactors)
ValueError: all the input arrays must have same number of dimensions

Yes, HLine is the right way to do this, but unfortunately the support for categorical axes in HoloViews is currently limited and will not allow such an overlay. There's been some work on an unfinished alternative implementation for categorical axes that would fix this. In the meantime, I'd assume you could add a custom hook, but that would be awkward to figure out.
Note that hv.Layout(boxwhisker) won't work as you have it above, in any case; it would need to be hv.Layout([boxwhisker]) or just boxwhisker (as Layout takes a list, but here you don't even need a layout since you have only one item).

Related

plot for every criterial

I am trying to plot every serial number in Power Bi using Python. I want to do it with a for loop.
I tried this:
x = dataset['mrwSmpVWi']
c = dataset['c']
a = dataset['a']
b = dataset['b']
y = (c / (1 + (a) * np.exp(-b*(x))))
for number in dataset['Seriennummer']
plt.plot(x,y, linewidth = 4)
plt.title("TEST")
plt.xlabel('Wind in m/s')
plt.ylabel('Leistung in kWh')
plt.xlim(0,25)
plt.ylim(0,1900)
plt.show()
Do I need to define number or can I just say plot the graph for every serial number?
This is my Error:
Error Message:
Þŷτĥоň şĉŗιρт έŕгοґ.
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 33, in <module>
for number in dataset['Seriennummer']:
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Python27\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Seriennummer'
Your error tells you your dataset has no column called "Seriennummer". Make sure such a column actually exists in your database. See how to debug small programs
Also, you seem to plot the same thing in every plot. Not sure if this is what you want or you simplified it for your [mre], but just something to keep in mind. You usually want different things in each plot, so you'd be calculating x and y inside the loop.
for index, row in dataset.iterrows():
number = row['Seriennumber']
x = row['mrwSmpVWi']
c = row['c']
a = row['a']
b = row['b']
y = (c / (1 + (a) * np.exp(-b*(x))))
plt.plot(x, y, linewidth = 4)
# plt.whatever else...
You can open a new figure using plt.figure().
You can create a new subplot using plt.subplot()
You can save an existing figure using plt.savefig()
You can clear the current figure using plt.clf()
pyplot has hold = True by default, so plotting repeatedly without plt.show() will add more lines to the same plot.
for number in dataset['Seriennummer']:
# plt.whatever...
# REMEMBER, NO PLT.SHOW()
# show the plot AFTER you've plotted everything
plt.show()
So, to overwrite the same figure, but save it to png before creating the new one, you'd do this:
for number in dataset['Seriennummer']:
# plt.whatever...
plt.savefig(f"./plot{number}.png") # Saves as a png
plt.clf() # Clears figure
To create a new figure every time (so you don't need to overwrite the previous one), you'd do this:
for number in dataset['Seriennummer']:
plt.figure()
# plt.whatever...
plt.savefig(f"./plot{number}.png") # Saves as a png
To create a grid of subplots, you'd do this:
nrows = 3
ncols = math.ceil(len(dataset['Seriennummer']) / nrows)
for plotnum, number in enumerate(dataset['Seriennummer']):
plt.subplot(nrows, ncols, plotnum)
# plt.whatever...
# after loop is done, save
plt.savefig(f"./plot_all.png") # Saves as a png

Scatterplot of pandas DataFrame ends in KeyError: 0

After I updated pandas (0.23.4) and matplotlib (3.01) I get a strange error trying to do something like the following:
import pandas as pd
import matplotlib.pyplot as plt
clrdict = {1: "#a6cee3", 2: "#1f78b4", 3: "#b2df8a", 4: "#33a02c"}
df_full = pd.DataFrame({'x':[20,30,30,40],
'y':[25,20,30,25],
's':[100,200,300,400],
'l':[1,2,3,4]})
df_full['c'] = df_full['l'].replace(clrdict)
df_part = df_full[(df_full.x == 30)]
fig = plt.figure()
plt.scatter(x=df_full['x'],
y=df_full['y'],
s=df_full['s'],
c=df_full['c'])
plt.show()
fig = plt.figure()
plt.scatter(x=df_part['x'],
y=df_part['y'],
s=df_part['s'],
c=df_part['c'])
plt.show()
The scatterplot of the original DataFrame (df_full) is shown without problems. But the plot of the partially DataFrame raises the following error:
Traceback (most recent call last):
File "G:\data\project\test.py", line 27, in <module>
c=df_part['c'])
File "C:\Program Files\Python37\lib\site-packages\matplotlib\pyplot.py", line 2864, in scatter
is not None else {}), **kwargs)
File "C:\Program Files\Python37\lib\site-packages\matplotlib\__init__.py", line 1805, in inner
return func(ax, *args, **kwargs)
File "C:\Program Files\Python37\lib\site-packages\matplotlib\axes\_axes.py", line 4195, in scatter
isinstance(c[0], str))):
File "C:\Program Files\Python37\lib\site-packages\pandas\core\series.py", line 767, in __getitem__
result = self.index.get_value(self, key)
File "C:\Program Files\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3118, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 114, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
This is due to the color-option c=df_part['c']. When you leave it out – the problem doesn't occur. This hasn't happend before the updates, so maybe you're not able to reproduce this with lower versions of matplotlib or pandas (I have no idea which one causes it).
In my project the df_part = df_full[(df_full.x == i)] line is used within the update-function of a matplotlib.animation.FuncAnimation. The result is an animation over the values of x (which are timestamps in my project). So I need a way to part the DataFrame.
This is a bug which got fixed by https://github.com/matplotlib/matplotlib/pull/12673.
It should hopefully be available in the next bugfix release 3.0.2, which should be up within the next days.
In the meantime, you may use the numpy array from the pandas series, series.values.

Python Matplotlib Streamplot providing start points

I am trying to add start points to a streamline plot. I found an example code using start points here; at this link a different issue is discussed but the start_points argument works. From here I grabbed the streamline example code (images_contours_and_fields example code: streamplot_demo_features.py). I don't understand why I can define start points in one code and not the other. I get the following error when I try to define start points in the example code (streamplot_demo_features.py):
Traceback (most recent call last):
File "<ipython-input-79-981cad64cff6>", line 1, in <module>
runfile('C:/Users/Admin/.spyder/StreamlineExample.py', wdir='C:/Users/Admin/.spyder')
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/Admin/.spyder/StreamlineExample.py", line 28, in <module>
ax1.streamplot(X, Y, U, V,start_points=start_points)
File "C:\ProgramData\Anaconda2\lib\site-packages\matplotlib\__init__.py", line 1891, in inner
return func(ax, *args, **kwargs)
File "C:\ProgramData\Anaconda2\lib\site-packages\matplotlib\axes\_axes.py", line 4620, in streamplot
zorder=zorder)
File "C:\ProgramData\Anaconda2\lib\site-packages\matplotlib\streamplot.py", line 144, in streamplot
sp2[:, 0] += np.abs(x[0])
ValueError: non-broadcastable output operand with shape (1,) doesn't match the broadcast shape (100,)
I've notice there isn't much on the web in way of using start_points, so any additional information would be helpful.
The main difference between the example that successfully uses start_points and the example from the matplotlib page is that the first uses 1D arrays as x and y grid, whereas the official example uses 2D arrays.
Since the documentation explicitely states
x, y : 1d arrays, an evenly spaced grid.
we might stick to 1D arrays. It's unclear why the example contradicts the docsting, but we can simply ignore that.
Now, using 1D arrays as grid, start_points works as expected in that it takes a 2-column array (first column x-coords, second y-coords).
A complete example:
import numpy as np
import matplotlib.pyplot as plt
x,y = np.linspace(-3,3,100),np.linspace(-3,3,100)
X,Y = np.meshgrid(x,y)
U = -1 - X**2 + Y
V = 1 + X - Y**2
speed = np.sqrt(U*U + V*V)
start = [[0,0], [1,2]]
fig0, ax0 = plt.subplots()
strm = ax0.streamplot(x,y, U, V, color=(.75,.90,.93))
strmS = ax0.streamplot(x,y, U, V, start_points=start, color="crimson", linewidth=2)
plt.show()

Plotting data from csv using matplotlib.pyplot

I am trying to follow a tutorial on youtube, now in the tutorial they plot some standard text files using matplotlib.pyplot, I can achieve this easy enough, however I am now trying to perform the same thing using some csvs I have of real data.
The code I am using is import matplotlib.pyplot as plt
import csv
#import numpy as np
with open(r"Example RFI regression axis\Delta RFI.csv") as x, open(r"Example RFI regression axis\strikerate.csv") as y:
readx = csv.reader(x)
ready = csv.reader(y)
plt.plot(readx,ready)
plt.title ('Test graph')
plt.xlabel('x axis')
plt.ylabel('y axis')
plt.show()
The traceback I receive is long
Traceback (most recent call last):
File "C:\V4 code snippets\matplotlib_test.py", line 11, in <module>
plt.plot(readx,ready)
File "C:\Python27\lib\site-packages\matplotlib\pyplot.py", line 2832, in plot
ret = ax.plot(*args, **kwargs)
File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 3997, in plot
self.add_line(line)
File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 1507, in add_line
self._update_line_limits(line)
File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 1516, in _update_line_limits
path = line.get_path()
File "C:\Python27\lib\site-packages\matplotlib\lines.py", line 677, in get_path
self.recache()
File "C:\Python27\lib\site-packages\matplotlib\lines.py", line 401, in recache
x = np.asarray(xconv, np.float_)
File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 320, in asarray
return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number
Please advise what I need to do, I realise this is probably very easy to most seasoned coders. Kind regards SMNALLY
csv.reader() returns strings (technically, .next()method of reader object returns lists of strings). Without converting them to float or int, you won't be able to plt.plot() them.
To save the trouble of converting, I suggest using genfromtxt() from numpy. (http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html)
For example, there are two files:
data1.csv:
data1
2
3
4
3
6
6
4
and data2.csv:
data2
92
73
64
53
16
26
74
Both of them have one line of header. We can do:
import numpy as np
data1=np.genfromtxt('data1.csv', skip_header=1) #suppose it is in the current working directory
data2=np.genfromtxt('data2.csv', skip_header=1)
plt.plot(data1, data2,'o-')
and the result:

date2num , ValueError: ordinal must be >= 1

I'm using the matplotlib candlestick module which requires the time to be passed as a float day format . I`m using date2num to convert it, before :
This is my code :
import csv
import sys
import math
import numpy as np
import datetime
from optparse import OptionParser
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
import matplotlib.mlab as mlab
import matplotlib.dates as mdates
from matplotlib.finance import candlestick
from matplotlib.dates import date2num
datafile = 'historical_data/AUD_Q10_1D_500.csv'
print 'loading', datafile
r = mlab.csv2rec(datafile, delimiter=';')
quotes = [date2num(r['date']),r['open'],r['close'],r['max'],r['min']]
candlestick(ax, quotes, width=0.6)
plt.show()
( here is the csv file : http://db.tt/MIOqFA0 )
This is what the doc says :
candlestick(ax, quotes,
width=0.20000000000000001,
colorup='k', colordown='r', alpha=1.0)
quotes is a list of (time, open,
close, high, low, ...) tuples. As
long as the first 5 elements of the
tuples are these values, the tuple
can be as long as you want (eg it may
store volume).
time must be in float days format - see date2num
Here is the full error log :
Traceback (most recent call last):
File
"/usr/lib/python2.6/site-packages/matplotlib/backends/backend_qt4agg.py",
line 83, in paintEvent
FigureCanvasAgg.draw(self) File
"/usr/lib/python2.6/site-packages/matplotlib/backends/backend_agg.py",
line 394, in draw
self.figure.draw(self.renderer) File
"/usr/lib/python2.6/site-packages/matplotlib/artist.py",
line 55, in draw_wrapper draw(artist,
renderer, *args, **kwargs) File
"/usr/lib/python2.6/site-packages/matplotlib/figure.py",
line 798, in draw func(*args) File
"/usr/lib/python2.6/site-packages/matplotlib/artist.py",
line 55, in draw_wrapper draw(artist,
renderer, *args, **kwargs) File
"/usr/lib/python2.6/site-packages/matplotlib/axes.py", line 1946, in draw a.draw(renderer)
File
"/usr/lib/python2.6/site-packages/matplotlib/artist.py",
line 55, in draw_wrapper draw(artist,
renderer, *args, **kwargs) File
"/usr/lib/python2.6/site-packages/matplotlib/axis.py", line 971, in draw tick_tups = [ t for
t in self.iter_ticks()] File
"/usr/lib/python2.6/site-packages/matplotlib/axis.py", line 904, in iter_ticks majorLocs =
self.major.locator() File
"/usr/lib/python2.6/site-packages/matplotlib/dates.py",
line 743, in __call__ self.refresh()
File
"/usr/lib/python2.6/site-packages/matplotlib/dates.py",
line 752, in refresh dmin, dmax =
self.viewlim_to_dt() File
"/usr/lib/python2.6/site-packages/matplotlib/dates.py",
line 524, in viewlim_to_dt return
num2date(vmin, self.tz),
num2date(vmax, self.tz) File
"/usr/lib/python2.6/site-packages/matplotlib/dates.py",
line 289, in num2date if not
cbook.iterable(x): return
_from_ordinalf(x, tz) File "/usr/lib/python2.6/site-packages/matplotlib/dates.py",
line 203, in _from_ordinalf dt =
datetime.datetime.fromordinal(ix)
ValueError: ordinal must be >= 1
If I run a quick :
for x in r['date']:
print str(x) + "is :" + str(date2num(x))
it outputs something like :
2010-06-12is :733935.0
2010-07-12is :733965.0
2010-08-12is :733996.0
which sound ok to me :)
Read the docstring a bit more carefully :)
quotes is a list of (time, open, close, high, low, ...) tuples.
What's happening is that it expects each item of quotes to be a sequence of (time, open, close, high, low).
You're passing in 5 long arrays, it expects a long sequence of 5 items.
You just need to zip your input.
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
from matplotlib.finance import candlestick
from matplotlib.dates import date2num
datafile = 'Downloads/AUD_Q10_1D_500.csv'
r = mlab.csv2rec(datafile, delimiter=';')
quotes = zip(date2num(r['date']),r['open'],r['close'],r['max'],r['min'])
fig, ax = plt.subplots()
candlestick(ax, quotes, width=0.6)
plt.show()
Seems like you're passing it a float. And in the error message you provide (full message next time please!) it appears that matplotlib is simply delegating the conversion to datetime.datetime.fromordinal.
I don't have a Python 3 installation to test this with, but when I tried to convert a float to a datetime object using datetime.datetime.fromordinal in 2.6, I got a deprecation warning. Then I tried it on ideone and got this:
Traceback (most recent call last):
File "prog.py", line 2, in <module>
print(datetime.datetime.fromordinal(5.5))
TypeError: integer argument expected, got float
So perhaps it's choking on the float.
I think your problem is here:
r = mlab.csv2rec(datafile, delimiter=';')
You need to skip the first line of the csv, which means you need:
r = mlab.csv2rec(datafile, delimiter=';', skiprows=1)
Technically this is incorrect, Ubuntu has an older version of the library, and the OP's version has the two lines below, but it was my original answer
I would make sure you're using the most recent version of matplotlib.
So that I could reproduce this issue, I downloaded and installed the latest version and I noticed that the line number of the offending piece of code had been changed to 179. I also noticed that the value is cast to int immediately before fromordinal is called (this gives a lot of credence to senderle's answer).
(line 178-179 of most recent matplotlib in Ubuntu repository)
ix = int(x)
dt = datetime.datetime.fromordinal(ix)
If upgrading is not an option, then you should cast to an int first.

Categories

Resources