How to plot a boxplot using aggregates in plotly? - python

I need to plot a series of boxplots, based on results of numerical air quality model. Since this is a significant amount of data, I trigger calculation of aggregates (min, max, quartiles, etc.) every time when new model results become ready and store them in PostgreSQL. For visualization purpose I load the aggregates into pandas and I plot them using dash. I am able to plot line plots of timeseries, however I would like to get something like this example, but also interactive.
As I went through plotly examples, it looks like it always require the raw data for ploting boxplots ( https://plot.ly/python/box-plots/#basic-box-plot ). I really enjoy the concept of presentation and logic separation. Is it possible to get a plotly box plot based on aggregated data?

You can provide your aggreate values to a Plotly boxplot in Python by providing it in the following format:
plotly.graph_objs.Box(y=[val_min,
val_lower_box,
val_lower_box,
val_median,
val_upper_box,
val_upper_box,
val_max])
e.g.
import plotly
plotly.offline.init_notebook_mode()
val_min = 1
val_lower_box = 2
val_median = 3
val_upper_box = 4.5
val_max = 6
box_plot = plotly.graph_objs.Box(y=[val_min,
val_lower_box,
val_lower_box,
val_median,
val_upper_box,
val_upper_box,
val_max])
plotly.offline.iplot([box_plot])
gives you

Related

Plot a graph in matplotlib with two different scales on one axis

I'm trying to plot a graph with time data on X-Axis. My data has daily information, but I want to create something that has two different date scales on X-Axis.
I want to start it from 2005 and it goes to 2014, but after 2014, I want that, the data continues by months of 2015. Is this possible to do? If so: how can I create this kind of plot?
Thanks.
I provided an image below:
Yes you can, just use the following pattern as I observed your X-axis values are already the same so it would just plot the other graph on the right
For a dataframe:
import numpy, matplotlib
data = numpy.array([45,63,83,91,101])
df1 = pd.DataFrame(data, index=pd.date_range('2005-10-09', periods=5, freq='W'), columns=['events'])
df2 = pd.DataFrame(numpy.arange(10,21,2), index=pd.date_range('2015-01-09', periods=6, freq='M'), columns=['events'])
matplotlib.pyplot.plot(df1.index, df1.events)
matplotlib.pyplot.plot(df2.index, df2.events)
matplotlib.pyplot.show()
You can change the parameters according to your convenience.

How to align bars with tick labels in plt or pandas histogram (when plotting multiple columns)

I have started using python for lots of data problems at work and the datasets are always slightly different. I'm trying to explore more efficient ways of plotting data using the inbuilt pandas function rather than individually writing out the code for each column and editing the formatting to get a nice result.
Background: I'm using Jupyter notebook and looking at histograms where the values are all unique integers.
Problem: I want the xtick labels to align with the centers of the histogram bars when plotting multiple columns of data with the one function e.g. df.hist() to get histograms of all columns at once.
Does anyone know if this is possible?
Or is it recommended to do each graph on its own vs. using the inbuilt function applied to all columns?
I can modify them individually following this post: Matplotlib xticks not lining up with histogram
which gives me what I would like but only for one graph and with some manual processing of the values.
Desired outcome example for one graph:
Basic example of data I have:
# Import libraries
import pandas as pd
import numpy as np
# create list of datapoints
data = [[170,30,210],
[170,50,200],
[180,50,210],
[165,35,180],
[170,30,190],
[170,70,190],
[170,50,190]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['height', 'width','weight'])
# print dataframe.
df
Code that displays the graphs in the problem statement
df.hist(figsize=(5,5))
plt.show()
Code that displays the graph for weight how I would like it to be for all
df.hist(column='weight',bins=[175,185,195,205,215])
plt.xticks([180,190,200,210])
plt.yticks([0,1,2,3,4,5])
plt.xlim([170, 220])
plt.show()
Any tips or help would be much appreciated!
Thanks
I hope this helps.You take the column and count the frequency of each label (value counts) then you specify sort_index in order to get the order by the label not by the frecuency, then you plot the bar plot.
data = [[170,30,210],
[170,50,200],
[180,50,210],
[165,35,180],
[170,30,190],
[170,70,190],
[170,50,190]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['height', 'width','weight'])
df.weight.value_counts().sort_index().plot(kind = 'bar')
plt.show()

Plot standard deviation from external datasource using seaborn

I am trying to visualize a lineplot via seaborn, in which I want to plot the average and standard deviation of a column. As I am using large files (with millions of rows) the plot takes a while to load.
To reduce computational time, I pre-computed the average of the columns and the corresponding standard deviation. Subsequently, I use this pre-computed data as input for the lineplot, instead the supplying the complete Pandas dataframe.
This is the code I currently use:
df = open_pickle("data/experiment")
sns.lineplot(x="rho", y="wait_time_mean", hue="c", style="service_type", data=df)
This will only show the average. I was wondering if it would be possible to manually supply values for the standard deviation to seaborn.
sns.lineplot returns the Axes object of the plot which then can be used to draw on it. Assuming your standard deviation is also in df you can adapt your code in the following way, which now uses the matplotlib-function fill_beetween:
df = open_pickle("data/experiment")
ax = sns.lineplot(x="rho", y="wait_time_mean", hue="c", style="service_type", data=df)
ax.fill_between(df["rho"], y1=df["wait_time_mean"] - df["wait_time_std"], y2=df["wait_time_mean"] + df["wait_time_std"], alpha=.5)

How to show the value of the error bars generated by seaborn?

I am trying to visualize some data using seaborns. I am using a catplot that is set to be a bar plot. I have it showing the error bars to be the standard deviation. I want to know what value it is using for the mean and standard deviation it is using in the visualization, however I do not know how to retrieve that information from the plot. How would I go about getting that information?
bar_graph = seaborn.catplot(x="x", y="y", hue="z", data=data, ci="sd", capsize=0.1, kind="bar")
Trying to get that data from the plot generated by seaborn would not be impossible, but would be very cumbersome, as seaborn does not return the artists that it creates and catplot() can generate a number of subplots, etc.
However, I expect you don't need to get the data from the plot, you can get them directly from the dataframe, can't you? This simple demonstration shows that the plot and the calculated values do match:
titanic = sns.load_dataset("titanic")
sns.catplot(x='sex',y='age',hue="class", data=titanic, ci="sd", capsize=0.1, kind="bar")
titanic.groupby(['sex','class'])['age'].describe()[['mean','std']]
mean std
sex class
female First 34.611765 13.612052
Second 28.722973 12.872702
Third 21.750000 12.729964
male First 41.281386 15.139570
Second 30.740707 14.793894
Third 26.507589 12.159514

Plotting data with matplot and python to graph

I'm currently trying to plot 7 days with varying small to large numbers.
The first set of data may look like this
dates = ['2018-09-20', '2018-09-21', '2018-09-22', '2018-09-23', '2018-09-24', '2018-09-25', '2018-09-26', '2018-09-27']
values = [107.660514, 107.550403, 107.435041, 107.435003, 107.574965, 107.449961, 107.650052, 107.649974]
vs another set of data may have the same dates, but the values may be much small incremental changes
dates = ['2018-09-20', '2018-09-21', '2018-09-22', '2018-09-23', '2018-09-24', '2018-09-25', '2018-09-26', '2018-09-27']
values = [0.849215, 0.849655, 0.849655, 0.851095, 0.850885, 0.850135, 0.851203, 0.851865]
When I use this
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
plt.plot_date(x=dates, y=values, fmt="r--")
plt.ylabel(c)
plt.grid(True)
plt.savefig('static/%s.png' % c)
The resulting image for the 1st set of values comes out as a dashed lined connecting the days to the dots. But the 2nd set of data makes a image of 7 parallel lines stacked on top of each other.
Should I be plotting this differently?
I assume you would like a comparison between two set of data you provided.
However, with such gap between both sets of data, it could be fairly unclear if you want to show both sets in a same plot.
You could use plt.subplots() to do that, and you'll probably get a plot like this
Or a better way is just showing two plots separately.. And you'll get a much clearer plot.
If you want to just show two plots, you can do something like this.

Categories

Resources