Adding multiple tables to plotly plot - python

Im trying my best to figure out a way of plotting two tables on the same plotly subplots but cant seem to get past this part. Searching has has not been fruitful.
fig = tools.make_subplots(rows = 2, cols=1, subplot_titles=('Continious
Features','Catagorical Features'))
cont_table = FF.create_table(cont_data_matrix, index= True,index_title='Feature')
cat_table = FF.create_table(cont_data_matrix, index= True,index_title='Feature')
cont_table['figure'].extend()
fig.append_trace(cont_table, 1,1)
fig.append_trace(cont_table, 2,1)
py.plot(fig)

I highly recommend using the cufflinks library which ties together pandas and plotly, giving pandas dataframes the iplot() command. https://plot.ly/ipython-notebooks/cufflinks/
import cufflinks as cf
import pandas as pd
table.iplot(kind='scatter', subplots=True, shape=(2,1), filename='cufflinks_test')
So now the trick is to get the table set up. You want the data for each trace to be in different columns, and the index to correspond to the X values. pivot_table() is extremely useful for getting your data into the right shape.
If you include some dummy data for cont_table and cat_table I could provide code to combine them so that the above code works.

Related

plot graphs horizontally when using df.groupby.plot.bar

I want to graph 3 plots horizontally side by side
Three graphs are generated using the code below:
df.groupby(df.col1, pd.cut[0,1,2]).col2.mean().plot.bar()
df1.groupby(df.col1, pd.cut[0,1,2]).col2.mean().plot.bar()
df2.groupby(df.col1, pd.cut[0,1,2]).col2.mean().plot.bar()
I'm not sure where to set axes in this case. Any help would be appreciated.
You may simply use pandas' barh function.
df.groupby(pd.cut(df.col1, [0,1,2]).col2.mean().plot.barh()
This is an example, using this approach to create a dataframe with random samples:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.groupby(pd.cut(df.A, [0,10,20,30,40,50,60,70,80,90,100])).A.mean().plot.barh()
This snippet outputs the following plot:

Plotly: How to update / redraw a plotly express figure with new data?

During debugging or computationally heavy loops, i would like to see how my data processing evolves (for example in a line plot or an image).
In matplotlib the code can redraw / update the figure with plt.cla() and then plt.draw() or plt.pause(0.001), so that i can follow the progress of my computation in real time or while debugging. How do I do that in plotly express (or plotly)?
So i think i essentially figured it out. The trick is to not use go.Figure() to create a figure, but go.FigureWidget() Which is optically the same thing, but behind the scenes it's not.
documentation
youtube video demonstration
Those FigureWidgets are exactly there to be updated as new data comes in. They stay dynamic, and later calls can modify them.
A FigureWidget can be made from a Figure:
figure = go.Figure(data=data, layout=layout)
f2 = go.FigureWidget(figure)
f2 #display the figure
This is practical, because it makes it possible to use the simplified plotly express interface to create a Figure and then use this to construct a FigureWidget out of it. Unfortunately plotly express does not seem to have it's own simplified FigureWidget module. So one needs to use the more complicated go.FigureWidget.
I'm not sure if an idential functionality exists for plotly. But you can at least build a figure, expand your data source, and then just replace the data of the figure without touching any other of the figure elements like this:
for i, col in enumerate(fig.data):
fig.data[i]['y'] = df[df.columns[i]]
fig.data[i]['x'] = df.index
It should not matter if your figure is a result of using plotly.express or go.Figure since both approaches will produce a figure structure that can be edited by the code snippet above. You can test this for yourself by setting the two following snippets up in two different cells in JupyterLab.
Code for cell 1
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
# code and plot setup
# settings
pd.options.plotting.backend = "plotly"
# sample dataframe of a wide format
np.random.seed(5); cols = list('abc')
X = np.random.randn(50,len(cols))
df=pd.DataFrame(X, columns=cols)
df.iloc[0]=0;df=df.cumsum()
# plotly figure
fig = df.plot(template = 'plotly_dark')
fig.show()
Code for cell 2
# create or retrieve new data
Y = np.random.randn(1,len(cols))
# organize new data in a df
df2 = pd.DataFrame(Y, columns = cols)
# add last row to df to new values
# this step can be skipped if your real world
# data is not a cumulative process like
# in this example
df2.iloc[-1] = df2.iloc[-1] + df.iloc[-1]
# append new data to existing df
df = df.append(df2, ignore_index=True)#.reset_index()
# replace old data in fig with new data
for i, col in enumerate(fig.data):
fig.data[i]['y'] = df[df.columns[i]]
fig.data[i]['x'] = df.index
fig.show()
Running the first cell will put together some data and build a figure like this:
Running the second cell will produce a new dataframe with only one row, append it to your original dataframe, replace the data in your existing figure, and show the figure again. You can run the second cell as many times as you like to redraw your figure with an expanding dataset. After 50 runs, your figure will look like this:

How to use two columns in x-axis

I'm using the below code to get Segment and Year in x-axis and Final_Sales in y-axis but it is throwing me an error.
CODE
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline
order = pd.read_excel("Sample.xls", sheet_name = "Orders")
order["Year"] = pd.DatetimeIndex(order["Order Date"]).year
result = order.groupby(["Year", "Segment"]).agg(Final_Sales=("Sales", sum)).reset_index()
bar = plt.bar(x = result["Segment","Year"], height = result["Final_Sales"])
ERROR
Can someone help me to correct my code to see the output as below.
Required Output
Try to add another pair of brackets - result[["Segment","Year"]],
What you tried to do is to retrieve column named - "Segment","Year",
But actually what are you trying to do is to retrieve a list of columns - ["Segment","Year"].
There are several problems with your code:
When using several columns to index a dataframe you want to pass a list of columns to [] (see the docs) as follows :
result[["Segment","Year"]]
From the figure you provide it looks like you want to use year as hue. matplotlib.barplot doesn't have a hue argument, you would have to build it manually as described here. Instead you can use seaborn library that you are already importing anyway (see https://seaborn.pydata.org/generated/seaborn.barplot.html):
sns.barplot(x = 'Segment', y = 'Final_Sales', hue = 'Year', data = result)

How to plot time series graph in jupyter?

I have tried to plot the data in order to achieve something like this:
But I could not and I just achieved this graph with plotly:
Here is the small sample of my data
Does anyone know how to achieve that graph?
Thanks in advance
You'll find a lot of good stuff on timeseries on plotly.ly/python. Still, I'd like to share some practical details that I find very useful:
organize your data in a pandas dataframe
set up a basic plotly structure using fig=go.Figure(go.Scatter())
Make your desired additions to that structure using fig.add_traces(go.Scatter())
Plot:
Code:
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# random data or other data sources
np.random.seed(123)
observations = 200
timestep = np.arange(0, observations/10, 0.1)
dates = pd.date_range('1/1/2020', periods=observations)
val1 = np.sin(timestep)
val2=val1+np.random.uniform(low=-1, high=1, size=observations)#.tolist()
# organize data in a pandas dataframe
df= pd.DataFrame({'Timestep':timestep, 'Date':dates,
'Value_1':val1,
'Value_2':val2})
# Main plotly figure structure
fig = go.Figure([go.Scatter(x=df['Date'], y=df['Value_2'],
marker_color='black',
opacity=0.6,
name='Value 1')])
# One of many possible additions
fig.add_traces([go.Scatter(x=df['Date'], y=df['Value_1'],
marker_color='blue',
name='Value 2')])
# plot figure
fig.show()

Pyplot Stacked histogram - amount of occurences in column

I'm trying to present datatable collected from firewall logs in a histogram so that i would have one bar for each date in the file, and the number of occurences in a certain column stacked in the bar.
I looked into several examples here but they all seemed to be based on the fact that i would know what values there are in the particular column - and what i'm trying to achieve here is the way to present histogram without needing to know all possible fields.
In the example i have used protocol as the column:
#!/usr/bin/python
import pandas as pd
import numpy as np
import glob
import matplotlib.pyplot as plt
csvs = glob.glob("*log-export.csv")
dfs = [pd.read_csv(csv, sep="\xff", engine="python") for csv in csvs]
df_merged = pd.concat(dfs).fillna("")
data = df_merged[['date', 'proto']]
np_data = np.array(data)
plt.hist(np_data, stacked=True)
plt.show()
But this shows following diagram:
histogram
and i would like to accomplish something like this:
stacked
Any suggestions how to achieve this?
Setup
I had to make up data because you didn't provide any.
df = pd.DataFrame(dict(
Date=pd.date_range(end=pd.to_datetime('now'), periods=100, freq='H'),
Proto=np.random.choice('UDP TCP ICMP'.split(), 100, p=(.3, .5, .2))
))
Solution
Use pd.crosstab then plot
pd.crosstab(df.Date.dt.date, df.Proto).plot.bar(stacked=True)

Categories

Resources