I have tried to plot the data in order to achieve something like this:
But I could not and I just achieved this graph with plotly:
Here is the small sample of my data
Does anyone know how to achieve that graph?
Thanks in advance
You'll find a lot of good stuff on timeseries on plotly.ly/python. Still, I'd like to share some practical details that I find very useful:
organize your data in a pandas dataframe
set up a basic plotly structure using fig=go.Figure(go.Scatter())
Make your desired additions to that structure using fig.add_traces(go.Scatter())
Plot:
Code:
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# random data or other data sources
np.random.seed(123)
observations = 200
timestep = np.arange(0, observations/10, 0.1)
dates = pd.date_range('1/1/2020', periods=observations)
val1 = np.sin(timestep)
val2=val1+np.random.uniform(low=-1, high=1, size=observations)#.tolist()
# organize data in a pandas dataframe
df= pd.DataFrame({'Timestep':timestep, 'Date':dates,
'Value_1':val1,
'Value_2':val2})
# Main plotly figure structure
fig = go.Figure([go.Scatter(x=df['Date'], y=df['Value_2'],
marker_color='black',
opacity=0.6,
name='Value 1')])
# One of many possible additions
fig.add_traces([go.Scatter(x=df['Date'], y=df['Value_1'],
marker_color='blue',
name='Value 2')])
# plot figure
fig.show()
Related
I'm trying to create a scatterplot in plotly, but have some difficulties. I think I need to rearrange my data table to be able to work with it, but am note sure.
This is how my data table looks:
table structure
The "Average Price" is the "real" data and the prices in the "Predictions" column are what my model predicted.
I want to display it in a scatterplot, showing both the predicted and real prices as dots, like this:
scatterplot created through matplotlib
This, I created with pyplot
plt.scatter(x_axis, result['Average Price'], label='Real')
plt.scatter(x_axis, result['Predictions'], label='Predictions')
plt.xlabel('YYY-MM-DD')
plt.ylabel('Average Price')
plt.legend(loc='lower right')
plt.show()
However, I wanted to do the same with plotly, which I can't seem to figure out. I have no problems with one column, but don't know how to access both. Do I need to rearrange the table so that I have all prices (predicted and real) in one column and an additional column labeling the data as "real" or "predicted"?
chart_model = px.scatter(result, x='YYYY-MM-DD', y='Predictions', title='Predictions')
chart_model.update_layout(title_x=0.5, plot_bgcolor='#ecf0f1', yaxis_title='Average Price Predicted',
font_color='#2c3e50')
chart_model.update_traces(marker=dict(color='blue'))
Thanks in advance for any tips on how to proceed!
have simulated dataframe of same structure as your question
have used pandas melt() to reshape in line to long dataframe that is then simple to use with plotly
import pandas as pd
import numpy as np
import plotly.express as px
# simulate data frame
df = pd.DataFrame(
{
"YYYY-MM-DD": pd.date_range("4-jan-2015", freq="7D", periods=300),
"Average Price": np.random.uniform(1.2, 1.4, 300),
}
).pipe(
lambda d: d.assign(
Predictions=d["Average Price"] * np.random.uniform(0.9, 1.1, 300)
)
)
# simple inline restructure of data frame
px.scatter(df.set_index("YYYY-MM-DD").melt(ignore_index=False), y="value", color="variable")
alternate
just move data into index and define columns to be plotted
px.scatter(df.set_index("YYYY-MM-DD"), y=["Average Price", "Predictions"])
Is there a simple way of creating histograms for a continuous variable (mpg) that is filtered by a categorical variable (cyl=4,8)? So essentially I need two histograms for mpg grouped by cyl, one for cyl=4 and one for cyl=8.
Here is an example from a different dataset:
import numpy as np
import pandas as pd
import seaborn as sns
data = pd.DataFrame()
data[4] = np.random.normal(0,10,300)
data[8] = np.random.normal(20,11,300)
sns.distplot(data[4], color="skyblue")
sns.distplot(data[8], color="orange")
I just used my random sample.
I am just being a little lazy here, but all you need to do is a seaborn package.
There are much more options you can handle, so please read it more here [https://python-graph-gallery.com/]
During debugging or computationally heavy loops, i would like to see how my data processing evolves (for example in a line plot or an image).
In matplotlib the code can redraw / update the figure with plt.cla() and then plt.draw() or plt.pause(0.001), so that i can follow the progress of my computation in real time or while debugging. How do I do that in plotly express (or plotly)?
So i think i essentially figured it out. The trick is to not use go.Figure() to create a figure, but go.FigureWidget() Which is optically the same thing, but behind the scenes it's not.
documentation
youtube video demonstration
Those FigureWidgets are exactly there to be updated as new data comes in. They stay dynamic, and later calls can modify them.
A FigureWidget can be made from a Figure:
figure = go.Figure(data=data, layout=layout)
f2 = go.FigureWidget(figure)
f2 #display the figure
This is practical, because it makes it possible to use the simplified plotly express interface to create a Figure and then use this to construct a FigureWidget out of it. Unfortunately plotly express does not seem to have it's own simplified FigureWidget module. So one needs to use the more complicated go.FigureWidget.
I'm not sure if an idential functionality exists for plotly. But you can at least build a figure, expand your data source, and then just replace the data of the figure without touching any other of the figure elements like this:
for i, col in enumerate(fig.data):
fig.data[i]['y'] = df[df.columns[i]]
fig.data[i]['x'] = df.index
It should not matter if your figure is a result of using plotly.express or go.Figure since both approaches will produce a figure structure that can be edited by the code snippet above. You can test this for yourself by setting the two following snippets up in two different cells in JupyterLab.
Code for cell 1
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
# code and plot setup
# settings
pd.options.plotting.backend = "plotly"
# sample dataframe of a wide format
np.random.seed(5); cols = list('abc')
X = np.random.randn(50,len(cols))
df=pd.DataFrame(X, columns=cols)
df.iloc[0]=0;df=df.cumsum()
# plotly figure
fig = df.plot(template = 'plotly_dark')
fig.show()
Code for cell 2
# create or retrieve new data
Y = np.random.randn(1,len(cols))
# organize new data in a df
df2 = pd.DataFrame(Y, columns = cols)
# add last row to df to new values
# this step can be skipped if your real world
# data is not a cumulative process like
# in this example
df2.iloc[-1] = df2.iloc[-1] + df.iloc[-1]
# append new data to existing df
df = df.append(df2, ignore_index=True)#.reset_index()
# replace old data in fig with new data
for i, col in enumerate(fig.data):
fig.data[i]['y'] = df[df.columns[i]]
fig.data[i]['x'] = df.index
fig.show()
Running the first cell will put together some data and build a figure like this:
Running the second cell will produce a new dataframe with only one row, append it to your original dataframe, replace the data in your existing figure, and show the figure again. You can run the second cell as many times as you like to redraw your figure with an expanding dataset. After 50 runs, your figure will look like this:
I'm trying to present datatable collected from firewall logs in a histogram so that i would have one bar for each date in the file, and the number of occurences in a certain column stacked in the bar.
I looked into several examples here but they all seemed to be based on the fact that i would know what values there are in the particular column - and what i'm trying to achieve here is the way to present histogram without needing to know all possible fields.
In the example i have used protocol as the column:
#!/usr/bin/python
import pandas as pd
import numpy as np
import glob
import matplotlib.pyplot as plt
csvs = glob.glob("*log-export.csv")
dfs = [pd.read_csv(csv, sep="\xff", engine="python") for csv in csvs]
df_merged = pd.concat(dfs).fillna("")
data = df_merged[['date', 'proto']]
np_data = np.array(data)
plt.hist(np_data, stacked=True)
plt.show()
But this shows following diagram:
histogram
and i would like to accomplish something like this:
stacked
Any suggestions how to achieve this?
Setup
I had to make up data because you didn't provide any.
df = pd.DataFrame(dict(
Date=pd.date_range(end=pd.to_datetime('now'), periods=100, freq='H'),
Proto=np.random.choice('UDP TCP ICMP'.split(), 100, p=(.3, .5, .2))
))
Solution
Use pd.crosstab then plot
pd.crosstab(df.Date.dt.date, df.Proto).plot.bar(stacked=True)
Im trying my best to figure out a way of plotting two tables on the same plotly subplots but cant seem to get past this part. Searching has has not been fruitful.
fig = tools.make_subplots(rows = 2, cols=1, subplot_titles=('Continious
Features','Catagorical Features'))
cont_table = FF.create_table(cont_data_matrix, index= True,index_title='Feature')
cat_table = FF.create_table(cont_data_matrix, index= True,index_title='Feature')
cont_table['figure'].extend()
fig.append_trace(cont_table, 1,1)
fig.append_trace(cont_table, 2,1)
py.plot(fig)
I highly recommend using the cufflinks library which ties together pandas and plotly, giving pandas dataframes the iplot() command. https://plot.ly/ipython-notebooks/cufflinks/
import cufflinks as cf
import pandas as pd
table.iplot(kind='scatter', subplots=True, shape=(2,1), filename='cufflinks_test')
So now the trick is to get the table set up. You want the data for each trace to be in different columns, and the index to correspond to the X values. pivot_table() is extremely useful for getting your data into the right shape.
If you include some dummy data for cont_table and cat_table I could provide code to combine them so that the above code works.