Scatterplot with plotly vs pyplot / different approach in data table needed? - python

I'm trying to create a scatterplot in plotly, but have some difficulties. I think I need to rearrange my data table to be able to work with it, but am note sure.
This is how my data table looks:
table structure
The "Average Price" is the "real" data and the prices in the "Predictions" column are what my model predicted.
I want to display it in a scatterplot, showing both the predicted and real prices as dots, like this:
scatterplot created through matplotlib
This, I created with pyplot
plt.scatter(x_axis, result['Average Price'], label='Real')
plt.scatter(x_axis, result['Predictions'], label='Predictions')
plt.xlabel('YYY-MM-DD')
plt.ylabel('Average Price')
plt.legend(loc='lower right')
plt.show()
However, I wanted to do the same with plotly, which I can't seem to figure out. I have no problems with one column, but don't know how to access both. Do I need to rearrange the table so that I have all prices (predicted and real) in one column and an additional column labeling the data as "real" or "predicted"?
chart_model = px.scatter(result, x='YYYY-MM-DD', y='Predictions', title='Predictions')
chart_model.update_layout(title_x=0.5, plot_bgcolor='#ecf0f1', yaxis_title='Average Price Predicted',
font_color='#2c3e50')
chart_model.update_traces(marker=dict(color='blue'))
Thanks in advance for any tips on how to proceed!

have simulated dataframe of same structure as your question
have used pandas melt() to reshape in line to long dataframe that is then simple to use with plotly
import pandas as pd
import numpy as np
import plotly.express as px
# simulate data frame
df = pd.DataFrame(
{
"YYYY-MM-DD": pd.date_range("4-jan-2015", freq="7D", periods=300),
"Average Price": np.random.uniform(1.2, 1.4, 300),
}
).pipe(
lambda d: d.assign(
Predictions=d["Average Price"] * np.random.uniform(0.9, 1.1, 300)
)
)
# simple inline restructure of data frame
px.scatter(df.set_index("YYYY-MM-DD").melt(ignore_index=False), y="value", color="variable")
alternate
just move data into index and define columns to be plotted
px.scatter(df.set_index("YYYY-MM-DD"), y=["Average Price", "Predictions"])

Related

Stacked Area Chart in Python

I'm working on an assignment from school, and have run into a snag when it comes to my stacked area chart.
The data is fairly simple: 4 columns that look similar to this:
Series id
Year
Period
Value
LNS140000
1948
M01
3.4
I'm trying to create a stacked area chart using Year as my x and Value as my y and breaking it up over Period.
#Stacked area chart still using unemployment data
x = d.Year
y = d.Value
plt.stackplot(x, y, labels = d['Period'])
plt.legend(d['Period'], loc = 'upper left')
plt.show()enter code here`
However, when I do it like this it only picks up M01 and there are M01-M12. Any thoughts on how I can make this work?
You need to preprocess your data a little before passing them to the stackplot function. I took a look at this link to work on an example that could be suitable for your case.
Since I've seen one row of your data, I add some random values to the dataset.
import pandas as pd
import matplotlib.pyplot as plt
dd=[[1948,'M01',3.4],[1948,'M02',2.5],[1948,'M03',1.6],
[1949,'M01',4.3],[1949,'M02',6.7],[1949,'M03',7.8]]
d=pd.DataFrame(dd,columns=['Year','Period','Value'])
years=d.Year.unique()
periods=d.Period.unique()
#Now group them per period, but in year sequence
d.sort_values(by='Year',inplace=True) # to ensure entire dataset is ordered
pds=[]
for p in periods:
pds.append(d[d.Period==p]['Value'].values)
plt.stackplot(years,pds,labels=periods)
plt.legend(loc='upper left')
plt.show()
Is that what you want?
So I was able to use Seaborn to help out. First I did a pivot table
df = d.pivot(index = 'Year',
columns = 'Period',
values = 'Value')
df
Then I set up seaborn
plt.style.use('seaborn')
sns.set_style("white")
sns.set_theme(style = "ticks")
df.plot.area(figsize = (20,9))
plt.title("Unemployment by Year and Month\n", fontsize = 22, loc = 'left')
plt.ylabel("Values", fontsize = 22)
plt.xlabel("Year", fontsize = 22)
It seems to me that the problem you are having relates to the formatting of the data. Look how the values are formatted in this matplotlib example. I would try to groupby the data by period, or pivot it in the correct format, and then graphing again.

Any way to correctly make weekly time series line chart in matplotlib?

I am trying to make a linear chart that visualizes the product's export and sales activity by using weekly base data. Basically, I want to use this data to see how the exporting number of different commodities is changing along with weekly time base data. I could able to aggregate data for making a line chart for the export trends of different commodities for top-5 counties, but the resulted plot in my attempt didn't make my expected output. Can anyone point me out how to make this right? Is there any better way to make a product export trend line chart using matplotlib or seaborn in python? Can anyone suggest a possible better way of doing this? Any thoughts
my current attempt
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
url = 'https://gist.githubusercontent.com/adamFlyn/e9ad428a266eccb5dc38b4cee7084372/raw/cfcbe9cf0ed19ada6a4ea409644db7414de9c87f/sales_df.csv'
df = pd.read_csv(url)
df.drop(columns=['Unnamed: 0'], inplace=True)
df_grp = df.groupby(['weekEndingDate','country', 'commodity'])['weeklyExports'].sum().unstack().reset_index()
df_grp = df_grp .fillna(0)
for c in df_grp[['FCF_Beef', 'FCF_Pork']]:
fig, ax = plt.subplots(figsize=(7, 4), dpi=144)
df_grp_new = df_grp .groupby(['country', 'weekEndingDate'])[c].sum().unstack().fillna(0)
df_grp_new = df_grp_new .T
df_grp_new.drop([col for col, val in df_grp_new .sum().iteritems() if val < 1000], axis=1, inplace=True)
for col in df_grp_new.columns:
sns.lineplot(x='WeekEndingDate', y='weekly export', ci=None, data=df_grp_new, label=col)
ax.relim()
ax.autoscale_view()
ax.xaxis.label.set_visible(False)
plt.legend(bbox_to_anchor=(1., 1), loc='upper left')
plt.ylabel('weekly export')
plt.margins(x=0)
plt.title(c)
plt.tight_layout()
plt.grid(True)
plt.show()
plt.close()
but these attempts didn't make my expected output. Essentially, I want to see how weekly export of different commodities like beef and pork for different countries by weekly base time series. Can anyone suggest to me what went wrong in my code? How can I get a desirable line chart by using the above data? Any idea?
desired output
here is the example desired plots (just style) that I want to make in my attempt:
Plenty of ways to do it. If you make your time column into datetime seaborn will handle formatting the axis for you.
You could use a facetgrid to split by commodity, or if you want finer control over the individual charts plot them using lineplot, filtering the df by the commodity prior.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
url = 'https://gist.githubusercontent.com/adamFlyn/e9ad428a266eccb5dc38b4cee7084372/raw/cfcbe9cf0ed19ada6a4ea409644db7414de9c87f/sales_df.csv'
df = pd.read_csv(url)
df.drop(columns=['Unnamed: 0'], inplace=True)
df['weekEndingDate'] = pd.to_datetime(df['weekEndingDate'])
# sns.set(rc={'figure.figsize':(11.7,8.27)})
g = sns.FacetGrid(df, col='commodity', height=8, sharex=False, sharey=False, legend_out=True)
g.map_dataframe(sns.lineplot, x='weekEndingDate',y='weeklyExports', hue='country', ci=None)
g.add_legend()

Plotly Express seemingly only taking color attributes from first frame of animation slider

Plotly Express using px.scatter and pd dataframe only seems to take the color attributes for the values in the dataframe column for the first frame of the time animation slider.
Yet if no color parameter is passed through then all values show but of course are not divided up by colour.
CSV data file: https://github.com/nightingaleandrew/olympic_visualisations
CSV then convert to df.
import plotly.express as px
import pandas as pd
fig = px.scatter(df,
x="% of New Nations That Take Up The Sport",
y="Age of Sport",
animation_frame="Year",
animation_group="Sport ",
size="Number of Participating Nations",
color="Sport ", #only shows data from first frame values of Sport column
hover_name="Sport ",
title="I wonder if new nations take up younger sports?",
range_x=[0,100],
range_y=[0,100],
)
fig.show()
Further, color_discrete_map with keys (values of column) & values (colours) does not seem to work either.
Any suggestions? Thanks!

How to plot time series graph in jupyter?

I have tried to plot the data in order to achieve something like this:
But I could not and I just achieved this graph with plotly:
Here is the small sample of my data
Does anyone know how to achieve that graph?
Thanks in advance
You'll find a lot of good stuff on timeseries on plotly.ly/python. Still, I'd like to share some practical details that I find very useful:
organize your data in a pandas dataframe
set up a basic plotly structure using fig=go.Figure(go.Scatter())
Make your desired additions to that structure using fig.add_traces(go.Scatter())
Plot:
Code:
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# random data or other data sources
np.random.seed(123)
observations = 200
timestep = np.arange(0, observations/10, 0.1)
dates = pd.date_range('1/1/2020', periods=observations)
val1 = np.sin(timestep)
val2=val1+np.random.uniform(low=-1, high=1, size=observations)#.tolist()
# organize data in a pandas dataframe
df= pd.DataFrame({'Timestep':timestep, 'Date':dates,
'Value_1':val1,
'Value_2':val2})
# Main plotly figure structure
fig = go.Figure([go.Scatter(x=df['Date'], y=df['Value_2'],
marker_color='black',
opacity=0.6,
name='Value 1')])
# One of many possible additions
fig.add_traces([go.Scatter(x=df['Date'], y=df['Value_1'],
marker_color='blue',
name='Value 2')])
# plot figure
fig.show()

Plotting a Pandas DataSeries.GroupBy

I am new to python and pandas, and have the following DataFrame.
How can I plot the DataFrame where each ModelID is a separate plot, saledate is the x-axis and MeanToDate is the y-axis?
Attempt
data[40:76].groupby('ModelID').plot()
DataFrame
You can make the plots by looping over the groups from groupby:
import matplotlib.pyplot as plt
for title, group in df.groupby('ModelID'):
group.plot(x='saleDate', y='MeanToDate', title=title)
See for more information on plotting with pandas dataframes:
http://pandas.pydata.org/pandas-docs/stable/visualization.html
and for looping over a groupby-object:
http://pandas.pydata.org/pandas-docs/stable/groupby.html#iterating-through-groups
Example with aggregation:
I wanted to do something like the following, if pandas had a colour aesthetic like ggplot:
aggregated = df.groupby(['model', 'training_examples']).aggregate(np.mean)
aggregated.plot(x='training_examples', y='accuracy', label='model')
(columns: model is a string, training_examples is an integer, accuracy is a decimal)
But that just produces a mess.
Thanks to joris's answer, I ended up with:
for index, group in df.groupby(['model']):
group_agg = group.groupby(['training_examples']).aggregate(np.mean)
group_agg.plot(y='accuracy', label=index)
I found that title= was just replacing the single title of the plot on each loop iteration, but label= does what you'd expect -- after running plt.legend(), of course.

Categories

Resources