Multi-indexing plotting with Matplotlib - python

I am trying to graph multi indexing plot using matplotlib. However, I was struggling to find the exact code from the previously answered code. Can anyone assist me how can I produce similar graph.
import pandas as pd
import matplotlib.pyplot as plt
import pylab as pl
import numpy as np
import pandas
xls_filename = "abc.xlsx"
f = pandas.ExcelFile(xls_filename)
df = f.parse("Sheet1", index_col='Year' and 'Month')
f.close()
matplotlib.rcParams.update({'font.size': 18}) # Font size of x and y-axis
df.plot(kind= 'bar', alpha=0.70)
It is not indexing as I wanted and not produced the graph as expected as well. Help appreciated.

I created a DataFrame from some of the values I see on your attached plot and plotted it.
index = pd.MultiIndex.from_tuples(tuples=[(2011, ), (2012, ), (2016, 'M'), (2016, 'J')], names=['year', 'month'])
df = pd.DataFrame(index=index, data={'1': [10, 140, 6, 9], '2': [23, 31, 4, 5], '3': [33, 23, 1, 1]})
df.plot(kind='bar')
This is the outcome
where the DataFrame is this

Related

Seaborn lineplot - connecting dots of scatterplot

I have problem with sns lineplot and scatterplot. Basically what I'm trying to do is to connect dots of a scatterplot to present closest line joining mapped points. Somehow lineplot is changing width when facing points with tha same x axis values. I want to lineplot to be same, solid line all the way.
The code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data = {'X': [13, 13, 13, 12, 11], 'Y':[14, 11, 13, 15, 20], 'NumberOfPlanets':[2, 5, 2, 1, 2]}
cts = pd.DataFrame(data=data)
plt.figure(figsize=(10,10))
sns.scatterplot(data=cts, x='X', y='Y', size='NumberOfPlanets', sizes=(50,500), legend=False)
sns.lineplot(data=cts, x='X', y='Y',estimator='max', color='red')
plt.show()
The outcome:
Any ideas?
EDIT:
If I try using pyplot it doesn't work either:
Code:
plt.plot(cts['X'], cts['Y'])
Outcome:
I need one line, which connects closest points (basically what is presented on image one but with same solid line).
Ok, I have finally figured it out. The reason lineplot was so messy is because data was not properly sorted. When I sorted dataframe data by 'Y' values, the outcome was satisfactory.
data = {'X': [13, 13, 13, 12, 11], 'Y':[14, 11, 13, 15, 20], 'NumberOfPlanets':[2, 5, 2, 1, 2]}
cts = pd.DataFrame(data=data)
cts = cts.sort_values('Y')
plt.figure(figsize=(10,10))
plt.scatter(cts['X'], cts['Y'], zorder=1)
plt.plot(cts['X'], cts['Y'], zorder=2)
plt.show()
Now it works. Tested it also on other similar scatter points. Everything is fine :)
Thanks!

Plotly: Annotate marker at the last value in line chart

I am trying to mark the last value of the line chart with a big red dot in plotly express python, could someone please help me?
I am successful in building the line chart but not able to annotate the dot.
Below is my dataframe and I want the last value in the dataframe to be annotated.
Below is the line chart created and I want my chart to be similar to the second image in the screenshot
Code I am working with:
fig = px.line(gapdf, x='gap', y='clusterCount', text="clusterCount")
fig.show()
The suggestion from gflavia works perfectly well.
But you can also set up an extra trace and associated text by addressing the elements in the figure directly instead of the data source like this:
fig.add_scatter(x = [fig.data[0].x[-1]], y = [fig.data[0].y[-1]])
Plot 1
Complete code:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
gapdf = pd.DataFrame({
'clusterCount': [1, 2, 3, 4, 5, 6, 7, 8],
'gap': [-15.789, -14.489, -13.735, -13.212, -12.805, -12.475, -12.202, -11.965]
})
fig = px.line(gapdf, x='gap', y='clusterCount')
fig.add_scatter(x = [fig.data[0].x[-1]], y = [fig.data[0].y[-1]],
mode = 'markers + text',
marker = {'color':'red', 'size':14},
showlegend = False,
text = [fig.data[0].y[-1]],
textposition='middle right')
fig.show()
You could overlay an additional trace for the last data point with plotly.graph_objects:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
gapdf = pd.DataFrame({
'clusterCount': [1, 2, 3, 4, 5, 6, 7, 8],
'gap': [-15.789, -14.489, -13.735, -13.212, -12.805, -12.475, -12.202, -11.965]
})
fig = px.line(gapdf, x='gap', y='clusterCount')
fig.add_trace(go.Scatter(x=[gapdf['gap'].iloc[-1]],
y=[gapdf['clusterCount'].iloc[-1]],
text=[gapdf['clusterCount'].iloc[-1]],
mode='markers+text',
marker=dict(color='red', size=10),
textfont=dict(color='green', size=20),
textposition='top right',
showlegend=False))
fig.update_layout(plot_bgcolor='white',
xaxis=dict(linecolor='gray', mirror=True),
yaxis=dict(linecolor='gray', mirror=True))
fig.show()

Why my Seaborn line plot x-axis shifts one unit?

I am trying to compare two simple and summarized pandas dataframe with line plot from Seaborn library but one of the lines shifts one unit in X axis. What's wrong with it?
The dataframes are:
Here is my code:
df = pd.read_csv('/home/gazelle/Documents/m3inference/m3_result.csv',index_col='id')
df = df.drop("Unnamed: 0",axis=1)
for i, v in df.iterrows():
if str(i) not in result:
df.drop(i, inplace=True)
else:
df.loc[i, 'estimated'] = result[str(i)]
m3 = pd.read_csv('plot_result.csv').set_index('id')
ids = list(m3.index.values)
m3 = m3['age'].value_counts().to_frame().reset_index().sort_values('index')
m3 = m3.rename(columns={m3.columns[0]:'bucket', m3.columns[1]:'age'})
df_estimated = df[df.index.isin(ids)]['estimated'].value_counts().to_frame().reset_index().sort_values('index')
df_estimated = df_estimated.rename(columns={df_estimated.columns[0]:'bucket', df_estimated.columns[1]:'age'})
sns.lineplot(x='bucket', y='age', data=m3)
sns.lineplot(x='bucket', y='age', data=df_estimated)
And the result is:
As has been pointed out in the comments, the data and code you provide appear to produce the correct result:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
sns.set()
m3 = pd.DataFrame({"index": [2, 3, 4, 1], "age": [123, 116, 66, 33]})
df_estimated = pd.DataFrame({"index": [3, 2, 4, 1], "estimated": [200, 100, 37, 1]})
sns.lineplot(x="index", y="age", data=m3)
sns.lineplot(x="index", y="estimated", data=df_estimated)
plt.show()
This gives a plot which is different from the one you posted above:
From your screenshots it looks like you are working in a Jupyter notebook. You are probably suffering from the issue that at the time you plot, the dataframe m3 no longer has the values you printed above, but has been modified.

How to show the plot with correcaltions

I have a data frame and I want to plot a figure like this. I try in R and python, but I can not. Can anybody help me to plot this data?
Thank you.
This is my simple data and code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.DataFrame([[1, 4, 5, 12, 5, 2,2], [-5, 8, 9, 0,2,1,8],[-6, 7, 11, 19,1,2,5],[-5, 1, 3, 7,5,2,5],[-5, 7, 3, 7,6,2,9],[2, 7, 9, 7,6,2,8]])
sns.pairplot(data)
plt.show()
sns.pairplot() is a simple function aimed at creating a pair-plot easily using the default settings. If you want more flexibility in terms of the kind of plots you want in the figure, then you have to use PairGrid directly
data = pd.DataFrame(np.random.normal(size=(1000,4)))
def remove_ax(*args, **kwargs):
plt.axis('off')
g = sns.PairGrid(data=data)
g.map_diag(plt.hist)
g.map_lower(sns.kdeplot)
g.map_upper(remove_ax)

Plotting graphs from different years in one graph (datetime64)

I have the following problem: I want to combine graphs from various years in one plot. To explain you more about my problem I made the hereunder simplified problem.
# packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Here I make a simplified dataframe which i want to plot
data = {'Dates': ['02-04-2014', '18-08-2014', '05-03-2014', '06-06-2014', '05-08-2013', '06-11-2013', '03-01-2013', '12-02-2013'], 'Values':
[7, 8, 11, 3, 6, 1, 8, 13]}
df = pd.DataFrame.from_dict(data)
This is important for me because it is the format I work with in my problem
df['Dates'] = pd.to_datetime(df['Dates'])
Here I do the plotting
years = sorted([i for i in df['Dates'].apply(lambda x: x.year).unique()])
for i in years:
df1 = df[(df['Dates'].apply(lambda x: x.year) == i)]
df1 = df1.sort_values(by = ['Dates'])
plt.show()
This returns in this case two separate line plots, one for the year 2013 and one for 2014. I want these combined in one graph. So, that i get one grap with a legend for the year.
Hope you can help!

Categories

Resources