How to show the plot with correcaltions - python

I have a data frame and I want to plot a figure like this. I try in R and python, but I can not. Can anybody help me to plot this data?
Thank you.
This is my simple data and code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.DataFrame([[1, 4, 5, 12, 5, 2,2], [-5, 8, 9, 0,2,1,8],[-6, 7, 11, 19,1,2,5],[-5, 1, 3, 7,5,2,5],[-5, 7, 3, 7,6,2,9],[2, 7, 9, 7,6,2,8]])
sns.pairplot(data)
plt.show()

sns.pairplot() is a simple function aimed at creating a pair-plot easily using the default settings. If you want more flexibility in terms of the kind of plots you want in the figure, then you have to use PairGrid directly
data = pd.DataFrame(np.random.normal(size=(1000,4)))
def remove_ax(*args, **kwargs):
plt.axis('off')
g = sns.PairGrid(data=data)
g.map_diag(plt.hist)
g.map_lower(sns.kdeplot)
g.map_upper(remove_ax)

Related

Seaborn lineplot - connecting dots of scatterplot

I have problem with sns lineplot and scatterplot. Basically what I'm trying to do is to connect dots of a scatterplot to present closest line joining mapped points. Somehow lineplot is changing width when facing points with tha same x axis values. I want to lineplot to be same, solid line all the way.
The code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data = {'X': [13, 13, 13, 12, 11], 'Y':[14, 11, 13, 15, 20], 'NumberOfPlanets':[2, 5, 2, 1, 2]}
cts = pd.DataFrame(data=data)
plt.figure(figsize=(10,10))
sns.scatterplot(data=cts, x='X', y='Y', size='NumberOfPlanets', sizes=(50,500), legend=False)
sns.lineplot(data=cts, x='X', y='Y',estimator='max', color='red')
plt.show()
The outcome:
Any ideas?
EDIT:
If I try using pyplot it doesn't work either:
Code:
plt.plot(cts['X'], cts['Y'])
Outcome:
I need one line, which connects closest points (basically what is presented on image one but with same solid line).
Ok, I have finally figured it out. The reason lineplot was so messy is because data was not properly sorted. When I sorted dataframe data by 'Y' values, the outcome was satisfactory.
data = {'X': [13, 13, 13, 12, 11], 'Y':[14, 11, 13, 15, 20], 'NumberOfPlanets':[2, 5, 2, 1, 2]}
cts = pd.DataFrame(data=data)
cts = cts.sort_values('Y')
plt.figure(figsize=(10,10))
plt.scatter(cts['X'], cts['Y'], zorder=1)
plt.plot(cts['X'], cts['Y'], zorder=2)
plt.show()
Now it works. Tested it also on other similar scatter points. Everything is fine :)
Thanks!

Plotly: Annotate marker at the last value in line chart

I am trying to mark the last value of the line chart with a big red dot in plotly express python, could someone please help me?
I am successful in building the line chart but not able to annotate the dot.
Below is my dataframe and I want the last value in the dataframe to be annotated.
Below is the line chart created and I want my chart to be similar to the second image in the screenshot
Code I am working with:
fig = px.line(gapdf, x='gap', y='clusterCount', text="clusterCount")
fig.show()
The suggestion from gflavia works perfectly well.
But you can also set up an extra trace and associated text by addressing the elements in the figure directly instead of the data source like this:
fig.add_scatter(x = [fig.data[0].x[-1]], y = [fig.data[0].y[-1]])
Plot 1
Complete code:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
gapdf = pd.DataFrame({
'clusterCount': [1, 2, 3, 4, 5, 6, 7, 8],
'gap': [-15.789, -14.489, -13.735, -13.212, -12.805, -12.475, -12.202, -11.965]
})
fig = px.line(gapdf, x='gap', y='clusterCount')
fig.add_scatter(x = [fig.data[0].x[-1]], y = [fig.data[0].y[-1]],
mode = 'markers + text',
marker = {'color':'red', 'size':14},
showlegend = False,
text = [fig.data[0].y[-1]],
textposition='middle right')
fig.show()
You could overlay an additional trace for the last data point with plotly.graph_objects:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
gapdf = pd.DataFrame({
'clusterCount': [1, 2, 3, 4, 5, 6, 7, 8],
'gap': [-15.789, -14.489, -13.735, -13.212, -12.805, -12.475, -12.202, -11.965]
})
fig = px.line(gapdf, x='gap', y='clusterCount')
fig.add_trace(go.Scatter(x=[gapdf['gap'].iloc[-1]],
y=[gapdf['clusterCount'].iloc[-1]],
text=[gapdf['clusterCount'].iloc[-1]],
mode='markers+text',
marker=dict(color='red', size=10),
textfont=dict(color='green', size=20),
textposition='top right',
showlegend=False))
fig.update_layout(plot_bgcolor='white',
xaxis=dict(linecolor='gray', mirror=True),
yaxis=dict(linecolor='gray', mirror=True))
fig.show()

Altair mark_line plots noisier than matplotlib?

I am learning altair to add interactivity to my plots. I am trying to recreate a plot I do in matplotlib, however altair is adding noise to my curves.
this is my dataset
df1
linked here from github: https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv
This is the code:
fig, ax = plt.subplots(figsize=(8, 6))
for key, grp in df1.groupby(['Name']):
y=grp.logabsID
x=grp.VG
ax.plot(x, y, label=key)
plt.legend(loc='best')
plt.show()
#doing it directly from link
df1='https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv'
import altair as alt
alt.Chart(df1).mark_line(size=1).encode(
x='VG:Q',
y='logabsID:Q',
color='Name:N'
)
Here is the image of the plots I am generating:
matplotlib vs altair plot
How do I remove the noise from altair?
Altair sorts the x axis before drawing lines, so if you have multiple lines in one group it will often lead to "noise", as you call it. This is not noise, but rather an accurate representation of all the points in your dataset shown in the default sort order. Here is a simple example:
import numpy as np
import pandas as pd
import altair as alt
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5, 5, 4, 3, 2, 1],
'y': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'group': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
})
alt.Chart(df).mark_line().encode(
x='x:Q',
y='y:Q'
)
The best way to fix this is to set the detail encoding to a column that distinguishes between the different lines that you would like to be drawn individually:
alt.Chart(df).mark_line().encode(
x='x:Q',
y='y:Q',
detail='group:N'
)
If it is not the grouping that is important, but rather the order of the points, you can specify that by instead providing an order channel:
alt.Chart(df.reset_index()).mark_line().encode(
x='x:Q',
y='y:Q',
order='index:Q'
)
Notice that the two lines are connected on the right end. This is effectively what matplotlib does by default: it maintains the index order even if there is repeated data. Using the order channel for your data produces the result you're looking for:
df1 = pd.read_csv('https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv')
alt.Chart(df1.reset_index()).mark_line(size=1).encode(
x='VG:Q',
y='logabsID:Q',
color='Name:N',
order='index:Q'
)
The multiple lines in each group are drawn in order connected at the ends, just as they are in matplotlib.

Using MaxNLocator for pandas bar plot results in wrong labels

I have a pandas dataframe and I want to create a plot of it:
import pandas as pd
from matplotlib.ticker import MultipleLocator, FormatStrFormatter, MaxNLocator
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
df.plot(kind='barh')
Nice, everything works as expected:
Now I wanted to hide some of the ticks on y axes. Looking at the docs, I thought I can achieve it with:
MaxNLocator: Finds up to a max number of intervals with ticks at nice
locations. MultipleLocator: Ticks and range are a multiple of base;
either integer or float.
But both of them plot not what I was expecting to see (the values on the y-axes do not show the correct numbers):
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MultipleLocator(2))
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MaxNLocator(3))
What do I do wrong?
Problem
The problem occurs because pandas barplots are categorical. Each bar is positioned at a succesive integer value starting at 0. Only the labels are adjusted to show the actual dataframe index. So here you have a FixedLocator with values 0,1,2,3,... and a FixedFormatter with values -5, -4, -3, .... Changing the Locator alone does not change the formatter, hence you get the numbers -5, -4, -3, ... but at different locations (one tick is not shown, hence the plot starts at -4 here).
A. Pandas solution
In addition to setting the locator you would need to set a formatter, which returns the correct values as function of the location. In the case of a dataframe index with successive integers as used here, this can be done by adding the minimum index to the location using a FuncFormatter. For other cases, the function for the FuncFormatter may become more complicated.
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.ticker import (MultipleLocator, MaxNLocator,
FuncFormatter, ScalarFormatter)
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MultipleLocator(2))
sf = ScalarFormatter()
sf.create_dummy_axis()
sf.set_locs((df.index.max(), df.index.min()))
ax.yaxis.set_major_formatter(FuncFormatter(lambda x,p: sf(x+df.index[0])))
plt.show()
B. Matplotlib solution
Using matplotlib, the solution is potentially easier. Since matplotlib bar plots are numeric in nature, they position the bars at the locations given to the first argument. Here, setting a locator alone is sufficient.
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.ticker import MultipleLocator, MaxNLocator
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
fig, ax = plt.subplots()
ax.barh(df.index, df.values[:,0])
ax.yaxis.set_major_locator(MultipleLocator(2))
plt.show()

Multi-indexing plotting with Matplotlib

I am trying to graph multi indexing plot using matplotlib. However, I was struggling to find the exact code from the previously answered code. Can anyone assist me how can I produce similar graph.
import pandas as pd
import matplotlib.pyplot as plt
import pylab as pl
import numpy as np
import pandas
xls_filename = "abc.xlsx"
f = pandas.ExcelFile(xls_filename)
df = f.parse("Sheet1", index_col='Year' and 'Month')
f.close()
matplotlib.rcParams.update({'font.size': 18}) # Font size of x and y-axis
df.plot(kind= 'bar', alpha=0.70)
It is not indexing as I wanted and not produced the graph as expected as well. Help appreciated.
I created a DataFrame from some of the values I see on your attached plot and plotted it.
index = pd.MultiIndex.from_tuples(tuples=[(2011, ), (2012, ), (2016, 'M'), (2016, 'J')], names=['year', 'month'])
df = pd.DataFrame(index=index, data={'1': [10, 140, 6, 9], '2': [23, 31, 4, 5], '3': [33, 23, 1, 1]})
df.plot(kind='bar')
This is the outcome
where the DataFrame is this

Categories

Resources