Seaborn lineplot - connecting dots of scatterplot - python

I have problem with sns lineplot and scatterplot. Basically what I'm trying to do is to connect dots of a scatterplot to present closest line joining mapped points. Somehow lineplot is changing width when facing points with tha same x axis values. I want to lineplot to be same, solid line all the way.
The code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data = {'X': [13, 13, 13, 12, 11], 'Y':[14, 11, 13, 15, 20], 'NumberOfPlanets':[2, 5, 2, 1, 2]}
cts = pd.DataFrame(data=data)
plt.figure(figsize=(10,10))
sns.scatterplot(data=cts, x='X', y='Y', size='NumberOfPlanets', sizes=(50,500), legend=False)
sns.lineplot(data=cts, x='X', y='Y',estimator='max', color='red')
plt.show()
The outcome:
Any ideas?
EDIT:
If I try using pyplot it doesn't work either:
Code:
plt.plot(cts['X'], cts['Y'])
Outcome:
I need one line, which connects closest points (basically what is presented on image one but with same solid line).

Ok, I have finally figured it out. The reason lineplot was so messy is because data was not properly sorted. When I sorted dataframe data by 'Y' values, the outcome was satisfactory.
data = {'X': [13, 13, 13, 12, 11], 'Y':[14, 11, 13, 15, 20], 'NumberOfPlanets':[2, 5, 2, 1, 2]}
cts = pd.DataFrame(data=data)
cts = cts.sort_values('Y')
plt.figure(figsize=(10,10))
plt.scatter(cts['X'], cts['Y'], zorder=1)
plt.plot(cts['X'], cts['Y'], zorder=2)
plt.show()
Now it works. Tested it also on other similar scatter points. Everything is fine :)
Thanks!

Related

Why is my gridline above x-axis and how can I correct it(matplotlib)?

In the figure (see the link below the code), you can see that the bottom horizontal gridline is above the x-axis whereas I would prefer it to be overlapping the x-axis to make the graph look more accurate. Could anyone please tell me how to achieve that? Also, it would be amazing if someone could tell me how I can start my graph from 0 at the bottom left corner. Here is my code:
import matplotlib.pyplot as plt
import numpy as np
x_coordinates = np.array([0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
y_coordinates = np.array([0, 5, 10, 15, 20, 25, 30, 35, 40,45 ])
plt.xlabel("extension/mm")
plt.ylabel("tension/ N")
plt.title("extention vs tension correlation")
plt.xticks([0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
plt.minorticks_on()
plt.grid(b=True, which="minor", color="black" )
plt.grid(b=True, which ="major",color="black")
plt.plot(x_coordinates, y_coordinates)
plt.show()
It's plotting the minor ticks, and that looks confusing against the x-axis. If your plot range ends on a major tick, then it will look nicer. Here is one possible solution:
plt.ylim([min(y_coordinates),max(y_coordinates)])
plt.xlim([min(x_coordinates),max(x_coordinates)])

How to show the plot with correcaltions

I have a data frame and I want to plot a figure like this. I try in R and python, but I can not. Can anybody help me to plot this data?
Thank you.
This is my simple data and code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.DataFrame([[1, 4, 5, 12, 5, 2,2], [-5, 8, 9, 0,2,1,8],[-6, 7, 11, 19,1,2,5],[-5, 1, 3, 7,5,2,5],[-5, 7, 3, 7,6,2,9],[2, 7, 9, 7,6,2,8]])
sns.pairplot(data)
plt.show()
sns.pairplot() is a simple function aimed at creating a pair-plot easily using the default settings. If you want more flexibility in terms of the kind of plots you want in the figure, then you have to use PairGrid directly
data = pd.DataFrame(np.random.normal(size=(1000,4)))
def remove_ax(*args, **kwargs):
plt.axis('off')
g = sns.PairGrid(data=data)
g.map_diag(plt.hist)
g.map_lower(sns.kdeplot)
g.map_upper(remove_ax)

Show all lines in matplotlib line plot

How do I bring the other line to the front or show both the graphs together?
plot_yield_df.plot(figsize=(20,20))
If plot data overlaps, then one way to view both the data is increase the linewidth along with handling transparency, as shown:
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Original', linewidth=5, alpha=0.5)
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Predicted')
plt.legend()
Subplotting is other good way.
Problem
The lines are plotted in the order their columns appear in the dataframe. So for example
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
a = np.random.rand(400)*0.9
b = np.random.rand(400)+1
a = np.c_[a,-a].flatten()
b = np.c_[b,-b].flatten()
df = pd.DataFrame({"A" : a, "B" : b})
df.plot()
plt.show()
Here the values of "B" hide those from "A".
Solution 1: Reverse column order
A solution is to reverse their order
df[df.columns[::-1]].plot()
That has also changed the order in the legend and the color coding.
Solution 2: Reverse z-order
So if that is not desired, you can instead play with the zorder.
ax = df.plot()
lines = ax.get_lines()
for line, j in zip(lines, list(range(len(lines)))[::-1]):
line.set_zorder(j)

Using MaxNLocator for pandas bar plot results in wrong labels

I have a pandas dataframe and I want to create a plot of it:
import pandas as pd
from matplotlib.ticker import MultipleLocator, FormatStrFormatter, MaxNLocator
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
df.plot(kind='barh')
Nice, everything works as expected:
Now I wanted to hide some of the ticks on y axes. Looking at the docs, I thought I can achieve it with:
MaxNLocator: Finds up to a max number of intervals with ticks at nice
locations. MultipleLocator: Ticks and range are a multiple of base;
either integer or float.
But both of them plot not what I was expecting to see (the values on the y-axes do not show the correct numbers):
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MultipleLocator(2))
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MaxNLocator(3))
What do I do wrong?
Problem
The problem occurs because pandas barplots are categorical. Each bar is positioned at a succesive integer value starting at 0. Only the labels are adjusted to show the actual dataframe index. So here you have a FixedLocator with values 0,1,2,3,... and a FixedFormatter with values -5, -4, -3, .... Changing the Locator alone does not change the formatter, hence you get the numbers -5, -4, -3, ... but at different locations (one tick is not shown, hence the plot starts at -4 here).
A. Pandas solution
In addition to setting the locator you would need to set a formatter, which returns the correct values as function of the location. In the case of a dataframe index with successive integers as used here, this can be done by adding the minimum index to the location using a FuncFormatter. For other cases, the function for the FuncFormatter may become more complicated.
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.ticker import (MultipleLocator, MaxNLocator,
FuncFormatter, ScalarFormatter)
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MultipleLocator(2))
sf = ScalarFormatter()
sf.create_dummy_axis()
sf.set_locs((df.index.max(), df.index.min()))
ax.yaxis.set_major_formatter(FuncFormatter(lambda x,p: sf(x+df.index[0])))
plt.show()
B. Matplotlib solution
Using matplotlib, the solution is potentially easier. Since matplotlib bar plots are numeric in nature, they position the bars at the locations given to the first argument. Here, setting a locator alone is sufficient.
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.ticker import MultipleLocator, MaxNLocator
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
fig, ax = plt.subplots()
ax.barh(df.index, df.values[:,0])
ax.yaxis.set_major_locator(MultipleLocator(2))
plt.show()

Multi-indexing plotting with Matplotlib

I am trying to graph multi indexing plot using matplotlib. However, I was struggling to find the exact code from the previously answered code. Can anyone assist me how can I produce similar graph.
import pandas as pd
import matplotlib.pyplot as plt
import pylab as pl
import numpy as np
import pandas
xls_filename = "abc.xlsx"
f = pandas.ExcelFile(xls_filename)
df = f.parse("Sheet1", index_col='Year' and 'Month')
f.close()
matplotlib.rcParams.update({'font.size': 18}) # Font size of x and y-axis
df.plot(kind= 'bar', alpha=0.70)
It is not indexing as I wanted and not produced the graph as expected as well. Help appreciated.
I created a DataFrame from some of the values I see on your attached plot and plotted it.
index = pd.MultiIndex.from_tuples(tuples=[(2011, ), (2012, ), (2016, 'M'), (2016, 'J')], names=['year', 'month'])
df = pd.DataFrame(index=index, data={'1': [10, 140, 6, 9], '2': [23, 31, 4, 5], '3': [33, 23, 1, 1]})
df.plot(kind='bar')
This is the outcome
where the DataFrame is this

Categories

Resources