Lets say I've 2 arrays
x = [1,2,3,4,5,6,7]
y = [1,2,2,2,3,4,5]
its scatter plot looks like this
what I want to do is that I want my x axis to look like this in the plot
0,4,8
as a result of which values of y in each piece of x should come closer .
The similar behavior I've seen is bar plots where this is called clustering , how do I do the same in case of scatter plot , or is there any other plot I should be using ?
I hope my question is clear/understandable .
All the help is appreciated
With you plot, try this, before you display the plot.
plt.xticks([0,4,8]))
or
import numpy as np
plt.xticks(np.arange(0, 8+1, step=4))
Then to change the scale you can try something like this,
plt.xticks([0,4,8]))
plt.rcParams["figure.figsize"] = (10,5)
I got this with my example,
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.xticks([0,4,8])
plt.rcParams["figure.figsize"] = (7,3)
plt.plot(x, y, 'o', color='black')
output
I think what you are looking for is close to swarmplots and stripplots in Seaborn. However, Seaborn's swarmplot and stripplot are purely categorical on one of the axes, which means that they wouldn't preserve the relative x-axis order of your elements inside each category.
One way to do what you want would be to increase the space in your x-axis between categories ([0,4,8]) and modify your xticks accordingly.
Below is an example of this where I assign the data to 3 different categories: [-2,2[, [2,6[, [6,10[. And each bar is dil_k away from its directly neighboring bars.
import matplotlib.pyplot as plt
import numpy as np
#Generating data
x= np.random.choice(8,size=(100))
y= np.random.choice(8,size=(100))
dil_k=20
#Creating the spacing between categories
x[np.logical_and(x<6, x>=2)]+=dil_k
x[np.logical_and(x<10, x>=6)]+=2*dil_k
#Plotting
ax=plt.scatter(x,y)
#Modifying axes accordingly
plt.xticks([0,2,22,24,26,46,48,50],[0,2,2,4,6,6,8,10])
plt.show()
And the output gives:
Alternatively, if you don't care about keeping the order of your elements along the x-axis inside each category, then you can use swarmplot directly.
The code can be seen below:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
#Generating data
x= np.random.choice(8,size=(100))
y= np.random.choice(8,size=(100))
#Creating the spacing between categories
x[np.logical_and(x<2,x>=-2)]=0
x[np.logical_and(x<6, x>=2)]=4
x[np.logical_and(x<10, x>=6)]=8
#Plotting
sns.swarmplot(x=x,y=y)
plt.show()
And the output gives:
First time user so apologies for any mistakes.
I have some code (pasted below) which is used to analyse and gain values/graphs from a simulation I have run.
This results in the following image:
I would therefore now like to plot a line graph on top of this according to the values of the colour map corresponding to r = 0 on the y-axis at every point on the x - axis with each respective value on the colour map. However, I'm completely lost on where to even begin with this. I've tried looking into KDE and other similar things, but I realise I'm not sure how to take numerical values which were used to generate the colour map.
from openpmd_viewer import OpenPMDTimeSeries
from openpmd_viewer.addons import LpaDiagnostics
import numpy as np
from scipy.constants import c, e, m_e
import matplotlib.pyplot as plt
from matplotlib import gridspec
# Replace the string below, to point to your data
ts = OpenPMDTimeSeries(r"/Users/bentorrance/diags/hdf5/")
ts_2d = LpaDiagnostics(r"/Users/bentorrance/diags/hdf5/")
plt.figure(1)
Ez = ts.get_field(iteration=5750, field='E', coord='z', plot=True, cmap='inferno')
plt.title(r'Electric Field Density $E_{z}$')
plt.show()
I am creating several scatter plot graphs in matplotlib. For these I want to plot trend lines for the scatter plots. I am using the numpy polyfit and poly1d methods to create the trendline.
My problem is as follows: There are only positive y values in my dataset (I have also removed all 0 values), but my trendlines are going below 0. The reason why I think it's going below 0 is that I have some very large outlier values that skew the trendline.
Is there a way I can prevent my graph trendlines from going below 0 without removing data points? Perhaps using a method or parameter for a method in the numpy or matplotlib libraries?
Removing outliers helps some trendlines, but not at all for the multiple graphs I'm making.
Graph example with scatter points: https://imgur.com/a/bwIFJw7
Graph example without scatter points (same data as above graph): https://imgur.com/a/k5TyNjt
Changing the degree of the trend line doesn't solve the issue
code for reproduce-ability:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import numpy as np
plt.figure(figsize=(20,150))
loc = mdates.AutoDateLocator()
dataset = {'time':['4/5/2014','4/10/2014','4/21/2014','5/3/2014','5/8/2014','5/19/2014','6/7/2014','6/12/2014','6/16/2014','12/6/2014','12/11/2014','12/15/2014','2/7/2015','2/12/2015','2/16/2015','7/20/2015','8/1/2015','8/13/2015','8/17/2015,'9/5/2015','9/10/2015','9/21/2015','10/3/2015','12/10/2015','1/18/2016','8/6/2016','8/11/2016','8/15/2016','9/3/2016','9/8/2016','9/19/2016','10/1/2016','10/13/2016','10/17/2016','11/10/2016','11/5/2016','8/10/2017','9/14/2017','9/18/2017','10/7/2017','2/8/2018','2/19/2018','3/3/2018','3/8/2018','3/19/2018','4/12/2018','4/7/2018','4/16/2018','5/5/2018','5/10/2018','5/21/2018','11/3/2018','11/8/2018','11/19/2018','12/1/2018','12/13/2018','12/17/2018','1/5/2019','1/10/2019','1/21/2019','2/2/2019','2/14/2019','2/18/2019','3/2/2019','3/14/2019','3/18/2019','4/6/2019','4/11/2019','4/15/2019'],'yval':[1714.6,996.32,1638.4,1293.47,744.73,1843.2,1009.97,2168.47,819.2,2949.12,2730.67,2106.51,14745.6,3880.42,73728,792.77,538.16,585.14,571.53,580.54,933.27,460.8,646.74,4336.94,36864,190.51,206.89,199.02,197.54,219.84,210.27,223.75,201.96,212.23,223.6,211.48,1568.68,418.91,837.82,5671.38,217.18,189.74,192.59,192.04,196.74,197.8,196.47,200.69,193.69,210.79,349.42,222.5,209.17,191.37,192.91,197.57,207.23,192.48,189.7,199.44,187.57,186.85,187.99,189.19,196.34,196.11,192.61,196.39,190.05,]}
dataset['time'] = pd.to_datetime(dataset['time'])
dataset['yval'] = pd.to_numeric(dataset['yval'])
x = mdates.date2num(dataset['time'])
y = dataset['yval']
z = np.polyfit(x,y,3)
p = np.poly1d(z)
plt.plot(x,p(x),'#00FFFF', label = type)
plt.title(type)
plt.xlabel('Time')
plt.ylabel('Weight')
#comment out the next line to see plot without scatter points
plt.scatter(x,y)
plt.gca().xaxis.set_major_locator(loc)
plt.gca().xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
plt.grid(which='major',axis='both')
plt.show()
Graph with trendline not going below the horizontal 0 axis is the desired output
I have a data set that has two independent variables and 1 dependent variable. I thought the best way to represent the dataset is by a checkerboard-type plot wherein the color of the cells represent a range of values, like this:
I can't seem to find a code to do this automatically.
You need to use a plotting package to do this. For example, with matplotlib:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
X = 100*np.random.rand(6,6)
fig, ax = plt.subplots()
i = ax.imshow(X, cmap=cm.jet, interpolation='nearest')
fig.colorbar(i)
plt.show()
For those who come across this years later as myself, what Original Poster wants is a heatmap.
Matplotlib has documentation regarding the following example here.
I like this particular plot and the ability to pass a function to the stat_func keyword to quickly plot up and visualize relationships between variables, but there's one thing. How do I 'turn off' or not plot the marginal distribution axes?
It looks nice but sometime I don't want this feature.
For example using this code:
import numpy as np
import seaborn as sns
x = np.arange(100) + np.random.randn(100)*20
y = np.arange(100) + np.random.randn(100)*20
sns.jointplot(x, y, kind='reg')
How can I remove the kde subplots on the top and right hand side of the main axes?
You could use JointGrid directly:
from scipy import stats
g = sns.JointGrid(x, y, ratio=100)
g.plot_joint(sns.regplot)
g.annotate(stats.pearsonr)
g.ax_marg_x.set_axis_off()
g.ax_marg_y.set_axis_off()