This question already has an answer here:
What estimators does seaborn support
(1 answer)
Closed 4 months ago.
I am reading the documentation of seaborn.barplot and I read the following.
estimator : string or callable that maps vector -> scalar, optional
Statistical function to estimate within each categorical bin.
I could not understand what callable that maps vector -> scalar means. What does this statement convey?
When I passed estimator = 'mean', I got this error.
TypeError: 'str' object is not callable
What should we pass as a string?
Callable means a function. Named function or lambda function. A function that takes a vector as argument and returns a scalar. Function are "first class citizens" in Python so can be passed as argument and generally treated as any other object.
See example here:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
penguin_data = sns.load_dataset("penguins")
f = plt.figure(figsize=(6, 4))
fig = sns.barplot(x="species", y="body_mass_g", palette = "flare",
estimator = np.mean, data=penguin_data)
Related
I want to plot a seaborn regplot.
my code:
x=data['Healthy life expectancy']
y=data['max_dead']
sns.regplot(x,y)
plt.show()
However this gives me future warning error. How to fix this warning?
FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid
positional argument will be 'data', and passing other arguments without an explicit keyword will
result in an error or misinterpretation.
Seaborn 0.12
Following is a non-exhaustive list of potential errors for incorrect use of positional and keyword arguments with seaborn:
sns.regplot(tips.total_bill, tips.tip): TypeError: regplot() takes from 0 to 1 positional arguments but 2 were given.
sns.lmplot('petel_width', 'petal_length', data=iris): TypeError: lmplot() got multiple values for argument 'data'
sns.kdeplot(x, y): TypeError: kdeplot() takes from 0 to 1 positional arguments but 2 positional arguments (and 1 keyword-only argument) were given
Only data may be specified as the first positional argument for seaborn plots. All other arguments must use keywords (e.g. x= and y=).
Seaborn 0.11
Technically, it's a warning, not an error, and can be ignored for now, as shown in the bottom section of this answer.
I recommend doing as the warning says, specify the x and y parameters for seaborn.regplot, or any of the other seaborn plot functions with this warning.
sns.regplot(x=x, y=y), where x and y are parameters for regplot, to which you are passing x and y variables.
Beginning in version 0.12, passing any positional arguments, except data, will result in an error or misinterpretation.
For those concerned with backward compatibility, write a script to fix existing code, or don't update to 0.12 (once available).
x and y are used as the data variable names because that is what is used in the OP. Data can be assigned to any variable name (e.g. a and b).
This also applies to FutureWarning: Pass the following variable as a keyword arg: x, which can be generated by plots only requiring x or y, such as:
sns.countplot(pen['sex']), but should be sns.countplot(x=pen['sex']) or sns.countplot(y=pen['sex'])
import seaborn as sns
import pandas as pd
pen = sns.load_dataset('penguins')
x = pen.culmen_depth_mm # or bill_depth_mm
y = pen.culmen_length_mm # or bill_length_mm
# plot without specifying the x, y parameters
sns.regplot(x, y)
# plot with specifying the x, y parameters
sns.regplot(x=x, y=y)
# or use
sns.regplot(data=pen, x='bill_depth_mm', y='bill_length_mm')
Ignore the warnings
I do not advise using this option.
Once seaborn v0.12 is available, this option will not be viable.
From version 0.12, the only valid positional argument will be data, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)
# plot without specifying the x, y parameters
sns.regplot(x, y)
I use
sns.distplot
to plot a univariate distribution of observations. Still, I need not only the chart, but also the data points. How do I get the data points from matplotlib Axes (returned by distplot)?
You can use the matplotlib.patches API. For instance, to get the first line:
sns.distplot(x).get_lines()[0].get_data()
This returns two numpy arrays containing the x and y values for the line.
For the bars, information is stored in:
sns.distplot(x).patches
You can access the bar's height via the function patches.get_height():
[h.get_height() for h in sns.distplot(x).patches]
If you want to obtain the kde values of an histogram you can use scikit-learn KernelDensity function instead:
import numpy as np
import pandas as pd
from sklearn.neighbors import KernelDensity
ds=pd.read_csv('data-to-plot.csv')
X=ds.loc[:,'Money-Spent'].values[:, np.newaxis]
kde = KernelDensity(kernel='gaussian', bandwidth=0.75).fit(X) #you can supply a bandwidth
#parameter.
x=np.linspace(0,5,100)[:, np.newaxis]
log_density_values=kde.score_samples(x)
density=np.exp(log_density_values)
array([1.88878660e-05, 2.04872903e-05, 2.21864649e-05, 2.39885206e-05,
2.58965064e-05, 2.79134003e-05, 3.00421245e-05, 3.22855645e-05,
3.46465903e-05, 3.71280791e-05, 3.97329392e-05, 4.24641320e-05,
4.53246933e-05, 4.83177514e-05, 5.14465430e-05, 5.47144252e-05,
5.81248850e-05, 6.16815472e-05, 6.53881807e-05, 6.92487062e-05,
7.32672057e-05, 7.74479375e-05, 8.17953578e-05, 8.63141507e-05,
..........................
..........................
3.93779919e-03, 4.15788216e-03, 4.38513011e-03, 4.61925890e-03,
4.85992626e-03, 5.10672757e-03, 5.35919187e-03, 5.61677855e-03])
With the newer version of seaborn this is not the case anymore. First of all, distplot has been replaced with displot. Secondly, when calling get_lines() an error message comes up AttributeError: 'FacetGrid' object has no attribute 'get_lines'.
This will get the kde curve you want
line = sns.distplot(data).get_lines()[0]
plt.plot(line.get_xdata(), line.get_ydata())
Lets say I have the following function:
def f(x):
return log(3*exp(3*x) + 7*exp(7*x))
I want to do two things:
1) plot the function over a range of x-values
2) find the root of the function using the Newton method from scipy
My problem is that it seems that plotting is best done with a numpy array x=np.linspace(-2,2,1000), but then evaluating the function results in erros TypeError: only size-1 arrays can be converted to Python scalars. I can fix this by simply changing log and exp to np.log and np.exp, respectively.
But doing so then makes scipy.optimize.newton unhappy.
It seems like I need to define the function twice, once for use in plotting (with np. ...) and once for optimizing in the form given above.
I can't imagine that this is actually the case. Any hints would be greatly appreciated.
Seems legit, you just need to use numpy functions instead of base math functions:
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt
%matplotlib inline
def f(x):
return np.log(3*np.exp(3*x) + 7*np.exp(7*x))
x = np.linspace(-2,2,1000)
y = f(x)
plt.scatter(x, y)
optimize.root(f, 1)
This question already has answers here:
python+numpy: why does numpy.log throw an attribute error if its operand is too big?
(2 answers)
Closed 5 years ago.
I am trying to plot several points using Matplotlib on a plot that has lines following the function defined in energy(). The points are plasma parameters and the lines follow the function that connects them using multiple values of the Debye length.
import matplotlib.pyplot as plt
import numpy as np
n_pts = [10**21,10**19,10**23,10**11,10**15,10**14,10**17,10**6]
KT_pts = [10000,100,1000,0.05,2,0.1,0.2,0.01]
n_set = np.logspace(6,25)
debye_set = 7.43*np.logspace(-1,-7,10)
def energy(n,debye):
return n*(debye/7430)**2
fig,ax=plt.subplots()
ax.scatter(n_pts,KT_pts)
for debye in debye_set:
ax.loglog(n_set,energy(n_set,debye))
plt.show()
This gives the following error:
AttributeError: 'int' object has no attribute 'log'
Python does automatic, weird things for integers larger than can be held as a 64-bit integer (on 64 bit systems), like 10**21. In doing so, numpy will then not automatically use a numpy dtype for such objects, instead using the object dtype. This, in turn, does not support ufuncs like np.log:
> np.log([10**3])
array([ 6.90775528])
> np.log([10**30])
AttributeError: 'int' object has no attribute 'log'
One easy solution here is to make sure that numpy converts n_pts, the array with the large numbers, into a dtype it can actually use, like float:
import matplotlib.pyplot as plt
import numpy as np
n_pts = np.array([10**21,10**19,10**23,10**11,10**15,10**14,10**17,10**6], dtype='float')
KT_pts = [10000,100,1000,0.05,2,0.1,0.2,0.01]
n_set = np.logspace(6,25)
debye_set = 7.43*np.logspace(-1,-7,10)
def energy(n,debye):
return n*(debye/7430)**2
fig,ax=plt.subplots()
ax.scatter(n_pts,KT_pts)
for debye in debye_set:
ax.loglog(n_set,energy(n_set,debye))
plt.show()
Trying to evaluate scipy's multivariate_normal.pdf function, but keep getting errors. MWE:
import numpy as np
from scipy.stats import multivariate_normal as mvnorm
x = np.random.rand(5)
mvnorm.pdf(x)
gives
TypeError: pdf() takes at least 4 arguments (2 given)
The docs say both the mean and cov arguments are optional, and that the last axis of x labels the components. Since x.shape = (4L,), it seems like all is kosher. I am expecting a single number as output.
It looks like these parameters aren't optional.
If I pass the default values for mean and cov like:
import numpy as np
from scipy.stats import multivariate_normal as mvnorm
x = np.random.rand(5)
mvnorm.pdf(x, mean=0, cov=1)
I get the following output:
array([ 0.35082878, 0.27012396, 0.26986049, 0.39887847, 0.36116341])
While using:
import numpy as np
from scipy.stats import multivariate_normal as mvnorm
x = np.random.rand(5)
mvnorm.pdf(x)
gives me the same error:
TypeError: pdf() takes at least 4 arguments (2 given)