My name is Luis Francisco Gomez and I am in the course Intermediate Python > 1 Matplotlib > Sizes that belongs to the Data Scientist with Python in DataCamp. I am reproducing the exercises of the course where in this part you have to make a scatter plot in which the size of the points are equivalent to the population of the countries. I try to reproduce the results of DataCamp with this code:
# load subpackage
import matplotlib.pyplot as plt
## load other libraries
import pandas as pd
import numpy as np
## import data
gapminder = pd.read_csv("https://assets.datacamp.com/production/repositories/287/datasets/5b1e4356f9fa5b5ce32e9bd2b75c777284819cca/gapminder.csv")
gdp_cap = gapminder["gdp_cap"].tolist()
life_exp = gapminder["life_exp"].tolist()
# create an np array that contains the population
pop = gapminder["population"].tolist()
pop_np = np.array(pop)
plt.scatter(gdp_cap, life_exp, s = pop_np*2)
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000, 10000, 100000],['1k', '10k', '100k'])
# Display the plot
plt.show()
However a get this:
But in theory you need to get this:
I don't understand what is the problem with the argument s in plt.scatter .
You need to scale your s,
plt.scatter(gdp_cap, life_exp, s = pop_np*2/1000000)
The marker size in points**2.
Per docs
This is because your sizes are too large, scale it down. Also, there's no need to create all the intermediate arrays:
plt.scatter(gapminder.gdp_cap,
gapminder.life_exp,
s=gapminder.population/1e6)
Output:
I think you should use
plt.scatter(gdp_cap, life_exp, s = gdp_cap*2)
or maybe reduce or scale pop_np
I am aware that there are a few questions about a similar subject, although I couldn't find a proper answer.
I would like to fit some data with a function (called Bastenaire) and iget the parameters values. Here is the code:
import numpy as np
from matplotlib import pyplot as plt
from scipy import optimize
def bastenaire(s, A,B, C,sd):
logNB=np.log(A)-C*(s-sd)-np.log(s-sd)
return np.exp(logNB)-B
S=np.array([659,646,634,623,613,595,580,565,551,535,515,493,473,452,432,413,394,374,355,345])
N=np.array([46963,52934,59975,65522,74241,87237,101977,116751,133665,157067,189426,233260,281321,355558,428815,522582,630257,768067,902506,1017280])
fitmb,fitmob=optimize.curve_fit(bastenaire,S,N,p0=(30000,2000000000,0.2,250))
plt.scatter(N,S)
plt.plot(bastenaire(S,*fitmb),S,label='bastenaire')
plt.legend()
plt.show()
However, the curve fit cannot identify the correct parameters and I get: OptimizeWarning: Covariance of the parameters could not be estimated.
Same results when I give no input parameters values.
Figure
Is there any way to tweak something and get results? Should my dataset cover a wider range and values?
Thank you!
Broc
Fitting is tough, you need to restrain the parameter space using bounds and (often) check a bit your initial values.
To make it work, I search for an initial value where the function had the correct look, then estimated some constraints:
bounds = np.array([(1e4, 1e12), (-np.inf, np.inf), (1e-20, 1e-2), (-2000., 20000)]).T
fitmb, fitmob = optimize.curve_fit(bastenaire,S, N,p0=(1e7,-100.,1e-5,250.), bounds=bounds)
returns
(array([ 1.00000000e+10, 1.03174824e+04, 7.53169772e-03, -7.32901325e+01]), array([[ 2.24128391e-06, 6.17858390e+00, -1.44693602e-07,
-5.72040842e-03],
[ 6.17858390e+00, 1.70326029e+07, -3.98881486e-01,
-1.57696515e+04],
[-1.44693602e-07, -3.98881486e-01, 1.14650323e-08,
4.68707940e-04],
[-5.72040842e-03, -1.57696515e+04, 4.68707940e-04,
1.93358414e+01]]))
I need the values of the autocorrelation coefficients coming from the autocorrelation_plot(). The problem is that the output coming from this function is not accessible, so I need another function to get such values. That's why I used acf() from statsmodels but it didn't get the same plot as autocorrelation_plot() does. Here is my code:
from statsmodels.tsa.stattools import acf
from pandas.plotting import autocorrelation_plot
import matplotlib.pyplot as plt
import numpy as np
y = np.sin(np.arange(1,6*np.pi,0.1))
plt.plot(acf(y))
plt.show()
So the result is not the same as this:
autocorrelation_plot(y)
plt.show()
This seems to be related to the nlags parameter of acf:
nlags: int, optional
Number of lags to return autocorrelation for.
I don't know what exactly this does but in the source of acf there is a slicing
that shortens the array:
avf = acovf(x, unbiased=unbiased, demean=True, fft=fft, missing=missing)
acf = avf[:nlags + 1] / avf[0]
If you use statsmodels.tsa.stattools.acovf directly the result is the same as with autocorrelation_plot:
avf = acovf(x, unbiased=unbiased, demean=True, fft=fft, missing=missing)
So you can call it like
plt.plot(acf(y, nlags=len(y)))
to make it work.
An explanation of lag: https://math.stackexchange.com/questions/2548314/what-is-lag-in-a-time-series/2548350
I want to transform data
[0.54667, 0.471447, 0.826591, 0.330514, 0.7263, 0.496063, 0.520698, 0.321594, 0.351358, 0.894333]
to distribution
'dgamma(a=0.91, loc=0.48, scale=0.15)'
How to do in python
First of all, you don't need to generate an distribution object in advance. All you need would be the distribution params using codes below.
from scipy.stats import gamma
import numpy as np
data = [1,2,3,4,5] # your data
fit_alpha, fit_loc, fit_beta = gamma.fit(np.array(data), floc=0, fscale=1)
Then, you can use scipy.stats.gamma funtions to get PDF/CDF/ec. Like:
print(gamma.pdf(0.9, fit_alpha))
Check out the documentations to find the useful calls.
I had x,y,height vars to build a contour in python.
I created a Triangulation grid using
x,y,height and traing are numpy arrays
tri = Tri.Triangulation(x, y, triang)
then i did a contour using tricontourf
tricontourf(tri,height)
how can i get the output of the tricontourf into a numpy array. I can display the image using pyplot but I dont want to.
when I tried this:
triout = tricontourf(tri,height)
print triout
I got:
<matplotlib.tri.tricontour.TriContourSet instance at 0xa9ab66c>
I need to get the image data and if I could get numpy array its easy for me.
Is it possible to do this?
if its not possible can I do what tricontourf does without matplotlib in python?
You should try this :
cs = tricontourf(tri,height)
for collection in cs.collections:
for path in collection.get_paths():
print path.to_polygons()
as I learned on:
https://github.com/matplotlib/matplotlib/issues/367
(it is better to use path.to_polygons() )