Geopandas plotting by specifying column at plot time - python

I am reading a geojson data from here into a GeoDataFrame named gdf.
I have also calculated the centroids of each polygon using gdf['centroid'] = gdf.centroid.
I can individually plot either the centroids or the polygons by setting the column as the geometry column using gdf.set_geometry("<centroid | geometry>"). So, the following code works:
gdf.plot() #By default the geometry column is the column to plot
gdf = gdf.set_geometry("centroid")
gdf.plot()
However, when I try to run the following code:
gdf['geometry'].plot() #Geometry column has been set as centroid before
Or,
gdf = gdf.set_geometry("geometry")
gdf["centroid"].plot()
I get the following error:
TypeError Traceback (most recent call last)
<ipython-input-22-0330c435e2c9> in <module>
1 gdf = gdf.set_geometry("centroid")
----> 2 ax = gdf['geometry'].plot()
3 #gdf["centroid"].plot(ax=ax, color="black")
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\plotting\_core.py in __call__(self, *args, **kwargs)
953 data.columns = label_name
954
--> 955 return plot_backend.plot(data, kind=kind, **kwargs)
956
957 __call__.__doc__ = __doc__
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\plotting\_matplotlib\__init__.py in plot(data, kind, **kwargs)
59 kwargs["ax"] = getattr(ax, "left_ax", ax)
60 plot_obj = PLOT_CLASSES[kind](data, **kwargs)
---> 61 plot_obj.generate()
62 plot_obj.draw()
63 return plot_obj.result
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\plotting\_matplotlib\core.py in generate(self)
276 def generate(self):
277 self._args_adjust()
--> 278 self._compute_plot_data()
279 self._setup_subplots()
280 self._make_plot()
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\plotting\_matplotlib\core.py in _compute_plot_data(self)
439 # no non-numeric frames or series allowed
440 if is_empty:
--> 441 raise TypeError("no numeric data to plot")
442
443 self.data = numeric_data.apply(self._convert_to_ndarray)
TypeError: no numeric data to plot
Even though I can set the column as the geometry and then do the plotting, plotting by specifying the particular column at plot time is needed to overlay multiple geometries.
--FULL CODE--
import geopandas
import geoplot
gdf = geopandas.read_file("<path to file>.geojson")
print(gdf.head())
print(gdf.crs)
gdf.plot(legend=True)
gdf['centroid'] = gdf.centroid
gdf = gdf.set_geometry("centroid")
gdf.plot() #Works
gdf['centroid'].plot() #Works
gdf['geometry'].plot() #Error is thrown here
type(gdf)

Can you try it without the line import geoplot? It seems to be working fine for me.

Related

Python: why do I get an error when I try to interpolate an xarray between dates?

I am trying to interpolate the values of an xarray called pop
pop
I am using the function xarray.interp
dates = pd.date_range('1990-01-01', '2020-01-01', freq='1Y')
popI = pop.interp(time=dates, kwargs={"fill_value": "extrapolate"})
but I get the following error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-75-1393bc257da7> in <module>
----> 1 popI = pop.interp(time=dates, kwargs={"fill_value": "extrapolate"})
/usr/lib/python3/dist-packages/xarray/core/dataset.py in interp(self, coords, method, assume_sorted, kwargs, method_non_numeric, **coords_kwargs)
3163 if method in ["linear", "nearest"]:
3164 for k, v in validated_indexers.items():
-> 3165 obj, newidx = missing._localize(obj, {k: v})
3166 validated_indexers[k] = newidx[k]
3167
/usr/lib/python3/dist-packages/xarray/core/missing.py in _localize(var, indexes_coords)
561 indexes = {}
562 for dim, [x, new_x] in indexes_coords.items():
--> 563 minval = np.nanmin(new_x.values)
564 maxval = np.nanmax(new_x.values)
565 index = x.to_index()
<__array_function__ internals> in nanmin(*args, **kwargs)
/usr/lib/python3/dist-packages/numpy/lib/nanfunctions.py in nanmin(a, axis, out, keepdims)
319 # which do not implement isnan (gh-9009), or fmin correctly (gh-8975)
320 res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
--> 321 if np.isnan(res).any():
322 warnings.warn("All-NaN slice encountered", RuntimeWarning,
323 stacklevel=3)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
You're calling interp on a Dataset, which will always apply functions to all data variables. One of your data variable is a string array mollewide. This can't be interpolated. So you can either set this as a coordinate:
popI = pop.set_coords('mollewide').interp(time=dates, kwargs={"fill_value": "extrapolate"})
or you can only operate on the popDensity data variable:
popI = pop["popDensity"].interp(time=dates, kwargs={"fill_value": "extrapolate"})

Healpy mollview() ValueError for colormap

I need to learn how to use Healpy and so I was trying to reproduce the results of the basic tutorial. I use Anaconda on Ubuntu 22.04 and I think I have all the pre-requisites (I have Python 3.9.13, Numpy, Matplotlib, Astropy, python3-dev and python-dev-is-python3 installed).
I have tried many variations of what is shown in the tutorial notebook, including a literal copy+paste of the code, I've tried to do this in Ipython on terminal, on a jupyter notebook, on Spyder, I've tried to include the %matplotlib inline (after importing matplotlib) in all of these options (I've tried not to include in all of them too), and in all situations I end up with the exact same error message (full error message in the end of the post):
ValueError: Passing a Normalize instance simultaneously with vmin/vmax
is not supported. Please pass vmin/vmax directly to the norm when
creating it.
Everything works except for the plot. I've tried setting min and max in the hp.mollview() command according to the documentation, but it didn't work too. It seems like a bug to me, so I thought about creating an issue ticket on github, but honestly the tutorial is very updated and I don't think this kind of bug would go unnoticed, so I'm thinking I missed some minor detail and I hope someone in here can help me identify what it is. In the meantime, I'll probably try to learn some other version of Healpix.
Here is the full error message when I run the code in a jupyter notebook (by the way, sorry if my question is not very well organized, this is my first post):
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) Cell In [5], line 2
1 m = np.arange(NPIX)
----> 2 hp.mollview(m, title="Mollview image RING")
3 hp.graticule()
File ~/anaconda3/lib/python3.9/site-packages/healpy/visufunc.py:250,
in mollview(map, fig, rot, coord, unit, xsize, title, nest, min, max,
flip, remove_dip, remove_mono, gal_cut, format, format2, cbar, cmap,
badcolor, bgcolor, notext, norm, hold, reuse_axes, margins, sub,
nlocs, return_projected_map)
246 elif remove_mono:
247 map = pixelfunc.remove_monopole(
248 map, gal_cut=gal_cut, nest=nest, copy=True, verbose=True
249 )
--> 250 img = ax.projmap(
251 map,
252 nest=nest,
253 xsize=xsize,
254 coord=coord,
255 vmin=min,
256 vmax=max,
257 cmap=cmap,
258 badcolor=badcolor,
259 bgcolor=bgcolor,
260 norm=norm,
261 )
262 if cbar:
263 im = ax.get_images()[0]
File ~/anaconda3/lib/python3.9/site-packages/healpy/projaxes.py:736,
in HpxMollweideAxes.projmap(self, map, nest, **kwds)
734 nside = pixelfunc.npix2nside(pixelfunc.get_map_size(map))
735 f = lambda x, y, z: pixelfunc.vec2pix(nside, x, y, z, nest=nest)
--> 736 return super(HpxMollweideAxes, self).projmap(map, f, **kwds)
File ~/anaconda3/lib/python3.9/site-packages/healpy/projaxes.py:726,
in MollweideAxes.projmap(self, map, vec2pix_func, xsize, **kwds)
724 def projmap(self, map, vec2pix_func, xsize=800, **kwds):
725 self.proj.set_proj_plane_info(xsize=xsize)
--> 726 img = super(MollweideAxes, self).projmap(map, vec2pix_func, **kwds)
727 self.set_xlim(-2.01, 2.01)
728 self.set_ylim(-1.01, 1.01)
File ~/anaconda3/lib/python3.9/site-packages/healpy/projaxes.py:202,
in SphericalProjAxes.projmap(self, map, vec2pix_func, vmin, vmax,
badval, badcolor, bgcolor, cmap, norm, rot, coord, **kwds)
200 ext = self.proj.get_extent()
201 img = np.ma.masked_values(img, badval)
--> 202 aximg = self.imshow(
203 img,
204 extent=ext,
205 cmap=cm,
206 norm=nn,
207 interpolation="nearest",
208 origin="lower",
209 vmin=vmin,
210 vmax=vmax,
211 **kwds
212 )
213 xmin, xmax, ymin, ymax = self.proj.get_extent()
214 self.set_xlim(xmin, xmax)
File
~/anaconda3/lib/python3.9/site-packages/matplotlib/_api/deprecation.py:454,
in make_keyword_only..wrapper(*args, **kwargs)
448 if len(args) > name_idx:
449 warn_deprecated(
450 since, message="Passing the %(name)s %(obj_type)s "
451 "positionally is deprecated since Matplotlib %(since)s; the "
452 "parameter will become keyword-only %(removal)s.",
453 name=name, obj_type=f"parameter of {func.name}()")
--> 454 return func(*args, **kwargs)
File
~/anaconda3/lib/python3.9/site-packages/matplotlib/init.py:1423,
in _preprocess_data..inner(ax, data, *args, **kwargs) 1420
#functools.wraps(func) 1421 def inner(ax, *args, data=None,
**kwargs): 1422 if data is None:
-> 1423 return func(ax, *map(sanitize_sequence, args), **kwargs) 1425 bound = new_sig.bind(ax, *args, **kwargs) 1426 auto_label = (bound.arguments.get(label_namer) 1427
or bound.kwargs.get(label_namer))
File
~/anaconda3/lib/python3.9/site-packages/matplotlib/axes/_axes.py:5577,
in Axes.imshow(self, X, cmap, norm, aspect, interpolation, alpha,
vmin, vmax, origin, extent, interpolation_stage, filternorm,
filterrad, resample, url, **kwargs) 5574 if im.get_clip_path() is
None: 5575 # image does not already have clipping set, clip to
axes patch 5576 im.set_clip_path(self.patch)
-> 5577 im._scale_norm(norm, vmin, vmax) 5578 im.set_url(url) 5580 # update ax.dataLim, and, if autoscaling, set viewLim 5581 #
to tightly fit the image, regardless of dataLim.
File ~/anaconda3/lib/python3.9/site-packages/matplotlib/cm.py:405, in
ScalarMappable._scale_norm(self, norm, vmin, vmax)
403 self.set_clim(vmin, vmax)
404 if isinstance(norm, colors.Normalize):
--> 405 raise ValueError(
406 "Passing a Normalize instance simultaneously with "
407 "vmin/vmax is not supported. Please pass vmin/vmax "
408 "directly to the norm when creating it.")
410 # always resolve the autoscaling so we have concrete limits
411 # rather than deferring to draw time.
412 self.autoscale_None()
ValueError: Passing a Normalize instance simultaneously with vmin/vmax
is not supported. Please pass vmin/vmax directly to the norm when
creating it.

Annotate csv column in scatter plot

I have two dataset in csv format:
df2
type prediction 100000 155000
0 0 2.60994 3.40305
1 1 10.82100 34.68900
0 0 4.29470 3.74023
0 0 7.81339 9.92839
0 0 28.37480 33.58000
df
TIMESTEP id type y z v_acc
100000 8054 1 -0.317192 -0.315662 15.54430
100000 669 0 0.352031 -0.008087 2.60994
100000 520 0 0.437786 0.000325 5.28670
100000 2303 1 0.263105 0.132615 7.81339
105000 8055 1 0.113863 0.036407 5.94311
I am trying to match value of df2[100000] to df1[v_acc]. If value matched, I am making scatter plot from df with columns y and z. After that I want to to annoted scatter point with matched value.
What I want is:
(I want all annotaions in a same plot).
I tried to code in python for such condition but I am not getting all annotation points in a single plot instead I am getting multi plots with a single annotation.
I am also getting this error:
TypeError Traceback (most recent call last)
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/IPython/core/formatters.py:339, in BaseFormatter.__call__(self, obj)
337 pass
338 else:
--> 339 return printer(obj)
340 # Finally look for special method names
341 method = get_real_method(obj, self.print_method)
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/IPython/core/pylabtools.py:151, in print_figure(fig, fmt, bbox_inches, base64, **kwargs)
148 from matplotlib.backend_bases import FigureCanvasBase
149 FigureCanvasBase(fig)
--> 151 fig.canvas.print_figure(bytes_io, **kw)
152 data = bytes_io.getvalue()
153 if fmt == 'svg':
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/matplotlib/backend_bases.py:2295, in FigureCanvasBase.print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, bbox_inches, pad_inches, bbox_extra_artists, backend, **kwargs)
2289 renderer = _get_renderer(
2290 self.figure,
2291 functools.partial(
2292 print_method, orientation=orientation)
2293 )
2294 with getattr(renderer, "_draw_disabled", nullcontext)():
-> 2295 self.figure.draw(renderer)
2297 if bbox_inches:
...
189 if len(self) == 1:
190 return converter(self.iloc[0])
--> 191 raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>
Can I get some help to make a plot as I want?
Thank you.
My code is here:
df2 = pd.read_csv('./result.csv')
print(df2.columns)
#print(df2.head(10))
df = pd.read_csv('./main.csv')
df = df[df['TIMESTEP'] == 100000]
for i in df['v_acc']:
for j in df2['100000']:
# sometimes numbers are long and different after decimals.So mathing 0.2f only
if "{0:0.2f}".format(i) == "{0:0.2f}".format(j):
plt.figure(figsize = (10,8))
sns.scatterplot(data = df, x = "y", y = "z", hue = "type", palette=['red','dodgerblue'], legend='full')
plt.annotate(i, (df['y'][df['v_acc'] == i], df['z'][df['v_acc'] == i]))
plt.grid(False)
plt.show()
break
the reason for the multiple plots is because are you using plt.figure() inside the loop. This will create a single figure for each loop. You need to create that outside and only the individual scatter and annotate within the loop. Here is the updated code that ran for the data you provided. Other than that, think your code is fine...
fig, ax=plt.subplots(figsize = (7,7)) ### Keep this before the loop and call it as subplot
for i in df['v_acc']:
for j in df2[100000]:
# sometimes numbers are long and different after decimals.So mathing 0.2f only
if "{0:0.2f}".format(i) == "{0:0.2f}".format(j):
#plt.figure(figsize = (10,8))
ax=sns.scatterplot(data = df, x = "y", y = "z", hue = "type", palette=['red','dodgerblue'], legend='full')
ax.annotate(i, (df['y'][df['v_acc'] == i], df['z'][df['v_acc'] == i]))
break
plt.grid(False) ### Keep these two after the loop, just one show for one plot
plt.show()
Output plot

How to plot a windrose when the wind direction is a categorical value

From Dataset Australia Rainfall, I'm trying to predict RainTomorrow. Here is my code given below :
Downloading dataset directly from Kaggle using opendatasets library
import opendatasets as od
dataset_url = 'https://www.kaggle.com/jsphyg/weather-dataset-rattle-package'
od.download(dataset_url)
Importing necessary libraries
import os
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10,6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
Loading Dataset
data_dir = './weather-dataset-rattle-package'
os.listdir(data_dir)
train_csv = data_dir + '/weatherAUS.csv'
raw_df = pd.read_csv(train_csv)
Explore WindGustDir variable
print('WindGustDir contains', len(raw_df['WindGustDir'].unique()), 'labels')
raw_df['WindGustDir'].unique()
raw_df.WindGustDir.value_counts()
pd.get_dummies(raw_df.WindGustDir, drop_first=True, dummy_na=True).head()
pd.get_dummies(raw_df.WindGustDir, drop_first=True, dummy_na=True).sum(axis=0)
Plotting Windrose
from windrose import WindroseAxes
ax = WindroseAxes.from_ax()
ax.bar(raw_df.WindGustDir, raw_df.Rainfall, normed=True, opening=0.8,
edgecolor='white')
ax.set_legend()
I am unable to figure out which columns should use with WindGustDir or if their is any other option of compare RainTomorrow and WindGustDir .
Error Message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
e:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
57 try:
---> 58 return bound(*args, **kwds)
59 except TypeError:
TypeError: '<' not supported between instances of 'float' and 'str'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-253-1a1f0fa6bf7a> in <module>
1 ax = WindroseAxes.from_ax()
----> 2 ax.bar(direction=df.WindGustDir, var=df.Rainfall, normed=True, opening=0.8, edgecolor='white')
3 ax.set_legend()
e:\Anaconda3\lib\site-packages\windrose\windrose.py in bar(self, direction, var, **kwargs)
547 """
548
--> 549 bins, nbins, nsector, colors, angles, kwargs = self._init_plot(
550 direction, var, **kwargs
551 )
e:\Anaconda3\lib\site-packages\windrose\windrose.py in _init_plot(self, direction, var, **kwargs)
359
360 # Set the global information dictionnary
--> 361 self._info["dir"], self._info["bins"], self._info["table"] = histogram(
362 direction, var, bins, nsector, normed, blowto
363 )
e:\Anaconda3\lib\site-packages\windrose\windrose.py in histogram(direction, var, bins, nsector, normed, blowto)
746 direction[direction >= 360.] = direction[direction >= 360.] - 360
747
--> 748 table = histogram2d(x=var, y=direction, bins=[var_bins, dir_bins], normed=False)[0]
749 # add the last value to the first to have the table of North winds
750 table[:, 0] = table[:, 0] + table[:, -1]
<__array_function__ internals> in histogram2d(*args, **kwargs)
e:\Anaconda3\lib\site-packages\numpy\lib\twodim_base.py in histogram2d(x, y, bins, range, normed, weights, density)
742 xedges = yedges = asarray(bins)
743 bins = [xedges, yedges]
--> 744 hist, edges = histogramdd([x, y], bins, range, normed, weights, density)
745 return hist, edges[0], edges[1]
746
<__array_function__ internals> in histogramdd(*args, **kwargs)
e:\Anaconda3\lib\site-packages\numpy\lib\histograms.py in histogramdd(sample, bins, range, normed, weights, density)
1071
1072 # Compute the bin number each sample falls into.
-> 1073 Ncount = tuple(
1074 # avoid np.digitize to work around gh-11022
1075 np.searchsorted(edges[i], sample[:, i], side='right')
e:\Anaconda3\lib\site-packages\numpy\lib\histograms.py in <genexpr>(.0)
1073 Ncount = tuple(
1074 # avoid np.digitize to work around gh-11022
-> 1075 np.searchsorted(edges[i], sample[:, i], side='right')
1076 for i in _range(D)
1077 )
<__array_function__ internals> in searchsorted(*args, **kwargs)
e:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in searchsorted(a, v, side, sorter)
1346
1347 """
-> 1348 return _wrapfunc(a, 'searchsorted', v, side=side, sorter=sorter)
1349
1350
e:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
65 # Call _wrapit from within the except clause to ensure a potential
66 # exception has a traceback chain.
---> 67 return _wrapit(obj, method, *args, **kwds)
68
69
e:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in _wrapit(obj, method, *args, **kwds)
42 except AttributeError:
43 wrap = None
---> 44 result = getattr(asarray(obj), method)(*args, **kwds)
45 if wrap:
46 if not isinstance(result, mu.ndarray):
TypeError: '<' not supported between instances of 'float' and 'str'
It seems that the direction parameter must be numeric.
Create a dict where each key is a each direction in 'WindGustDir' and the corresponding value is a float in degrees.
.map the dict to df.WindGustDir and plot
Alternatively, create and plot a new column
df.insert(loc=8, column='WindGustDirDeg', value=df.WindGustDir.map(wind_dir_deg))
import pandas as pd
from windrose import WindroseAxes
import numpy as np
# load the downloaded data and dropna
df = pd.read_csv('weatherAUS/weatherAUS.csv').dropna(subset=['WindGustDir'])
# create a dict for WindGustDir to numeric values
wind_dir = ['E', 'ENE', 'NE', 'NNE', 'N', 'NNE', 'NW', 'WNW', 'W', 'WSW', 'SW', 'SSW', 'S', 'SSE', 'SE', 'ESE']
degrees = np.arange(0, 360, 22.5)
wind_dir_deg = dict((zip(wind_dir, degrees)))
# plot and map WindGustDir to the dict
ax = WindroseAxes.from_ax()
ax.bar(direction=df.WindGustDir.map(wind_dir_deg), var=df.Rainfall, normed=True, opening=0.8, edgecolor='white')
ax.set_legend()

Why is statsmodels throwing an IndedxError when I try to fit a linear mixed-effect model?

Given the code:
import statsmodels.api as sm
import statsmodels.formula.api as smf
df.reset_index(drop=True, inplace=True)
display(df.describe())
md = smf.mixedlm("c ~ iscorr", df, groups=df.subnum)
mdf = md.fit()
Where df is a pandas.DataFrame, I get the following error out of smf.mixedlm:
IndexError Traceback (most recent call last)
<ipython-input-34-5373fe9b774a> in <module>()
4 df.reset_index(drop=True, inplace=True)
5 display(df.describe())
----> 6 md = smf.mixedlm("c ~ iscorr", df, groups=df.subnum)
7 # mdf = md.fit()
/home/lthibault/.pyenv/versions/3.5.0/lib/python3.5/site-packages/statsmodels/regression/mixed_linear_model.py in from_formula(cls, formula, data, re_formula, subset, *args, **kwargs)
651 subset=None,
652 exog_re=exog_re,
--> 653 *args, **kwargs)
654
655 # expand re names to account for pairs of RE
/home/lthibault/.pyenv/versions/3.5.0/lib/python3.5/site-packages/statsmodels/base/model.py in from_formula(cls, formula, data, subset, *args, **kwargs)
148 kwargs.update({'missing_idx': missing_idx,
149 'missing': missing})
--> 150 mod = cls(endog, exog, *args, **kwargs)
151 mod.formula = formula
152
/home/lthibault/.pyenv/versions/3.5.0/lib/python3.5/site-packages/statsmodels/regression/mixed_linear_model.py in __init__(self, endog, exog, groups, exog_re, use_sqrt, missing, **kwargs)
537
538 # Split the data by groups
--> 539 self.endog_li = self.group_list(self.endog)
540 self.exog_li = self.group_list(self.exog)
541 self.exog_re_li = self.group_list(self.exog_re)
/home/lthibault/.pyenv/versions/3.5.0/lib/python3.5/site-packages/statsmodels/regression/mixed_linear_model.py in group_list(self, array)
671 if array.ndim == 1:
672 return [np.array(array[self.row_indices[k]])
--> 673 for k in self.group_labels]
674 else:
675 return [np.array(array[self.row_indices[k], :])
/home/lthibault/.pyenv/versions/3.5.0/lib/python3.5/site-packages/statsmodels/regression/mixed_linear_model.py in <listcomp>(.0)
671 if array.ndim == 1:
672 return [np.array(array[self.row_indices[k]])
--> 673 for k in self.group_labels]
674 else:
675 return [np.array(array[self.row_indices[k], :])
IndexError: index 7214 is out of bounds for axis 1 with size 7214
Why is this error occurring? len(df) reports that there are 7296 rows, so there should be no issue indexing the 7214th, and the explicit re-indexing ensures that the indices span from zero to 7295.
You may download df here to fiddle around with it if you'd like.
You have 82 null values in iscorr:
>>> df.iscorr.isnull().sum()
82
Drop them and you will be fine:
df = df[df.iscorr.notnull()]
Per the function's docstring:
Notes
------
`data` must define __getitem__ with the keys in the formula
terms args and kwargs are passed on to the model
instantiation. E.g., a numpy structured or rec array, a
dictionary, or a pandas DataFrame.
If `re_formula` is not provided, the default is a random
intercept for each group.
This method currently does not correctly handle missing
values, so missing values should be explicitly dropped from
the DataFrame before calling this method.
"""
Output:
>>> mdf.params
Intercept 0.032000
iscorr[T.True] 0.030670
Intercept RE -0.057462

Categories

Resources