why am I getting the valueError while sentimental analysis? - python

I was trying to do the sentimental analysis of amazon product reviews here and i was trying to get the pie chart and bar graph but got this error.
not getting the pie chart and bargraph
ValueError Traceback (most recent call last)
<ipython-input-90-2089ce8a5ab8> in <module>
----> 1 categorical_variable_summary(df,"overall")
1 frames
<ipython-input-87-29535c4328ba> in categorical_variable_summary(df, column_name)
3 fig = make_subplots(rows = 1, cols = 2,
4 subplot_titles=('Countplot', 'Percentage'),
----> 5 specs=[[{'types' : 'xy'}],[{'types': 'domain'}]])
6
7 fig.add_trace(go.Bar( y = df[column_name].value_counts().values.tolist(),
/usr/local/lib/python3.7/dist-packages/plotly/subplots.py in make_subplots(rows, cols, shared_xaxes, shared_yaxes, start_cell, print_grid, horizontal_spacing, vertical_spacing, subplot_titles, column_widths, row_heights, specs, insets, column_titles, row_titles, x_title, y_title, figure, **kwargs)
448 dimensions ({rows} x {cols}).
449 Received value of type {typ}: {val}""".format(
--> 450 rows=rows, cols=cols, typ=type(specs), val=repr(specs)
451 )
452 )
ValueError:
The 'specs' argument to make_subplots must be a 2D list of dictionaries with dimensions (1 x 2).
Received value of type <class 'list'>: [[{'types': 'xy'}], [{'types': 'domain'}]]

Related

How to transform a Pandas Dataframe with irregular coordinates into a xarray Dataset

I'm working with a pandas Dataframe on python, but in order to plot as a map my data I have to transform it into a xarray Dataset, since the library I'm using to plot (salem) works best for this class. The problem I'm having is that the grid of my data isn't regular so I can't seem to be able to create the Dataset.
My Dataframe has the latitude and longitude, as well as the value in each point:
lon lat value
0 -104.936302 -51.339233 7.908411
1 -104.827377 -51.127686 7.969049
2 -104.719154 -50.915470 8.036676
3 -104.611641 -50.702595 8.096765
4 -104.504814 -50.489056 8.163690
... ... ... ...
65995 -32.911377 15.359591 25.475702
65996 -32.957718 15.579139 25.443994
65997 -33.004040 15.798100 25.429346
65998 -33.050335 16.016472 25.408105
65999 -33.096611 16.234255 25.383844
[66000 rows x 3 columns]
In order to create the Dataset using lat and lon as coordinates and fill all of the missing values with NaN, I was trying the following:
ds = xr.Dataset({
'ts': xr.DataArray(
data = value, # enter data here
dims = ['lon','lat'],
coords = {'lon': lon, 'lat':lat},
attrs = {
'_FillValue': np.nan,
'units' : 'K'
}
)},
attrs = {'attr': 'RegCM output'}
)
ds
But I got the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [41], in <cell line: 1>()
1 ds = xr.Dataset({
----> 2 'ts': xr.DataArray(
3 data = value, # enter data here
4 dims = ['lon','lat'],
5 coords = {'lon': lon, 'lat':lat},
6 attrs = {
7 '_FillValue': np.nan,
8 'units' : 'K'
9 }
10 )},
11 attrs = {'example_attr': 'this is a global attribute'}
12 )
14 # ds = xr.Dataset(
15 # data_vars=dict(
16 # variable=(["lon", "lat"], value)
(...)
25 # }
26 # )
27 ds
File ~\anaconda3\lib\site-packages\xarray\core\dataarray.py:406, in DataArray.__init__(self, data, coords, dims, name, attrs, indexes, fastpath)
404 data = _check_data_shape(data, coords, dims)
405 data = as_compatible_data(data)
--> 406 coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
407 variable = Variable(dims, data, attrs, fastpath=True)
408 indexes = dict(
409 _extract_indexes_from_coords(coords)
410 ) # needed for to_dataset
File ~\anaconda3\lib\site-packages\xarray\core\dataarray.py:123, in _infer_coords_and_dims(shape, coords, dims)
121 dims = tuple(dims)
122 elif len(dims) != len(shape):
--> 123 raise ValueError(
124 "different number of dimensions on data "
125 f"and dims: {len(shape)} vs {len(dims)}"
126 )
127 else:
128 for d in dims:
ValueError: different number of dimensions on data and dims: 1 vs 2
I would really appreciate any insights to solve this.
If you really require a rectangularly gridded dataset you need to resample your data into a regular grid... (rasterio, pyresample etc. provide useful functionalities for that). However if you just want to plot the data, this is not necessary!
Not sure about salem (never used it so far), but I've tried my best to simplify plotting of irrelgularly sampled data in the visualization-library I'm developing EOmaps!
You could get a "contour-plot" like appearance if you use a "delaunay triangulation" to visualize the data:
import pandas as pd
df = pd.read_csv("... path-to df.csv ...", index_col=0)
from eomaps import Maps
m = Maps()
m.add_feature.preset.coastline()
m.set_data(df, x="lon", y="lat", crs=4326, parameter="value")
m.set_shape.delaunay_triangulation()
m.plot_map()

TypeError: Invalid shape (1,) for image data., when I tried to run program, something just didn't go as I wish. Here is code:

warnings.filterwarnings('always')
warnings.filterwarnings('ignore')
count=0
fig,ax=plt.subplots(4,2)
fig.set_size_inches(15,15)
for i in range (4):
for j in range (2):
ax[i,j].imshow(x_test[prop_class[count]])
ax[i,j].set_title("Predicted Flower : "+str(le.inverse_transform([pred_digits[prop_class[count]]]))+"\n"+"Actual Flower : "+str(le.inverse_transform(np.argmax([y_test[prop_class[count]]]))))
plt.tight_layout()
count+=1
I am attaching the code as well, please let me know how I can fix this....
//TypeError: Invalid shape (1,) for image data
TypeError Traceback (most recent call last)
<ipython-input-72-17e30716b99c> in <module>
7 for i in range (4):
8 for j in range (2):
----> 9 ax[i,j].imshow(x_test[prop_class[count]])
10 ax[i,j].set_title("Predicted Flower : "+str(le.inverse_transform([pred_digits[prop_class[count]]]))+"\n"+"Actual Flower : "+str(le.inverse_transform(np.argmax([y_test[prop_class[count]]]))))
11 plt.tight_layout()
4 frames
/usr/local/lib/python3.7/dist-packages/matplotlib/image.py in set_data(self, A)
697 or self._A.ndim == 3 and self._A.shape[-1] in [3, 4]):
698 raise TypeError("Invalid shape {} for image data"
--> 699 .format(self._A.shape))
700
701 if self._A.ndim == 3:
TypeError: Invalid shape (1,) for image data

Error when trying to use seasonal_decompose from Statsmodels

I have the following dataframe called 'data':
Month
Revenue Index
1920-01-01
1.72
1920-02-01
1.83
1920-03-01
1.94
...
...
2021-10-01
114.20
2021-11-01
115.94
2021-12-01
116.01
This is essentially a monthly revenue index on which I am trying to use seasonal_decompose with the following code:
result = seasonal_decompose(data['Revenue Index'], model='multiplicative')
But unfortunately I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-39-08e3139bbf77> in <module>()
----> 1 result = seasonal_decompose(data['Consumptieprijsindex'], model='multiplicative')
2 rcParams['figure.figsize'] = 12, 6
3 plt.rc('lines', linewidth=1, color='r')
4
5 fig = result.plot()
/usr/local/lib/python3.7/dist-packages/statsmodels/tsa/seasonal.py in seasonal_decompose(x, model, filt, freq, two_sided, extrapolate_trend)
125 freq = pfreq
126 else:
--> 127 raise ValueError("You must specify a freq or x must be a "
128 "pandas object with a timeseries index with "
129 "a freq not set to None")
ValueError: You must specify a freq or x must be a pandas object with a timeseries index with a freq not set to None
Does anyone know how to solve this issue? Thanks!
The following code in the comments answered my question:
result = seasonal_decompose(data['Revenue Index'], model='multiplicative', period=12)

Binning a series returns a seemingly unrelated TypeError

I am trying to slice a dataframe I created into bins:
picture of dataframe in case it's relevant
# create bins and labels
bins = [575, 600, 625, 650]
labels = [
"$575-$599",
"$600-$624",
"$625-$649",
"$650-$675"
]
schoolSummary["Spending Range"] = pd.cut(schoolSummary["Per Student Budget"], bins, labels = labels)
For some reason, I receive this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-73-b938397739fa> in <module>()
9
10 #schoolSummary["Spending Range"] =
---> 11 pd.cut(schoolSummary["Per Student Budget"], bins, labels = labels)
~\Anaconda3\envs\py36\lib\site-packages\pandas\core\reshape\tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates)
232 include_lowest=include_lowest,
233 dtype=dtype,
--> 234 duplicates=duplicates)
235
236 return _postprocess_for_cut(fac, bins, retbins, x_is_series,
~\Anaconda3\envs\py36\lib\site-packages\pandas\core\reshape\tile.py in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates)
335
336 side = 'left' if right else 'right'
--> 337 ids = _ensure_int64(bins.searchsorted(x, side=side))
338
339 if include_lowest:
TypeError: '<' not supported between instances of 'int' and 'str'
I'm confused, because I did not use '<' in the code at all. I also used
print(type(schoolSummary["Per Student Budget"]))
and it is a series object, so I don't know what 'int' and 'str' it's referring to. Is it a problem with my bins or labels?
Due to low rep, I can't comment to your question,
You must try the following
bins = [575, 600, 625, 650]
labels = [
"$575-$599",
"$600-$624",
"$625-$649",
"$650-$675"
]
for bin_ in bins:
schoolSummary["Spending Range"] = pd.cut(schoolSummary["Per Student Budget"], bin_, labels = labels)
Because bin takes int type, instead of a list.

Having trouble with seaborn module in python

I am trying to draw some basic plots using the seaborn's jointplot() method.
My pandas data frame looks like this:
Out[250]:
YEAR Yields avgSumPcpn avgMaxSumTemp avgMinSumTemp
1970 5000 133.924981 30.437124 19.026974
1971 5560 107.691316 31.161974 19.278186
1972 5196 116.830066 31.454192 19.443712
1973 4233 181.550733 30.373581 19.097679
1975 5093 112.137538 30.428966 18.863224
I am trying to draw 'Yields' against 'YEAR' (So a plot to see how 'Yields' is varying over time). A simple plot.
But when I do this:
sns.jointplot(x='YEAR',y='Yeilds', data = summer_pcpn_temp_yeild, kind = 'reg', size = 10)
I am getting the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-251-587582a746b8> in <module>()
3 #ax = plt.axes()
4 #sns_sum_reg_min_temp_pcpn = sns.regplot(x='avgSumPcpn',y='avgMaxSumTemp', data = df_sum_temp_pcpn)
----> 5 sns.jointplot(x='Yeilds',y='YEAR', data = summer_pcpn_temp_yeild, kind = 'reg', size = 10)
6 plt.title('Avg Summer Precipitation vs Yields of Wharton TX', fontsize = 10)
7
//anaconda/lib/python2.7/site-packages/seaborn/distributions.pyc in jointplot(x, y, data, kind, stat_func, color, size, ratio, space, dropna, xlim, ylim, joint_kws, marginal_kws, annot_kws, **kwargs)
793 grid = JointGrid(x, y, data, dropna=dropna,
794 size=size, ratio=ratio, space=space,
--> 795 xlim=xlim, ylim=ylim)
796
797 # Plot the data using the grid
//anaconda/lib/python2.7/site-packages/seaborn/axisgrid.pyc in __init__(self, x, y, data, size, ratio, space, dropna, xlim, ylim)
1637 if dropna:
1638 not_na = pd.notnull(x) & pd.notnull(y)
-> 1639 x = x[not_na]
1640 y = y[not_na]
1641
TypeError: string indices must be integers, not Series
So I printed out the types of each column. Here is how:
for i in summer_pcpn_temp_yeild.columns.values.tolist():
print type(summer_pcpn_temp_yeild[[i]])
print type(summer_pcpn_temp_yeild.index.values)
which gives me:
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<type 'numpy.ndarray'>
SO, I am not being able to understand how to fix it.
Any help would be greatly appreciated.
Thanks
Check that the YEAR and Yields have integer ( not string) types of values.
Try changing x='Yeilds' to x='Yields' in your call to jointplot:
sns.jointplot(x='YEAR',y='Yeilds', data = summer_pcpn_temp_yeild, kind = 'reg', size = 10)
The error message is misleading. Seaborn can't find the column named "Yeilds" in your summer_pcpn_temp_yeild dataframe, because the dataframe column is spelled "Yields".
I had the same problem, and fixed it by correcting the x= argument to sns.jointplot()

Categories

Resources