CountVectorizer' object has no attribute 'get_feature_names_out'

CountVectorizer' object has no attribute 'get_feature_names_out' - python

Why do i keep getting this error? I tried different versions of anaconda 3 but did not manage to get it done. What should i install to work it properly? I used sklearn versions from 0.20 - 0.23.
Error message:
Code:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from sklearn.feature_extraction.text import CountVectorizer
from collections import Counter
from wordcloud import WordCloud
vectorizer = CountVectorizer(ngram_range=(2,2), analyzer='word')
sparse_matrix = vectorizer.fit_transform(df['content'][:2000])
frequencies = sum(sparse_matrix).toarray()[0]
ngrams = pd.DataFrame(frequencies, index=vectorizer.get_feature_names_out(), columns=['frequency'])
ngrams = ngrams.sort_values(by='frequency', ascending=False)
ngrams

You are using an old version of scikit-learn. If I'm not mistaken, get_feature_names_out() was only introduced in version 1.0.
Upgrade to a newer version, or, to get similar functionality in an earlier version, you can use get_feature_names().

Related

Run markdown in pycharm with R and python chunks using reticulate

Really difficult to find anyone using markdown in a python IDE (I am using pycharm), with both R and python chunks.
Here is my code so far; I am just trying to set up my markdown to use both R and python code; it seems like my python chunk doesn't work; any idea why? Thanks!
R environment
library(readODS) # excel data
library(glmmTMB) # mixed models
library(car) # ANOVA on mixed models
library(DHARMa) # goodness of fit of the model
library(emmeans) # post hoc
library(ggplot2) # plots
library(reticulate) # link between R and python
use_python('C:/Users/saaa/anaconda3/envs/Python_projects/python.exe')
Python environment
import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

Pkl.File import can't be read "ValueError: unsupported pickle protocol: 5"

here the exemplary code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pickle5 as pickle
#Read
output = pd.read_pickle("Energy_data.pkl")
plt.figure()
#print(output)
output.plot()
I am using Python 3.7 and that is probably the reason for the error message, because this .pkl files were created in Python 3.8 . If my colleague runs it (he created the .pkl-Files), it'll work.
I tried to use this solution (maybe I did not do it correctly) shown here, but it did not work anyway. Can someone show me how to import the pkl files using the example above in Python 3.7?
Thank you very much in advance!

merging train and test datasets into one using tensorflow

I am working with the classic titanic dataset and trying to apply NNs. My data comes already split into train and dev sets. However, I want to merge the datasets together for many things (for example, my own splitting, etc..)
Is there a way I can merge both datasets?
I have looked around and only found information about how to split a dataset, but I was unable to find how to merge them back together.
Any help?
A MWE is provided below!
from __future__ import absolute_import,division,print_function,unicode_literals
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import clear_output
from six.moves import urllib
import tensorflow.compat.v2.feature_column as fc
import tensorflow as tf
import seaborn as sns
# URL address of data
TRAIN_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/train.csv"
TEST_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/eval.csv"
# Downloading data
train_file_path = tf.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
test_file_path = tf.keras.utils.get_file("eval.csv", TEST_DATA_URL)
# Reading data
data_train = pd.read_csv(train_file_path)
data_test = pd.read_csv(test_file_path)
MY_DATA= MERGE HERE????? # merge(data_train,data_test)??

I assume data_train and data_test have the same number of columns and the column names are the same. Then just do
merged_df= pd.concat([data_train, data_test], axis=0)

Problem with importing Lemmatization from gensim

I am trying to use Gensim packages as written below:
import re, numpy as np, pandas as pd
from pprint import pprint
# Gensim
import gensim, spacy, logging, warnings
import gensim.corpora as corpora
from gensim.utils import lemmatize, simple_preprocess
from gensim.models import CoherenceModel
import matplotlib.pyplot as plt
But i keep getting the error:
ImportError: cannot import name 'lemmatize' from 'gensim.utils' (/Users/xxx/opt/anaconda3/envs/virt_env/lib/python3.9/site-packages/gensim/utils.py)
I am using gensim v4.0.1, Python 3.8, numpy 1.20.0.
Has anyone encountered this kinda problem lately? Thank you

Gensim only ever previously wrapped the lemmatization routines of another library (Pattern) – which was not a particularly modern/maintained option, so was removed from Gensim-4.0.
Users should choose & apply their own lemmatization operations, if any, as a preprocessing step before applying Gensim's algorithms. Some Python libraries offering lemmatization include:
Pattern (Gensim's previously-included option): https://github.com/clips/pattern
NLTK: https://www.nltk.org/api/nltk.stem.html#nltk.stem.wordnet.WordNetLemmatizer
UDPipe: https://ufal.mff.cuni.cz/udpipe
Spacy: https://spacy.io/api/lemmatizer
Stanza: https://stanfordnlp.github.io/stanza/

NetCDF Attribute not found when using metpy and siphon to get data

I'm trying to plot some meteorological data in NetCDF format accessed via the Unidata siphon package.
I've imported what the MetPy docs suggest are the relevant libraries
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
from netCDF4 import num2date
import numpy as np
import xarray as xr
from siphon.catalog import TDSCatalog
from datetime import datetime
import metpy.calc as mpcalc
from metpy.units import units
and I've constructed a query for data as per the Siphon docs
best_gfs = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/catalog.xml?dataset=grib/NCEP/GFS/Global_0p25deg/Best')
best_ds = best_gfs.datasets[0]
ncss = best_ds.subset()
query = ncss.query()
query.lonlat_box(north=55, south=20, east=-60, west=-90).time(datetime.utcnow())
query.accept('netcdf4')
query.variables('Vertical_velocity_pressure_isobaric','Relative_humidity_isobaric','Temperature_isobaric','u-component_of_wind_isobaric','v-component_of_wind_isobaric','Geopotential_height_isobaric')
data = ncss.get_data(query)
Unfortunately, when I attempt to parse the dataset using the code from the Metpy docs
data = data.metpy.parse_cf()
I get an error: "AttributeError: NetCDF: Attribute not found"
When attempting to fix this problem, I came across another SO post that seems to have the same issue, but the solution suggested there -- to update my metpy to the latest version, -- did not work for me. I updated metpy using Conda but got the same problem as before I updated. Any other ideas on how to get this resolved?

Right now the following code in Siphon
data = ncss.get_data(query)
will return a Dataset object from netcdf4-python. You need one extra step to hand this to xarray, which will make MetPy's parse_cf available:
from xarray.backends import NetCDF4DataStore
ds = xr.open_dataset(NetCDF4DataStore(data))
data = ds.metpy.parse_cf()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

CountVectorizer' object has no attribute 'get_feature_names_out' - python

You are using an old version of scikit-learn. If I'm not mistaken, get_feature_names_out() was only introduced in version 1.0. Upgrade to a newer version, or, to get similar functionality in an earlier version, you can use get_feature_names().

Related

Run markdown in pycharm with R and python chunks using reticulate

Pkl.File import can't be read "ValueError: unsupported pickle protocol: 5"

merging train and test datasets into one using tensorflow

Problem with importing Lemmatization from gensim

NetCDF Attribute not found when using metpy and siphon to get data

Categories

Resources