Problem with importing Lemmatization from gensim

Problem with importing Lemmatization from gensim - python

I am trying to use Gensim packages as written below:
import re, numpy as np, pandas as pd
from pprint import pprint
# Gensim
import gensim, spacy, logging, warnings
import gensim.corpora as corpora
from gensim.utils import lemmatize, simple_preprocess
from gensim.models import CoherenceModel
import matplotlib.pyplot as plt
But i keep getting the error:
ImportError: cannot import name 'lemmatize' from 'gensim.utils' (/Users/xxx/opt/anaconda3/envs/virt_env/lib/python3.9/site-packages/gensim/utils.py)
I am using gensim v4.0.1, Python 3.8, numpy 1.20.0.
Has anyone encountered this kinda problem lately? Thank you

Gensim only ever previously wrapped the lemmatization routines of another library (Pattern) – which was not a particularly modern/maintained option, so was removed from Gensim-4.0.
Users should choose & apply their own lemmatization operations, if any, as a preprocessing step before applying Gensim's algorithms. Some Python libraries offering lemmatization include:
Pattern (Gensim's previously-included option): https://github.com/clips/pattern
NLTK: https://www.nltk.org/api/nltk.stem.html#nltk.stem.wordnet.WordNetLemmatizer
UDPipe: https://ufal.mff.cuni.cz/udpipe
Spacy: https://spacy.io/api/lemmatizer
Stanza: https://stanfordnlp.github.io/stanza/

Related

Run markdown in pycharm with R and python chunks using reticulate

Really difficult to find anyone using markdown in a python IDE (I am using pycharm), with both R and python chunks.
Here is my code so far; I am just trying to set up my markdown to use both R and python code; it seems like my python chunk doesn't work; any idea why? Thanks!
R environment
library(readODS) # excel data
library(glmmTMB) # mixed models
library(car) # ANOVA on mixed models
library(DHARMa) # goodness of fit of the model
library(emmeans) # post hoc
library(ggplot2) # plots
library(reticulate) # link between R and python
use_python('C:/Users/saaa/anaconda3/envs/Python_projects/python.exe')
Python environment
import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

CountVectorizer' object has no attribute 'get_feature_names_out'

Why do i keep getting this error? I tried different versions of anaconda 3 but did not manage to get it done. What should i install to work it properly? I used sklearn versions from 0.20 - 0.23.
Error message:
Code:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from sklearn.feature_extraction.text import CountVectorizer
from collections import Counter
from wordcloud import WordCloud
vectorizer = CountVectorizer(ngram_range=(2,2), analyzer='word')
sparse_matrix = vectorizer.fit_transform(df['content'][:2000])
frequencies = sum(sparse_matrix).toarray()[0]
ngrams = pd.DataFrame(frequencies, index=vectorizer.get_feature_names_out(), columns=['frequency'])
ngrams = ngrams.sort_values(by='frequency', ascending=False)
ngrams

You are using an old version of scikit-learn. If I'm not mistaken, get_feature_names_out() was only introduced in version 1.0.
Upgrade to a newer version, or, to get similar functionality in an earlier version, you can use get_feature_names().

import error when using spafe library for feature extraction

I ' am working on audio file and need to use spafe library for lfcc, lpc... and i install the library as mentionned in the site : https://spafe.readthedocs.io/en/latest/
But when i try to extract some features , like lfcc, mfcc, lpc, i have import error par example when i use this code :
import scipy.io.wavfile
import spafe.utils.vis as vis
from spafe.features.mfcc import lfcc
i have this error :
ImportError: cannot import name 'lfcc'
I don't undestand because i can import spafe, i have all dependancies the libraries required with the correct versions ( numpy, scipy...).

There seems to be a typo in the docs example (which I guess tou are trying to follow); it should be
from spafe.features.lfcc import lfcc
i.e. lfcc, not mfcc (which mfcc indeed does not have a module lfcc, hence the error).

TclError: no display name and no $DISPLAY environment vriable in Kaggle

I am running the following code in Kaggle. The language set is "Python". Trying to do perform NLP on a paragraph and create visuals using the code below.
import matplotlib
# Force matplotlib to not use any Xwindows backend.
matplotlib.use('Agg')
import matplotlib.pyplot
import matplotlib.pyplot as plt
import pandas as pd
import itertools
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
from nltk.tokenize import RegexpTokenizer
from nltk import RegexpParser
document= """The BBC has been testing a new service called SoundIndex, which
lists the top 1,000 artists based on discussions crawled from Bebo,
Last.fm, Google Groups, iTunes, MySpace and YouTube. The top five
bands according to SoundIndex right now are Coldplay, Rihanna, The
Ting Tings, Duffy and Mariah Carey , but the index is refreshed
every six hours. SoundIndex also lets users sort by popular tracks,
search by artist, or create customized charts based on music
preferences or filters by age range, sex or location. Results can
also be limited to just one data source (such as Last.fm).
"""
sentences = sent_tokenize(document)
sentences = [word_tokenize(sent) for sent in sentences]
sentences = [pos_tag(sent) for sent in sentences]
sentence = list(itertools.chain(*sentences))
#grammar = "NP: {<DT>?<JJ>*<NN>}"
grammar = """
NOUN_VERB_NOUN: {<DT>?<VB>*<NN.*>+}
GRUND_NOUN: {<VBG.><NN.*>+}
VN:{<VBN><NN>+}
NOUN_AND_ADJ: {<NN>?<JJ>*<NN.*>+}
{<N.*|JJ.*>*<N.*>} # Nouns and Adjectives, terminated with Nouns
NOUN_PHRASE: {<DT>?<JJ>*<NN>}
ADJ_PHRASE: {}
KEYPHRASE: {<DT>?<JJ>*<NN>}
KEYWORDS: {<NN.*>}
VERB_PHRASE: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
CLAUSE: {<NP><VP>}
"""
cp = RegexpParser(grammar)
result = cp.parse(sentence)
#print(result)
result.draw()
for subtree in result.subtrees():
#if subtree.label() == "NOUN_VERB_NOUN":
#print("NOUN_VERB_NOUN: "+str(subtree.leaves()))
print(str(subtree.label())+" "+str(subtree.leaves()))
result.draw()
return(result)
The error I am getting on execution is:
TclError: no display name and no $DISPLAY environment variable
Could anyone please assist on how can I resolve this issue?

Importing PNG for deep-learning Python

I am a novice coder, having been self-taught through codeacademy. I am wondering what is the easiest way to import png files to python (2.7.14) with the goal of using these files in a deep-learning program.
So far I have tried these two codes:
import scipy
from scipy import misc
import glob
for image_path in glob.glob("/E:/_SAMM_Projects/gemini_hikai_DM_hack_complete/export/contact_frames/boat/*.png"):
image = misc.imread(image_path)
print image.shape
print image.dtype
import scipy
from scipy import misc
import glob
import numpy
png = []
for image_path in glob.glob("/E:/_SAMM_Projects/gemini_hikai_DM_hack_complete/export/contact_frames/boat/*.png"):
png.append(misc.imread(image_path))
im = np.asarray(png)
print "importing done...", im.shape
based off templates I have found online, both do not seem to work

In the context of deep learning, I understand that you would like to read an image into a numpy array so you can use deep learning models (such as ConvNet) on it.
I suggest using OpenCV for your purpose.
import cv2
image = cv2.imread("yourimg.png")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problem with importing Lemmatization from gensim - python

Related

Run markdown in pycharm with R and python chunks using reticulate

CountVectorizer' object has no attribute 'get_feature_names_out'

import error when using spafe library for feature extraction

TclError: no display name and no $DISPLAY environment vriable in Kaggle

Importing PNG for deep-learning Python

Categories

Resources