while importing the below lines Jupyter compiler result in an error.
ImportError: cannot import name 'deprecated' from 'gensim.utils
from gensim.summarization.summarizer import summarize
from gensim.summarization import keywords**
Error as follows:
~\AppData\Local\Programs\Python\Python39\Lib\site-packages\gensim\summarization\summarizer.py in <module>
54
55 import logging
---> 56 from gensim.utils import deprecated
57 from gensim.summarization.pagerank_weighted import pagerank_weighted as _pagerank
58 from gensim.summarization.textcleaner import clean_text_by_sentences as _clean_text_by_sentences
ImportError: cannot import name 'deprecated' from 'gensim.utils' (C:\Users\PavanKumar\AppData\Local\Programs\Python\Python39\Lib\site-packages\gensim\utils.py)
The summarization code was removed from Gensim 4.0. See:
https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4#12-removed-gensimsummarization
12. Removed gensim.summarization
Despite its general-sounding name, the module will not satisfy the
majority of use cases in production and is likely to waste people's
time. See this Github
ticket for
more motivation behind this.
If you need it, you could try:
installing the older gensim version; or…
copy the source code out to your own local module
However, I expect you'd likely be disappointed by its inflexibility and how little it can do. It's only extractive summarization - choosing a few key sentences from those that already exist – which only gives impressive results when the source text was already well-written in an expository style mixing high-level summaries with details. And its method of analyzing/ranking words is very crude & hard-to-customize.
Related
I'm new to using rdkit but can't find any forum posts online addressing this but I've been trying to use the functions in rdkit.Chem.Descriptors as follows:
from rdkit import Chem
mol = Chem.MolFromSmiles('CC')
Descriptors.MolWt(mol)
this should just return the molecular weight = 30.07 of the molecule simply and I took this from the documentation but it just gives me an error saying:
NameError: name 'Descriptors' is not defined
I have no idea why - I tried changing versions but this has made no difference. It seems to be an error inherent to all functions in the Descriptor class.
I've also tried explicitly importing the Descriptor submodule/class and this doesn't work either.
from rdkit.Chem import Descriptors
is the correct way to import module for this code to work
When I run the code below it shows me the error.
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.
I have been searching solution on the online. The problem is the code below working in old version of torch (0.4.1). I want to know whether it is possible to modify or replace this code for working in the new version of pytorch.
from torch.utils.ffi import _wrap_function
from ._nms import lib as _lib, ffi as _ffi
__all__ = []
def _import_symbols(locals):
for symbol in dir(_lib):
fn = getattr(_lib, symbol)
if callable(fn):
locals[symbol] = _wrap_function(fn, _ffi)
else:
locals[symbol] = fn
__all__.append(symbol)
_import_symbols(locals())
I am facing the same problem have just seen some useful information in:
https://pytorch.org/tutorials/advanced/cpp_extension.html
https://pytorch.org/docs/stable/cpp_extension.html
To avoid downgrade the version of PyTorch, you should consider to use the following libraries while finding more details in the above links:
from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CppExtension
This is my first attempt to use xgboost in pyspark so my experience with Java and Pyspark is still in learning phase.
I saw an awesome article in towards datascience with title PySpark ML and XGBoost full integration tested on the Kaggle Titanic dataset where the author goes through use case of xgboost in pyspark.
I tried to follow the steps but was hit with ImportError.
Installation
I have downloaded two jar files from maven and put them in the same directory where my notebook is.
xgboost4j version 0.72
xgboost4j-spark version 0.72
I have also downloaded the xgboost wrapper file sparkxgb.zip to the path ~/Softwares/sparkxgb.zip.
My jupyter notebook first cell
import xgboost
print(xgboost.__version__) # 1.2.0
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars xgboost4j-spark-0.72.jar,xgboost4j-0.72.jar pyspark-shell'
HOME = os.path.expanduser('~')
import findspark
findspark.init(HOME + "/Softwares/spark-3.0.0-bin-hadoop2.7")
import pyspark
from pyspark.sql.session import SparkSession
from pyspark.sql.types import *
from pyspark.ml.feature import StringIndexer, VectorAssembler
from pyspark.ml import Pipeline
from pyspark.sql.functions import col
spark = SparkSession\
.builder\
.appName("PySpark XGBOOST Titanic")\
.getOrCreate()
spark.sparkContext.addPyFile(HOME + "/Softwares/sparkxgb.zip")
print(pyspark.__version__) # 3.0.0
# this does not give any error
# Computer: MacOS
This cell gives errror
from sparkxgb import XGBoostEstimator
Error
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-7-cf2ff39c26f4> in <module>
----> 1 from sparkxgb import XGBoostEstimator
/private/var/folders/tb/7xdk9scs79j9hxzcl3l_s6k00000gn/T/spark-1cf282a4-f3f2-42b3-a064-6bbd8751489e/userFiles-abca5e59-5af3-4b3d-a3bc-edc2973e9995/sparkxgb.zip/sparkxgb/__init__.py in <module>
18
19 from sparkxgb.pipeline import XGBoostPipeline, XGBoostPipelineModel
---> 20 from sparkxgb.xgboost import XGBoostEstimator, XGBoostClassificationModel, XGBoostRegressionModel
21
22 __all__ = ["XGBoostEstimator", "XGBoostClassificationModel", "XGBoostRegressionModel",
/private/var/folders/tb/7xdk9scs79j9hxzcl3l_s6k00000gn/T/spark-1cf282a4-f3f2-42b3-a064-6bbd8751489e/userFiles-abca5e59-5af3-4b3d-a3bc-edc2973e9995/sparkxgb.zip/sparkxgb/xgboost.py in <module>
19 from pyspark.ml.param import Param
20 from pyspark.ml.param.shared import HasFeaturesCol, HasLabelCol, HasPredictionCol, HasWeightCol, HasCheckpointInterval
---> 21 from pyspark.ml.util import JavaMLWritable, JavaPredictionModel
22 from pyspark.ml.wrapper import JavaEstimator, JavaModel
23 from sparkxgb.util import XGBoostReadable
ImportError: cannot import name 'JavaPredictionModel' from 'pyspark.ml.util' (/Users/poudel/Softwares/spark-3.0.0-bin-hadoop2.7/python/pyspark/ml/util.py)
Questions
How to fix the error and run xgboost in pyspark?
Maybe I have not placed downloaded jar files to correct path. (I have them placed in my working directory where I have jupyter notebook file). Do I need to place these files somewhere else? I assume jupyter automatically loads the path . and sees these jar files but I may be wrong.
If any good samaritan has already ran xgboost in pyspark, their help is much appreciated.
This question was asked almost 6 months ago but still no solution were provided.
Even I was facing the same issue for past few days, and finally I got solution so would like to share with all my folks.
By now you might have also got solution but I thought it would be better if I share so that you or in future any one can get benefits from this solution.
You can get rid of this error in two ways,
JavaPredictionModel has been removed from latest version of pyspark, so your can downgrad pyspark to let say version 2.4.0, then error will resolve.
But by doing this you might have to follow all the structure of old pyspark version only like OneHotEncoder can not be used for multiple features at same time you have to do that one-by-one.
!pip install pyspark==2.4.0
The second and best solution is to modify sparkxgb codes, like you can import JavaPredictionModel from pyspark.ml.wrapper, so you don't need to downgrad your pyspark.
from pyspark.ml.wrapper import JavaPredictionModel
P.S. Pardon me for not following the answer standards.
There are some problem with your versions. I know decision of similar problem fot catboost_spark. So i had a problem with versions (catboost_spark_version)
You need to go to https://catboost.ai/en/docs/installation/spark-installation-pyspark
Get the appropriate catboost_spark_version (see available versions at Maven central).
Choose the appropriate spark_compat_version (2.3, 2.4 or 3.0) and scala_compat_version (2.11 or 2.12).
Just add the catboost-spark Maven artifact with the appropriate spark_compat_version, scala_compat_version and catboost_spark_version to spark.jar.packages Spark config parameter and import the catboost_spark package:
So you go to https://search.maven.org/search?q=catboost-spark and
choose version (for example catboost-spark_3.3_2.12)
Then copy "Artifact ID". In this case is "catboost-spark_3.3_2.12:1.1.1"
Then paste it to your config parameter
And you will get something like this:
sparkSession = (SparkSession.builder
.master('local[*]')
.config("spark.jars.packages", "ai.catboost:catboost-spark_3.3_2.12:1.1.1")
.getOrCreate())
import catboost_spark
and it will works :)
some months ago I stumbled upon this article on SEJ explaining how to generate captions from images using Python and Pythia framework.
The script used to work fine until some months ago. You can find the original copy here in Colab. However, at some point, Pythia repository in Git was redirect to MMF framework and the script stopped working. Colab returns the following error:
<ipython-input-9-b64f6c325603> in <module>()
34
35
---> 36 from pythia.utils.configuration import ConfigNode
37 from pythia.tasks.processors import VocabProcessor, CaptionProcessor
38 from pythia.models.butd import BUTD
ModuleNotFoundError: No module named 'pythia'
Replacing this line of code with:
from mmf.utils.configuration import ConfigNode
from mmf.datasets.processors import BaseProcessor
from mmf.models.base_model import BaseModel
from mmf.common.registry import registry
from mmf.common.sample import Sample, SampleList
The script goes on for a little bit, but it stops again on these two errors:
---> 32 from maskrcnn_benchmark.config import cfg
33 from maskrcnn_benchmark.layers import nms
34 from maskrcnn_benchmark.modeling.detector import build_detection_model
----> 4 from yacs.config import CfgNode as CN
Right now I am stucked at this point.
Does anyone has an idea of which workaround might be the right resolution for this script?
Thank you
I'm certainly missing something very obvious here, but why does this work:
a = [0.2635,0.654654,0.365,0.4545,1.5465,3.545]
import statsmodels.robust as rb
print rb.scale.mad(a)
0.356309343367
but this doesn't:
import statsmodels as sm
print sm.robust.scale.mad(a)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-1ce0c872b0be> in <module>()
----> 1 print statsmodels.robust.scale.mad(a)
AttributeError: 'module' object has no attribute 'robust'
Long answer see http://www.statsmodels.org/stable/importpaths.html
Statsmodels has intentionally mostly empty __init__.py but has a parallel import collection through the api.py.
The recommended import for interactive work import statsmodels.api as sm imports almost all of statsmodels, numpy, pandas and patsy, and large parts of scipy. This is slooow on cold start.
If we want to import just a specific part of statsmodels, then we don't need to import all these extras. Having empty __init__.py means that we can import just a single module (which of course imports the dependencies of that module).
e.g. from statsmodels.robust.scale import mad or
import statmodels.robust scale as smscale
smscale.mad(...)
(Small caveat: Some of the very low level imports might not remain always backwards compatible if the internal structure changes. However, the general policy is to deprecate functions over one or two releases while maintaining the old access structure.)
You can, you just have to import robust as well:
import statsmodels as sm
import statsmodels.robust
Then:
>>> sm.robust.scale.mad(a)
0.35630934336679576
robust is a subpackage of statsmodels, and importing a package does not in general automatically import subpackages (unless the package is written to do so explicitly).