some months ago I stumbled upon this article on SEJ explaining how to generate captions from images using Python and Pythia framework.
The script used to work fine until some months ago. You can find the original copy here in Colab. However, at some point, Pythia repository in Git was redirect to MMF framework and the script stopped working. Colab returns the following error:
<ipython-input-9-b64f6c325603> in <module>()
34
35
---> 36 from pythia.utils.configuration import ConfigNode
37 from pythia.tasks.processors import VocabProcessor, CaptionProcessor
38 from pythia.models.butd import BUTD
ModuleNotFoundError: No module named 'pythia'
Replacing this line of code with:
from mmf.utils.configuration import ConfigNode
from mmf.datasets.processors import BaseProcessor
from mmf.models.base_model import BaseModel
from mmf.common.registry import registry
from mmf.common.sample import Sample, SampleList
The script goes on for a little bit, but it stops again on these two errors:
---> 32 from maskrcnn_benchmark.config import cfg
33 from maskrcnn_benchmark.layers import nms
34 from maskrcnn_benchmark.modeling.detector import build_detection_model
----> 4 from yacs.config import CfgNode as CN
Right now I am stucked at this point.
Does anyone has an idea of which workaround might be the right resolution for this script?
Thank you
Related
I am attempting to use Jupyter notebook in visual studio code. I cloned the GitHub repo https://github.com/dibgerge/ml-coursera-python-assignments.
This is the python version of a coursera ml course. In the exercise1.ipynb file I attempt to run the following code.
# appends the implemented function in part 1 to the grader object
grader[1] = warmUpExercise
# send the added functions to coursera grader for getting a grade on this part
grader.grade()
After entering in my email and token for the grading process, I receive this error message.
FileNotFoundError Traceback (most recent call last)
Cell In [3], line 5
2 grader[1] = warmUpExercise
4 # send the added functions to coursera grader for
getting a grade on this part ---->
5 grader.grade()
File c:\Users\jerry\Documents\mlcourse\ml-coursera-python-
assignments\Exercise1\..\submission.py:26, in
SubmissionBase.grade(self)
24 def grade(self):
25 print('\nSubmitting Solutions | Programming
Exercise %s\n' % self.assignment_slug)
26 self.login_prompt()
28 # Evaluate the different parts of exercise
29 parts = OrderedDict()
File c:\Users\jerry\Documents\mlcourse\ml-coursera-python-
assignments\Exercise1\..\submission.py:71, in
SubmissionBase.login_prompt(self)
69 # Save the entered credentials
70 if not os.path.isfile(self.save_file):
71 with open(self.save_file, 'wb') as f:
72 pickle.dump((self.login, self.token), f)
FileNotFoundError: [Errno 2] No such file or directory:
'token.pkl'
The repo code can be seen here https://github.com/dibgerge/ml-coursera-python-assignments/blob/master/submission.py.
It appears I can not access the 'token.pkl' file.
I have checked similar issues on this site and the repo issue page. Not sure if it has to do with the os or pickle module.
# used for manipulating directory paths
import os
# Scientific and vector computation for python
import numpy as np
# Plotting library
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D # needed to plot
3-D surfaces
# library written for this exercise providing additional
functions for assignment submission, and others
import utils
# define the submission/grader object for this exercise
grader = utils.Grader()
# tells matplotlib to embed plots within the notebook
%matplotlib inline
It appears to run fine but I get a message from windows saying that The controlled folder access has blocked python3.8.exe from making changes.
How to fix this issue?
Good morning,
i want to test the hdbscan (Hierarchical Density-Based Spatial Clustering of Applications w/ Noise)using GPU so i should use the framework rapids.
When i tried to follow the steps described here https://colab.research.google.com/drive/1rY7Ln6rEE1pOlfSHCYOVaqt8OvDO35J0#forceEdit=true&sandboxMode=true&scrollTo=EwaJSKuswsNi
taken from Rapids website: https://rapids.ai/start.html
i get the following error when i run the code of the function CUDF:
import cudf
import io, requests
# download CSV file from GitHub
url="https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode('utf-8')
# read CSV from memory
tips_df = cudf.read_csv(io.StringIO(content))
tips_df['tip_percentage'] = tips_df['tip']/tips_df['total_bill']*100
# display average tip by dining party size
print(tips_df.groupby('size').tip_percentage.mean())
ValueError Traceback (most recent call last)
<ipython-input-1-a95ca25217db> in <module>()
----> 1 import cudf
2 import io, requests
3
4 # download CSV file from GitHub
5 url="https://github.com/plotly/datasets/raw/master/tips.csv"
2 frames
/usr/local/lib/python3.7/site-packages/cudf/_lib/__init__.py in <module>()
2 import numpy as np
3
----> 4 from . import (
5 avro,
6 binaryop,
cudf/_lib/avro.pyx in init cudf._lib.avro()
cudf/_lib/column.pyx in init cudf._lib.column()
cudf/_lib/scalar.pyx in init cudf._lib.scalar()
cudf/_lib/interop.pyx in init cudf._lib.interop()
ValueError: pyarrow.lib.Codec size changed, may indicate binary incompatibility. Expected 48 from C header, got 40 from PyObject
could you please help me.
thanks an advance
Colab made some enhancements this week that affected the RAPIDS installation process. Work toward a resolution is active, and progress is being tracked in this issue (which includes a potential workaround)
while importing the below lines Jupyter compiler result in an error.
ImportError: cannot import name 'deprecated' from 'gensim.utils
from gensim.summarization.summarizer import summarize
from gensim.summarization import keywords**
Error as follows:
~\AppData\Local\Programs\Python\Python39\Lib\site-packages\gensim\summarization\summarizer.py in <module>
54
55 import logging
---> 56 from gensim.utils import deprecated
57 from gensim.summarization.pagerank_weighted import pagerank_weighted as _pagerank
58 from gensim.summarization.textcleaner import clean_text_by_sentences as _clean_text_by_sentences
ImportError: cannot import name 'deprecated' from 'gensim.utils' (C:\Users\PavanKumar\AppData\Local\Programs\Python\Python39\Lib\site-packages\gensim\utils.py)
The summarization code was removed from Gensim 4.0. See:
https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4#12-removed-gensimsummarization
12. Removed gensim.summarization
Despite its general-sounding name, the module will not satisfy the
majority of use cases in production and is likely to waste people's
time. See this Github
ticket for
more motivation behind this.
If you need it, you could try:
installing the older gensim version; or…
copy the source code out to your own local module
However, I expect you'd likely be disappointed by its inflexibility and how little it can do. It's only extractive summarization - choosing a few key sentences from those that already exist – which only gives impressive results when the source text was already well-written in an expository style mixing high-level summaries with details. And its method of analyzing/ranking words is very crude & hard-to-customize.
This is my first attempt to use xgboost in pyspark so my experience with Java and Pyspark is still in learning phase.
I saw an awesome article in towards datascience with title PySpark ML and XGBoost full integration tested on the Kaggle Titanic dataset where the author goes through use case of xgboost in pyspark.
I tried to follow the steps but was hit with ImportError.
Installation
I have downloaded two jar files from maven and put them in the same directory where my notebook is.
xgboost4j version 0.72
xgboost4j-spark version 0.72
I have also downloaded the xgboost wrapper file sparkxgb.zip to the path ~/Softwares/sparkxgb.zip.
My jupyter notebook first cell
import xgboost
print(xgboost.__version__) # 1.2.0
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars xgboost4j-spark-0.72.jar,xgboost4j-0.72.jar pyspark-shell'
HOME = os.path.expanduser('~')
import findspark
findspark.init(HOME + "/Softwares/spark-3.0.0-bin-hadoop2.7")
import pyspark
from pyspark.sql.session import SparkSession
from pyspark.sql.types import *
from pyspark.ml.feature import StringIndexer, VectorAssembler
from pyspark.ml import Pipeline
from pyspark.sql.functions import col
spark = SparkSession\
.builder\
.appName("PySpark XGBOOST Titanic")\
.getOrCreate()
spark.sparkContext.addPyFile(HOME + "/Softwares/sparkxgb.zip")
print(pyspark.__version__) # 3.0.0
# this does not give any error
# Computer: MacOS
This cell gives errror
from sparkxgb import XGBoostEstimator
Error
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-7-cf2ff39c26f4> in <module>
----> 1 from sparkxgb import XGBoostEstimator
/private/var/folders/tb/7xdk9scs79j9hxzcl3l_s6k00000gn/T/spark-1cf282a4-f3f2-42b3-a064-6bbd8751489e/userFiles-abca5e59-5af3-4b3d-a3bc-edc2973e9995/sparkxgb.zip/sparkxgb/__init__.py in <module>
18
19 from sparkxgb.pipeline import XGBoostPipeline, XGBoostPipelineModel
---> 20 from sparkxgb.xgboost import XGBoostEstimator, XGBoostClassificationModel, XGBoostRegressionModel
21
22 __all__ = ["XGBoostEstimator", "XGBoostClassificationModel", "XGBoostRegressionModel",
/private/var/folders/tb/7xdk9scs79j9hxzcl3l_s6k00000gn/T/spark-1cf282a4-f3f2-42b3-a064-6bbd8751489e/userFiles-abca5e59-5af3-4b3d-a3bc-edc2973e9995/sparkxgb.zip/sparkxgb/xgboost.py in <module>
19 from pyspark.ml.param import Param
20 from pyspark.ml.param.shared import HasFeaturesCol, HasLabelCol, HasPredictionCol, HasWeightCol, HasCheckpointInterval
---> 21 from pyspark.ml.util import JavaMLWritable, JavaPredictionModel
22 from pyspark.ml.wrapper import JavaEstimator, JavaModel
23 from sparkxgb.util import XGBoostReadable
ImportError: cannot import name 'JavaPredictionModel' from 'pyspark.ml.util' (/Users/poudel/Softwares/spark-3.0.0-bin-hadoop2.7/python/pyspark/ml/util.py)
Questions
How to fix the error and run xgboost in pyspark?
Maybe I have not placed downloaded jar files to correct path. (I have them placed in my working directory where I have jupyter notebook file). Do I need to place these files somewhere else? I assume jupyter automatically loads the path . and sees these jar files but I may be wrong.
If any good samaritan has already ran xgboost in pyspark, their help is much appreciated.
This question was asked almost 6 months ago but still no solution were provided.
Even I was facing the same issue for past few days, and finally I got solution so would like to share with all my folks.
By now you might have also got solution but I thought it would be better if I share so that you or in future any one can get benefits from this solution.
You can get rid of this error in two ways,
JavaPredictionModel has been removed from latest version of pyspark, so your can downgrad pyspark to let say version 2.4.0, then error will resolve.
But by doing this you might have to follow all the structure of old pyspark version only like OneHotEncoder can not be used for multiple features at same time you have to do that one-by-one.
!pip install pyspark==2.4.0
The second and best solution is to modify sparkxgb codes, like you can import JavaPredictionModel from pyspark.ml.wrapper, so you don't need to downgrad your pyspark.
from pyspark.ml.wrapper import JavaPredictionModel
P.S. Pardon me for not following the answer standards.
There are some problem with your versions. I know decision of similar problem fot catboost_spark. So i had a problem with versions (catboost_spark_version)
You need to go to https://catboost.ai/en/docs/installation/spark-installation-pyspark
Get the appropriate catboost_spark_version (see available versions at Maven central).
Choose the appropriate spark_compat_version (2.3, 2.4 or 3.0) and scala_compat_version (2.11 or 2.12).
Just add the catboost-spark Maven artifact with the appropriate spark_compat_version, scala_compat_version and catboost_spark_version to spark.jar.packages Spark config parameter and import the catboost_spark package:
So you go to https://search.maven.org/search?q=catboost-spark and
choose version (for example catboost-spark_3.3_2.12)
Then copy "Artifact ID". In this case is "catboost-spark_3.3_2.12:1.1.1"
Then paste it to your config parameter
And you will get something like this:
sparkSession = (SparkSession.builder
.master('local[*]')
.config("spark.jars.packages", "ai.catboost:catboost-spark_3.3_2.12:1.1.1")
.getOrCreate())
import catboost_spark
and it will works :)
I am working on some NLP task, trying to import normalize_corpus from the normalization module.
I am getting the below error.
from normalization import normalize_corpus
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-13-5919bba55473> in <module>()
----> 1 from normalization import normalize_corpus
ImportError: cannot import name 'normalize_corpus'
The normalization module you have installed (probably https://pypi.org/project/normalization/) does not correspond to the code you are trying to run (possibly from "Text Analytics with Python").
Uninstall the normalization module and track down the code from the book. (A place to start: https://github.com/dipanjanS/text-analytics-with-python)
For whoever lands here because you are reading "Text analytics with Python". The author of the book is referring to a custom script they built and is hosted in the first edition - chapter 4 or 5 -
https://github.com/dipanjanS/text-analytics-with-python/blob/master/Chapter-4/normalization.py
https://github.com/dipanjanS/text-analytics-with-python/blob/b4f6cefc9dd96e2a3e74e01bda391019bd7fb053/Old-First-Edition/Ch05_Text_Summarization/normalization.py