Please help, I get the error below running jupyter notebook.
import numpy as np
import pandas as pd
from helper import boston_dataframe
np.set_printoptions(precision=3, suppress=True)
Error:
ImportError Traceback (most recent call last)
<ipython-input-3-a6117bd64450> in <module>
1 import numpy as np
2 import pandas as pd
----> 3 from helper import boston_dataframe
4
5
ImportError: cannot import name 'boston_dataframe' from 'helper' (/Users/irina/opt/anaconda3/lib/python3.8/site-packages/helper/__init__.py)
Since you are not giving the where you get the notebook, I have to guess that you get it from this course Supervised Learning: Regression provided IBM.
In the zip folder in week 1, it provides helper.py.
What you need to do it is to change the directory to where this file is. Change IPython/Jupyter notebook working directory
Alternatively, you can load boston data from sklearn then load it to Pandas Dataframe
Advices for you:
Learn how to use Jupyter notebook
Learn how Python import work
Learn how to provide information in a question so that no one need to guess
Related
I'd seen previous errors importing form JAX from several years ago (https://github.com/google/jax/issues/372), but the post implied an update would fix it. I just installed JAX and am trying to get set up on a jupyter notebook. Could you let me know what might be going wrong?
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Input In [1], in <cell line: 4>()
1 ########## JAX ON MNIST #####################
2 # Import some additional JAX and dataloader helpers
3 from jax.scipy.special import logsumexp
----> 4 from jax.experimental import optimizers
6 import torch
7 from torchvision import datasets, transforms
ImportError: cannot import name 'optimizers' from 'jax.experimental' (/Users/XXX/opt/anaconda3/lib/python3.9/site-packages/jax/experimental/__init__.py)
I saw that the similar previous error was in 2019 and implied a version difference would fix it. I did not know where to go from there.
According to the CHANGELOG
jax 0.3.16
Deprecations:
Removed jax.experimental.optimizers; it has long been a deprecated alias of jax.example_libraries.optimizers.
So it sounds like if you're using JAX version 0.3.16 or newer, you should do
from jax.example_libraries import optimizers
But as noted in the jax.example_libraries.optimizers documentation, this is not well-supported code and you'll probably have a better experience with something like Optax or JAXopt.
Problem
I use Jupyter a lot, and while using jupyter i have the same list of imports that are long and cumbersome, something like:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.core.display import display, HTML
from ipywidgets import interact, IntSlider
from IPython.display import display
pd.options.display.max_columns = 35
pd.options.display.max_rows = 300
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 170 # 200 e.g. is really fine, but slower
import IPython.display as ipd
plt.ion()
display(HTML("<style>.container { width:95% !important; }</style>"))
def plot_frozen(df, num_rows=30, num_columns=30, step_rows=1,
step_columns=1):
"""
Freeze the headers (column and index names) of a Pandas DataFrame. A widget
enables to slide through the rows and columns.
Parameters
----------
df : Pandas DataFrame
DataFrame to display
num_rows : int, optional
Number of rows to display
num_columns : int, optional
Number of columns to display
step_rows : int, optional
Step in the rows
step_columns : int, optional
Step in the columns
Returns
-------
Displays the DataFrame with the widget
"""
#interact(last_row=IntSlider(min=min(num_rows, df.shape[0]),
max=df.shape[0],
step=step_rows,
description='rows',
readout=False,
disabled=False,
continuous_update=True,
orientation='horizontal',
slider_color='purple'),
last_column=IntSlider(min=min(num_columns, df.shape[1]),
max=df.shape[1],
step=step_columns,
description='columns',
readout=False,
disabled=False,
continuous_update=True,
orientation='horizontal',
slider_color='purple'))
def _freeze_header(last_row, last_column):
display(df.iloc[max(0, last_row-num_rows):last_row,
max(0, last_column-num_columns):last_column])
It's imports and a bunch of plotting/display helper functions.
Is there a way for me to bundle all of this up into a single pip package so that i can only have a line or two?
I'm imagining running:
pip install Genesis
then inside my jupyter notebook have:
import Genesis
and nothing else.
What I've tried:
I've tried making a genesis package that is basically a copy of this guide but with a single file called jupyter.py that contains the setup code above.
Then I run the following:
from Genesis import jupyter
jupyter.setup()
But it doesn't import pandas,numpy and matplotlib.pyplot for me. It makes sense because those packages are imported within the scope of the package. But any way to avoid that? Is it even possible in Python?
You can make a package with all your imports no problem, you just need to be careful of namespaces.
Say I have a file:
# genesis/__init__.py
import pandas as pd
import numpy as np
...
Importing that genesis package will run that code, but it won't be accessible directly
>>> import genesis
>>> help(np)
raceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'np' is not defined
>>> help(genesis.np) # This should succeed
...
You could address this with from genesis import * which would bring everything into the namespace you expect
e.g.
>>> from genesis import *
>>> help(np) # This should succeed
...
This is my first attempt to use xgboost in pyspark so my experience with Java and Pyspark is still in learning phase.
I saw an awesome article in towards datascience with title PySpark ML and XGBoost full integration tested on the Kaggle Titanic dataset where the author goes through use case of xgboost in pyspark.
I tried to follow the steps but was hit with ImportError.
Installation
I have downloaded two jar files from maven and put them in the same directory where my notebook is.
xgboost4j version 0.72
xgboost4j-spark version 0.72
I have also downloaded the xgboost wrapper file sparkxgb.zip to the path ~/Softwares/sparkxgb.zip.
My jupyter notebook first cell
import xgboost
print(xgboost.__version__) # 1.2.0
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars xgboost4j-spark-0.72.jar,xgboost4j-0.72.jar pyspark-shell'
HOME = os.path.expanduser('~')
import findspark
findspark.init(HOME + "/Softwares/spark-3.0.0-bin-hadoop2.7")
import pyspark
from pyspark.sql.session import SparkSession
from pyspark.sql.types import *
from pyspark.ml.feature import StringIndexer, VectorAssembler
from pyspark.ml import Pipeline
from pyspark.sql.functions import col
spark = SparkSession\
.builder\
.appName("PySpark XGBOOST Titanic")\
.getOrCreate()
spark.sparkContext.addPyFile(HOME + "/Softwares/sparkxgb.zip")
print(pyspark.__version__) # 3.0.0
# this does not give any error
# Computer: MacOS
This cell gives errror
from sparkxgb import XGBoostEstimator
Error
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-7-cf2ff39c26f4> in <module>
----> 1 from sparkxgb import XGBoostEstimator
/private/var/folders/tb/7xdk9scs79j9hxzcl3l_s6k00000gn/T/spark-1cf282a4-f3f2-42b3-a064-6bbd8751489e/userFiles-abca5e59-5af3-4b3d-a3bc-edc2973e9995/sparkxgb.zip/sparkxgb/__init__.py in <module>
18
19 from sparkxgb.pipeline import XGBoostPipeline, XGBoostPipelineModel
---> 20 from sparkxgb.xgboost import XGBoostEstimator, XGBoostClassificationModel, XGBoostRegressionModel
21
22 __all__ = ["XGBoostEstimator", "XGBoostClassificationModel", "XGBoostRegressionModel",
/private/var/folders/tb/7xdk9scs79j9hxzcl3l_s6k00000gn/T/spark-1cf282a4-f3f2-42b3-a064-6bbd8751489e/userFiles-abca5e59-5af3-4b3d-a3bc-edc2973e9995/sparkxgb.zip/sparkxgb/xgboost.py in <module>
19 from pyspark.ml.param import Param
20 from pyspark.ml.param.shared import HasFeaturesCol, HasLabelCol, HasPredictionCol, HasWeightCol, HasCheckpointInterval
---> 21 from pyspark.ml.util import JavaMLWritable, JavaPredictionModel
22 from pyspark.ml.wrapper import JavaEstimator, JavaModel
23 from sparkxgb.util import XGBoostReadable
ImportError: cannot import name 'JavaPredictionModel' from 'pyspark.ml.util' (/Users/poudel/Softwares/spark-3.0.0-bin-hadoop2.7/python/pyspark/ml/util.py)
Questions
How to fix the error and run xgboost in pyspark?
Maybe I have not placed downloaded jar files to correct path. (I have them placed in my working directory where I have jupyter notebook file). Do I need to place these files somewhere else? I assume jupyter automatically loads the path . and sees these jar files but I may be wrong.
If any good samaritan has already ran xgboost in pyspark, their help is much appreciated.
This question was asked almost 6 months ago but still no solution were provided.
Even I was facing the same issue for past few days, and finally I got solution so would like to share with all my folks.
By now you might have also got solution but I thought it would be better if I share so that you or in future any one can get benefits from this solution.
You can get rid of this error in two ways,
JavaPredictionModel has been removed from latest version of pyspark, so your can downgrad pyspark to let say version 2.4.0, then error will resolve.
But by doing this you might have to follow all the structure of old pyspark version only like OneHotEncoder can not be used for multiple features at same time you have to do that one-by-one.
!pip install pyspark==2.4.0
The second and best solution is to modify sparkxgb codes, like you can import JavaPredictionModel from pyspark.ml.wrapper, so you don't need to downgrad your pyspark.
from pyspark.ml.wrapper import JavaPredictionModel
P.S. Pardon me for not following the answer standards.
There are some problem with your versions. I know decision of similar problem fot catboost_spark. So i had a problem with versions (catboost_spark_version)
You need to go to https://catboost.ai/en/docs/installation/spark-installation-pyspark
Get the appropriate catboost_spark_version (see available versions at Maven central).
Choose the appropriate spark_compat_version (2.3, 2.4 or 3.0) and scala_compat_version (2.11 or 2.12).
Just add the catboost-spark Maven artifact with the appropriate spark_compat_version, scala_compat_version and catboost_spark_version to spark.jar.packages Spark config parameter and import the catboost_spark package:
So you go to https://search.maven.org/search?q=catboost-spark and
choose version (for example catboost-spark_3.3_2.12)
Then copy "Artifact ID". In this case is "catboost-spark_3.3_2.12:1.1.1"
Then paste it to your config parameter
And you will get something like this:
sparkSession = (SparkSession.builder
.master('local[*]')
.config("spark.jars.packages", "ai.catboost:catboost-spark_3.3_2.12:1.1.1")
.getOrCreate())
import catboost_spark
and it will works :)
i am in python i search for implementation for random subspace ensemble classifier , and i found the following code in github
https://github.com/mwygoda/randomSubspaceImplementation/blob/master/solution.py
author depend in this two lib
from utils import prepare_data_from_file
from utils import plot_results
i try to install utils using pip3, it installed and worked when i run import utils as ut put still get error cannot import name 'plot_results' or'prepare_data_from_file'
any one help me how can i fix it
That file is not in the repo. You will have to implement it yourself.
from the code it looks like it returns a feature vector and target labels.
e.g
def prepare_data_from_file(file):
import pandas as pd
df = pd.read_csv(file)
return df['A'], df['B']
But this is mere speculation. Now get off stackoverflow and go do your assisgnment.
I'm getting the following error when importing modules in python. I'm using jupyter notebook (python 2). I've searched through the internet but still can't quite figure out why. Any help would be so much appreciated.
Here's the code:
import numpy as np
from pandas import Series,DataFrame
import pandas as pd
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-e4e9959b303a> in <module>()
----> 1 import numpy as np
2 from pandas import Series,DataFrame
3 import pandas as pd
/Users/...filepath.../Python/data_analysis/numpy.pyc in <module>()
17
18 import numpy as np
---> 19 from numpy.random import randn
20
21
ImportError: No module named random
I've tried adding import random to the above code (before the other modules) and it still gives the same error. Could this be due to the version of gfortran on my system? I have version 4.9.2
Since I dont have complete code, just tried using import statements.
If we use np as per #John
import numpy as np
from np.random import randn
I am getting
from np.random import randn
ImportError: No module named np.random
I am not getting any error if I import randn from numpy.random
import numpy as np
from numpy.random import randn
print "randn1= ", randn()
from numpy.random import rand
print "rand1= ", rand()
Its working for me with output as below,
randn1= 0.147667079884
rand1= 0.243935746205
You can also try to use np.random.randn() and np.random.rand() directly.
import numpy as np
print "randn2= ", np.random.randn()
print "rand2= ", np.random.rand()
I get :
randn2= -0.22571513741
rand2= 0.486507681046