Executing Pyspark on windows giving error - python

To run Pyspark I have installed it through pip install pyspark. Now to initialize the session after going through many blogs I am running below command
import pyspark
spark = pyspark.sql.SparkSession.builder.appName('test').getOrCreate()
Above code giving me the error
Exception: Java gateway process exited before sending the driver its port number
This will be my first program for spark. I want your advice on whether "pip install pyspark" is enough to run spark on my windows laptop or I need to do something else.
I have Java 8 version installed on my laptop and I am using conda with python 3.6.

Related

How to use remote Spark in local vs code?

Starting to learn Spark but now stuck at the first step.
I downloaded Spark from Apache website and have finished the configurations. Now if I run pyspark command in my WSL, a Jupyter server will start and I can open it in my Windows browser and import pyspark works just fine. But if I connect to WSL with VS Code, and create a new notebook in it, then the pyspark module can't be found.
I didn't install pyspark module through pip or conda because I thought it's already included in the full version that I downloaded so it seems redundant to me.
Is there any way that I can use remote installed Spark in VS Code without separately install it again?

I get an error when trying to run glue job locally in cmd

I followed this tutorial to run glue scripts locally. I checked in cmd if spark and maven are installed and it confirmed it by showing the installed version.
Now when I try to run a glue script by typing:
./bin/gluesparksubmit "path_to_python_script"
I get the error that the dot operator could not be found. How could I solve this? I already googled that and could not find a solution for that

Trouble encountered when launching pyspark with command prompt

When i attempt to launch spark in command prompt with 'spark-shell', a new command prompt simply appears and does not launch spark. i used 'pip install pyspark' to install spark. Thank you for any help
I had a virus 'malware' in my pc. I downloaded a malwarebyte anti-virus here. He cleaned up all malwares and now pyspark works well.

Running pyspark in (Anaconda - Spyder) in windows OS

Dears,
I am using windows 10 and I am familiar with testing my python code in Spyder.
however, when I am trying to write ïmport pyspark" command, Spyder showing "No module named 'pyspark'"
Pyspark is installed in my PC and also I can do import pyspark in command prompt without any error.
I found many blogs explaining how to do this in Ubuntu but I did not find how to solve it in windows.
Well for using packages in Spyder, you have to install them through Anaconda. You can open
"anaconda prompt" and the write down the blew code:
conda install pyshark
That will give you the package available in SPYDER.
Hi I have installed Pyspark in windows 10 few weeks back. Let me tell you how I did it.
I followed "https://changhsinlee.com/install-pyspark-windows-jupyter/".
So after following each step precisely you can able to run pyspark using either command promp or saving a python file and running.
When you run via notebook(download Anaconda). start anacoda shell and type pyspark. now you don't need to do "ïmport pyspark".
run your program without this and it will be alright. you can also do spark-submit but for that I figured out that you need to remove the PYSPARK_DRIVER_PATH and OPTS PATH in environment variable.

pyspark interpreter not found in apache zeppelin

I am having issue with using pyspark in Apache-Zeppelin (version 0.6.0) notebook. Running the following simple code gives me pyspark interpreter not found error
%pyspark
a = 1+3
Running sc.version gave me res2: String = 1.6.0 which is the version of spark installed on my machine. And running z return res0: org.apache.zeppelin.spark.ZeppelinContext = {}
Pyspark works from CLI (using spark 1.6.0 and python 2.6.6)
The default python on the machine 2.6.6, while anaconda-python 3.5 is also installed but not set as default python.
Based on this post I updated the zeppelin-env.sh file located at /usr/hdp/current/zeppelin-server/lib/conf and added Anaconda python 3 path
export PYSPARK_PYTHON=/opt/anaconda3/bin/python
export PYTHONPATH=/opt/anaconda3/bin/python
After that I have stopped and restarted zeppelin many times using
/usr/hdp/current/zeppelin-server/lib/bin/zeppelin-daemon.sh
But I can't get the pyspark interpreter to work in zeppelin.
To people who found out pyspark not responding, please try to restart your spark interpreter in Zeppelin,it may solve pyspark not responding
error.

Categories

Resources