connect local python script to remote spark master - python

I am using python 2.7 with spark standalone cluster.
When I start the master on the same machine running the python script. it works smoothly.
When I start the master on a remote machine, and try to start spark context on the local machine to access the remote spark master. nothing happens and i get a massage saying that the task did not get any resources.
When i access the master's UI. i see the job, but nothing happens with it, it's just there.
How do i access a remote spark master via a local python script?
Thanks.
EDIT:
I read that in order to do this i need to run the cluster in cluster mode (not client mode), and I found that currently standalone mode does not support this for python application.
Ideas?

in order to do this i need to run the cluster in cluster mode (not client mode), and I found here that currently standalone mode does not support this for python application.

Related

How can I keep an AWS EC2 VM alive until a long-term script ends?

I have a Selenium Python (3.9) renderer script that will perform a bunch of fetching tasks that I'm trying to utilize a AWS EC2 virtual machine for a runtime on the cloud. I am running it by using SSH to access the VM, adding the scripts and dependencies, and then running it with a python3 <script-name>.py.
But, I need help preserving the runtime instance till my script is completely (or indefinitely till I manually delete the instance). Currently, the script seems tied to my local CLI, and when I leave it be for a while or shut my lid, the AWS VM runtime quits with error:
Client Loop: Send Disconnect: Broken Pipe
How can I preserve the runtime indefinitely or till end of script, and untie it from any local runtimes? Apologies for any idiosyncrasy, I'm new to DevOps and deploying stuff outside of local runtimes.
I used ClientAliveInterval=30, ServerAliveInterval=30, ClientAliveCountMax=(arbitrarily high amount), and ServerAliveCountMax=(arbitrarily high amount). I tried to use nohup inside my VM, but it did not prevent the process from ending. I have observed that when ps -a returns my ssh session, it is running, else it is not.
I am using a M1 Mac on Ventura. The AMI I am using to create the VM is ami-08e9419448399d936. This is Selenium-Webdriver-on-Headless-Ubuntu, using Ubuntu 20.04.

Running script using local Python and packages but executing certain commands on remote server

I know how to run a Python script made locally on a remote server and have seen a lot of questions in that regard. But I am in a situation where I cannot install python packages on the remote server I am accessing. Specifically, I need to use pypostal, which requires libpostal to be installed and I cannot do so. Moreover, I need pyspark to play with Hive tables.
Therefore, I need the script to run locally, where I can manage my packages and everything executes fine, but certain commands need to access the server in order to grab data. For example, using pyspark to get Hive tables on the server into a local dataframe. Essentially, I need all the Python to be executed using my local distribution with my local packages but perform its actions on the remote server.
I have looked into things like paramiko. But as far as I can workout, is just like an SSH client, which would use the Python distro on the remote server and not locally. Though, perhaps I don't understand how to use it properly.
I am running python 3.6 on Ubuntu 18.04 using WSL. The packages I am using are pandas, numpy, pyspark, and postal (subsequently libpostal).
TLDR;
Is it possible to run a script locally, have parts of it execute remotely but using my local Python? Or if there are other possible solutions, I would be grateful.

Run a script located on remote server using python

I have a python script located on a remote server with SSH enabled. That script displays a lot of debug messages displayed while executing. I want to trigger this script using another python script which is on my local system and depending on the output of the earlier script, I want to proceed further. While doing all this, I want the display messages on the remote server to be displayed on my local system as well. Basically, I want to view whatever output is thrown by the remote script during the course of the script, on my local system. I am able to trigger the script using paramiko but I am neither able to check whether the script on the remote server is running nor am I able to view it's output. Is there any way to do it? Already tried conn.recv(65535) but to no avail.
In my experience I found python fabric module easier than using paramiko. If you want to execute local script on remote machine using fabric. You just need to upload them using put() and then call run() api.
http://docs.fabfile.org/en/1.14/api/core/operations.html#fabric.operations.put

How to submit a pyspark job to remote cluster from windows client?

We are using a remote Spark cluster with YARN (in Hortonworks). Developers want to use Spyder to implement Spark application in Windows. To ssh to cluster using ipython notebook or Jupyter works well. Is there any other way to communicate with Spark cluster from Windows.
Question 1: I got a headache with submitting spark job(written in python) from Windows which has no Spark installed. Is there anyone could help me out of this. Specifically, how to phrase the command line to submit the job.
We could ssh to YARN node in the cluster just in case these might relative to some solution. It is also ping-able from cluster to windows client.
Question 2: What do we need to have in the client side e.g. Spark libraries if we want to do the debug with environment like this?

Execute a Hadoop Job in remote server and get its results from python webservice

I have a Hadoop job packaged in a jar file that I can execute in a server using command line and store the results in the hdfs of the server using the command line.
Now, I need to create a Web Service in Python (Tornado) that must to execute the Hadoop Job and get the results to present them to the user. The Web Service is hosted in other server.
I googled a lot for call the Job from outside the server in python Script but unfortunately did not have answers.
Anyone have a solution for this?
Thanks
One option could be install the binaries of hadoop in your webservice server using the same configuration than in your hadoop cluster. You will require that to be able to talk with the cluster. You don't nead to lunch any hadoop deamon there. At least configure HADOOP_HOME, HADOOP_CONFIG_DIR, HADOOP_LIBS and set properly the PATH environment variables.
You need the binaries because you will use them to submit the job and the configurations to tell hadoop client where is the cluster (the namenode and the resourcemanager).
Then in python you can execute the hadoop jar command using subprocess: https://docs.python.org/2/library/subprocess.html
You can configure the job to notify your server when the job has finished using a callback: https://hadoopi.wordpress.com/2013/09/18/hadoop-get-a-callback-on-mapreduce-job-completion/
And finally you could read the results in HDFS using WebHDFS (HDFS WEB API) or using some python HDFS package like: https://pypi.python.org/pypi/hdfs/

Categories

Resources