I am recently reading some deep learning using tensorflow. I was successfully installed tensorflow on Ubuntu 16 in python environment on my laptop and validate this installation at the end and it was successful by following the procedure on Tensorflow.
In order to take the ultimate benefits from processing capacity, i have taken an account of cluster client running CentOS (Linux based operating system) running under my school to take the large capacity of benefits for their students like simulation etc. And this Cluster system comprises of about 16 cluster nodes with 6 processes. This is all what i know about the cluster environment.
Note: this is not GPU based cluster.
Admin has given me userid and password along with host ip address to access the cluster from any system. I installed one software naming Xmanager Enterprise 5 under windows operating system. I can access my account now.
What i need to know is that,
I am unable to install anything without passing sudo command and if i pass sudo command it shows me the privilege error message. How can i install required packages on cluster machine so that i could run my program accordingly?
Or is there anything else by installing everything locally to my laptop
ubuntu and then converting this to cluster end.
In fact, I want to know something more about it but I am unable to start with. Please guide me.
Related
Here is the situation.
Trying to run a Python Flask API in Kubernetes hosted in Raspberry Pi cluster, nodes are running Ubuntu 20. The API is containerized into a Docker container on the Raspberry Pi control node to account for architecture differences (ARM).
When the API and Mongo are ran outside K8s on the Raspberry Pi, just using Docker run command, the API works correctly; however, when the API is applied as a Deployment on Kubernetes the pod for the API fails with a CrashLoopBackoff and logs show 'standard_init_linux.go:211: exec user process caused "exec format error"'
Investigations show that the exec format error might be associated with problems related to building against different CPU architectures. However, having build the Docker image on a Raspberry Pi, and are successfully running the API on the architecture, I am unsure this could the source of the problem.
It has been two days and all attempts have failed. Can anyone help?
Fixed; however, something doesn't seem right.
The Kubernetes Deployment was always deployed onto the same node. I connected to that node and ran the Docker container and it wouldn't run; the "exec format error" would occur. So, it looks like it was a node specific problem.
I copied the API and Dockerfile onto the node and ran Docker build to create the image. It now runs. That does not make sense as the Docker image should have everything it needs to run.
Maybe it's because a previous image build against x86 (the development machine) remained in that nodes Docker cache/repository. Maybe the image on the node is not overwritten with newer images that have the same name and version number (the version number didn't increment). That would seem the case as the spin up time of the image on the remote node is fast suggesting the new image isn't copied on the remote node. That likely to be what it is.
I will post this anyway as it might be useful.
Edit: allow me to clarify some more, the root of this problem was ultimately because there was no shared image repository in the cluster. Images were being manually copied onto each RPI (running ARM64) from a laptop (not running ARM64) and this manual process caused the problem.
An image build on the laptop was based from a base image incompatible with ARM64; this was manually copied to all RPI's in the cluster. This caused the Exec Format error.
Building the image on the RPI pulled a base image that supported ARM64; however, this build had to be done on all RPI because there was no central repository in the cluster that Kubernetes could pull newly build ARM64 compatible images to other RPI nodes in the cluster.
Solution: a shared repository
Hope this helps.
I've installed a cloudera CDH cluster with spark2 on 7 hosts ( 2 matsers, 4 workers and 1 edge)
I installed a Jupyter server on the edge node, I want to set pyspark to run on cluster mode, I run this on a notebook
os.environ['PYSPARK_SUBMIT_ARGS']='--master yarn --deploy-mode=cluster pyspark-shell'
It gives me "Error: Cluster deploy mode is not applicable to Spark shells."
Can someone helps me with this?
Thanks
The answer here is you can't. Firstly because the configured Jupiter behind the scenes launches a pyspark shell session. Which you cant run on cluster mode.
One soultion which i think of to your problem can be
Livy+spark magic+jupyter
Where Livy can run on yarn mode and serve job request as REST calls.
Spark_magic residing on jupyter.
You can follow the below link for more info on this
https://blog.chezo.uno/livy-jupyter-notebook-sparkmagic-powerful-easy-notebook-for-data-scientist-a8b72345ea2d
Major update.
I. Have succeeded to deploy a jupyter hub with cdh5.13, it works without no problems.
One thing to pay attention to is to install as default language python 3, with python 2, multiple jobs will failed because of incompatibility with cloudera package
I have a vmware workstation pro 12 and I can open multiple virtual machines at a time. All wanted is to connect them in a virtual network. This will allow me to create a server(using python sockets) in a virtual machines and other VMs act as clients. Is my idea possible? if possible How can I do it.
Im not sure if this help but your question doesnt really help either.
So the last time I used vmware was for virtual machine. I think it was called wmware workstation 12. I used the free version which lets you use it for noncommercial use. If you are using that then this most likely applies.
So because its not the pro or commercial version you can only open one virtual machine at a time. But from your question seems like your using python. Not sure what that means. But what i am trying to say is if its the free version then you may only be able to open one virtual machine at a time.
This maybe the problem your having.
I hoped this helps, if not you then someone else.
EDIT
Here is a few youtube video i have found that will help to make a virtual network. You need to make a host-only network. May wish to turn on dhcp. Once your created the virtual network. All the vms need to use the same virtual network. Now that your vms are on the same network and are able to communicate with each other hopefully your python script should work. Im not sure how to use pyhton. Otherwise would have provided code to open a simple socket and test it from client side. Anyway im sure you could your script correctly and it should work now. You may need to use ipconfig (windows cmd)/ifconfig (unix terminal) to find the ip address of the server machine.
https://www.youtube.com/watch?v=8VPkRC0mKF4
https://youtu.be/vKoFSmy3agM?t=131
Here is link to simple python server
https://www.tutorialspoint.com/python/python_networking.htm
the host variable in the client code should be the ip of the server and not gethostname. so use ifconfig/ipconfig on server to find the server ip.
👍
I know Ansible supports Windows clients/nodes. What I really enjoy about Ansible is that I can create a Linux VM, pull a git repo that contains Ansible playbooks for and without any configuration or setup of a control server, I am able to run the playbook on the local machine.
Since you can execute Python on Windows, would it be possible to run roles/playbooks on localhost on Windows?
This would be the first step for running Ansible in a datacenter with only Windows where it is not possible to even run Linux in VirtualBox.
Ansible won't run on a windows control machine, as stated in the documentation:
Reminder: You Must Have a Linux Control Machine
Note running Ansible from a Windows control machine is NOT a goal of the project. Refrain from asking for this feature, as it limits what technologies, features, and code we can use in the main project in the future. A Linux control machine will be required to manage Windows hosts.
Cygwin is not supported, so please do not ask questions about Ansible running from Cygwin.
Just want a clarification of the following. I am currently in the process of transferring a Bamboo plan into a Jenkins one and everything was working fine up until the point I ran a Python script on my CentOS Virtual Machine. The reason being that Python wants to import a library called winreg which is not available on RedHat distributions.
In order to fix this, I wanted to have my Master be a CentOS machine and my slave be a Windows 10 machine. Is that how it works? Will the plan be built on the Windows 10 machine while the output is handled by the CentOS machine?
Thanks
Yes, it will. This is usual Jenkins using - see documentation.