I recently installed apache zeppelin 0.6.2 on Mac OS Siera 10.2, I am able to run the spark and python examples but when I try to run the R codes using either %r or %spark.r I get an error. I have already set the SPARK_HOME and SCALA_HOME in .bash_profile. Attaching the error log:
INFO [2016-10-31 19:48:10,806] ({pool-2-thread-5} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1477923480756 finished by scheduler org.apache.zeppelin.spark.SparkRInterpreter314730576
INFO [2016-10-31 19:48:10,804] ({pool-1-thread-5} ZeppelinR.java[createRScript]:366) - File /var/folders/_b/2cr99z410sddt8km9p9b9fs80000gn/T/zeppelin_sparkr-6402261059466053567.R created
ERROR [2016-10-31 19:48:20,836] ({pool-1-thread-5} TThreadPoolServer.java[run]:296) - Error occurred during processing of message.
org.apache.zeppelin.interpreter.InterpreterException: sparkr is not responding
Just figured out that setting SPARK_HOME in .bash_profile is not enough, one needs to also update the zeppelin-env.sh and set SPARK_HOME there as well.
Related
In HDFSCLI docs it says that it can be configured to connect to multiple hosts by adding urls separated with semicolon ; (https://hdfscli.readthedocs.io/en/latest/quickstart.html#configuration).
I use kerberos client, and this is my code -
from hdfs.ext.kerberos import KerberosClient hdfs_client = KerberosClient('http://host01:50070;http://host02:50070')
And when I try to makedir for example, I get the following error - requests.exceptions.InvalidURL: Failed to parse: http://host01:50070;http://host02:50070/webhdfs/v1/path/to/create
Apparently the version of hdfs I installed was old, the code didn't work with version 2.0.8, and it did work with version 2.5.7
I'm trying to backup Postgres from Python on Win10.
I'm working on Anaconda python 3.8, Win10 machine with Postgres12 local. On path environment variable I have postgres (lib and bin), no anaconda, and python 3.8 (no the anaconda one).
I'm able to correctly backup the database using on window's command shell:
pg_dump --dbname=postgresql://postgres:password#127.0.0.1:5432/test > C:\backup\dumpfile3.dump
but when I run it on anaconda:
os.system("pg_dump --dbname=postgresql://postgres:password#127.0.0.1:5432/test > C:\backup\dumpfile3.dump" )
I get as output 1 , witch is error code. It creates the file, but it's empty.
Using:
import subprocess
stk= 'pg_dump --dbname=postgresql://postgres:password#127.0.0.1:5432/test > C:\backup\dumpfile3.dump'
try:
subprocess.check_output(stk, shell=True, stderr=subprocess.STDOUT)
except subprocess.CalledProcessError as e:
raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))
I get :
RuntimeError: command 'pg_dump --dbname=postgresql://postgres:password#127.0.0.1:5432/test > C:\backup\dumpfile3.dump' return with error (code 1): b"'pg_dump' is not recognized as an internal or external command,\r\noperable program or batch file.\r\n"
If I use: subprocess.run or subprocess.call I doesn't produce error, but the created file it's empty.
It seems that neither os.system or subprocess on the anaconda interpreter, got access to the environment variables on the command shell. How is this possible? and how I can overcome it?. It is different user invoking the shell?
Thanks in advance.
The Computer was restarted, and it solves the issue... . There was no change in the paths, I believe that from the moment things (python ,postgres, ...) were installed, the machine hasn"t been restarted.
import os
os.system("pg_dump --dbname=postgresql://postgres:password#127.0.0.1:5432/test > C:\backup\dumpfile3.dump" )
worked!, and
import subprocess
subprocess.call(r"C:\some\path\backup.bat")
also worked!. Inside backup.bat is:
pg_dump pg_dump --dbname=postgresql://postgres:password#127.0.0.1:5432/test > C:\backup\dumpfile3.dump
I imagine that the issue was that the anaconda interpreter need a system restart to get access to the environment variables (where the postgres variable was), witch make very little sense as return with error (code 1): b"'pg_dump' is not recognized as an internal or external command,\r\noperable program or batch file.\r\n" seen like a console message.
If anyone have a better explanation, is welcome.
I am trying to run an mlflow server locally.
To do so, I am using:
mlflow server --backend-store-uri="sqlite:///C:\\path\\to\\project_folder\\backend\\mlflow_data.db"
--default-artifact-root="file:///C:\\path\\to\\project_folder\\artifact_store\\"
Where
- backend-store-uri: URI to which to persist experiment and run data (sqlite database in our case).
- default-artifact-root: Local or S3 URI to store artifacts, for new experiments (local folder in our case).
I have already install the packages:
numpy==1.17.3
pandas==0.25.3
jupyterlab==1.0.10
scikit-learn==0.21.3
matplotlib==3.1.2
mlflow==1.4.0
torch==1.3.1+cpu
torchvision==0.4.2+cpu
xgboost==0.90
The problem is that I am getting this error:
Fatal error in launcher: Unable to create process using "d:\bld\mlflow_1572494804636\_h_env\python.exe" "C:\Users\user\AppData\Local\Continuum\miniconda3\envs\mlflow-tut orial\Scripts\mlflow.exe" server --backend-store-uri=sqlite:///C:\\projects\\user\\mlflow_tutorial\\backend\\mlflow_ui_data.db --default-artifact-root=file:///C:\\projects \\user\\mlflow_tutorial\\artifact_store\\
When I run the mlflow server command. Any ideas ?
It seems that there is a space in the path:
I'm running JupyterLab from Anaconda, and installed a JupyterLab plotly extension using:
conda install -c conda-forge jupyterlab-plotly-extension
Apparently, the installation was successful, but something is still wrong.
When launching JuyterLab, I'm getting this prompt:
Clicking BUILD gives me this:
And clicking RELOAD relods JupyterLab, BUT I'm getting this again:
And on and on it spins. Does anyone know why?
Clicking CANCEL does not help either because plotly won't produce any plots, only blank spaces:
Solution:
Deactivate firewall and run the following command in a windows command prompt:
jupyter lab build
The details:
This turned out to be a firewall problem, and I'm not sure why it would not be prompted as such in the JupyterLab interface. The following command in a windows command prompt returned the error message below:
Command:
jupyter lab build
Output:
C:>jupyter labextension list JupyterLab v0.34.9 Known labextensions:
app dir:
C:\Users*******\AppData\Local\Continuum\anaconda3\share\jupyter\lab
#jupyterlab/plotly-extension v0.18.2 enabled ok
Build recommended, please run jupyter lab build:
#jupyterlab/plotly-extension needs to be included in build
C:>jupyter lab build [LabBuildApp] JupyterLab 0.34.9 [LabBuildApp]
Building in
C:\Users*******\AppData\Local\Continuum\anaconda3\share\jupyter\lab
[LabBuildApp] > node
C:\Users*******\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js
install yarn install v1.9.4 info No lockfile found. [1/4] Resolving
packages... error An unexpected error occurred:
"https://registry.yarnpkg.com/#jupyterlab%2fapplication: self signed
certificate in certificate chain". info If you think this is a bug,
please open a bug report with the information provided in
"C:\Users\*******\AppData\Local\Continuum\anaconda3\share\jupyter\lab\staging\yarn-error.log".
What pointed me towards suspecting a firewall problem was this part:
self signed certificate in certificate chain
Running the same command on less rigid fire-wall settings triggers this output (shortened):
WARNING in d3-array Multiple versions of d3-array found:
1.2.4 ./~/d3-scale/~/d3-array from ./~/d3-scale/~/d3-array\src\index.js
2.2.0 ./~/d3-array from ./~/d3-array\src\index.js
Check how you can resolve duplicate packages:
https://github.com/darrenscerri/duplicate-package-checker-webpack-plugin#resolving-duplicate-packages-in-your-bundle
Child html-webpack-plugin for "index.html":
1 asset
Entrypoint undefined = index.html
[KTNU] ./node_modules/html-loader!./templates/partial.html 567 bytes {0} [built]
[YuTi] (webpack)/buildin/module.js 497 bytes {0} [built]
[aS2v] ./node_modules/html-webpack-plugin/lib/loader.js!./templates/template.html
1.22 KiB {0} [built]
[yLpj] (webpack)/buildin/global.js 489 bytes {0} [built]
+ 1 hidden module
And despite some warning messages, JupyterLab now produces plotly figures without any problems:
I tried executing a python command using ksh in SAP BODS script to run a program called "zzz.py" in the BODS server:
print(exec('ksh', '-c "python --version"', 8));
print(exec('ksh', '-c "python zzz.py"', 8));
However, upon executing the script, I got the following output:
3850 2990602048 PRINTFN 11/2/2017 4:26:17 PM 0: Python 2.7.9
3850 2990602048 PRINTFN 11/2/2017 4:26:17 PM 1: Could not find platform independent libraries <prefix> Could not find platform dependent libraries <exec_prefix>
3850 2990602048 PRINTFN 11/2/2017 4:26:17 PM Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] ImportError: No module named site
While I proceeded to add the export PYTHONHOME=/usr/bin/python, and executed the printenv command, the PYTHONHOME path is not shown.
I went ahead to use SSH to access the server via PuTTy, and executing the command works perfectly. However, when running the python --version command, it shows that my version in 2.7.5 as opposed to the one shown in BODS. I tried adding the PYTHONHOME path as well, but it did not help in the BODS (and instead i cannot run the python command in my SSH session, which of course i went to unset it and SSH session works normally now)
May I seek some help in this? THANKS!
Managed to solve this:
When executing from BODS, a different user is being used (as opposed to root which was being used for SSH). Had to set "export LD_LIBRARY_PATH=/usr/local/lib" before executing python and it works.