I Installed Hadoop , Pig, Hive, HBase and Zookeeper successfully.
I installed Apache Phoenix to access HBase. Below are my PATH variables.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH="$PATH:$JAVA_HOME/bin"
export PATH="/home/vijee/anaconda3/bin:$PATH"
export HADOOP_HOME=/home/vijee/hadoop-2.7.7
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH="$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin"
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export ZOOKEEPER_HOME=/home/vijee/apache-zookeeper-3.6.2-bin
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export HBASE_HOME=/home/vijee/hbase-1.4.13-bin
export PATH=$PATH:$HBASE_HOME/bin
export PHOENIX_HOME=/home/vijee/apache-phoenix-4.15.0-HBase-1.4-bin
export PATH=$PATH:$PHOENIX_HOME/bin
I copied phoenix-4.15.0-HBase-1.4-client.jar, phoenix-4.15.0-HBase-1.4-server.jar, phoenix-core-4.15.0-HBase-1.4.jar to HBase lib directory and restarted Hbase and Zookeeper.
When I run the below Phoenix command, it is throwing error
(base) vijee#vijee-Lenovo-IdeaPad-S510p:~/apache-phoenix-4.15.0-HBase-1.4-bin/bin$ psql.py localhost $PHOENIX_HOME/examples/WEB_STAT.sql $PHOENIX_HOME/examples/WEB_STAT.csv $PHOENIX_HOME/examples/WEB_STAT_QUERIES.sql
Traceback (most recent call last):
File "/home/vijee/apache-phoenix-4.15.0-HBase-1.4-bin/bin/psql.py", line 57, in <module>
if hbase_env.has_key('JAVA_HOME'):
AttributeError: 'dict' object has no attribute 'has_key'
My Python Version
$ python --version
Python 3.8.3
I know it is Python compatability issue and psql.py is written for Python 2.x.
How to resolve this issue?
Briefly searching, it looks like HBase-1.4 is from 2017, while the latest stable is 2.2.5 .. the release notes imply that it works with Python 3
Consider simply using the newer jar Apache Archive Link for stable files
At least psql.py in the latest Apache Phoenix code does appear to support Python 3 https://github.com/apache/phoenix/blob/master/bin/psql.py so you should be able to get a newer version than you have which will work with it.
This can be seen in the latest commit to it
commit history by-file on GitHub: https://github.com/apache/phoenix/commits/master/bin/psql.py
commit for Python3 support with in being fixed
If you must use 1.4.x, you may be able to run psql.py with Python 2 instead. Most operating systems will accept having them installed in parallel, though it may make some dependency management confusing and it is not a maintainable solution.
.has_key was removed in 3.x, use in instead!
See also Should I use 'has_key()' or 'in' on Python dicts?
Related
After I install Google cloud sdk in my computer, I open the terminal and type "gcloud --version" but it says "python was not found"
note:
I unchecked the box saying "Install python bundle" when I install Google cloud sdk because I already have python 3.10.2 installed.
so, how do fix this?
Thanks in advance.
As mentioned in the document:
Cloud SDK requires Python; supported versions are Python 3 (preferred,
3.5 to 3.8) and Python 2 (2.7.9 or later). By default, the Windows version of Cloud SDK comes bundled with Python 3 and Python 2. To use
Cloud SDK, your operating system must be able to run a supported
version of Python.
As suggested by #John Hanley the CLI cannot find Python which is already installed. Try reinstalling the CLI selecting install Python bundle. If you are still facing the issue another workaround can be to try with Python version 2.x.x .
You can follow the below steps :
1.Uninstall all Python version 3 and above.
2.Install Python version -2.x.x (I have installed - 2.7.17)
3.Create environment variable - CLOUDSDK_PYTHON and provide value as C:\Python27\python.exe
4.Run GoogleCloudSDKInstaller.exe again.
On ubuntu Linux, you can define this variable in the .bashrc file:
export CLOUDSDK_PYTHON=/usr/bin/python3
On Windows, setting the CLOUDSDK_PYTHON environment variable fixes this, but when I first tried this I pointed the variable to the folder containing the python executable and that didn't work. The variable apparently must point to the executable file.
I'm trying to setup pyspark on my desktop and interact with it via the terminal.
I'm following this guide,
http://jmedium.com/pyspark-in-python/
When I run 'pyspark' in the terminal is says,
/home/jacob/spark-2.1.0-bin-hadoop2.7/bin/pyspark: line 45: python:
command not found
env: ‘python’: No such file or directory
I've followed several guides which all lead to this same issue (some have different details on setting up the .profile. Thus far none have worked correctly).
I have java, python3.6, and Scala installed.
My .profile is configured as follows:
#Spark and PySpark Setup
PATH="$HOME/bin:$HOME/.local/bin:$PATH"
export SPARK_HOME='/home/jacob/spark-2.1.0-bin-hadoop2.7'
export PATH=$SPARK_HOME:$PATH
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
#export PYSPARK_DRIVER_PYTHON="jupyter"
#export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYSPARK_PYTHON=python3.6.5
Note that jupyter notebook is commented out because I want to launch pyspark in the shell right now with out the notebook starting
Interestingly spark-shell launches just fine
I'm using Ubuntu 18.04.1 and
Spark 2.1
See Images
I've tried every guide I can find, and since this is my first time setting up Spark i'm not sure how to troubleshoot it from here
Thank you
Attempting to execute pyspark
.profile
versions
You should have set export PYSPARK_PYTHON=python3 instead of export PYSPARK_PYTHON=python3.6.5 in your .profile
then source .profile , of course.
That's worked for me.
other options, installing sudo apt python (which is for 2.x ) is not appropriate.
For those who may come across this, I figured it out!
I specifically chose to use an older version of Spark in order to follow along with a tutorial I was watching - Spark 2.1.0. I did not know that the latest version of Python (3.5.6 at the time of writing this) is incompatible with Spark 2.1. Thus PySpark would not launch.
I solved this by using Python 2.7 and setting the path accordingly in .bashrc
export PYTHONPATH=$PYTHONPAH:/usr/lib/python2.7
export PYSPARK_PYTHON=python2.7
People using python 3.8 and Spark <= 2.4.5 will have the same problem.
In this case, the only solution I found is to update spark to V 3.0.0.
Look at https://bugs.python.org/issue38775
for GNU/Linux users that have python3 package installed (ubuntu/debian distro's specially) you can find a package called "python-is-python3" this would help identifying python3 as python command.
# apt install python-is-python3
python 2.7 is deprecated now (2020 ubuntu 20.10) so do not try installing it.
I have already solved this issue. Just type this command:
sudo apt install python
I tried to follow the instructions in this book:
Large Scale Machine Learning with Python
It uses a VM image to run Spark involving Oracle VM VirtualBox and Vagrant. I almost managed to get the VM working but am blocked by not having the permission to switch virtualization on in the BIOS (I do not have the password and doubt my employer's IT department will allow me to switch this on). See also discussion here.
Anyway, what other options do I have to play with sparkpy locally (have installed it locally)? My first aim is to get this Scala code:
scala> val file = sc.textFile("C:\\war_and_peace.txt")
scala> val warsCount = file.filter(line => line.contains("war"))
scala> val peaceCount = file.filter(line => line.contains("peace"))
scala> warsCount.count()
res0: Long = 1218
scala> peaceCount.count()
res1: Long = 128
running in Python. Any pointers would be very much appreciated.
So you can setup Spark with the python and scala shells on windows, but the caveat is that in my experience performance on windows is inferior to that of osx and linux. If you want to go the route of setting everything up on windows I made a short write up of the instructions for this not too long ago that you can check out here. I am pasting the text below just incase I ever move the file from that repo or the links breaks for some other reason.
Download and Extract Spark
Download latest release of spark from apache.
Be aware that it is critical that you get the right Hadoop binaries for the version of spark you choose. See section on Hadoop binaries below.
Extract with 7-zip.
Install Java and Python
Install latest version of 64-bit Java.
Install Anaconda3 Python 3.5, 64-bit (or other version of your choice) for all users. Restart server.
Test Java and Python
Open command line and type java -version. If it is installed properly you will see an output like this:
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
Type either python or python --version.
The first will open the python shell after showing the version information. The second will show only version information similar to this:
Python 3.5.2 :: Anaconda 4.2.0 (64-bit)
Download Hadoop binary for Windows 64-bit
You likely don't have Hadoop installed on windows, but spark will deep within its core look for this file and possibly other binaries. Thankfully a Hadoop contributor has compiled these and has a repository with binaries for Hadoop 2.6. These binaries will work for spark version 2.0.2, but will not work with 2.1.0. To use spark 2.1.0 download the binaries from here.
The best tactic for this is to clone the repo and keep the Hadoop folder corresponding to your version of spark and add the hadoop-%version% folder to your path as HADOOP_HOME.
Add Java and Spark to Environment
Add the path to java and spark as environment variables JAVA_HOME and SPARK_HOME respectively.
Test pyspark
In command line, type pyspark and observe output. At this point spark should start in the python shell.
Setup pyspark to use Jupyter notebook
Instructions for using interactive python shells with pyspark exist within the pyspark code and can be accessed through your editor. In order to use the Jupyter notebook before launching pyspark type the following two commands:
set PYSPARK_DRIVER_PYTHON=jupyter
set PYSPARK_DRIVER_PYTHON_OPTS='notebook'
Once those variables are set pyspark will launch in the Jupyter notebook with the default SparkContext initialized as sc and the SparkSession initialized as spark. ProTip: Open http://127.0.0.1:4040 to view the spark UI which includes lots of useful information about your pipeline and completed processes. Any additional notebooks open with spark running will be in consecutive ports i.e. 4041, 4042, etc...
The jist is that getting the right versions of the Hadoop binary files for your version of spark is critical. The rest is making sure your path and environment variable are properly configured.
I'm trying to write a function in PostgreSQL on Windows with a Python script in the body and i'm running into an error message when trying to create the plpythonu extension. The command I'm running is:
CREATE EXTENSION plpythonu;
Which produces the following error message:
ERROR: could not access file "$libdir/plpython2": No such file or directory
SQL state: 58P01
I also tried running:
CREATE EXTENSION plpython3u;
which results in this error:
ERROR: could not load library "C:/Program Files (x86)/PostgreSQL/9.2/lib/plpython3.dll": The specified module could not be found.
SQL state: 58P01
The plpython3.dll file exists at this location, but apparently is missing some critical dependency. I've searched everywhere and found nothing helpful on this. I have both Python 2 and 3 installed on the machine...
The newest (9.4 or later) binary installations from EnterpriseDB contain only plpython3u.dll. In versions 9.4 to 9.6 I had to install python 3.3 to get plpython3u run.
You can check which version of Python is needed by plpython3u.dll using Dependency Walker.
A full answer can be found:
https://postgresrocks.enterprisedb.com/t5/PostgreSQL/unable-to-install-pl-python-extension/m-p/4090
It assumes you have used stackbuilder to install the edb language pack.
Do check the commands for correctness in your installation.
E.g. path to postgresql data, install path of edb and python version.
When you use depency walker (depends.exe), only pay attention to the pythonxx.dll. With older PG versions, this may or may not agree to the version installed by the EDB languages package. For version 10.7, version 3.4 Python is required. For windows, the later 3.4 Python versions do not appear to have a msi installer. You may have to install 3.4.4, or try to upgrade PG 10 to the latest version (10.11) first. This version requires python 3.7, so then you can use the EDB download.
But the python version may already exist and be found.
could not load library plpython3.dll (here on stackoverflow) was somewhat close, but did not detail the environment vars needed.
the solution proposed does not require you to change env vars permanently, which is a great help when using several python installations.
I'm configuring a new Developing Server that came with Windows 7 64bits.
It must have installed Trac with Subversion integration.
I install Subversion with VisualSVN 2.1.1, clients with TortoiseSVN 1.6.7 and AnkhSVN 2.1.7 for Visual Studio 2008 SP1 integration.
All works fine! my problem begun when going to Trac installation.
I install python 2.6 all fine.
Trac hasn't a x64 windows installer, so I installed it manually by compiling it with python console (C:\Python26\python.exe C:/TRAC/setup.py install).
After that, I can create TRAC projects normally, the Trac core is working fine. And so the problem begins, lets take a look at the Trac INSTALL file:
Requirements
To install Trac, the following software packages must be installed:
Python, version >= 2.3.
Subversion, version >= 1.0. (>= 1.1.xrecommended)
Subversion SWIG Python bindings (not PySVN).
PySQLite,version 1.x (for SQLite 2.x) or version 2.x (for SQLite 3.x)
Clearsilver, version >= 0.9.3 (0.9.14 recommended)
Python: OK
Subverion: OK
Subversion SWIG Python bindings (not PySVN):
Here I face the first issue, he asks me for 'cd' to the swig directory and run the 'configure' file, and the result is:
C:\swigwin-1.3.40> c:\python26\python.exe configure
File "configure", line 16
DUALCASE=1; export DUALCASE # for MKS sh
^
SyntaxError: invalid syntax
PySQLite, version 1.x (for SQLite 2.x) or version 2.x (for SQLite 3.x):
Don't need, as Python 2.6 comes with SQLLite
Clearsilver, version >= 0.9.3 (0.9.14 recommended):
Second issue, Clearsilver only has 32bit installer wich does not recognize python installation (as registry keys are in different places from 32 to 64 bits).
So I try to manually install it with python console. It returns me a error of the same kind as SWIG:
C:\clearsilver-0.10.5>C:\python26\python.exe ./configure
File "./configure", line 13
if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
^
SyntaxError: invalid syntax
When I simulate a web server using the "TRACD" command, it runs fine when I disable svn support but when I try to open the web page it shows me a error regarding ClearSilver is not installed for generating the html content.
AND (for making me more happy) This TRAC will run over IIS7, I mustn't install Apache...
I'm nearly crazy with this issue... HELP!!!
Just export Register from [HKEY_LOCAL_MACHINE\SOFTWARE\Python] to [HKEY_CURRENT_USER\Software\Python].
It's happens because trac only see the [HKEY_CURRENT_USER\Software\Python] and you installed the python "For all users"
I would expect that Trac on Windows instructions should work on x64, even if they're 32-bit packages. Have you tried this and failed?
Subversion SWIG Python bindings:
configure is not meant to be run by Python; it's meant to be run with a POSIX sh, like Bash or ksh. However, if you read subversion/bindings/swig/INSTALL you'll find that the installation instructions for Windows do not use configure; instead, Visual Studio and gen-make.py are used.
Note that your bindings should match your installed Subversion.
Clearsilver:
Likewise, configure is meant for a sh, not Python. Clearsilver compilation instructions for Windows can be found in clearsilver/python/README.txt.
Looks like I'm not the only one trying to install Trac on Win 7 64-bit, only to see the install crash and burn.
One problem is the lack of installed registry entries for Python on Win x64, which I was able to find via a web search. The problem was identified months ago, but unfortunately no patch release has been made available.
I was almost ready to give up on Trac, but the information here has given me new hope. Thanks all!