HADOOP_CONF_DIR not found error in python pydoop program - python

I am using Pydoop to connect to hdfs files system inside the python program. This python program try to read/ write files in hdfs. When I try to execute I am getting error.
The command used to execute :
Command :
hadoop jar /usr/share/bigdata/hadoop-1.2.0/contrib/streaming/hadoop-streaming-1.2.0.jar -file ./Methratio.py -mapper './Methratio.py -d /user/hadoop/gnome.fa -r -g -o hdfs://ai-ole6-main.ole6.com:54311/user/hadoop/bsmapout.txt hdfs://ai-ole6-main.ole6.com:54311/user/hadoop/Example.bam ' -input sampleinput.txt -output outfile
The Error :
Traceback (most recent call last):
File "/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201501251859_0001/attempt_201501251859_0001_m_000000_1/work/./Methratio.py", line 2, in <module>
import sys, time, os, array, optparse,pydoop.hdfs as hdfs
File "/usr/local/lib/python2.7/site-packages/pydoop-1.0.0_rc1-py2.7.egg/pydoop/hdfs/__init__.py", line 98, in <module>
init()
File "/usr/local/lib/python2.7/site-packages/pydoop-1.0.0_rc1-py2.7.egg/pydoop/hdfs/__init__.py", line 92, in init
pydoop.hadoop_classpath(), _ORIG_CLASSPATH, pydoop.hadoop_conf()
File "/usr/local/lib/python2.7/site-packages/pydoop-1.0.0_rc1-py2.7.egg/pydoop/__init__.py", line 103, in hadoop_classpath
return _PATH_FINDER.hadoop_classpath(hadoop_home)
File "/usr/local/lib/python2.7/site-packages/pydoop-1.0.0_rc1-py2.7.egg/pydoop/hadoop_utils.py", line 551, in hadoop_classpath
jars.extend([self.hadoop_native(), self.hadoop_conf()])
File "/usr/local/lib/python2.7/site-packages/pydoop-1.0.0_rc1-py2.7.egg/pydoop/hadoop_utils.py", line 493, in hadoop_conf
PathFinder.__error("hadoop conf dir", "HADOOP_CONF_DIR")
File "/usr/local/lib/python2.7/site-packages/pydoop-1.0.0_rc1-py2.7.egg/pydoop/hadoop_utils.py", line 385, in __error
raise ValueError("%s not found, try setting %s" % (what, env_var))
ValueError: hadoop conf dir not found, try setting HADOOP_CONF_DIR
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
The code :
with hdfs.open(options.reffile) as hdfsfile:
for line in hdfsfile.open(options.reffile):
if line[0] == '>':
#some processing

HADOOP_CONF_DIR environment variable must be set to appropriate location i.e. the path to the folder containing files like core-site.xml, mapred-site.xml, hdfs-site.xml etc. Generally these files can be found in hadoop/etc/ folder.
In my case, I installed Hadoop 2.6 from tarball and placed the extracted folder in /usr/local.
I added the following line in ~/.bashrc
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
Then just give the command source ~/.bashrc from terminal.

Related

getting error while running mrjob python scripting in hadoop cluster

hi i want to sort movie ratings by a python script but i am getting error
`[root#sandbox-hdp maria_dev]# python RatingsBreakdown.py -r hadoop --hadoop-streaming-jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar u.data
No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in $PATH...
Found hadoop binary: /usr/bin/hadoop
Using Hadoop version 3.1.1.3.0.1.0
Creating temp directory /tmp/RatingsBreakdown.maria_dev.20190830.233300.332634
STDERR: mkdir: Permission denied: user=root, access=WRITE, inode="/user/maria_dev" :maria_dev:hdfs:drwxr-xr-x
Traceback (most recent call last):
File "RatingsBreakdown.py", line 19, in <module>
RatingsBreakdown.run()
File "/usr/lib/python2.7/site-packages/mrjob/job.py", line 446, in run
mr_job.execute()
File "/usr/lib/python2.7/site-packages/mrjob/job.py", line 473, in execute
super(MRJob, self).execute()
File "/usr/lib/python2.7/site-packages/mrjob/launch.py", line 202, in execute
self.run_job()
File "/usr/lib/python2.7/site-packages/mrjob/launch.py", line 247, in run_job
return self._handle(name, path, path)
File "/usr/lib/python2.7/site-packages/mrjob/fs/composite.py", line 118, in _han dle
return getattr(fs, name)(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/mrjob/fs/hadoop.py", line 298, in mkdir
raise IOError("Could not mkdir %s" % path)
IOError: Could not mkdir hdfs:///user/maria_dev/tmp/mrjob/RatingsBreakdown.maria_d ev.20190830.233300.332634/files/wd`
can you plese describe what is the problem here
Please take a look at these 2 references:
Permission denied at hdfs
Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:dr
i found that hortonworks take a lot of time to boot up
when i booted up correctly it worked fine
it took about 1 hour to boot

Error in creating a conda recipe from a Bitbucket package

I've a package on Bitbucket which contains code files in Python, R, and bash.
I'm using a laptop running Linux CentOS 7.
I'd like to create a conda package for it. I've started with creating a conda recipe, but I probably made some mistakes. I'm using conda 4.3.18.
I tried to build my conda recipe with the following command, but it generated several errors that I cannot interpret:
$ conda build behst_conda_recipe/
BUILD START: behst--0
pulling from https://bitbucket.org/PROJECT_ADDRESS
searching for changes
no changes found
checkout: 'tip'
updating to branch default
108 files updated, 0 files merged, 0 files removed, 0 files unresolved
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
Package: behst--0
source tree in: /home/davide/miniconda3/conda-bld/behst_1495134385344/work
+ source /home/davide/miniconda3/bin/activate /home/davide/miniconda3/conda-bld/behst_1495134385344/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac
+ set -o nounset -o pipefail -o errexit
+ set -o xtrace
+ echo 'Running build.sh'
Running build.sh
INFO conda_build.build:bundle_conda(861): Packaging behst--0
number of files: 0
Fixing permissions
Fixing permissions
Traceback (most recent call last):
File "/home/davide/miniconda3/lib/python3.5/site-packages/conda_build/utils.py", line 133, in _copy_with_shell_fallback
stderr=subprocess.PIPE, stdout=subprocess.PIPE)
File "/home/davide/miniconda3/lib/python3.5/subprocess.py", line 581, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cp -a /home/davide/miniconda3/conda-bld/behst_1495134385344/work/LICENSE /home/davide/miniconda3/conda-bld/behst_1495134385344/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/info/LICENSE.txt' returned non-zero exit status 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/davide/miniconda3/bin/conda-build", line 6, in <module>
sys.exit(conda_build.cli.main_build.main())
File "/home/davide/miniconda3/lib/python3.5/site-packages/conda_build/cli/main_build.py", line 334, in main
execute(sys.argv[1:])
File "/home/davide/miniconda3/lib/python3.5/site-packages/conda_build/cli/main_build.py", line 325, in execute
noverify=args.no_verify)
File "/home/davide/miniconda3/lib/python3.5/site-packages/conda_build/api.py", line 97, in build
need_source_download=need_source_download, config=config)
File "/home/davide/miniconda3/lib/python3.5/site-packages/conda_build/build.py", line 1518, in build_tree
config=config)
File "/home/davide/miniconda3/lib/python3.5/site-packages/conda_build/build.py", line 1154, in build
built_package = bundlers[output_dict.get('type', 'conda')](output_dict, m, config, env)
File "/home/davide/miniconda3/lib/python3.5/site-packages/conda_build/build.py", line 893, in bundle_conda
create_info_files(metadata, files, config=config, prefix=config.build_prefix)
File "/home/davide/miniconda3/lib/python3.5/site-packages/conda_build/build.py", line 494, in create_info_files
copy_license(m, config)
File "/home/davide/miniconda3/lib/python3.5/site-packages/conda_build/build.py", line 272, in copy_license
locking=config.locking)
File "/home/davide/miniconda3/lib/python3.5/site-packages/conda_build/utils.py", line 177, in copy_into
_copy_with_shell_fallback(src, dst_fn)
File "/home/davide/miniconda3/lib/python3.5/site-packages/conda_build/utils.py", line 136, in _copy_with_shell_fallback
raise OSError("Failed to copy {} to {}. Error was: {}".format(src, dst, e))
OSError: Failed to copy /home/davide/miniconda3/conda-bld/behst_1495134385344/work/LICENSE to /home/davide/miniconda3/conda-bld/behst_1495134385344/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/info/LICENSE.txt. Error was: Command 'cp -a /home/davide/miniconda3/conda-bld/behst_1495134385344/work/LICENSE /home/davide/miniconda3/conda-bld/behst_1495134385344/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/info/LICENSE.txt' returned non-zero exit status 1
Exception ignored in: <bound method BaseFileLock.__del__ of <filelock.UnixFileLock object at 0x7f30df7349e8>>
Traceback (most recent call last):
File "/home/davide/miniconda3/lib/python3.5/site-packages/filelock.py", line 305, in __del__
File "/home/davide/miniconda3/lib/python3.5/site-packages/filelock.py", line 292, in release
File "/home/davide/miniconda3/lib/python3.5/site-packages/filelock.py", line 371, in _release
AttributeError: 'NoneType' object has no attribute 'flock'
Does anyone know what these mistakes mean?
EDIT: Here's the meta.yaml file:
package:
name: behst
source:
hg_url: https://bitbucket.org/PROJECT_ADDRESS
about:
home: https://bitbucket.org/PROJECT_ADDRESS
license: BSD
license_file: LICENSE
While the build.sh at the moment is just an echo command:
#!/bin/bash
#
#$ -cwd
#$ -S /bin/bash
#
set -o nounset -o pipefail -o errexit
set -o xtrace
echo "Running build.sh"
Do you have your home dir encrypted? Here are related issues on github.
The solution is to use --croot argument pointing somewhere outside your home, for e.g. /tmp/_conda_build_.
You seem to have your Home directory encrypted. The character length when dealing with encrypted folder is reduced from the normal 255 character length.
Hence the only solution, even as per the contributors for conda, is to use --croot with a non encrypted location like /tmp/whateverfolder
conda config --prepend pkgs_dirs /tmp/temp_conda_recipe
and then
conda build behst_conda_recipe/ --croot /tmp/temp_conda_recipe

TensorFlow : Error building pip package

I am trying to build TensorFlow from source. After configuring the installation, when I try to build to the pip package with the following command,
$ bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
I get the following error message:
ERROR: /workspace/tensorflow/core/BUILD:1312:1: Executing genrule //tensorflow/core:version_info_gen failed: bash failed: error executing command
(cd /root/.cache/bazel/_bazel_root/eab0d61a99b6696edb3d2aff87b585e8/execroot/workspace && \
exec env - \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
/bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; tensorflow/tools/git/gen_git_source.py --generate tensorflow/tools/git/gen
/spec.json tensorflow/tools/git/gen/head tensorflow/tools/git/gen/branch_ref "bazel-out/host/genfiles/tensorflow/core/util/version_info.cc"'): com.goo
gle.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
Traceback (most recent call last):
File "tensorflow/tools/git/gen_git_source.py", line 260, in <module>
generate(args.generate)
File "tensorflow/tools/git/gen_git_source.py", line 212, in generate
git_version = get_git_version(data["path"])
File "tensorflow/tools/git/gen_git_source.py", line 152, in get_git_version
str("--work-tree=" + git_base_path), "describe", "--long", "--tags"
File "/usr/lib/python2.7/subprocess.py", line 566, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 8.567s, Critical Path: 7.90s
What's going wrong?
(Ubuntu 14.04, CPU only)
Your build appears to be encountering an error in
tensorflow/tools/git/gen_git_source.py
at line 152. At this stage in the build the script is trying to get the git version number of your tensor flow repo. Have you used git to check out your tensor flow repo? Are the .git files present in the /tensorflow/ root dir? Maybe you need to update your version of git?
Looks similar to this question: Build Error Tensorflow
I encountered this error even though I had git in my PATH variable. I got a hint from https://stackoverflow.com/a/5659249/212076 that the launched subprocess was not getting the PATH variable.
The solution was to hard-code the git command in <ROOT>\tensorflow\tensorflow\tools\git\gen_git_source.py by replacing
val = bytes(subprocess.check_output([
"git", str("--git-dir=%s/.git" % git_base_path),
str("--work-tree=" + git_base_path), "describe", "--long", "--tags"
]).strip())
with
val = bytes(subprocess.check_output([
"C:\Program Files (x86)\Git\cmd\git.cmd", str("--git-dir=%s/.git" % git_base_path),
str("--work-tree=" + git_base_path), "describe", "--long", "--tags"
]).strip())
Once this got fixed I got another error: fatal: Not a git repository: './.git'.
I figured the tensorflow root folder was the one that should have been referenced and so I edited <ROOT>\tensorflow\tensorflow\tools\git\gen_git_source.py to replace
git_version = get_git_version(".")
with
git_version = get_git_version("../../../../")
After that the build was successful.
NOTE: Unlike OP, my build platform was Windows 7 64 bit

Conceptnet5 python setup mac OS

I am trying to install conceptnet locally on a mac.
I took The high-bandwidth, low-computation way https://github.com/commonsense/conceptnet5/wiki/Running-your-own-copy and did the following:
git clone https://github.com/commonsense/conceptnet5
pyvenv-3.4 conceptnet-env
source conceptnet-env/bin/activate
cd conceptnet5
python setup.py develop
make download_db
Then at the last step i am getting the following error :
command
ln -s `readlink -f data` ~/.conceptnet5
output
readlink: illegal option -- f
usage: readlink [-n] [file ...]
So i did ignored this step, because i had seen a google discussion which proposed another method (as the link command was not working for me)
So from this discussion https://github.com/commonsense/conceptnet5/issues/33 I tried the following :
CONCEPTNET_DATA=~/conceptnet5/data/
export CONCEPTNET_DATA
But this didnt work, cause when doing the following :
from conceptnet5.query import lookup
lookup('/c/en/examples')
I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/arj/Desktop/app/conceptnet5/conceptnet5/query.py", line 75, in lookup
self.load_index()
File "/Users/arj/Desktop/app/conceptnet5/conceptnet5/query.py", line 58, in load_index
self._db_filename, self._edge_dir, self.nshards
File "/Users/arj/Desktop/app/conceptnet5/conceptnet5/formats/sql.py", line 211, in __init__
self._connect()
File "/Users/arj/Desktop/app/conceptnet5/conceptnet5/formats/sql.py", line 216, in _connect
self.dbs[i] = sqlite3.connect(filename)
sqlite3.OperationalError: unable to open database file
What should be the OS equivalent of the ln command?
Or what else can i do for conceptnet to locate the database file ?

Change Mapreduce intermediate output location using MRJob

I am trying to run a python script using MRJob on a cluster in which I don't have admin permissions and I got the error pasted below. What I think is happening is that the job is trying to write the intermediate files to the default /tmp.... dir and since this is a protected directory to which I don't have permission to write, the job receives an error and exits. I would like to know how I can change this tmp output directory location to someplace in my local filesystem example:
/home/myusername/some_path_in_my_local_filesystem_on_the_cluster , basically I would like to know what additional parameters I would have to pass to change the intermediate output location from /tmp/... to some place local where I have write permission.
I invoke my script as:
python myscript.py input.txt -r hadoop > output.txt
The error:
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/13435.1.all.q/mr_word_freq_count.myusername.20131215.004905.274232
writing wrapper script to /tmp/13435.1.all.q/mr_word_freq_count.myusername.20131215.004905.274232/setup-wrapper.sh
STDERR: mkdir: org.apache.hadoop.security.AccessControlException: Permission denied: user=myusername, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
Traceback (most recent call last):
File "/home/myusername/privatemodules/python/examples/mr_word_freq_count.py", line 37, in <module>
MRWordFreqCount.run()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/job.py", line 500, in run
mr_job.execute()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute
super(MRJob, self).execute()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute
self.run_job()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/launch.py", line 207, in run_job
runner.run()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/runner.py", line 458, in run
self._run()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 236, in _run
self._upload_local_files_to_hdfs()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 263, in _upload_local_files_to_hdfs
self._mkdir_on_hdfs(self._upload_mgr.prefix)
Are you running mrjob as a "local" job, or trying to run it on your Hadoop cluster?
If you are actually trying to use it on Hadoop, you can control the "scratch" HDFS location (where mrjob will store intermediate files) using the --base-tmp-dir flag:
python mr.py -r hadoop -o hdfs:///user/you/output_dir --base-tmp-dir hdfs:///user/you/tmp hdfs:///user/you/data.txt

Categories

Resources