getting error while running mrjob python scripting in hadoop cluster

getting error while running mrjob python scripting in hadoop cluster - python

hi i want to sort movie ratings by a python script but i am getting error
`[root#sandbox-hdp maria_dev]# python RatingsBreakdown.py -r hadoop --hadoop-streaming-jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar u.data
No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in $PATH...
Found hadoop binary: /usr/bin/hadoop
Using Hadoop version 3.1.1.3.0.1.0
Creating temp directory /tmp/RatingsBreakdown.maria_dev.20190830.233300.332634
STDERR: mkdir: Permission denied: user=root, access=WRITE, inode="/user/maria_dev" :maria_dev:hdfs:drwxr-xr-x
Traceback (most recent call last):
File "RatingsBreakdown.py", line 19, in <module>
RatingsBreakdown.run()
File "/usr/lib/python2.7/site-packages/mrjob/job.py", line 446, in run
mr_job.execute()
File "/usr/lib/python2.7/site-packages/mrjob/job.py", line 473, in execute
super(MRJob, self).execute()
File "/usr/lib/python2.7/site-packages/mrjob/launch.py", line 202, in execute
self.run_job()
File "/usr/lib/python2.7/site-packages/mrjob/launch.py", line 247, in run_job
return self._handle(name, path, path)
File "/usr/lib/python2.7/site-packages/mrjob/fs/composite.py", line 118, in _han dle
return getattr(fs, name)(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/mrjob/fs/hadoop.py", line 298, in mkdir
raise IOError("Could not mkdir %s" % path)
IOError: Could not mkdir hdfs:///user/maria_dev/tmp/mrjob/RatingsBreakdown.maria_d ev.20190830.233300.332634/files/wd`
can you plese describe what is the problem here

Please take a look at these 2 references:
Permission denied at hdfs
Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:dr

i found that hortonworks take a lot of time to boot up
when i booted up correctly it worked fine
it took about 1 hour to boot

Related

Django API raises an error "No such file or directory: 'manage.py'" when requested

This might seem like an already asked question but I have searched for an answer for a week now and got nothing.
The problem is I have developed an API using Django which is hosted on a server. Now when I run the following command to initiate the server :
python manage.py runserver 0.0.0.0:9000
The server starts as usual. Its only when I send request to the server via "Postman" that I see the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'manage.py'
The strange thing is there is no error in running the server but only when I send a request to it. Also I have many more Django APIs running on the same server with same python version(Python 3.4.3) and same virtual environment (but different port) that are running just fine.
Full error traceback :
Traceback (most recent call last):
File "manage.py", line 15, in <module>
execute_from_command_line(sys.argv)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/__init__.py", line 371, in execute_from_command_line
utility.execute()
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/__init__.py", line 365, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/base.py", line 288, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/commands/runserver.py", line 61, in execute
super().execute(*args, **options)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/base.py", line 335, in execute
output = self.handle(*args, **options)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/commands/runserver.py", line 98, in handle
self.run(**options)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/commands/runserver.py", line 105, in run
autoreload.main(self.inner_run, None, options)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/utils/autoreload.py", line 317, in main
python_reloader(wrapped_main_func, args, kwargs)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/utils/autoreload.py", line 296, in python_reloader
reloader_thread()
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/utils/autoreload.py", line 274, in reloader_thread
change = fn()
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/utils/autoreload.py", line 204, in code_changed
stat = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory: 'manage.py'
Things I have tried:
I have tried changing the !# as suggested on various posts.
I have tried using dos2unix to convert the file to unix format(server on which my API is hosted is linux based).
I even have tried to create a new Django project.
And yes I'm running manage.py from the correct directory.
I have also tried making manage.py executable by :
chmod +x manage.py
Nothing worked for me so far. Am I missing something?

Unable to deploy to EB using Bitbucket pipelines

We have established pipelines scripts that work very well. Lately, we decided to deploy to elastic beanstalk automatically, with the use of bitbucket pipelines and following the tutorial which uses the command eb deploy to deploy. Apparently, this command fails on pipelines. The config files seem legit because it runs locally. It also runs from inside a container of the same image that we have specified in the pipelines file and also by using docker exec from the local to run the command inside a container of the same image. The following are the pipelines file and the error we get using eb deploy --verbose command. I am obviously missing something here. Any help or direction would be appreciated. Thanking you in advance.
feature/KKLT-1065-deploy-via-pipelines:
- step:
deployment: staging
caches:
- composer
script:
- php -r "file_exists('.env') || copy('.env.example', '.env');"
- cat .env
- composer install
- php artisan cache:clear
- php artisan migrate
- php artisan db:seed
- eb init KMLT-staging-ttl -r eu-central-1 -p "64bit Amazon Linux 2017.09 v2.6.4 running PHP 7.1"
- eb deploy --verbose
services:
- postgres
+ eb deploy --verbose
INFO: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ebcli/core/ebrun.py", line 41, in run_app
app.run()
File "/usr/lib/python2.7/site-packages/cement/core/foundation.py", line 797, in run
return_val = self.controller._dispatch()
File "/usr/lib/python2.7/site-packages/cement/core/controller.py", line 472, in _dispatch
return func()
File "/usr/lib/python2.7/site-packages/cement/core/controller.py", line 475, in _dispatch
self._parse_args()
File "/usr/lib/python2.7/site-packages/cement/core/controller.py", line 452, in _parse_args
self.app._parse_args()
File "/usr/lib/python2.7/site-packages/cement/core/foundation.py", line 1076, in _parse_args
for res in self.hook.run('post_argument_parsing', self):
File "/usr/lib/python2.7/site-packages/cement/core/hook.py", line 150, in run
res = hook[2](*args, **kwargs)
File "/usr/lib/python2.7/site-packages/ebcli/core/hooks.py", line 35, in pre_run_hook
set_profile(app.pargs.profile)
File "/usr/lib/python2.7/site-packages/ebcli/core/hooks.py", line 47, in set_profile
profile = commonops.get_default_profile()
File "/usr/lib/python2.7/site-packages/ebcli/operations/commonops.py", line 973, in get_default_profile
profile = get_config_setting_from_branch_or_default('profile')
File "/usr/lib/python2.7/site-packages/ebcli/operations/commonops.py", line 1008, in get_config_setting_from_branch_or_default
setting = get_setting_from_current_branch(key_name)
File "/usr/lib/python2.7/site-packages/ebcli/operations/commonops.py", line 991, in get_setting_from_current_branch
branch_name = source_control.get_current_branch()
File "/usr/lib/python2.7/site-packages/ebcli/objects/sourcecontrol.py", line 184, in get_current_branch
stdout, stderr, exitcode = self._run_cmd(revparse_command, handle_exitcode=False)
File "/usr/lib/python2.7/site-packages/ebcli/objects/sourcecontrol.py", line 480, in _run_cmd
stdout, stderr, exitcode = exec_cmd(cmd)
File "/usr/lib/python2.7/site-packages/cement/utils/shell.py", line 40, in exec_cmd
proc = Popen(cmd_args, *args, **kw)
File "/usr/lib/python2.7/subprocess.py", line 390, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
INFO: OSError - [Errno 2] No such file or directory

Try python3 version of eb instead of python2.7. Might have more success.

mrjob virtualenv error in Hadoop cluster: Permission denied

I work at a large corporate organization where we have a Hadoop cluster. I got the admin to install virtualenv on all the Hadoop worker nodes so that I can submit mrjobs with standard Python dependencies that may not exist on the worker nodes. As per the documentation here, this is how my mrjob.conf file looks like:
runners:
hadoop:
setup:
- virtualenv venv
- . venv/bin/activate
- pip install nltk
I have a simple job that uses nltk package. I can verify that this setup script runs on the worker nodes (I can put simple commands like write some data to a file in /tmp and it works). However, I get the following error:
New python executable in venv/bin/python
Installing setuptools............done.
Installing pip...
Error [Errno 13] Permission denied while executing command /storage5/hadoop/map...env/bin/easy_install /usr/share/python-virtualenv/pip-1.1.tar.gz
...Installing pip...done.
Traceback (most recent call last):
File "/usr/bin/virtualenv", line 3, in <module>
virtualenv.main()
File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 938, in main
never_download=options.never_download)
File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 1054, in create_environment
install_pip(py_executable, search_dirs=search_dirs, never_download=never_download)
File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 643, in install_pip
filter_stdout=_filter_setup)
File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 976, in call_subprocess
cwd=cwd, env=env)
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
What may be causing this error?

Thanks for this idea for deploying packages to the cluster.
As for your problem I think it looks like it doesn't have permission to write to the directory.

Change Mapreduce intermediate output location using MRJob

I am trying to run a python script using MRJob on a cluster in which I don't have admin permissions and I got the error pasted below. What I think is happening is that the job is trying to write the intermediate files to the default /tmp.... dir and since this is a protected directory to which I don't have permission to write, the job receives an error and exits. I would like to know how I can change this tmp output directory location to someplace in my local filesystem example:
/home/myusername/some_path_in_my_local_filesystem_on_the_cluster , basically I would like to know what additional parameters I would have to pass to change the intermediate output location from /tmp/... to some place local where I have write permission.
I invoke my script as:
python myscript.py input.txt -r hadoop > output.txt
The error:
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/13435.1.all.q/mr_word_freq_count.myusername.20131215.004905.274232
writing wrapper script to /tmp/13435.1.all.q/mr_word_freq_count.myusername.20131215.004905.274232/setup-wrapper.sh
STDERR: mkdir: org.apache.hadoop.security.AccessControlException: Permission denied: user=myusername, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
Traceback (most recent call last):
File "/home/myusername/privatemodules/python/examples/mr_word_freq_count.py", line 37, in <module>
MRWordFreqCount.run()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/job.py", line 500, in run
mr_job.execute()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute
super(MRJob, self).execute()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute
self.run_job()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/launch.py", line 207, in run_job
runner.run()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/runner.py", line 458, in run
self._run()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 236, in _run
self._upload_local_files_to_hdfs()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 263, in _upload_local_files_to_hdfs
self._mkdir_on_hdfs(self._upload_mgr.prefix)

Are you running mrjob as a "local" job, or trying to run it on your Hadoop cluster?
If you are actually trying to use it on Hadoop, you can control the "scratch" HDFS location (where mrjob will store intermediate files) using the --base-tmp-dir flag:
python mr.py -r hadoop -o hdfs:///user/you/output_dir --base-tmp-dir hdfs:///user/you/tmp hdfs:///user/you/data.txt

Google AppEngine Python OSError "too many files open" at launch

I'm using ubuntu 13.10 running in a VM on OSX, python2.7 and GAE 1.8.8.
Lauching dev_appserver.py results in the following error:
INFO 2013-12-10 03:53:30,046 api_server.py:527] Saving search indexes
Traceback (most recent call last):
File "/home/ubuntu/xxxxxx/google_appengine/dev_appserver.py", line 197, in <module>
File "/home/ubuntu/xxxxxx/google_appengine/dev_appserver.py", line 193, in _run_file
File "/home/ubuntu/xxxxxx/google_appengine/google/appengine/tools/devappserver2/devappserver2.py", line 872, in <module>
File "/home/ubuntu/xxxxxx/google_appengine/google/appengine/tools/devappserver2/devappserver2.py", line 868, in main
File "/home/ubuntu/xxxxxx/google_appengine/google/appengine/tools/devappserver2/devappserver2.py", line 707, in stop
File "/home/ubuntu/xxxxxx/google_appengine/google/appengine/tools/devappserver2/api_server.py", line 141, in quit
File "/home/ubuntu/xxxxxx/google_appengine/google/appengine/tools/devappserver2/api_server.py", line 528, in cleanup_stubs
File "/home/ubuntu/xxxxxx/google_appengine/google/appengine/api/search/simple_search_stub.py", line 984, in Write
File "/usr/lib/python2.7/tempfile.py", line 304, in mkstemp
File "/usr/lib/python2.7/tempfile.py", line 239, in _mkstemp_inner
OSError: [Errno 24] Too many open files: '/tmp/appengine.xxxxxx-hr-dev.ubuntu/tmpMVVXrH'
Any ideas?

Check the shared memory parameter, kern.sysv.shmseg on your linux system and set it right by increasing it.
To view the shared memory parameters, use:
sysctl -A | grep shm
To update that parameters, edit file:
sudo nano /etc/sysctl.conf
Refer to this SO answer for more information.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

getting error while running mrjob python scripting in hadoop cluster - python

Please take a look at these 2 references: Permission denied at hdfs Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:dr

i found that hortonworks take a lot of time to boot up when i booted up correctly it worked fine it took about 1 hour to boot

Related

Django API raises an error "No such file or directory: 'manage.py'" when requested

Unable to deploy to EB using Bitbucket pipelines

mrjob virtualenv error in Hadoop cluster: Permission denied

Change Mapreduce intermediate output location using MRJob

Google AppEngine Python OSError "too many files open" at launch

Categories

Resources