spark-submit python file ‘home/.python-eggs’ permission denied - python

I had a problem when I use spark-submit to run a python file.When the 'map' code run in 'executor', the problem like this :
Traceback (most recent call last):
File "/usr/lib64/python2.7/runpy.py", line 151, in _run_module_as_main
mod_name, loader, code, fname = _get_module_details(mod_name)
File "/usr/lib64/python2.7/runpy.py", line 101, in _get_module_details
loader = get_loader(mod_name)
File "/usr/lib64/python2.7/pkgutil.py", line 464, in get_loader
return find_loader(fullname)
File "/usr/lib64/python2.7/pkgutil.py", line 474, in find_loader
for importer in iter_importers(fullname):
File "/usr/lib64/python2.7/pkgutil.py", line 430, in iter_importers
__import__(pkg)
File "/data8/yarn/local-dir/usercache/bo.feng/appcache/application_1448854352032_70810/container_1448854352032_70810_01_000002/pyspark.zip/pyspark/__init__.py", line 41, in <module>
File "/data8/yarn/local-dir/usercache/bo.feng/appcache/application_1448854352032_70810/container_1448854352032_70810_01_000002/pyspark.zip/pyspark/context.py", line 35, in <module>
File "/data8/yarn/local-dir/usercache/bo.feng/appcache/application_1448854352032_70810/container_1448854352032_70810_01_000002/pyspark.zip/pyspark/rdd.py", line 51, in <module>
File "/data8/yarn/local-dir/usercache/bo.feng/appcache/application_1448854352032_70810/container_1448854352032_70810_01_000002/pyspark.zip/pyspark/shuffle.py", line 33, in <module>
File "build/bdist.linux-x86_64/egg/psutil/__init__.py", line 89, in <module>
File "build/bdist.linux-x86_64/egg/psutil/_pslinux.py", line 24, in <module>
File "build/bdist.linux-x86_64/egg/_psutil_linux.py", line 7, in <module>
File "build/bdist.linux-x86_64/egg/_psutil_linux.py", line 4, in __bootstrap__
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 945, in resource_filename
self, resource_name
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1633, in get_resource_filename
self._extract_resource(manager, self._eager_to_zip(name))
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1661, in _extract_resource
self.egg_name, self._parts(zip_path)
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1025, in get_cache_path
self.extraction_error()
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 991, inextraction_error
raise err
pkg_resources.ExtractionError: Can't extract file(s) to egg cache
The following error occurred while trying to extract file(s) to the Python egg
cache:
[Errno 13] Permission denied: '/home/.python-eggs'
The Python egg cache directory is currently set to:
/home/.python-eggs
Perhaps your account does not have write access to this directory? You can
change the cache directory by setting the PYTHON_EGG_CACHE environment
variable to point to an accessible directory.
I set the PYTHON_EGG_CACHE environment variable to every executor,and I also write 'os.environ['PYTHON_EGG_CACHE'] = "/tmp/"' in my program,but the problem is still happen.
My code :
import os,sys
print "env::::"+os.environ['PYTHON_EGG_CACHE']
from pyspark import SparkConf, SparkContext
# Load and parse the data
def parsePoint(line):
import os
print "env::::"+os.environ['PYTHON_EGG_CACHE']
os.environ['PYTHON_EGG_CACHE'] = "/tmp/"
values = [float(x) for x in line.split(' ')]
return line
if __name__ == "__main__":
os.environ['PYTHON_EGG_CACHE'] = "/tmp/"
print "env::::"+os.environ['PYTHON_EGG_CACHE']
conf = SparkConf()
sc = SparkContext(conf = conf)
data = sc.textFile(sys.argv[1])
parsedData = data.map(parsePoint)
parsedData.collect()
I run this python program in 'standalone' model and succeeded.
This is my submit command:
spark-submit --name test_py --master yarn-client testpy.py input/sample_svm_data.txt
Is the yarn's problem?

This is late, but it's the first result # google I found with this problem... the previous answer is helpful (i wanted to know which env vars I had to modify), but please DONT modify editing Spark sources, just change environment variables using the proper tools, add this to your spark.conf variables...
spark.executorEnv.PYTHON_EGG_CACHE="./.python-eggs/"
spark.executorEnv.PYTHON_EGG_DIR="./.python-eggs/"
spark.driverEnv.PYTHON_EGG_CACHE="./.python-eggs/"
spark.driverEnv.PYTHON_EGG_DIR="./.python-eggs/"
(I prefer not to use /tmp/ because . will get deleted after my job ends, so eggs should disappear too IMO)

I solved this problem:
unzip the pyspark.zip then find rdd.py file
open this file , under "import os" line ,add code as :
os.environ['PYTHON_EGG_CACHE'] = '/tmp/.python-eggs/'
os.environ['PYTHON_EGG_DIR']='/tmp/.python-eggs/'
save file and zip pyspark

I solved this problem with the help from BiS's answer. By adding the four configuration values when running spark-submit, it fixed the egg problem.
Here's an example of what adding the four parameters looks like when using spark-submit.
spark-submit \
--conf spark.executorEnv.PYTHON_EGG_CACHE="./.python-eggs/" \
--conf spark.executorEnv.PYTHON_EGG_DIR="./.python-eggs/" \
--conf spark.driverEnv.PYTHON_EGG_CACHE="./.python-eggs/" \
--conf spark.driverEnv.PYTHON_EGG_DIR="./.python-eggs/" \

Related

No such file or directory: 'GoogleNews-vectors-negative300.bin'

I have this code :
import gensim
filename = 'GoogleNews-vectors-negative300.bin'
model = gensim.models.KeyedVectors.load_word2vec_format(filename, binary=True)
and this is my folder organization thing :
image of my folder tree that shows that the .bin file is in the same directory as the file calling it, the file being ai_functions
But sadly I'm not sure why I'm having an error saying that it can't find it. Btw I checked, I am sure the file is not corrupted. Any thoughts?
Full traceback :
File "/Users/Ile-Maurice/Desktop/Flask/flaskapp/run.py", line 1, in <module>
from serv import app
File "/Users/Ile-Maurice/Desktop/Flask/flaskapp/serv/__init__.py", line 13, in <module>
from serv import routes
File "/Users/Ile-Maurice/Desktop/Flask/flaskapp/serv/routes.py", line 7, in <module>
from serv.ai_functions import checkplagiarism
File "/Users/Ile-Maurice/Desktop/Flask/flaskapp/serv/ai_functions.py", line 31, in <module>
model = gensim.models.KeyedVectors.load_word2vec_format(filename, binary=True)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gensim/models/keyedvectors.py", line 1629, in load_word2vec_format
return _load_word2vec_format(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gensim/models/keyedvectors.py", line 1955, in _load_word2vec_format
with utils.open(fname, 'rb') as fin:
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/smart_open/smart_open_lib.py", line 188, in open
fobj = _shortcut_open(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/smart_open/smart_open_lib.py", line 361, in _shortcut_open
return _builtin_open(local_path, mode, buffering=buffering, **open_kwargs)
FileNotFoundError: [Errno 2] No such file or directory: 'GoogleNews-vectors-negative300.bin'
The 'current working directory' that the Python process will consider active, and thus will use as the expected location for your plain relative filename GoogleNews-vectors-negative300.bin, will depend on how you launched Flask.
You could print out the directory to be sure – see some ways at How do you properly determine the current script directory? – but I suspect it may just be the /Users/Ile-Maurice/Desktop/Flask/flaskapp/ directory.
If so, you could relatively-reference your file with the path relative to the above directory...
serv/GoogleNews-vectors-negative300.bin
...or you could use a full 'absolute' path...
/Users/Ile-Maurice/Desktop/Flask/flaskapp/serv/GoogleNews-vectors-negative300.bin
...or you could move the file up to its parent directory, so that it is alonside your Flask run.py.

How to solve toml.decoder.TomlDecodeError: Key group not on a line by itself. (line 1 column 1 ...) error when calling streamlit package?

I just installed the streamlit package. When I try to run 'streamlit hello' I get the following error:
(base) C:\>streamlit hello Traceback (most recent call last):
File "c:\users\s158539\appdata\local\continuum\anaconda3\lib\runpy.py",
line 193, in _run_module_as_main "__main__", mod_spec)
File "c:\users\s158539\appdata\local\continuum\anaconda3\lib\runpy.py",
line 85, in _run_code exec(code, run_globals)
File "C:\Users\s158539\AppData\Local\Continuum\anaconda3\Scripts\streamlit.exe\__main__.py",
line 5, in <module>
File "c:\users\s158539\appdata\local\continuum\anaconda3\lib\site-packages\streamlit\__init__.py",
line 121, in <module> from streamlit.DeltaGenerator import DeltaGenerator as _DeltaGenerator
File "c:\users\s158539\appdata\local\continuum\anaconda3\lib\site-packages\streamlit\DeltaGenerator.py",
line 33, in <module> from streamlit import caching
File "c:\users\s158539\appdata\local\continuum\anaconda3\lib\site-packages\streamlit\caching.py",
line 38, in <module> from streamlit.hashing import CodeHasher
File "c:\users\s158539\appdata\local\continuum\anaconda3\lib\site-packages\streamlit\hashing.py",
line 36, in <module> from streamlit.folder_black_list import FolderBlackList
File "c:\users\s158539\appdata\local\continuum\anaconda3\lib\site-packages\streamlit\folder_black_list.py",
line 39, in <module> if config.get_option("global.developmentMode"):
File "c:\users\s158539\appdata\local\continuum\anaconda3\lib\site-packages\streamlit\config.py",
line 94, in get_option parse_config_file()
File "c:\users\s158539\appdata\local\continuum\anaconda3\lib\site-packages\streamlit\config.py",
line 877, in parse_config_file _update_config_with_toml(file_contents, filename)
File "c:\users\s158539\appdata\local\continuum\anaconda3\lib\site-packages\streamlit\config.py",
line 799, in _update_config_with_toml parsed_config_file = toml.loads(raw_toml)
File "c:\users\s158539\appdata\local\continuum\anaconda3\lib\site-packages\toml\decoder.py",
line 379, in loads original, pos) toml.decoder.TomlDecodeError: Key group not on a line by itself. (line 1 column 1 char
Does anyone know how to solve this error?
Thank you in advance!
Just delete the config.toml file which can be found in the directory where you have installed streamlit.
I also got the same error while I tried to run the 'streamlit' command.
So, what I did is track in the code entirely to see from where I can find this 'config.toml' and simply deleted the file.
The path for 'config.toml' file in Windows is: C:users/{username}/.streamlit/config.toml
Delete this file and it will solve the error.
If you are not able to locate your .streamlit directory
Run streamlit cache clear
Output :
Nothing to clear at {Username}\{path}\.streamlit\cache.
You will get output similar to this which will tell you where is your .streamlit directory exactly.
Take this path before the cache part
cd {Username}\{path}\.streamlit
You'll be able to see config.toml here just delete that file.
The Streamlit forum has this discussion topic: https://discuss.streamlit.io/t/toml-docoder-error/1400/10 that discusses this. Hope this helps!
delete the content of config.toml file which is
in C:\Users\username.streamlit
Do you have a setup.sh file? What's the content inside it?
maybe you will just have to put the content of setup.sh everything in 1 line like this
[server]\nheadless = true\nenableCORS=false\nport = \n
my problem was similar, not exactly like this. so I hope it works!

Setting up caffe on Ubuntu 14.04 but facing errors when running classify.py

I'm installing Caffe on an Ubuntu 14.04 virtual server with CUDA installed (without driver) using https://github.com/BVLC/caffe/wiki/Ubuntu-14.04-VirtualBox-VM as inspiration. I've installed all the necessary dependencies and have followed all the instructions step by step but get the error below when I try to test the installation.
In regards to the step:
" Modify python/classify.py to add the --print_results option"
I had amended the code via in classify.py to be identical to the official caffe distribution. Not sure if that is the step that is causing the problem but thought I'd add that extra piece of information just in case.
The error i get is as below:
vagrant#vagrant-ubuntu-trusty-64:~/caffe$ sudo python python/classify.py --print_results examples/images/cat.jpg foo
libdc1394 error: Failed to initialize libdc1394
Traceback (most recent call last):
File "python/classify.py", line 14, in <module>
import caffe
File "/home/vagrant/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/vagrant/caffe/python/caffe/pycaffe.py", line 15, in <module>
import caffe.io
File "/home/vagrant/caffe/python/caffe/io.py", line 2, in <module>
import skimage.io
File "/usr/local/lib/python2.7/dist-packages/skimage/io/__init__.py", line 15, in <module>
reset_plugins()
File "/usr/local/lib/python2.7/dist-packages/skimage/io/manage_plugins.py", line 89, in reset_plugins
_load_preferred_plugins()
File "/usr/local/lib/python2.7/dist-packages/skimage/io/manage_plugins.py", line 69, in _load_preferred_plugins
_set_plugin(p_type, preferred_plugins['all'])
File "/usr/local/lib/python2.7/dist-packages/skimage/io/manage_plugins.py", line 81, in _set_plugin
use_plugin(plugin, kind=plugin_type)
File "/usr/local/lib/python2.7/dist-packages/skimage/io/manage_plugins.py", line 251, in use_plugin
_load(name)
File "/usr/local/lib/python2.7/dist-packages/skimage/io/manage_plugins.py", line 295, in _load
fromlist=[modname])
File "/usr/local/lib/python2.7/dist-packages/skimage/io/_plugins/matplotlib_plugin.py", line 4, in <module>
import matplotlib.pyplot as plt
File "/usr/local/lib/python2.7/dist-packages/matplotlib/__init__.py", line 1131, in <module>
rcParams = rc_params()
File "/usr/local/lib/python2.7/dist-packages/matplotlib/__init__.py", line 975, in rc_params
return rc_params_from_file(fname, fail_on_error)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/__init__.py", line 1100, in rc_params_from_file
config_from_file = _rc_params_in_file(fname, fail_on_error)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/__init__.py", line 1018, in _rc_params_in_file
with _open_file_or_url(fname) as fd:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/matplotlib/__init__.py", line 1000, in _open_file_or_url
encoding = locale.getdefaultlocale()[1]
File "/usr/lib/python2.7/locale.py", line 543, in getdefaultlocale
return _parse_localename(localename)
File "/usr/lib/python2.7/locale.py", line 475, in _parse_localename
raise ValueError, 'unknown locale: %s' % localename
ValueError: unknown locale: UTF-8
Any input would be greatly appreciated. Thank you.
Check your current shell environment, these variables: LC_ALL, LC_CTYPE, LANG, LANGUAGE:
set | grep -a '^LC_ALL=.'; set | grep -a '^LC_CTYPE=.';
set | grep -a '^LANG=.'; set | grep -a '^LANGUAGE=.'
Likely the first printed line will contain one of these variables equal to 'UTF-8'. This is wrong.
A proper locale should have language and optional country and encoding specification, like 'en' or 'en_US' or 'ru_RU.UTF-8'. There's a special locale 'C' used as fallback.
So you may redefine your locale permanently in /etc/default/locale (don't forget to reload your settings, the easiest way is to logoff/logon), or simply override it for a particular command:
> LC_ALL=C python python/classify.py --print_results examples/images/cat.jpg foo
Regarding libdc1394: if you don't need interaction with FireWire check ctypes error: libdc1394 error: Failed to initialize libdc1394

GAE, sqlite3.OperationalError: unable to open database file

Ok, I read a lot. There is many people with the same issue, but all answers was not helpful for me.
I'm trying to do this - http://googlecloudplatform.github.io/appengine-php-wordpress-starter-project/ , but each time I running the app, I have the same message:
> 2014-09-22 10:12:10 Running command: "['C:\\Python27\\pythonw.exe', 'C:\\Program Files (x86)\\Google\\google_appengine\\dev_appserver.py', '--skip_sdk_update_check=yes', '--port=8080', '--admin_port=8090', 'C:\\gae\\wp39']"
INFO 2014-09-22 10:12:12,089 devappserver2.py:725] Skipping SDK update check.
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\dev_appserver.py", line 82, in <module>
_run_file(__file__, globals())
File "C:\Program Files (x86)\Google\google_appengine\dev_appserver.py", line 78, in _run_file
execfile(_PATHS.script_file(script_name), globals_)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 970, in <module>
main()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 963, in main
dev_server.start(options)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 768, in start
request_data, storage_path, options, configuration)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 867, in _create_api_server
default_gcs_bucket_name=options.default_gcs_bucket_name)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\api_server.py", line 364, in setup_stubs
auto_id_policy=datastore_auto_id_policy)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\datastore\datastore_sqlite_stub.py", line 604, in __init__
factory=sql_conn)
sqlite3.OperationalError: unable to open database file
2014-09-22 10:12:12 (Process exited with code 1)
Windows 8, Python27, GAE 1.9.11
I checked all permisions and started GAE as administrator.
I tried Compatibility Mode (XP&Me, Win7) - nothing.
I tried set TMP variable in app.yaml
I tried to find "datastore.db" on all C: drive - found nothing.
I tried start App from CMD (as Administrator), like this:
C:\gae\wp39>dev_appserver.py C:\gae\wp39
C:\gae\wp39>dev_appserver.py --datastore_path C:\temp\data.db C:\gae\wp39
C:\gae\wp39>dev_appserver.py --clear_datastore=yes --datastore_path C:\temp\data.db C:\gae\wp39
The same result.
When I try run app from console with attribute "--datastore_path C:\temp\data.db' , the system creates that file (about 9KB) , but still can't open database.
The folder "C:\Users\\AppData\Local\Temp\appengine.levalult" exists, but empty. I don't know what else to do.
Thanks. I will be grateful for any advice.
Solve:
Change username to none Unicode or
Change the tmp and temp environment variable value to e:\ or
In the cmd prompt, do:
Change env var value
Set temp=e:\
Set tmp=e:\
2: run gae
D:\Program Files (x86)\Google\google_appengine\launcher\GoogleAppEngineLauncher.exe
Reason:
In datastore_sqllite_stub.py,
In def __init__
Before the self.__connection = sqlite3.connect:
Add the following code:
f = open( 'e:/tmp/a.log', 'w' )
f.write( self.__datastore_file )
f.write( '\n' )
for name in os.environ.keys():
f.write( '\n' )
v = os.environ[name]
f.write( name )
f.write( ' ' )
f.write( v )
f.close()
self.__datastore_file = 'e:/tmp/datastore.db'
According to the code, the database file is located in:
c:\users\%username%\appdata\local\temp\appengine.xgogo\datastore.db
which equal to:
%TEMP%\ appengine.xgogo\datastore.db
Where the %Temp% is the environment variable.
When username have Unicode characters, make failed.

django and mod_wsgi having database connection issues

I've noticed that whenever I enable the database settings on my django project (starting to notice a trend in my questions?) it gives me an internal server error. Setting the database settings to be blank makes the error go away. Here are the apache error logs that it outputs.
mod_wsgi (pid=770): Exception occurred processing WSGI script '/Users/teifionjordan/rob2/apache/django.wsgi'.
Traceback (most recent call last):
File "/Library/Python/2.5/site-packages/django/core/handlers/wsgi.py", line 239, in __call__
response = self.get_response(request)
File "/Library/Python/2.5/site-packages/django/core/handlers/base.py", line 67, in get_response
response = middleware_method(request)
File "/Library/Python/2.5/site-packages/django/contrib/sessions/middleware.py", line 9, in process_request
engine = __import__(settings.SESSION_ENGINE, {}, {}, [''])
File "/Library/Python/2.5/site-packages/django/contrib/sessions/backends/db.py", line 2, in <module>
from django.contrib.sessions.models import Session
File "/Library/Python/2.5/site-packages/django/contrib/sessions/models.py", line 4, in <module>
from django.db import models
File "/Library/Python/2.5/site-packages/django/db/__init__.py", line 16, in <module>
backend = __import__('%s%s.base' % (_import_path, settings.DATABASE_ENGINE), {}, {}, [''])
File "/Library/Python/2.5/site-packages/django/db/backends/mysql/base.py", line 10, in <module>
import MySQLdb as Database
File "build/bdist.macosx-10.5-i386/egg/MySQLdb/__init__.py", line 19, in <module>
File "build/bdist.macosx-10.5-i386/egg/_mysql.py", line 7, in <module>
File "build/bdist.macosx-10.5-i386/egg/_mysql.py", line 4, in __bootstrap__
File "/Library/Python/2.5/site-packages/setuptools-0.6c9-py2.5.egg/pkg_resources.py", line 841, in resource_filename
self, resource_name
File "/Library/Python/2.5/site-packages/setuptools-0.6c9-py2.5.egg/pkg_resources.py", line 1310, in get_resource_filename
self._extract_resource(manager, self._eager_to_zip(name))
File "/Library/Python/2.5/site-packages/setuptools-0.6c9-py2.5.egg/pkg_resources.py", line 1332, in _extract_resource
self.egg_name, self._parts(zip_path)
File "/Library/Python/2.5/site-packages/setuptools-0.6c9-py2.5.egg/pkg_resources.py", line 921, in get_cache_path
self.extraction_error()
File "/Library/Python/2.5/site-packages/setuptools-0.6c9-py2.5.egg/pkg_resources.py", line 887, in extraction_error
raise err
ExtractionError: Can't extract file(s) to egg cache
The following error occurred while trying to extract file(s) to the Python egg
cache:
[Errno 20] Not a directory: '/Library/WebServer/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.5-i386.egg-tmp'
The Python egg cache directory is currently set to:
/Library/WebServer/.python-eggs
Perhaps your account does not have write access to this directory? You can
change the cache directory by setting the PYTHON_EGG_CACHE environment
variable to point to an accessible directory.
And here is the django.wsgi file
import os
import sys
os.environ['DJANGO_SETTINGS_MODULE'] = 'rob2.settings'
sys.path.append('/Users/teifionjordan')
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
I have several other scripts that all connect to a mysql database just fine, if I run the tutorial server then it displays the admin panel correctly. I have tried changing the environ variables for eggs but still nothing changes.
You need to set the PYTHON_EGG_CACHE environment variable. Apache/mod_wsgi is trying to extract the egg into a directory that Apache doesn't have write access to....or that doesn't exist.
It's explained in the Django docs here.
Does /Library/WebServer/.python-eggs exist? What does your Apache config file look like?

Categories

Resources