I've been using Fabric to run setup commands on an EC2 instance during some automated unittests. These tests ran fine for months, and then a few days ago, suddenly started failing with the error:
Traceback (most recent call last):
File "tests.py", line 207, in test_setup
run('./setup_server')
File "/usr/local/myproject/src/buildbot/worker3/myproject_runtests/build/.env/local/lib/python2.7/site-packages/fabric/network.py", line 682, in host_prompting_wrapper
return func(*args, **kwargs)
File "/usr/local/myproject/src/buildbot/worker3/myproject_runtests/build/.env/local/lib/python2.7/site-packages/fabric/operations.py", line 1091, in run
shell_escape=shell_escape, capture_buffer_size=capture_buffer_size,
File "/usr/local/myproject/src/buildbot/worker3/myproject_runtests/build/.env/local/lib/python2.7/site-packages/fabric/operations.py", line 934, in _run_command
capture_buffer_size=capture_buffer_size)
File "/usr/local/myproject/src/buildbot/worker3/myproject_runtests/build/.env/local/lib/python2.7/site-packages/fabric/operations.py", line 816, in _execute
worker.raise_if_needed()
File "/usr/local/myproject/src/buildbot/worker3/myproject_runtests/build/.env/local/lib/python2.7/site-packages/fabric/thread_handling.py", line 26, in raise_if_needed
six.reraise(e[0], e[1], e[2])
File "/usr/local/myproject/src/buildbot/worker3/myproject_runtests/build/.env/local/lib/python2.7/site-packages/fabric/thread_handling.py", line 13, in wrapper
callable(*args, **kwargs)
File "/usr/local/myproject/src/buildbot/worker3/myproject_runtests/build/.env/local/lib/python2.7/site-packages/fabric/io.py", line 31, in output_loop
OutputLooper(*args, **kwargs).loop()
File "/usr/local/myproject/src/buildbot/worker3/myproject_runtests/build/.env/local/lib/python2.7/site-packages/fabric/io.py", line 152, in loop
self._flush(end_of_line + "\n")
File "/usr/local/myproject/src/buildbot/worker3/myproject_runtests/build/.env/local/lib/python2.7/site-packages/fabric/io.py", line 57, in _flush
self.stream.flush()
IOError: [Errno 32] Broken pipe
The command I'm running is a pip install -r requirements.txt, and it appears to run just fine. If I run the test locally, it completes without error. However, it now fails half-way through when run on AWS every time.
What would cause this? Since it's an IOError, and these can be caused by virtually any kind of minor network interruption, I'm not sure how to diagnose it. If Fabric lost connection temporarily, that would explain it, but it wouldn't explain why it's repeatable. I've re-run this script several dozen times, and it fails each time after initially connecting to the EC2 instance perfectly.
Is there some sort of configuration in Fabric I can change to improve connection error handling?
Related
I am trying to build a slam algorithm using python scripts and ROS. I have a top-level workspace which has two packages, one is the (rplidar_ros) package from GitHub and the other is the (slam) package I built using catkin_create_pkg which has an src directory that has two python files icp.py and mapping.py, a msg directory with a custom topic defined in Custom.msg and a launch directory with icp.launch.My problem is that when I use roslaunch to launch all three nodes, the rplidar_ros and icp_node launch fine and keep alive. map_node though dies immediately as soon as its started and then respawns and then dies again and continues this cycle.
When each node is run independently not using roslaunch, if the icp_node is run before map_node, map_node produces the same cycle of shutting and starting. But if map_node is started before icp_node it gives this error -
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 154, in run
(client_sock, client_addr) = self.server_sock.accept()
File "/usr/lib/python2.7/socket.py", line 206, in accept
sock, addr = self._sock.accept()
File "/usr/lib/python2.7/socket.py", line 174, in _dummy
raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor
Any idea what could be causing this error?
This might seem like an already asked question but I have searched for an answer for a week now and got nothing.
The problem is I have developed an API using Django which is hosted on a server. Now when I run the following command to initiate the server :
python manage.py runserver 0.0.0.0:9000
The server starts as usual. Its only when I send request to the server via "Postman" that I see the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'manage.py'
The strange thing is there is no error in running the server but only when I send a request to it. Also I have many more Django APIs running on the same server with same python version(Python 3.4.3) and same virtual environment (but different port) that are running just fine.
Full error traceback :
Traceback (most recent call last):
File "manage.py", line 15, in <module>
execute_from_command_line(sys.argv)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/__init__.py", line 371, in execute_from_command_line
utility.execute()
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/__init__.py", line 365, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/base.py", line 288, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/commands/runserver.py", line 61, in execute
super().execute(*args, **options)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/base.py", line 335, in execute
output = self.handle(*args, **options)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/commands/runserver.py", line 98, in handle
self.run(**options)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/core/management/commands/runserver.py", line 105, in run
autoreload.main(self.inner_run, None, options)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/utils/autoreload.py", line 317, in main
python_reloader(wrapped_main_func, args, kwargs)
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/utils/autoreload.py", line 296, in python_reloader
reloader_thread()
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/utils/autoreload.py", line 274, in reloader_thread
change = fn()
File "/home/ubuntu/py3env/lib/python3.4/site-packages/django/utils/autoreload.py", line 204, in code_changed
stat = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory: 'manage.py'
Things I have tried:
I have tried changing the !# as suggested on various posts.
I have tried using dos2unix to convert the file to unix format(server on which my API is hosted is linux based).
I even have tried to create a new Django project.
And yes I'm running manage.py from the correct directory.
I have also tried making manage.py executable by :
chmod +x manage.py
Nothing worked for me so far. Am I missing something?
I had code that worked fine. I shut down my PC (Ubuntu) and then, built the same software (ns-3) and now I get the error:
Traceback (most recent call last):
File "./waf", line 148, in <module>
Scripting.prepare(t, cwd, VERSION, wafdir)
File "/home/ns-allinone-3.6/ns-3.6/.waf-1.5.8-12763e767c863088b8579dbeeb8265b6/wafadmin/Scripting.py", line 102, in prepare
prepare_impl(t,cwd,ver,wafdir)
File "/home/ns-allinone-3.6/ns-3.6/.waf-1.5.8-12763e767c863088b8579dbeeb8265b6/wafadmin/Scripting.py", line 95, in prepare_impl
main()
File "/home/ns-allinone-3.6/ns-3.6/.waf-1.5.8-12763e767c863088b8579dbeeb8265b6/wafadmin/Scripting.py", line 130, in main
fun(ctx)
File "/home/ns-allinone-3.6/ns-3.6/.waf-1.5.8-12763e767c863088b8579dbeeb8265b6/wafadmin/Scripting.py", line 269, in build
bld=check_configured(bld)
File "/home/ns-allinone-3.6/ns-3.6/.waf-1.5.8-12763e767c863088b8579dbeeb8265b6/wafadmin/Scripting.py", line 219, in check_configured
bld.load_dirs(proj[SRCDIR],proj[BLDDIR])
File "/home/ns-allinone-3.6/ns-3.6/.waf-1.5.8-12763e767c863088b8579dbeeb8265b6/wafadmin/Build.py", line 245, in load_dirs
self.load()
File "/home/ns-allinone-3.6/ns-3.6/.waf-1.5.8-12763e767c863088b8579dbeeb8265b6/wafadmin/Build.py", line 78, in load
if f:data=cPickle.load(f)
EOFError
I am just amazed at this, how could 2 minutes before everything was fine and now I am screwed by this error.
What should I do, I am totally bewildered. I have a deadline and suddenly this code stops, it worked fine, only change I did was to switch off my PC.
NS-3.6 is a very very old ns-3 release which you are trying to build with a new system (gcc compiler, python etc). I think the only solution to solve the problem is to update the simulator to a new release.
Try to rebuild it. First write: rm -rf build and build from scratch
I ran Python program in Cygwin to connect to AWS, but failed consistently as being timed out. But my connection to AWS using aws cli in Cygwin always works. Also if I run the same python code in Windows, it also works. I have checked all the connection credentials which are the same for all.
Here is the error msg:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/boto-2.38.0-py2.7.egg/boto/ec2/connection.py", line 585, in get_all_instances
max_results=max_results)
File "/usr/lib/python2.7/site-packages/boto-2.38.0-py2.7.egg/boto/ec2/connection.py", line 681, in get_all_reservations
[('item', Reservation)], verb='POST')
File "/usr/lib/python2.7/site-packages/boto-2.38.0-py2.7.egg/boto/connection.py", line 1170, in get_list
response = self.make_request(action, params, path, verb)
File "/usr/lib/python2.7/site-packages/boto-2.38.0-py2.7.egg/boto/connection.py", line 1116, in make_request
return self._mexe(http_request)
File "/usr/lib/python2.7/site-packages/boto-2.38.0-py2.7.egg/boto/connection.py", line 1030, in _mexe
raise ex
socket.error: [Errno 116] Connection timed out
I have found out that the culprit lies in the proxy credentials.
I put HTTP_PROXY and HTTPS_PROXY as Windows Environment Variables. However, when run in Cygwin, boto uses 'http_proxy' to match without considering the case of the word
(see /boto/connection.py(669)handle_proxy()
line 669: if 'http_proxy' in os.environ and not self.proxy:).
When I changed the capital case HTTP_PROXY to lower case http_proxy, then it worked. Not sure why it isn't a problem if I run with Python in Windows.
Questions with similar issue:
Parallel Python - too many files and Python too many open files (subprocesses)
I am using Parallel Python [V1.6.2] to run tasks. The task processes an input file and outputs a log/report. Say, there are 10 folders each with 5000 ~ 20000 files which are read in parallel, processed and logs written out. Each file is approx 50KB ~ 250KB
After ~6 Hours of running, Parallel Python fails with the following error.
File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 342, in __init__
File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 506, in set_ncpus
File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 140, in __init__
File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 146, in start
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
File "/usr/lib/python2.7/subprocess.py", line 1135, in _execute_child
File "/usr/lib/python2.7/subprocess.py", line 1091, in pipe_cloexec
OSError: [Errno 24] Too many open files
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 66, in apport_excepthook
ImportError: No module named fileutils
Original exception was:
Traceback (most recent call last):
File "PARALLEL_TEST.py", line 746, in <module>
File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 342, in __init__
File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 506, in set_ncpus
File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 140, in __init__
File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 146, in start
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
File "/usr/lib/python2.7/subprocess.py", line 1135, in _execute_child
File "/usr/lib/python2.7/subprocess.py", line 1091, in pipe_cloexec
OSError: [Errno 24] Too many open files
While I understand, this could be the issue in subprocess pointed out here http://bugs.python.org/issue2320, but, seems the solution is only part of Py V3.2. I am currently tied to Py V2.7.
I would like to know if the following suggestion helps:
[1]http://www.parallelpython.com/component/option,com_smf/Itemid,1/topic,313.0
*) Adding worker.t.close() in destroy() method of /usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py
*) Increasing BROADCAST_INTERVAL in /usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/ppauto.py
I would like to know if there is a fix available/Work Around for this issue in Python V2.7.
Thanks in Advance
My team recently stumbled upon a similar issue with the same file handle resource exhaustion issue while running celeryd task queue jobs. I believe the OP has nailed it and it's most likely the messy code in suprocess.py lib in Python 2.7 and Python 3.1.
As suggested in , Python Bug#2320, please pass in close_fds=True everywhere you call subprocess.Popen(). In fact they make that a default in Python 3.2 while also fixing the underlying race condition issue. See more details in that ticket.
I had left at some lines to destroy the job servers. job_server.destroy() fixes the issue.