Ansible, Juniper CLI commands. Timeout Error? - python

I am trying to transfer an automation script I made in Python, to ansible (company request), and I have NEVER worked with ansible before. I have tried the "wait_for:", but I have not gotten that to work either. In the script, I could set dev.timeout=None or whatever I needed. I am finding it hard to figure out where I can do this in ansible. I have tried setting the timeout in the "ansible.cfg" file. But that doesnt work. I can do simple commands, like:
cli="show version", or
cli="show system firmware".
The following is my playbook:
- hosts: local
roles:
- Juniper.junos
connection: local
gather_facts: no
tasks:
- junos_cli:
host={{ inventory_hostname }}
user=root
passwd=Hardware1
cli="request system snapshot slice alternate"
dest="{{ inventory_hostname }}.txt"
After I run that, around 120 seconds later, I get the following error:
fatal: [192.168.2.254]: FAILED! => {"changed": false, "failed": true, "module_stderr": "/usr/local/lib/python2.7/dist-packages/jnpr/junos/device.py:652: RuntimeWarning: CLI command is for debug use only!\n warnings.warn(\"CLI command is for debug use only!\", RuntimeWarning)\nTraceback (most recent call last):\n File \"/home/pkb/.ansible/tmp/ansible-tmp-1457428640.58-63826456311723/junos_cli\", line 2140, in <module>\n main()\n File \"/home/pkb/.ansible/tmp/ansible-tmp-1457428640.58-63826456311723/junos_cli\", line 177, in main\n dev.close()\n File \"/usr/local/lib/python2.7/dist-packages/jnpr/junos/device.py\", line 504, in close\n self._conn.close_session()\n File \"/usr/local/lib/python2.7/dist-packages/ncclient/manager.py\", line 158, in wrapper\n return self.execute(op_cls, *args, **kwds)\n File \"/usr/local/lib/python2.7/dist-packages/ncclient/manager.py\", line 228, in execute\n raise_mode=self._raise_mode).request(*args, **kwds)\n File \"/usr/local/lib/python2.7/dist-packages/ncclient/operations/session.py\", line 28, in request\n return self._request(new_ele(\"close-session\"))\n File \"/usr/local/lib/python2.7/dist-packages/ncclient/operations/rpc.py\", line 342, in _request\n raise TimeoutExpiredError('ncclient timed out while waiting for an rpc reply.')\nncclient.operations.errors.TimeoutExpiredError: ncclient timed out while waiting for an rpc reply.\n", "module_stdout": "", "msg": "MODULE FAILURE", "parsed": false}
I think it is the timeout, I could be wrong. But it is doing my head in that such a simple task is eluding me.

Okay, I managed to fix the problem.
The module for Ansible that does the CLI (junos_cli) doesn't support timeout. I therefor went in to:
/etc/ansible/roles/Juniper.junos/library/junos_cli
And below line:
dev = Device(args['host'], user=args['user'], password=args['passwd'],
port=args['port'], gather_facts=False).open()
I added:
dev.timeout=None
This sets the timer to infinity, so I have time for the formatting when doing "request system snapshot slice alternate".
Hope this helps anyone else doing something with the junos cli through ansible automation.

It's suggested to increase the timeout value to some decent number (say 300 sec) which we think should be good for the call, rather than making it infinite.

Related

Stackstorm 2.7 webhook "AccessRefused: 403" error

I have successfully created a webhook in stackstorm and it is visible in the webhook lists.
[centos#ip- ~]$ sudo st2 webhook list
+------------+------------------+-------------+
| url | type | description |
+------------+------------------+-------------+
| wfcreation | core.st2.webhook | |
+------------+------------------+-------------+
[centos#ip- ~]$
I triggered the webhook giving a payload and setting proper headers using stackstorm api key. The webhook gets triggered and returns with a status code of 200. But the underlying stackstorm workflow fails giving the below error.
{
"traceback": " File \"/opt/stackstorm/st2/lib/python2.7/site-packages/st2actions/container/base.py\", line 119, in _do_run
(status, result, context) = runner.run(action_params)
File \"/opt/stackstorm/st2/lib/python2.7/site-packages/retrying.py\", line 49, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File \"/opt/stackstorm/st2/lib/python2.7/site-packages/retrying.py\", line 206, in call
return attempt.get(self._wrap_exception)
File \"/opt/stackstorm/st2/lib/python2.7/site-packages/retrying.py\", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File \"/opt/stackstorm/st2/lib/python2.7/site-packages/retrying.py\", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File \"/opt/stackstorm/runners/mistral_v2/mistral_v2/mistral_v2.py\", line 247, in run
result = self.start_workflow(action_parameters=action_parameters)
File \"/opt/stackstorm/runners/mistral_v2/mistral_v2/mistral_v2.py\", line 284, in start_workflow
**options)
File \"/opt/stackstorm/st2/lib/python2.7/site-packages/mistralclient/api/v2/executions.py\", line 65, in create
return self._create('/executions', data)
File \"/opt/stackstorm/st2/lib/python2.7/site-packages/mistralclient/api/base.py\", line 100, in _create
self._raise_api_exception(resp)
File \"/opt/stackstorm/st2/lib/python2.7/site-packages/mistralclient/api/base.py\", line 160, in _raise_api_exception
error_message=error_data)
",
"error": "AccessRefused: 403"
}
The official stakstorm documentation does not have any reference of troubleshooting this error.
Any help would be greatly appreciated as I am blocked on this right now.
Finally I was able to figure out the problem was with the mistral-server service running on the stackstorm host.
The problem was that the mistral-server service was not able to connect to rabbitmq service due to a misconfiguration during stackstorm installation.But the misleading error message "AccessRefused: 403" did not pin point towards an issue with rabbitmq connection issue.
So here is what I found in the logs.
The error message in the mistral logs (/var/log/mistral/):
2018-06-25 15:30:19.309 10767 ERROR oslo_service.service AccessRefused: (0,0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile.
On digging the rabbitmq logs (/var/log/rabbitmq/):
=ERROR REPORT==== 25-Jun-2018::16:34:23 ===
closing AMQP connection <0.5118.0> (127.0.0.1:41248 -> 127.0.0.1:5672):
{handshake_error,starting,0,
{amqp_error,access_refused,
"AMQPLAIN login refused: user 'st2' - invalid credentials",
'connection.start_ok'}}
Clearly the st2 user credentials were wrongly configured during installation and that led to this whole issue.
Hope this helps somebody in future.
Another thing that could cause this is not having NGinx configured, and in that case the system wants you to use local ports. Here's an example that might help you: https://github.com/StackStorm/st2/tree/master/conf/st2.conf.sample and here's the nginx config.

ssh-copy-id hanging sometimes

This is a "follow up" to my question from last week. Basically I am seeing that some python code that ssh-copy-id using pexpect occasionally hangs.
I thought it might be a pexect problem, but I am not so sure any more after I was able to collect a "stacktrace" from such a hang.
Here you can see some traces from my application; followed by the stack trace after running into the timeout:
2016-07-01 13:23:32 DEBUG copy command: ssh-copy-id -i /yyy/.ssh/id_rsa.pub someuser#some.ip
2016-07-01 13:23:33 DEBUG first expect: 1
2016-07-01 13:23:33 DEBUG sending PASSW0RD
2016-07-01 13:23:33 DEBUG consuming output from remote side ...
2016-07-01 13:24:03 INFO Timeout occured ... stack trace information ...
2016-07-01 13:24:03 INFO Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/pexpect-3.3-py3.5.egg/pexpect/__init__.py", line 1535, in expect_loop
c = self.read_nonblocking(self.maxread, timeout)
File "/usr/local/lib/python3.5/site-packages/pexpect-3.3-py3.5.egg/pexpect/__init__.py", line 968, in read_nonblocking
raise TIMEOUT('Timeout exceeded.')
pexpect.TIMEOUT: Timeout exceeded.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "xxx/PrepareSsh.py", line 28, in execute
self.copy_keys(context, user, timeout)
File "xxx/PrepareSsh.py", line 83, in copy_keys
child.expect('[#\$]')
File "/usr/local/lib/python3.5/site-packages/pexpect-3.3-py3.5.egg/pexpect/__init__.py", line 1451, in expect
timeout, searchwindowsize)
File "/usr/local/lib/python3.5/site-packages/pexpect-3.3-py3.5.egg/pexpect/__init__.py", line 1466, in expect_list
timeout, searchwindowsize)
File "/usr/local/lib/python3.5/site-packages/pexpect-3.3-py3.5.egg/pexpect/__init__.py", line 1568, in expect_loop
raise TIMEOUT(str(err) + '\n' + str(self))
pexpect.TIMEOUT: Timeout exceeded.
<pexpect.spawn object at 0x2b74694995c0>
version: 3.3
command: /usr/bin/ssh-copy-id
args: ['/usr/bin/ssh-copy-id', '-i', '/yyy/.ssh/id_rsa.pub', 'someuser#some.ip']
searcher: <pexpect.searcher_re object at 0x2b746ae1c748>
buffer (last 100 chars): b'\r\n/usr/bin/xauth: creating new authorityy file /home/hmcmanager/.Xauthority\r\n'
before (last 100 chars): b'\r\n/usr/bin/xauth: creating new authority file /home/hmcmanager/.Xauthority\r\n'
after: <class 'pexpect.TIMEOUT'>
So, what I find kinda strange: xauth showing up in the messages that pexpect received.
You see, today, I created another VM for testing; and did all the setup manually. This is what I see when doing so:
> ssh-copy-id -i ~/.ssh/id_rsa.pub someuser#some.ip
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/xxx/.ssh/id_rsa.pub"
The authenticity of host 'some.ip (some.ip)' can't be established.
ECDSA key fingerprint is SHA256:7...
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
someuser#some.ip's password:
Number of key(s) added: 1
Now try logging into the machine, with: ....
So, lets recap:
when I run ssh-copy-id manually ... everything works; and the string "xauth" doesn't show up in the output coming back
when I run ssh-copy-id programmatically, it works most of the time; but sometimes there are timeouts ... and that message about xauth is send to my client
This is driving me crazy. Any ideas are welcome.
xauth reference smells like you are requesting X11 Forwarding. It will be configured in your ~/.ssh/config. That might be the difference in your configuration that can cause the hangs.

ImageKit async error - can't decode message body

I'm using Django 1.6 and Django-ImageKit 3.2.1.
I'm trying to generate images asynchronously with ImageKit. Async image generation works locally but not on the production server.
I'm using Celery and I've tried both:
IMAGEKIT_DEFAULT_CACHEFILE_BACKEND = 'imagekit.cachefiles.backends.Async'
IMAGEKIT_DEFAULT_CACHEFILE_BACKEND = 'imagekit.cachefiles.backends.Celery'
Using the Simple backend (synchronous) instead of Async or Celery works fine on the production server. So I don't understand why the asynchronous backend gives me the following ImportError (pulled from the Celery log):
[2014-04-05 21:51:26,325: CRITICAL/MainProcess] Can't decode message body: DecodeError(ImportError('No module named s3utils',),) [type:u'application/x-python-serialize' encoding:u'binary' headers:{}]
body: '\x80\x02}q\x01(U\x07expiresq\x02NU\x03utcq\x03\x88U\x04argsq\x04cimagekit.cachefiles.backends\nCelery\nq\x05)\x81q\x06}bcimagekit.cachefiles\nImageCacheFile\nq\x07)\x81q\x08}q\t(U\x11cachefile_backendq\nh\x06U\x12ca$
Traceback (most recent call last):
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/messaging.py", line 585, in _receive_callback
decoded = None if on_m else message.decode()
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/message.py", line 142, in decode
self.content_encoding, accept=self.accept)
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/serialization.py", line 184, in loads
return decode(data)
File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__
self.gen.throw(type, value, traceback)
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/serialization.py", line 59, in _reraise_errors
reraise(wrapper, wrapper(exc), sys.exc_info()[2])
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/serialization.py", line 55, in _reraise_errors
yield
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/serialization.py", line 184, in loads
return decode(data)
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/serialization.py", line 64, in pickle_loads
return load(BytesIO(s))
DecodeError: No module named s3utils
s3utils is what defines my AWS S3 bucket paths. I'll post it if need be, but the strange thing I think is that the synchronous backend has no problem importing s3utils while the asynchronous does... and asynchronous does ONLY on the production server, not locally.
I'd be SO greatful for any help debugging this. I've been wrestling this for days. I'm still learning Django and python so I'm hoping this is a stupid mistake on my part. My Google-fu has failed me.
As I hinted at in my comment above, this kind of thing is usually caused by forgetting to restart the worker.
It's a common gotcha with Celery. The workers are a separate process from your web server so they have their own versions of your code loaded. And just like with your web server, if you make a change to your code, you need to reload so it sees the change. The web server talks to your worker not by directly running code, but by passing serialized messages via the broker, which will say something like "call the function do_something()". Then the worker will read that message and—and here's the tricky part—call its version of do_something(). So even if you restart your webserver (so that it has a new version of your code), if you forget to reload the worker (which is what actually calls the function), the old version of the function will be called. In other words, you need to restart the worker any time you make a change to your tasks.
You might want to check out the autoreload option for development. It could save you some headaches.

python redis client fails to get existing hash values using .hgetall(key)

I'm encountering a circumstance where a demonstrably known hash in db2 of our redis cache dies when being requested with .hgetall(key). I'm hoping for some insight! Thank you.
Right, so... first, a sliver of code:
def from_cache(self, cachekey):
""" pull oft needed material from our persistent redis memory cache, ensuring of course that we have a connection """
try:
log.debug('trying to get \'%s\' from cache' % cachekey)
return self.redis.hgetall(cachekey)
except Exception, e:
self.connect_to_cache()
return self.redis.get(cachekey)
resulting in:
2013-05-21 14:45:26,035 23202 DEBUG trying to get 'fax:1112223333' from cache
2013-05-21 14:45:26,036 23202 DEBUG initializing connection to redis/cache memory localhost, port 6379, db 2...
2013-05-21 14:45:26,039 23202 ERROR stopping with an exception
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/simpledaemon/base.py", line 165, in start
self.run()
File "newgov.py", line 51, in run
if self.ready_for_queue(fax):
File "newgov.py", line 61, in ready_for_queue
if self.too_many_already_queued(fax):
File "newgov.py", line 116, in too_many_already_queued
rules = self.from_cache(key)
File "newgov.py", line 142, in from_cache
return self.redis.get(cachekey)
File "/usr/lib/python2.6/site-packages/redis/client.py", line 588, in get
return self.execute_command('GET', name)
File "/usr/lib/python2.6/site-packages/redis/client.py", line 378, in execute_command
return self.parse_response(connection, command_name, **options)
File "/usr/lib/python2.6/site-packages/redis/client.py", line 388, in parse_response
response = connection.read_response()
File "/usr/lib/python2.6/site-packages/redis/connection.py", line 309, in read_response
raise response
ResponseError: Operation against a key holding the wrong kind of value
And here is what is in redis:
$ redis-cli
redis 127.0.0.1:6379> SELECT 2
OK
redis 127.0.0.1:6379[2]> type fax:1112223333
hash
redis 127.0.0.1:6379[2]> hgetall fax:1112223333
1) "delay"
2) "0"
3) "concurrent"
4) "20"
5) "queued"
6) "20"
7) "exclude"
8) ""
Look at your Python stack trace: it fails on "return self.execute_command('GET', name)". It means that:
the hgetall command failed (probably because the connection was not established before)
an exception was raised and caught in your method
the Redis connection is established (I suppose by calling connect_to_cache())
then you try to run "self.redis.get(cachekey)"
it fails of course because the content of cachekey is the key of a hash (not a string)
(here I imagine you should use hgetall instead)
an other exception is raised - the Redis error is a type error (Operation against a key holding the wrong kind of value)
With redis-cli, try to run "GET fax:1112223333", you will have the same error.
There are different types of data when you set and hset. In hset your value is hash map. And when you try get hash map you have same Error. Just try hgetall and redis will return hashmap, or try hget with cachekey and any key of setted hashmap and enjoy

web2py error : Connection reset by peer

I've been googling for days trying to find a straight answer for why this is happening, but can't find anything useful. I have a web2py application that simply reads a database and makes some requests to a REST api. It is a healthcheck monitor so it refreshes itself every minute. There are about 20 or so users at any given time. Here is the error I'm seeing very consistently in the log file:
ERROR:Rocket.Errors.Port8080:Traceback (most recent call last):
File "/opt/apps/web2py/gluon/rocket.py", line 562, in listen
sock = self.wrap_socket(sock)
File "/opt/apps/web2py/gluon/rocket.py", line 506, in wrap_socket
ssl_version = ssl.PROTOCOL_SSLv23)
File "/usr/local/lib/python2.7/ssl.py", line 342, in wrap_socket
ciphers=ciphers)
File "/usr/local/lib/python2.7/ssl.py", line 121, in __init__
self.do_handshake()
File "/usr/local/lib/python2.7/ssl.py", line 281, in do_handshake
self._sslobj.do_handshake()
error: [Errno 104] Connection reset by peer
Based on some googling the most promising piece of information is that someone is trying to connect through a firewall and so it is killing the connection, however I don't understand why it's taking the actual application down. The process is still running, but no one can connect and I have to restart web2py.
I will be very appreciative of any input here. I'm beyond frustration.
Thanks!
The most common source of Connection reset by peer errors is that the remote client decides he doesn't want to contact you anymore, and cancels the interaction (with shutdown/an RST packet). This happens if the user navigates to a different page while the site is loading.
In your case, the remote host gave up on the connection even before you got to read or write anything on it. With the current web2py, this should only output the warning you're seeing, and not terminate anything.
If you have the current web2py, the error of not being able to connect is unrelated to these error messages. If you have an old version of web2py, you should update.

Categories

Resources