Raise Airflow Exception to Fail Task from CURL request

Raise Airflow Exception to Fail Task from CURL request - python

I am using airflow to schedule and automate Python scripts housed on a Ubuntu server. The DAG triggers a CURL request that hits a Flask API on the same machine which actually runs the script. Here is a high level overview of the flow:
Airflow --> Curl request --> Flask API --> Python Script
DAG Task:
t2 = BashOperator (
task_id='extract_pcty_data',
bash_command=f"""curl -d '{dataset}' -H 'Content-Type: application/json' -X POST {base_url}{endpoint}""",
)
Endpoint Registration:
api.add_resource(paylocity, "/api/v1/application/paylocity")
Resource Object:
class paylocity(Resource):
def __init__(self):
self.reqparse = reqparse.RequestParser()
def get(self):
return 200
def post(self):
try:
if request.json:
data = request.json
query = data['dataset']
pcty = PaylocityAPI()
pcty.auth()
pcty.get_employees()
pcty.get_paystatements()
pcty.load_dataset()
pcty.clean_up()
return 200
except Exception as e:
print(traceback.print_exc(e))
raise ValueError(e)
The issue I am running into, is that the script will fail for some reason which gets caught by the try/catch block and then raises the value error - but it does not cause the script to fail because the HTTP request response returned is 500 - Internal Server Error. What I am looking for is a simple and elegant way to interpret an HTTP response that is not 200 - OK as a "failure" and raising something like a ValueError or AirflowException to cause the task to fail. Any guidance or support would be greatly appreciated!

To those of you from Google looking for a simple and elegant answer to this or a similar question. Curl has a few flags that allow you to specify how you want the fail behavior of a request to act. For my specific scenario: --fail was the most appropriate. There is also --fail-with-body that allows you to get the content of the fail response rather than just the non-zero exit code. From their docs:
-f, --fail
(HTTP) Fail fast with no output at all on server errors. This is useful to enable scripts and users to better deal with failed attempts. In normal cases when an HTTP server fails to deliver a document, it returns an HTML document stating so (which often also describes why and more). This flag will prevent curl from outputting that and return error 22.
This method is not fail-safe and there are occasions where non-successful response codes will slip through, especially when authentication is involved (response codes 401 and 407).
Example:
curl --fail https://example.com

Related

requests.exceptions.HTTPError: 401 Client Error atlassian-python-api

I am trying to connect to a Confluence page using the python wrapper on the API (as I am not familiar with any of this) but I keep getting the following error:
requests.exceptions.HTTPError: 401 Client Error
I know that people talk about this being caused by the necessity of using an API token but the page runs on an old version of Confluence and I have been told that we cannot use access tokens.
So has anyone any other idea? Here's a small code:
from atlassian import Confluence
confluence = Confluence(
url='https://address',
username='name',
password='pwd'
)
confluence.create_page(
space='Test',
title='A title',
body='something')
I have tried to use an older version of atlassian-python-api just in case there was some conflict but it got me the same error.

Your code looks ok. Authenticating to Confluence using Basic Auth should work without generating an API token, afaik.
The 401 status definitely suggests a problem with the authentication though. The obvious reason for this would be of course wrong credentials, but I assume that you have double checked that the credentials work when interactively logging into confluence with a browser.
To get a better sense of the error, you can import logging to debug your requests and response:
from atlassian import Confluence
import logging
logging.basicConfig(filename='conf_connect.log', filemode='w', level=logging.DEBUG)
try:
c = Confluence(url='https://conf.yoursystem.com', username='name', password='pwd')
# atlassian API does not raise error on init if credentials are wrong, this only happens on the first request
c.get_user_details_by_username('name')
except Exception as e:
logging.error(e)
The Confluence module internally also uses logging, so the requests and responses will appear in your conf_connect.log logfile:
DEBUG:atlassian.rest_client:curl --silent -X GET -H 'Content-Type: application/json' -H 'Accept: application/json' 'https://conf.yoursystem.com/rest/api/user?username=name'
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): conf.yoursystem.com:443
DEBUG:urllib3.connectionpool:https://conf.yoursystem.com:443 "GET /rest/api/user?username=name HTTP/1.1" 401 751
DEBUG:atlassian.rest_client:HTTP: GET rest/api/user -> 401
DEBUG:atlassian.rest_client:HTTP: Response text -> <!doctype html><html lang="en"><head><title>HTTP Status 401 – Unauthorized</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 401 – Unauthorized</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> Basic Authentication Failure - Reason : AUTHENTICATED_FAILED</p><p><b>Description</b> The request has not been applied because it lacks valid authentication credentials for the target resource.</p><hr class="line" /><h3>Apache Tomcat/9.0.33</h3></body></html>
ERROR:root:401 Client Error: for url: https://conf.yoursystem.com/rest/api/user?username=name
The response body may include some information on the reason:
HTTP Status 401 – UnauthorizedType Status ReportMessage Basic Authentication Failure - Reason : AUTHENTICATED_FAILEDDescription The request has not been applied because it lacks valid authentication credentials for the target resource.
The reason AUTHENTICATED_FAILED suggests something is likely wrong with your credentials. If you want to dig deeper into that, you can use this SO answer to also display the headers that are being sent with your request.
However, if your reason is AUTHENTICATION_DENIED the problem is likely the following: If you have too many failed authentication attempts in a row, a CAPTCHA challenge is triggered, and this error will occur until the Failed Login Count is reset. This can easily happen when you are developing a script and test it frequently. To remedy this, either open a browser and manually (re-)logon to Confluence, completing the CAPTCHA, or resolve it from the Confluence User Management.

How to get status of a job running on another machine?

I have 2 machines A and B and A can send restful request to B as follows:
curl -XPOST -H "Content-type: application/json" -d '{"data":"python /tmp/demo.py","action":"demo"}' 'http://192.168.95.8:51888/api/host'
I have deployed an api service on B and when such request is received, B will execute the python script /tmp/demo.py and the execution may last 0.5-3 hours.
My question is:
1) How to write a job on A that keeps tracking the status of the task running on B and end it self when the task finishes successfully or failed?
2) In the tracking job, how to add a module that can kill itself after exceeding a pre-set time threshold?

Treat the job as an HTTP resource. When you do POST /api/host, that request creates a new id for that job and returns it. For good use of HTTP, the response would contain a Location header with the URL of the resource where the job's status can be checked, e.g.:
POST /api/hosts
Content-type: application/json
{"data":"python /tmp/demo.py","action":"demo"}
HTTP/1.1 201 Created
Location: /api/host/jobs/c2de232b-f63e-4178-a053-d3f3459ab538
You can now GET /api/host/jobs/c2de232b-f63e-4178-a053-d3f3459ab538 at any time and see what status the job has, e.g.:
{"status": "pending"}
You may POST commands to that resource, e.g. for cancelling it.
How exactly your HTTP API would get the status of that Python script is obviously up to you. Perhaps it can communicate with it over a socket, or the job itself will periodically write its status to some database or file.

Cannot access the request json_body when using Chalice

I'm attempting to make a curl request to my python api that is using the AWS package Chalice.
When I try to access the app.current_request.json_body a JSON Parse error is thrown. Cannot figure out why this is happening. My JSON is formatted properly as far as I can tell.
Here is the curl request:
(echo -n '{"data": "test"}') |
curl -H "Content-Type: application/json" -d #- $URL
Here is the python Chalice code:
app = Chalice(app_name='predictor')
#app.route('/', methods=['POST'], content_types=['application/json'])
def index():
try:
body = app.current_request.json_body
except Exception as e:
return {'error': str(e)}
When I invoke the route using the above curl request I get the following error:
{"error": "BadRequestError: Error Parsing JSON"}
Note: When I remove the .json_body from the app.current_request. I no longer get the error.
Any thoughts?

The documentation indeed indicates that the problem is Content-Type:
The default behavior of a view function supports a request body of application/json. When a request is made with a Content-Type of application/json, the app.current_request.json_body attribute is automatically set for you. This value is the parsed JSON body.
You can also configure a view function to support other content types. You can do this by specifying the content_types parameter value to your app.route function. This parameter is a list of acceptable content types.
It suggests that changing the Content-Type might make json_body work, but I didn't manage to have any success with it.
However using app.current_request.raw_body.decode() instead of app.current_request.json_body solves the problem.

Set the HTTP status text in a Flask response

How can I set the HTTP status text for a response in Flask?
I know I can return a string with a status code
#app.route('/knockknock')
def knockknock():
return "sleeping, try later", 500
But that sets the body, I'd like to change the HTTP status text from "INTERNAL SERVER ERROR" to "sleeping, try later".
It seems this is not possible in Flask/Werkzeug. It's not mentioned anywhere. But maybe I'm missing something?

The following will work in Flask. Flask can use either a status code or description while returning response. The description can obviously contain the code as well (from the client side it is identical). However note that this doesn't generate the default error description from the Flask side.
from flask import Flask
app = Flask(__name__)
#app.route('/knockknock')
def knockknock():
return "", "500 sleeping, try later"
Here is the output from the curl command,
curl -i http://127.0.0.1:5000/knockknock
HTTP/1.0 500 sleeping, try later
Content-Type: text/html; charset=utf-8

I'm not an expert on this, but I'm afraid this will only be possible through monkeypatching.
Flask returns werkzeug Response objects, and it seems that the status code reasons are hardcoded in http.py.
Again, some monkeypatching might be enough to change this for your application however.

Maybe this document will be helpful,you can customorize your own error page.
http://flask.pocoo.org/docs/0.12/patterns/errorpages/
by review the source code, in app.py can find
if status is not None:
if isinstance(status, string_types):
rv.status = status
else:
rv.status_code = status
so using
#app.route('/test')
def tt():
return "status error", "500 msg"
works.
Sorry for the misunderstanding.

python-requests making a GET instead of POST request

I have a daily cron which handles some of the recurring events at my app, and from time to time I notice one weird error that pops up in logs. The cron, among other things, does a validation of some codes, and it uses the webapp running on the same server, so the validation request is made via POST request with some data.
url = 'https://example.com/validate/'
payload = {'pin': pin, 'sku': sku, 'phone': phone, 'AR': True}
validation_post = requests.post(url, data=payload)
So, this makes the actual request and I log the response. From time to time, and recently up to 50% of the request, the response contains the following message from nginx:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>405 Method Not Allowed</title>
<h1>Method Not Allowed</h1>
<p>The method GET is not allowed for the requested URL.</p>
So, the actual request was made using the GET method, not the POST as it was instructed in the code. In the nginx access.log I can see that entry:
123.123.123.123 - - [18/Feb/2015:12:26:50 -0500] "GET /validate/ HTTP/1.1" 405 182 "-" "python-requests/2.2.1 CPython/2.7.6 Linux/3.13.0-37-generic"
And the uwsgi log for the app shows the similar thing:
[pid: 6888|app: 0|req: 1589/58763] 123.123.123.123 () {40 vars in 613 bytes} [Mon Apr 6 11:42:41 2015] GET /validate/ => generated 182 bytes in 1 msecs (HTTP/1.1 405) 4 headers in 234 bytes (1 switches on core 0)
So, everything points out that the actual request was not made using the POST. The app route that handles this code is simple, and this is an excerpt:
#app.route('/validate/', methods=['POST'])
#login_required
def validate():
if isinstance(current_user.user, Sales):
try:
#do the stuff here
except Exception, e:
app.logger.exception(str(e))
return 0
abort(403)
The app route can fail, and there are some returns inside the try block, but even if those fails or there is an expcetion, there is nothing that could raise the 405 error code in this block, only 403 which rarely happens since I construct and login the user manually from the cron.
I have found similar thing here but the soultion there was that there was a redirect from HTTP to HTTPS version, and I also have that redirect present in the server, but the URL the request is being made has the HTTPS in it, so I doubt this is the cause.
The stack I am running this on is uwsgi+nginx+flask. Can anyone see what might be causing this? To repeat, its not happening always, so sometimes its working as expected, sometimes not. I recently migrated from apache and mod_wsgi to this new stack and from that point I have started encontering this error; can't recally ever seeing it on apache environment.
Thanks!

The only time we ever change a POST request to a GET is when we're handling a redirect. Depending on the redirect code, we will change the request method. If you want to be sure that we don't follow redirects, you need to pass allow_redirects=False. That said, you need to figure out why your application is generating redirects (including if it's redirecting to HTTP or to a different domain, or using a specific status code).

Not sure if it's by design, but removing the forward slash at the end of the URL fixed it for me:
url = 'https://example.com/validate/' # remove the slash
payload = {'pin': pin, 'sku': sku, 'phone': phone, 'AR': True}
validation_post = requests.post(url, data=payload)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Raise Airflow Exception to Fail Task from CURL request - python

Related

requests.exceptions.HTTPError: 401 Client Error atlassian-python-api

How to get status of a job running on another machine?

Cannot access the request json_body when using Chalice

Set the HTTP status text in a Flask response

python-requests making a GET instead of POST request

Categories

Resources