I am working with multiple applications that communicate asynchronously using Kafka. These applications are managed by several departments and contract testing is appropriate to ensure that the messages used during communication follow the expected schema and will evolve according to the contract specification.
It sounded like the pact library for python is a good fit because it helps creating contract tests for HTTP and message integrations.
What I wanted to do is to send an HTTP request and to listen from the appropriate and dedicated Kafka topic immediately after. But it seems that the test is forcing me specify an HTTP code even if what I am expecting is a message from a queue without an HTTP status code. Furthermore, it seems that the HTTP request is being sent before the consumer is listening. Here is some sample code.
from pact.consumer import Consumer as p_Consumer
from pact.provider import Provider as p_Provider
from confluent_kafka import Consumer as k_Consumer
pact = p_Consumer('Consumer').has_pact_with(p_Provider('Provider'))
pact.start_service()
atexit.register(pact.stop_service)
config = {'bootstrap.servers':'server', 'group.id':0, 'auto.offset.reset':'latest'}
consumer = k_consumer(config)
consumer.subscribe(['usertopic'])
def user():
while True:
msg = consumer.poll(timeout=1)
if msg is None:
continue
else:
return msg.value().decode()
class ConstractTesting(unittest.TestCase):
expected = {
'username': 'UserA',
'id':123,
'groups':['Editors']
}
pact.given('UserA exists and is not an administrator')
.upon_receiving('a request for UserA')
.with_request(method='GET',path='/user/')
.will_respond_with(200, body=expected)
with pact:
result = user()
self.assertEqual(result,expected)
How would I carry out contract testing in Python using Kafka? It feels like I am going through a lot of hoops to carry out this test.
With Pact message it's a different API you write tests against. You don't use the standard HTTP one, in fact the transport itself is ignored altogether and it's just the payload - the message - we're interested in capturing and verifying. This allows us to test any queue without having to build specific interfaces for each
See this example: https://github.com/pact-foundation/pact-python/blob/02643d4fb89ff7baad63e6436f6a929256c6bf12/examples/message/tests/consumer/test_message_consumer.py#L65
You can read more about message pact testing here: https://docs.pact.io/getting_started/how_pact_works#non-http-testing-message-pact
And finally here are some Kafka examples for other languages that may be helpful: https://docs.pactflow.io/docs/examples/kafka/js/consumer
Related
I'm looking for some advice, or a relevant tutorial regarding the following:
My task is to set up a flask route that POSTs to API endpoint X, receives a new endpoint Y in X's response, then GETs from endpoint Y repeatedly until it receives a certain status message in the body of Y's response, and then returns Y's response.
The code below (irrelevant data redacted) accomplishes that goal in, I think, a very stupid way. It returns the appropriate data occasionally, but not reliably. (It times out 60% of the time.) When I console log very thoroughly, it seems as though I have bogged down my server with multiple while loops running constantly, interfering with each other.
I'll also receive this error occasionally:
SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /book
import sys, requests, time, json
from flask import Flask, request
# create the Flask app
app = Flask(__name__)
# main booking route
#app.route('/book', methods=['POST']) #GET requests will be blocked
def book():
# defining the api-endpoints
PRICING_ENDPOINT = ...
# data to be sent to api
data = {...}
# sending post request and saving response as response object
try:
r_pricing = requests.post(url = PRICING_ENDPOINT, data = data)
except requests.exceptions.RequestException as e:
return e
sys.exit(1)
# extracting response text
POLL_ENDPOINT = r_pricing.headers['location']
# setting data for poll
data_for_poll = {...}
r_poll = requests.get(POLL_ENDPOINT, data = data_for_poll)
# poll loop, looking for 'UpdatesComplete'
j = 1
poll_json = r_poll.json()
update_status = poll_json['Status']
while update_status == 'UpdatesPending':
time.sleep(2)
j = float(j) + float(1)
r_poll = requests.get(POLL_ENDPOINT, data = data_for_poll)
poll_json = r_poll.json()
update_status = poll_json['Status']
return r_poll.text
This is more of an architectural issue more than a Flask issue. Long-running tasks in Flask views are always a poor design choice. In this case, the route's response is dependent on two endpoints of another server. In effect, apart from carrying the responsibility of your app, you are also carrying the responsibility of another server.
Since the application's design seems to be a proxy for another service, I would recommend creating the proxy in the right way. Just like book() offers the proxy for PRICING_ENDPOINT POST request, create another route for POLL_ENDPOINT GET request and move the polling logic to the client code (JS).
Update:
If you cannot for some reason trust the client (browser -> JS) with the POLL_ENDPOINT information in a hidden proxy like situation, then maybe move the polling to a task runner like Celery or Python RQ. Although, it will introduce additional components to your application, it would be the right way to go.
Probably you get that error because of the HTTP connection time out with your API server that is looping. There are some standards for HTTP time connection and loop took more time that is allowed for the connection. The first (straight) solution is to "play" with Apache configs and increase the HTTP connection time for your wsgi. You can also make a socket connection and in it check the update status and close it while the goal was achieved. Or you can move your logic to the client side.
So we have a single thread flask server running where we receive requests from a python app client. In this flask server we use rabbitMQ with pika library to distribute messages to other clients.
What is happening is that in the get function the program is crashing with the error:
pika.exceptions.ConnectionClosed: (505, 'UNEXPECTED_FRAME - expected
content header for class 60, got non content header frame instead')
I've searched a lot of topics about this in stack overflow and others but they all address problems with multi threading which is not the case. Flask should only serve with one thread unless it is called in app.run(threaded=yes).
The program normally crashes when multiple messages are sent in a short interval (e.g. 5 per second) and it's also important to note that messages are being received every second with a request to this function:
#app.route('/api/users/getMessages', methods=['POST'])
def get_Messages():
data = json.loads(request.data)
token = data['token']
payload = jwt.decode(token, 'SECRET', algorithms=['HS256'])
istid = payload['istid']
print('istid: '+istid)
messages = []
queue = channel.queue_declare(queue=istid)
for i in range(queue.method.message_count):
method_frame, header_frame, body = channel.basic_get(queue=istid, no_ack=True)
if method_frame:
#print(method_frame, header_frame, body)
messages.append(body)
else:
print('No message returned')
res = {'messages':messages, 'error':0}
return jsonify(res)
In this code it crashes normally in the line:
queue = channel.queue_declare(queue=istid)
But we also tried to change the code to use a while instead of a for where it ends when the body is None and it crashes in the line:
method_frame, header_frame, body = channel.basic_get(queue=istid, no_ack=True)
in that case.
Also important, the crashes are random and it can work a few times and then randomly crashes after a get request while messages are being sent. If anyone knows anything related to this we would appreciate any help.
Another note, we thought about using basic_consume with callback instead of basic_get but we didn't find a way in which this would work since we have to send the messages back and have several user making requests to this same function.
EDIT #1:
In the rabbitMQ docs rabbitmq if you search for the function "def basic_get" you will notice there are some TODO comments and also a reference to this
Due to implementation details, this cannot be called a second time
until the callback is executed.
So I suspected that this could be what was happening but even if it is I don't know how could it be solved.
For anyone interested in the solution, as it is in the other comments, the program was not thread safe since flask as of version 1.0 uses threaded = True as default.
The solution is either:
1) running flask with app.run(threaded = False)
2) Making the program thread safe by implementing locks whenever accessing the channel /connection with pika.
I am trying to build an application using Google Cloud Platform AutoML using Python. My overall code flow looks like this:
User Interacts--> data sent to PubSub--> callback invokes my AutoML--> Result
The snippet that calls pubsub looks like this:
blob=blob+bytes(doc_type,'utf-8')
publisher.publish(topic,blob)
future=subscriber.subscribe(subscription,callback=callback)
#flash("The object is "+future,'info')
try:
future.result()
except Exception as ex:
subscriber.close()
In PubSub callback:
def callback(message):
new_message=message.data
display_name,score=predict_value(new_message,"modelID","projectid",'us-central1')
message.ack()
And my predict_value gets the model_id, project id and compute region and performs the prediction.
When I directly call predict_value without using PubSub it is working fine. If I do like this, I am getting the below error:
google.api_core.exceptions.PermissionDenied: 403 Permission 'automl.models.predict' denied on resource 'projects/projectID/locations/us-central1/models/' (or it may not exist).
Please help me to resolve the issue
Thank you so much for all your responses. I have just fixed the issue using the below snippet example
def receive_messages_synchronously(project, subscription_name):
"""Pulling messages synchronously."""
# [START pubsub_subscriber_sync_pull]
# project = "Your Google Cloud Project ID"
# subscription_name = "Your Pubsub subscription name"
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
project, subscription_name)
# Builds a pull request with a specific number of messages to return.
# `return_immediately` is set to False so that the system waits (for a
# bounded amount of time) until at lease one message is available.
response = subscriber.pull(
subscription_path,
max_messages=3,
return_immediately=False)
ack_ids = []
for received_message in response.received_messages:
print("Received: {}".format(received_message.message.data))
ack_ids.append(received_message.ack_id)
# Acknowledges the received messages so they will not be sent again.
subscriber.acknowledge(subscription_path, ack_ids)
# [END pubsub_subscriber_sync_pull]
The reason being the subscription that is created uses the pull request. I guess the callback method concept used is mainly for "push" which may be the reason because I didnt give the endpoint and token to publish the message. Hope what I am guessing is correct. Let me know your views as well.
This is likely due to one of two factors:
invalid credentials being used when sending the request to the AutoML API - it is very likely that pubsub executes in other context and can't get the default credentials
invalid model resource name (make sure it is correct) - it should be something like: "projects/12423534/locations/us-central1/models/23432423"
I am researching what it would take to make a web app that would interact with e-mails directly. Like you would send to something#myapp.com and the app would tear it apart and determine who it's from, if they are in the DB, what is the subject line, etc.
I am working with/most familiar with python and flask.
Could anyone get me started in the right direction of how to get an e-mail to interface with my flask app code?
There are several approach you can take:
write some code which uses IMAP or POP to retrieve emails and process them. Either run this from a crontab (or something similar) or add it to your flask app and trigger it in there, either through a crontab that requests a magic URL or setting up a custom timer thread.
configure your MTA to deliver email for something#myapp.com by feeding it to a program you write (for example in Exim you could use a pipe transport) . In that program you can either process it directly, or do something like POSTing it to your flask app.
I've done something along these lines recently, with a simple bookmarking web-app. I have the usual bookmarklet way of bookmarking something to it, but I also wanted to be able to e-mail links to it from apps like Reeder on my iPhone and whatever. You can see what I ended up with on GitHub: subMarks.
I use Google Apps for your Domain for my email, so I created a special address for my app to look at - I really didn't want to try building/configuring my own e-mail server.
the mail_daemon.py file from above is run as a cron job every 5 minutes. It connects to the email server using the poplib Python package, processes the emails that are there and then disconnects (one part I feel compelled to point out is that I check that the emails are from me before they are processed :) )
My Flask app then provides the front end to the bookmarks, displaying them from the database.
I decided not to put the email handling code into the actual flask app, because it can be rather slow and would only run when the page was visited, but you could do this if you wanted.
Here's some barebones code to get things going:
import poplib
from email import parser
from email.header import decode_header
import os
import sys
pop_conn = poplib.POP3_SSL('pop.example.com')
pop_conn.user('my-app#example.com')
pop_conn.pass_('password')
#Get messages from server:
messages = [pop_conn.retr(i) for i in range(1, len(pop_conn.list()[1]) + 1)]
# Concat message pieces:
messages = ["\n".join(mssg[1]) for mssg in messages]
#Parse message into an email object:
messages = [parser.Parser().parsestr(mssg) for mssg in messages]
for message in messages:
# check message is from a safe recipient
if 'me#example.com' in message['from']:
# Get the message body text
if message['Content-Type'][:4] == 'text':
text = message.get_payload() #plain text messages only have one payload
else:
text = message.get_payload()[0].get_payload() #HTML messages have more payloads
# decode the subject (odd symbols cause it to be encoded sometimes)
subject = decode_header(message['subject'])[0]
if subject[1]:
bookmark_title = subject[0].decode(subject[1]).encode('ascii', 'ignore') # icky
else:
bookmark_title = subject[0]
# in my system, you can use google's address+tag#gmail.com feature to specifiy
# where something goes, a useful feature.
project = message['to'].split('#')[0].split('+')
### Do something here with the message ###
pop_conn.quit()
Given, when a user requests /foo on my server, I send the following HTTP response (not closing the connection):
Content-Type: multipart/x-mixed-replace; boundary=-----------------------
-----------------------
Content-Type: text/html
foo
When the user goes to /bar (which will send 204 No Content so the view doesn't change), I want to send the following data in the initial response.
-----------------------
Content-Type: text/html
bar
How would I get the second request to trigger this from the initial response? I'm planning on possibly creating a fancy [engines that support multipart/x-mixed-replace (currently only Gecko)]-only email webapp that does server-push and Ajax effects without any JavaScript, just for fun.
No complete answer, but:
In your question, you're describing a Comet-style architecture. Regarding support of Comet-style techniques in Python/WSGI, there is a StackOverflow question, which talks about various Python servers with support for long-running requests a la Comet.
Also interesting is this mail thread in the Python Web-SIG: "Could WSGI handle Asynchronous response?". In May 2008, there was a broad discussion in the Web-SIG about the topic of asynchronous requests in WSGI.
A recent development is evserver, a lightweight WSGI server, which implements the Asynchronous WSGI extension proposed by Christopher Stawarz in the Web-SIG in May 2008.
Finally, the Tornado web server supports non-blocking asynchronous requests. It has a chat example application using long polling, which has similarities with your requirements.
If the problem is to pass some command from /bar application to /foo application and you are using some servlet-like approach (the Python code is loaded once and not for each request as in CGI), you can just change some class property of the /foo application and be ready to react to the change in the /foo instance (by checking the property state).
Obviously the /foo application should not return right after the first request and yield content line by line.
Thought this is just theory, I have not tried that myself.
I have created some small example (just for fun, you know :))
import threading
num = 0
cond = threading.Condition()
def app(environ, start_response):
global num
cond.acquire()
num += 1
cond.notifyAll()
cond.release()
start_response("200 OK", [("Content-Type", "multipart/x-mixed-replace; boundary=xxx")])
while True:
n = num
s = "--xxx\r\nContent-Type: text/html\r\n\r\n%s\n" % n
yield s
# wait for num change:
cond.acquire()
while num == n:
cond.wait()
cond.release()
from cherrypy.wsgiserver import CherryPyWSGIServer
server = CherryPyWSGIServer(("0.0.0.0", 3000), app)
try:
server.start()
except KeyboardInterrupt:
server.stop()
# Now whenever you visit http://127.0.0.1:3000/, the number increases.
# It also automatically increases in all previously opened windows/tabs.
The idea of a shared variable and thread synchronization (using condition variable object) is based on the fact that WSGI server provided by CherryPyWSGIServer is threaded.
Not sure if this is quite what you're looking for, but there is a fairly old way of doing server push using a mime content of multipart/x-mixed-replace
Basically you compose the response as a mime object with content type multipart/x-mixed-replace, and send the first "version" of a document down. The browser will keep the socket open.
Then as the server decides to push more data, a new "version" of the document gets sent from the server, and the browser will intelligently replace (within whatever frame/iframe contains the content) the content.
This was an early way of doing webcams, where the server would send down (push) image after image, and the browser would just keep replacing the image in the document over and over. This is also a way of doing a "Loading..." message over a single HTTP request.