Amazon SQS - Communicating URL between servers

Amazon SQS - Communicating URL between servers - python

I was wondering if I could get some help with Amazon SQS. In my example I am trying to set up a queue on Server A and query it from Server B. The issue I’m having is that when I create a queue on server A it provides me with a URL like this:
https://sqs.us-east-1.amazonaws.com/599169622985/test-topic-queue
Then on my other server I apparently need to query this URL for information on the queue. The trouble is, my server B doesn’t know the URL that I created on server A. This seems like a bit of a flaw, do I really need to find a way to also communicate the URL to server B before it can connect to the queue, and if so, does anyone have any good solutions for this?
I have tried asking on Amazon and didn’t get any replies.

For sure servers A and B must share some kind of information regarding the queue. If not the full URL, you can just share the name, and retrieve the queue URL on server B using the GetQueueUrl API endpoint:
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/Query_QueryGetQueueUrl.html

Queues should be treated like any other resources (cache, datastores, etc) and defined ahead of time in some type of application configuration file.
If your use case involves queue end points that change on a regular basis, then you might want to store the queue endpoint in something that both instances can check. It could be a database, or it could be a config file pulled from s3.

Related

Running python script concurrently based on trigger

What would be best way to solve following problem with Python ?
I have real-time data stream coming to my object-oriented storage from user application (json files being stored into S3 storage in Amazon).
Upon receiving of each JSON file, I have to within certain time (1s in this instance) process data in the file and generate response that is send back to the user. This data is being processed by simple Python script.
My issue is, that the real-time data stream can at the same time generate even few hundreds JSON files from user applications that I need to run trough my Python script and I don't know how to approach this the best way.
I understand, that way to tackle this would be to use trigger based Lambdas that would execute job on the top of every file once uploaded from real-time stream in server-less environment, however this option is quite expensive compared to have single server instance running and somehow triggering jobs inside.
Any advice is appreciated. Thanks.

Serverless can actually be cheaper than using a server. It is much cheaper when there are periods of no activity because you don't need to pay for a server doing nothing.
The hardest part of your requirement is sending the response back to the user. If an object is uploaded to S3, there is no easy way to send back a response and it isn't even obvious who is the user that sent the file.
You could process the incoming file and then store a response back in a similarly-named object, and the client could then poll S3 for the response. That requires the upload to use a unique name that is somehow generated.
An alternative would be for the data to be sent to AWS API Gateway, which can trigger an AWS Lambda function and then directly return the response to the requester. No server required, automatic scaling.
If you wanted to use a server, then you'd need a way for the client to send a message to the server with a reference to the JSON object in S3 (or with the data itself). The server would need to be running a web server that can receive the request, perform the work and provide back the response.
Bottom line: Think about the data flow first, rather than the processing.

Django, global variables and tokens

I'm using django to develop a website. On the server side, I need to transfer some data that must be processed on the second server (on a different machine). I then need a way to retrieve the processed data. I figured that the simplest would be to send back to the Django server a POST request, that would then be handled on a view dedicated for that job.
But I would like to add some minimum security to this process: When I transfer the data to the other machine, I want to join a randomly generated token to it. When I get the processed data back, I expect to also get back the same token, otherwise the request is ignored.
My problem is the following: How do I store the generated token on the Django server?
I could use a global variable, but I had the impression browsing here and there on the web, that global variables should not be used for safety reason (not that I understand why really).
I could store the token on disk/database, but it seems to be an unjustified waste of performance (even if in practice it would probably not change much).
Is there third solution, or a canonical way to do such a thing using Django?

You can store your token in django cache, it will be faster from database or disk storage in most of the cases.
Another approach is to use redis.
You can also calculate your token:
save some random token in settings of both servers
calculate token based on current timestamp rounded to 10 seconds, for example using:
token = hashlib.sha1(secret_token)
token.update(str(rounded_timestamp))
token = token.hexdigest()
if token generated on remote server when POSTing request match token generated on local server, when getting response, request is valid and can be processed.

The simple obvious solution would be to store the token in your database. Other possible solutions are Redis or something similar. Finally, you can have a look at distributed async tasks queues like Celery...

Querying objects from mysql with python

Since I can't explain clearly what I don't understand I'll use an example.
Lets say I have a client application and a server application. The server awaits and when the client sends some keyword to the server so the server knows what should be queried. And lets say that the client requests a product object so the server queries the database and gets back the row that the client needs as a set object. So every time I need some object I need send it to the client in form of a string and then instantiate it ?
Am i missing something ? Isn't it expensive to instantiate objects on every query ?
TIA!

Your question is very vague and doesn't really ask something but I'll try to give you a generic answer of how to interact between server and client.
When a user request a item in the client, you should provide the client with an API to the server, something like http://example.com/search?param=test. The client will use this API in either an AJAX call or a direct call.
The server should parse the param, connect to database, retrieve the requested item and return to client. The most common data types for this exchange are JSON and Plain Text.
The client will then parse either of the data types, generate if required an object from these and finnally show the user the requested data.
If this is not what you need please update your question to ask specifically the issue you have and maybe provide some code where you have the issue and I'll update my answer accordingly.

MySQL Server uses custom protocol over TCP. If you don't want to use any library you will have to parse TCP messages. MySQL Connector / Python does exactly that - you can look at its source code if wish.

Requests timeout between App engine and EC2

My webapp has two parts:
a GAE server which handles web requests and sends them to an EC2 REST server
an EC2 REST server which does all the calculations given information from GAE and sends back results
It works fine when the calculations are simple. Otherwise, I would have timeout error on the GAE side.
I realized that there are some approaches for this timeout issue. But after some researches, I found (please correct me if I am wrong):
taskqueue would not fit my needs since some of the calculations could take more than half an hours.
'GAE backend instance' works when I reserved another instance all the time. But since I have already resered an EC2 instance, I would like to find some "cheap" solutions (not paying GAE backend instance and EC2 at the same time)
'GAE Asynchronous Requests' also not an option, since it still wait for response from EC2 although users can send other requests while they are waiting
Below is a simple case of my code, and it asks:
users to upload a csv
parse this csv and send information to EC2
generate output page given response from EC2
OutputPage.py
from przm import przm_batchmodel
class OutputPage(webapp.RequestHandler):
def post(self):
form = cgi.FieldStorage()
thefile = form['upfile']
#this is where uploaded file will be processed and sent to EC2 for computing
html= przm_batchmodel.loop_html(thefile)
przm_batchoutput_backend.przmBatchOutputPageBackend(thefile)
self.response.out.write(html)
app = webapp.WSGIApplication([('/.*', OutputPage)], debug=True)
przm_batchmodel.py### This is the code which sends info. to EC2
def loop_html(thefile):
#parses uploaded csv and send its info. to the REST server, the returned value is a html page.
data= csv.reader(thefile.file.read().splitlines())
response = urlfetch.fetch(url=REST_server, payload=data, method=urlfetch.POST, headers=http_headers, deadline=60)
return response
At this moment, my questions are:
Is there a way on the GAE side allow me to just send the request to EC2 without waiting for its response? If this is possible, on the EC2 side, I can send users emails to notify them when the results are ready.
If question 1 is not possible. Is there a way to create a monitor on EC2 which will invoke the calculation once information are received from GAE side?
I appreciate any suggestions.

Here are some points:
For Question 1 : You do not need to wait on the GAE side for EC2 to complete its work. You are already using URLFetch to send the data across to EC2. As long as it is able to send that data across over to the EC2 side within 60 seconds and its size is not more than 10MB, then you are fine.
You will need to make sure that you have a Receipt Handler on the EC2 side that is capable of collecting this data from above and sending back an Ack. An Ack will be sufficient for the GAE side to track the activity. You can then always write some code on the EC2 side to send back the response to the GAE side that the conversion is done or as you mentioned, you could send an email off if needed.
I suggest that you create your own little tracker on the GAE side. For e.g. when the File is uploaded, created a Task and send back the Ack immediately to the client. Then you can use a Cron Job or Task Queue on the App Engine side to simply send off the work to EC2. Do not wait for EC2 to complete its job. Then let EC2 report back to GAE that its work is done for a particular Task Id and send off and email (if required) to notify the users that the work is done. In fact, EC2 can even report back with a batch of Task Ids that it completed, instead of sending a notification for each Task Id.

Fetching images from URL and saving on server and/or Table (ImageField)

I'm not seeing much documentation on this. I'm trying to get an image uploaded onto server from a URL. Ideally I'd like to make things simple but I'm in two minds as to whether using an ImageField is the best way or simpler to simply store the file on the server and display it as a static file. I'm not uploading anyfiles so I need to fetch them in. Can anyone suggest any decent code examples before I try and re-invent the wheel?
Given an URL say http://www.xyx.com/image.jpg, I'd like to download that image to the server, put it into a suitable location after renaming. My question is general as I'm looking for examples of what people have already done. So far I just see examples relating to uploading images, but that doesn't apply. This should be a simple case and I'm looking for a canonical example that might help.
This is for uploading an image from the user: Django: Image Upload to the Server
So are there any examples out there that just deal with the process of fetching and image and storing on the server and/or ImageField.

Well, just fetching an image and storing it into a file is straightforward:
import urllib2
with open('/path/to/storage/' + make_a_unique_name(), 'w') as f:
f.write(urllib2.urlopen(your_url).read())
Then you need to configure your Web server to serve files from that directory.
But this comes with security risks.
A malicious user could come along and type a URL that points nowhere. Or that points to their own evil server, which accepts your connection but never responds. This would be a typical denial of service attack.
A naive fix could be:
urllib2.urlopen(your_url, timeout=5)
But then the adversary could build a server that accepts a connection and writes out a line every second indefinitely, never stopping. The timeout doesn’t cover that.
So a proper solution is to run a task queue, also with timeouts, and a carefully chosen number of workers, all strictly independent of your Web-facing processes.
Another kind of attack is to point your server at something private. Suppose, for the sake of example, that you have an internal admin site that is running on port 8000, and it is not accessible to the outside world, but it is accessible to your own processes. Then I could type http://localhost:8000/path/to/secret/stats.png and see all your valuable secret graphs, or even modify something. This is known as server-side request forgery or SSRF, and it’s not trivial to defend against. You can try parsing the URL and checking the hostname against a blacklist, or explicitly resolving the hostname and making sure it doesn’t point to any of your machines or networks (including 127.0.0.0/8).
Then of course, there is the problem of validating that the file you receive is actually an image, not an HTML file or a Windows executable. But this is common to the upload scenario as well.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.