Making SQL Alchemy Play Nice With Google App Engine - python

I'm currently working on a Google App Engine (Python) project which primarily uses Google Cloud SQL (with SQL Alchemy) for back-end data storage.
Most of the time everything works perfectly well. However, occasionally "something" goes haywire and we start getting bizarre exceptions. For example:
AttributeError: 'ColumnProperty' object has no attribute 'strategy'
AttributeError: 'RelationshipProperty' object has no attribute 'strategy'
We think this might be related to the spinning up of a new GAE instance, but we can't really be sure.
With all that being said, my question is this. What are some strategies that my team and I can use to track down this issue?
Keep in mind that the application is running on Google App Engine so that might limit our options a bit.
Update: Owen Nelson's comment below is right on. We've added threading.RLock as suggested by Google. However we are still seeing this issue, but much less often.
I want to be clear, to this point we've been unable to reproduce this issue in our local environment. We are pretty sure this has something to do with dynamic instances spinning up and that isn't something that we can really do in development.

From what I can understand, your application have problems only in production mode.
Try to reproduce the bug in dev mode
The best possible solution would be to be able to reproduce that bug in development mode. To do that, you could try to run a batch of unittest with LOTS of data. (See how to do local test on appengine).
If that doesn't work...
Turn on appstats to get more information on the handler
You can turn on appstats to try and get information on which handler is currently causing the problem. Appstats normally gives you information on datastore, this is not relevant in our case, but you can get information from the requests in general (such as response time)
Identify the handler and wrap it in a beautiful try catch
Once you identify the source of the problem or from where it is raised, you can surround it with a try..catch.. With that you can get more information on the current execution Trace and hopefully solve your problem

Related

How do I run Python scripts automatically, while my Flask website is running on a VPS?

Okay, so basically I am creating a website. The data I need to display on this website is delivered twice daily, where I need to read the delivered data from a file and store this new data in the database (instead of the old data).
I have created the python functions to do this. However, I would like to know, what would be the best way to run this script, while my flask application is running? This may be a very simple answer, but I have seen some answers saying to incorporate the script into the website design (however these answers didn't explain how), and others saying to run it separately. The script needs to run automatically throughout the day with no monitoring or input from me.
TIA
Generally it's a really bad idea to put a webserver to handle such tasks, that is the flask application in your case. There are many reasons for it so just to name a few:
Python's Achilles heel - GIL.
Sharing system resources of the application between users and other operations.
Crashes - it happens, it could be unlikely but it does. And if you are not careful, the web application goes down along with it.
So with that in mind I'd advise you to ditch this idea and use crontabs. Basically write a script that does whatever transformations or operations it needs to do and create a cron job at a desired time.

Zope Legacy Code - Accessing DA Functions

We're working with an older zope version (2.10.6-final, python 2.4.5) and working with a database adapter called ZEIngresDA. We have an established connection for it, and the test function shows that it is totally functional and can connect and run queries.
My job is to change the way that the queries are actually executing, so that they're properly parameterizing variables to protect against sql injection. With that said, I'm running into a security issue that I'm hoping someone can help with.
connection = container.util.ZEIngresDAName()
#returning connection at this point reveals it to be of type ZEIngresDA.db.DA,
#which is the object we're looking for.
connection.query("SELECT * from data WHERE column='%s';", ('val1',))
#query is a function that is included in class DA, functions not in DA throw errors.
Here we run into the problem. Testing this script brings up a login prompt that, when logged into, immediately comes up again. I recognize that this is likely some type of security setting, but I've been unable to find anything online about this issue, though this old of zope documentation isn't spectacular online anyways. If this sounds familiar to you or you have any ideas, please let me know.
I have some experience using Zope2 but it's hard to give a good answer with the limited information you've posted. I'm assuming here that you're using a Python script within the ZMI
Here's a list of things I would check:
Are you logged into the root folder rather than a sub folder in the ZMI? This could cause a login prompt as you're requesting a resource that you do not have access to use
In the ZMI double check the "security" tab of the script you're trying to run to ensure that your user role has permission to run the script
Whilst you're there check the "proxy" tab to ensure that the script itself has permission to call the functions within it
Also worth checking that the products you're trying to use were installed by a user which is still listed in the root acl_user folder - from memory this can cause issues with the login prompt
Best of luck to you - happy (also sad) to hear that there's at least one other Zope user out there!

Windows Azure Web sites python

After a whole load of hard work I've eventually got a hello world flask app running on Windows Azure, the app is built locally and runs fine, deploying it to Azure is a nightmare though. So I've sort of got two questions here.
I can't seem to get a stack trace at all, I've tried setting things in web.config, but the documentation on how to use all this stuff is just apawling, all I can find is just literally badly written blog posts dotted around one of microsoft's millions of blogs. Which doesn't even help me to fix my problem.
The second question relates to the first one, due to some horrible debugging methods (taking my application apart and commenting things out) I feel like it could be pymongo causing this, I've built it without the C extensions and it's in my site-packages and it works on my local machine. However without a stack trace I've just no idea how to fix this without wanting to pull my hair out.
Can anyone shed some light on this? Really disappointing because the rest of azure isn't too bad, theres far better website hosting alternatives out there like heroku which are literally 10 command setups. I've been working on this all day so far..
Solved
For those who are interested I ended up solving this problem my manually adding error handling into my flask application completely bypassing the IIS settings and windows azure configs - far too complicated with no documentation at all.
from werkzeug.debug import get_current_traceback
#app.errorhandler(500)
def internal_server_error(e):
base = os.path.dirname(os.path.abspath(__file__))
f = open('%s/logs/error.log' % (base), 'a')
track = get_current_traceback(skip=1, show_hidden_frames=True, ignore_system_exceptions=False)
track.log(f)
f.close()
return 'An error has occured', 500

Starting, Stopping, and Continuing the Google App Engine BulkLoader

I have quite of bit of data that I will be uploading into Google App Engine. I want to use the bulkloader to help get it in there. However, I have so much data that I generally use up my CPU quota before it's done. Also, any other problem such a bad internet connection or random computer issue can stop the process.
Is there any way to continue a bulkload from where you left off? Or to only bulkload data that has not been written to the datastore?
I couldn't find anything in the docs, so I assume any answer will include digging into the code.
Well, it is in the docs:
If the transfer is interrupted, you
can resume the transfer from where it
left off using the --db_filename=...
argument. The value is the name of the
progress file created by the tool,
which is either a name you provided
with the --db_filename argument when
you started the transfer, or a default
name that includes a timestamp. This
assumes you have sqlite3 installed,
and did not disable the progress file
with --db_filename=skip.
http://code.google.com/appengine/docs/python/tools/uploadingdata.html
(I've used it some time ago, so I had a feeling it should be there)

In Python in GAE, what is the best way to limit the risk of executing untrusted code?

I would like to enable students to submit python code solutions to a few simple python problems. My applicatoin will be running in GAE. How can I limit the risk from malicios code that is sumitted? I realize that this is a hard problem and I have read related Stackoverflow and other posts on the subject. I am curious if the restrictions aleady in place in the GAE environment make it simpler to limit damage that untrusted code could inflict. Is it possible to simply scan the submitted code for a few restricted keywords (exec, import, etc.) and then ensure the code only runs for less than a fixed amount of time, or is it still difficult to sandbox untrusted code even in the resticted GAE environment? For example:
# Import and execute untrusted code in GAE
untrustedCode = """#Untrusted code from students."""
class TestSpace(object):pass
testspace = TestSpace()
try:
#Check the untrusted code somehow and throw and exception.
except:
print "Code attempted to import or access network"
try:
# exec code in a new namespace (Thanks Alex Martelli)
# limit runtime somehow
exec untrustedCode in vars(testspace)
except:
print "Code took more than x seconds to run"
#mjv's smiley comment is actually spot-on: make sure the submitter IS identified and associated with the code in question (which presumably is going to be sent to a task queue), and log any diagnostics caused by an individual's submissions.
Beyond that, you can indeed prepare a test-space that's more restrictive (thanks for the acknowledgment;-) including a special 'builtin' that has all you want the students to be able to use and redefines __import__ &c. That, plus a token pass to forbid exec, eval, import, __subclasses__, __bases__, __mro__, ..., gets you closer. A totally secure sandbox in a GAE environment however is a real challenge, unless you can whitelist a tiny subset of the language that the students are allowed.
So I would suggest a layered approach: the sandbox GAE app in which the students upload and execute their code has essentially no persistent layer to worry about; rather, it "persists" by sending urlfetch requests to ANOTHER app, which never runs any untrusted code and is able to vet each request very critically. Default-denial with whitelisting is still the holy grail, but with such an extra layer for security you may be able to afford a default-acceptance with blacklisting...
You really can't sandbox Python code inside App Engine with any degree of certainty. Alex's idea of logging who's running what is a good one, but if the user manages to break out of the sandbox, they can erase the event logs. The only place this information would be safe is in the per-request logging, since users can't erase that.
For a good example of what a rathole trying to sandbox Python turns into, see this post. For Guido's take on securing Python, see this post.
There are another couple of options: If you're free to choose the language, you could run Rhino (a Javascript interpreter) on the Java runtime; Rhino is nicely sandboxed. You may also be able to use Jython; I don't know if it's practical to sandbox it, but it seems likely.
Alex's suggestion of using a separate app is also a good one. This is pretty much the approach that shell.appspot.com takes: It can't prevent you from doing malicious things, but the app itself stores nothing of value, so there's no harm if you do.
Here's an idea. Instead of running the code server-side, run it client-side with Skuplt:
http://www.skulpt.org/
This is both safer, and easier to implement.

Categories

Resources