How to escape Python boto's SelectExpression for Amazon SimpleDB

How to escape Python boto's SelectExpression for Amazon SimpleDB - python

Currently my code is
client = boto3.client('sdb')
query = 'SELECT * FROM `%s` WHERE "%s" = "%s"' % (domain, key, value)
response = client.select(SelectExpression = query)
The variable key and value is input by user, what are the best way to escape them in my above code?
Edit: What I concern is how to escape the fields such as we did in the past to prevent SQL injection, but now in SimpleDB

Subselects and destructive operations can't be performed using simpledb.
Amazon provides quoting rules: http://docs.aws.amazon.com/AmazonSimpleDB/latest/DeveloperGuide/QuotingRulesSelect.html
You can apply this behavior in python using this function:
def quote(string):
return string.replace("'", "''").replace('"', '""').replace('`', '``')
client = boto3.client('sdb')
query = 'SELECT * FROM `%s` WHERE "%s" = "%s"' % (quote(domain), quote(key), quote(value))
response = client.select(SelectExpression = query)

If you meant sideffect of SQL injection is deletion/destruction, SimpleDB only support querying data, if you want to protect data exposing ( that you dont want to ) check aws docs here
Note: Since the guide is good to go, i thought the link is enough

Related

How to set up AzureSQL Database with AlwaysEncrypted and fill it with data?

at the moment I am working with the Azure Cloud. I want to set up an AzureSQL database and use AlwaysEncrypted to ensure that the data is 'always encrypted' ;-). Furthermore I would like to set up AzureFunctions which are able to connect to the Database as well as write records in.
I already set up an AzureSQL Database but I do not know how to work with it. I started two attempts:
Set up table directly in SSMS, fill data in table, create keys and encrypt it with the wizard. This works totally fine and I am able to see the plain data only if I set the 'AlwaysEncrypted' Checkbox while Connecting to the database.
My second attempt was to include the 'always encrypt directly in the queries. I tried the following:
CREATE COLUMN MASTER KEY CMK_test_1
WITH (
KEY_STORE_PROVIDER_NAME = 'AZURE_KEY_VAULT',
KEY_PATH = '<PATH_TO_AZURE_KEY_VAULT>'
)
CREATE COLUMN ENCRYPTION KEY CEK_test_1
WITH VALUES
(
COLUMN_MASTER_KEY = CMK_test_1,
ALGORITHM = 'RSA_OAEP',
ENCRYPTED_VALUE = <VALUE>
)
Create Table dbo.AlwaysEncryptedTest
(
ID int identity(1,1) PRIMARY KEY
, FirstName varchar(25) COLLATE Latin1_General_BIN2 ENCRYPTED WITH (
ENCRYPTION_TYPE = RANDOMIZED,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256',
COLUMN_ENCRYPTION_KEY = CEK_test_1) not NULL
, LastName varchar(25) COLLATE Latin1_General_BIN2 ENCRYPTED WITH (
ENCRYPTION_TYPE = RANDOMIZED,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256',
COLUMN_ENCRYPTION_KEY = CEK_test_1) not NULL
, City varchar(25) COLLATE Latin1_General_BIN2 ENCRYPTED WITH (
ENCRYPTION_TYPE = RANDOMIZED,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256',
COLUMN_ENCRYPTION_KEY = CEK_test_1) not NULL
, StreetName varchar(25) COLLATE Latin1_General_BIN2 ENCRYPTED WITH (
ENCRYPTION_TYPE = RANDOMIZED,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256',
COLUMN_ENCRYPTION_KEY = CEK_test_1) not NULL
)
I know that I have to use an application to put records in the database but I could not find a tutorial or something else that helps me to do so. I found some C# explenation on the Microsoft website but this did not help me to do the job. In best case I would write the connection in python.
Any help is appreciated.
Best
P

If you want to connect Azure SQL server which enables always encrypt with Azure key vault in python application, we can use ODBC driver to implement it.
Regarding how to implement it, we need to add ColumnEncryption=Enabled into connection string to tell odbc application always encrypt has been enabled. Besides, since we use Azure key vault store, we also need to add KeyStoreAuthentication KeyStorePrincipalId and KeyStoreSecret to make ODBC application connect Azure key vault, get encryption key.
For more details, please refer to here and here
For example
Create a service principal to connect Azure key vault
az login
az ad sp create-for-rbac --skip-assignment --sdk-auth
az keyvault set-policy --name $vaultName --key-permissions get, list, sign, unwrapKey, verify, wrapKey --resource-group $resourceGroupName --spn <clientId-of-your-service-principal>
Code
server = '<>.database.windows.net'
database = ''
username = ''
password = ''
driver= '{ODBC Driver 17 for SQL Server}'
KeyStoreAuthentication='KeyVaultClientSecret'
KeyStorePrincipalId='<clientId-of-your-service-principal>'
KeyStoreSecret='<clientSecret-of-your-service-principal>'
conn_str=f'DRIVER={driver};SERVER={server};PORT=1443;DATABASE={database};UID={username};PWD={password};ColumnEncryption=Enabled;KeyStoreAuthentication={KeyStoreAuthentication};KeyStorePrincipalId={KeyStorePrincipalId};KeyStoreSecret={KeyStoreSecret}'
with pyodbc.connect(conn_str) as conn:
with conn.cursor() as cursor:
cursor.execute("SELECT * FROM [dbo].[Patients]")
row = cursor.fetchone()
while row:
print (row)
row = cursor.fetchone()

How to get sql query from peewee?

Simple peewee example:
MySQL DB "Pet" with autoincrement "id" and char-field "name".
Doing
my_pet = Pet.select().where(name == 'Garfield')
With .sql() we get the sql interpretation.
How to get the raw sql query from:
my_pet = Pet.get(name='Garfield')
?

When you write:
my_pet = Pet(name='Garfield')
Nothing at all happens in the database.
You have simply created an object. There is no magic, as peewee is an ActiveRecord ORM, and only saves when you call a method like Model.save() or Model.create().
If you want the SQL for a query like Model.create(), then you should look into using Model.insert() instead:
insert_stmt = Pet.insert(name='Garfield')
sql = insert_stmt.sql()
new_obj_id = insert_stmt.execute()
The downside there is that you aren't returned a model instance, just the primary key.

If you are connecting to a Postgres database, per peewee 3.13 you can print SQL queries by first getting the cursor, then calling mogrify() for your query. Mogrify is provided by the psycopg2 library and hence may not be available when connecting to other databases.
Given your example:
my_pet = Pet.select().where(Pet.name == 'Garfield').limit(1)
cur = database.cursor()
print(cur.mogrify(*my_pet.sql()))
Where database is the Peewee Database object representing the connection to Postgres.

You can use python's "%" operator to build the string
def peewee_sql_to_str(sql):
return (sql[0] % tuple(sql[1]))
insert_stmt = Pet.insert(name='Garfield')
sql = insert_stmt.sql()
print(peewee_sql_to_str(sql))

CherryPy Sessions and large objects?

I have a CherryPy Webapp that I originally wrote using file based sessions. From time to time I store potentially large objects in the session, such as the results of running a report - I offer the option to download report results in a variety of formats, and I don't want to re-run the query when the user selects a download due to the potential of getting different data. While using file based sessions, this worked fine.
Now I am looking at the potential of bringing a second server online, and as such I need to be able to share session data between the servers, for which it would appear that using the memchached session storage type is the most appropriate. I briefly looked at using a PostgreSQL storage type, but this option was VERY poorly documented, and from what I could find, may well be broken. So I implemented the memcached option.
Now, however, I am running into a problem where, when I try to save certain objects to the session, I get an "AssertionError: Session data for id xxx not set". I'm assuming that this is due to the object size exceeding some arbitrary limit set in the CherryPy session backend or memcached, but I don't really know since the exception doesn't tell me WHY it wasn't set. I have increased the object size limit in memcached to the maximum of 128MB to see if that helped, but it didn't - and that's probably not a safe option anyway.
So what's my solution here? Is there some way I can use the memcached session storage to store arbitrarily large objects? Do I need to "roll my own" DB based or the like solution for these objects? Is the problem potentially NOT size based? Or is there another option I am missing?

I use mysql for handling my cherrypy sessions. As long as the object is serializeable (can be pickled) you can store it as a blob (binary large object) in mysql. Here's the code you would want to use for mysql session storage...
https://bitbucket-assetroot.s3.amazonaws.com/Lawouach/cherrypy/20111008/936/mysqlsession.py?Signature=gDmkOlAduvIZS4WHM2OVgh1WVuU%3D&Expires=1424822438&AWSAccessKeyId=0EMWEFSGA12Z1HF1TZ82
"""
MySQLdb session module for CherryPy by Ken Kinder <http://kenkinder.com/>
Version 0.3, Released June 24, 2000.
Copyright (c) 2008-2009, Ken Kinder
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the Ken Kinder nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""
import MySQLdb
import cPickle as pickle
import cherrypy
import logging
import threading
__version__ = '0.2'
logger = logging.getLogger('Session')
class MySQLSession(cherrypy.lib.sessions.Session):
##
## These can be over-ridden by config file
table_name = 'web_session'
connect_arguments = {}
SCHEMA = """create table if not exists %s (
id varchar(40),
data text,
expiration_time timestamp
) ENGINE=InnoDB;"""
_database = None
def __init__(self, id=None, **kwargs):
logger.debug('Initializing MySQLSession with %r' % kwargs)
for k, v in kwargs.items():
setattr(MySQLSession, k, v)
self.db = self.get_db()
self.cursor = self.db.cursor()
super(MySQLSession, self).__init__(id, **kwargs)
#classmethod
def get_db(cls):
##
## Use thread-local connections
local = threading.local()
if hasattr(local, 'db'):
return local.db
else:
logger.debug("Connecting to %r" % cls.connect_arguments)
db = MySQLdb.connect(**cls.connect_arguments)
cursor = db.cursor()
cursor.execute(cls.SCHEMA % cls.table_name)
db.commit()
local.db = db
return db
def _load(self):
logger.debug('_load %r' % self)
# Select session data from table
self.cursor.execute('select data, expiration_time from %s '
'where id = %%s' % MySQLSession.table_name, (self.id,))
row = self.cursor.fetchone()
if row:
(pickled_data, expiration_time) = row
data = pickle.loads(pickled_data)
return data, expiration_time
else:
return None
def _save(self, expiration_time):
logger.debug('_save %r' % self)
pickled_data = pickle.dumps(self._data)
self.cursor.execute('select count(*) from %s where id = %%s and expiration_time > now()' % MySQLSession.table_name, (self.id,))
(count,) = self.cursor.fetchone()
if count:
self.cursor.execute('update %s set data = %%s, '
'expiration_time = %%s where id = %%s' % MySQLSession.table_name,
(pickled_data, expiration_time, self.id))
else:
self.cursor.execute('insert into %s (data, expiration_time, id) values (%%s, %%s, %%s)' % MySQLSession.table_name,
(pickled_data, expiration_time, self.id))
self.db.commit()
def acquire_lock(self):
logger.debug('acquire_lock %r' % self)
self.locked = True
self.cursor.execute('select id from %s where id = %%s for update' % MySQLSession.table_name,
(self.id,))
self.db.commit()
def release_lock(self):
logger.debug('release_lock %r' % self)
self.locked = False
self.db.commit()
def clean_up(self):
logger.debug('clean_up %r' % self)
self.cursor.execute('delete from %s where expiration_time < now()' % MySQLSession.table_name)
self.db.commit()
def _delete(self):
logger.debug('_delete %r' % self)
self.cursor.execute('delete from %s where id=%%s' % MySQLSession.table_name, (self.id,))
self.db.commit()
def _exists(self):
# Select session data from table
self.cursor.execute('select count(*) from %s '
'where id = %%s and expiration_time > now()' % MySQLSession.table_name, (self.id,))
(count,) = self.cursor.fetchone()
logger.debug('_exists %r (%r)' % (self, bool(count)))
return bool(count)
def __del__(self):
logger.debug('__del__ %r' % self)
self.db.commit()
self.db.close()
self.db = None
def __repr__(self):
return '<MySQLSession %r>' % (self.id,)
cherrypy.lib.sessions.MysqlSession = MySQLSession
then your webapp.py would look something like this...
from mysqlsession import MySQLSession
import cherrypy
import logging
logging.basicConfig(level=logging.DEBUG)
sessionInfo = {
'tools.sessions.on': True,
'tools.sessions.storage_type': "Mysql",
'tools.sessions.connect_arguments': {'db': 'sessions'},
'tools.sessions.table_name': 'session'
}
cherrypy.config.update(sessionInfo)
class HelloWorld:
def index(self):
v = cherrypy.session.get('v', 1)
cherrypy.session['v'] = v+1
return "Hello world! %s" % v
index.exposed = True
cherrypy.quickstart(HelloWorld())
If you need to put some object in there do something like this...
import pickle
pickledThing = pickle.dumps(YourObject.GetItems(), protocol=0, fix_imports=False)
Hope this helps!

Sounds like you want to store a reference to the object stored in Memcache and then pull it back when you need it, rather than relying on the state to handle the loading / saving.

From what you have explained I can conclude that conceptually it isn't a good idea to mix user sessions and a cache. What sessions are mostly designed for is holding state of user identity. Thus it has security measures, locking, to avoid concurrent changes, and other aspects. Also a session storage is usually volatile. Thus if you mean to use sessions as a cache you should understand how sessions really work and the consequences are.
What I suggest you to do it to establish normal caching of your domain model that produces report data and keep session for identity.
CherryPy details
Default CherryPy session implementation locks the session data. In the OLAP case your user likely won't be able to perform concurrent requests (open another tab for instance) until the report is completed. There's however an option of manual locking management.
PostgreSQL session storage is broken and may be removed in next releases.
Memcached session storage doesn't implement distributed locking, so make sure you use consistent rule to balance your user across your servers.

Twisted - Using a Deferred for another sql query

I've got a twisted based network application. Now I would like to implement a new database design but I''m getting stuck with the Deferred object.
I write sessions into my database by using this two functions:
def createSession(self, peerIP, peerPort, hostIP, hostPort):
SessionUUID = uuid.uuid1().hex
yield self.writeSession(SessionUUID, peerIP, hostIP)
sid = self.db.runQuery("SELECT id FROM sessions WHERE sid = %s",
(SessionUUID,))
yield sid
#defer.inlineCallbacks
def writeSession(self, SessionUUID, peerIP, hostIP):
sensorname = self.getSensor() or hostIP
r = yield self.db.runQuery("SELECT id FROM sensors WHERE ip = %s",
(sensorname,))
if r:
id = r[0][0]
else:
yield self.db.runQuery("INSERT INTO sensors (ip) VALUES (%s)",
(sensorname,))
r = yield self.db.runQuery("""SELECT LAST_INSERT_ID()""")
id = int(r[0][0])
self.simpleQuery(
"""
INSERT INTO sessions (sid, starttime, sensor, ip)
VALUES (%s, FROM_UNIXTIME(%s), %s, %s)
""",
(SessionUUID, self.nowUnix(), id, peerIP))
In short words:
createSession creates an UUID for the session and calls writeSession to write this into my db. After this is written I try to select the ID of the last insert by using the UUID in the where statement and return the result.
Now my problem. To update the session information I call this function:
def handleConnectionLost(self, sid, args):
self.simpleQuery("UPDATE sessions SET endtime = now() WHERE sid = %s",
(sid))
As you can see I try to use the sid from createSession which is an Deferred object and not an integer. If I got this right and I add a Callback to handleConnectionLost it will run the query at this time so that I can use the value here. But this is not my only function where I need the sid. So it would be an overhead when I do every time a callback when I need the sid.
Is there a way that I can give my sid as an integer to my functions? So that I'm just running the query one time? How does it have to look like?
When I'm using a Deferred query with a now() statement. Will it use now() when I added this query to my Callbacks or will it use now() when the query is fired?

You can immediately get the ID after inserting a new row for later use, similar question was answered here: The equivalent of SQLServer function SCOPE_IDENTITY() in mySQL?
it will use now() when the query is fired

Trying to use Tweepy/Twitters Streaming API and psycopg2 to populate a PostgreSQL database. Very close, one line off

I've been working on trying to populate a table in a PostreSQL database using Tweepy and Twitter's Streaming API. I'm extremely close, I believe I'm just one line away from getting it. I've looked at many examples including:
http://andrewbrobinson.com/2011/07/15/using-tweepy-to-access-the-twitter-stream/
http://blog.creapptives.com/post/14062057061/the-key-value-store-everyone-ignored-postgresql
Python tweepy writing to sqlite3 db
tweepy stream to sqlite database - invalid synatx
Using tweepy to access Twitter's Streaming API
etc, etc
Im at the point where I can stream tweets quite easily using Tweepy, so I know my consumer key, consumer secret, access key and access secret are correct. I also have Postgres set up, and am successfully connecting to the database I created. I tested hard coded values into the table in my database using psycopg2 from a .py file, and that is also working. I am getting tweets streamed in based on keywords I select, and am successfully connected to a table in a database. Now I just need the tweets to stream into the table in my postgres database. Like I said, I am so close and any help would be so greatly appreciated.
This stripped down script inserts data into my desired table:
import psycopg2
try:
conn = psycopg2.connect("dbname=teststreamtweets user=postgres password=x host=localhost")
print "connected"
except:
print "unable to connect"
namedict = (
{"first_name":"Joshua", "last_name":"Drake"},
{"first_name":"Steven", "last_name":"Foo"},
{"first_name":"David", "last_name":"Bar"}
)
cur = conn.cursor()
cur.executemany("""INSERT INTO testdata(first_name, last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict);
conn.commit()
Below is the script I have been editing for a while now trying to get it to work:
import psycopg2
import time
import json
from getpass import getpass
import tweepy
consumer_key = 'x'
consumer_secret = 'x'
access_key = 'x'
access_secret = 'x'
connection = psycopg2.connect("dbname=teststreamtweets user=postgres password=x host=localhost")
cursor = connection.cursor()
#always use this step to begin clean
def reset_cursor():
cursor = connection.cursor()
class StreamWatcherListener(tweepy.StreamListener):
def on_data(self, data):
try:
print 'before cursor' + data
connection = psycopg2.connect("dbname=teststreamtweets user=postgres password=x host=localhost")
cur = connection.cursor()
print 'status is: ' + str(connection.status)
#cur.execute("INSERT INTO tweet_list VALUES (%s)" % (data.text))
cur.executemany("""INSERT INTO tweets(tweet) VALUES (%(text)s)""", data);
connection.commit()
print '---------'
print type(data)
#print data
except Exception as e:
connection.rollback()
reset_cursor()
print "not saving"
return
if cursor.lastrowid == None:
print "Unable to save"
def on_error(self, status_code):
print 'Error code = %s' % status_code
return True
def on_timeout(self):
print 'timed out.....'
print 'welcome'
auth1 = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth1.set_access_token(access_key, access_secret)
api = tweepy.API(auth1)
l = StreamWatcherListener()
print 'about to stream'
stream = tweepy.Stream(auth = auth1, listener = l)
setTerms = ['microsoft']
#stream.sample()
stream.filter(track = setTerms)
Sorry if it's a bit messy of code, but have been trying many options. Like I said any suggestions, links to helpful examples, etc would be greatly appreciated as I've tried everything I can think of and am now resorting to a long walk. Thanks a ton.

Well, I'm not sure why you are using classes for this, and then why you don't have __init__ defined in your class. Seems complicated.
Here is a basic version of the functions I use to do this stuff. I've only ever used sqlite for it, but the syntax looks basically the same. Maybe you can get something from this.
def retrieve_tweets(numtweets=10, *args):
"""
This function optionally takes one or more arguments as keywords to filter tweets.
It iterates through tweets from the stream that meet the given criteria and sends them
to the database population function on a per-instance basis, so as to avoid disaster
if the stream is disconnected.
Both SampleStream and FilterStream methods access Twitter's stream of status elements.
"""
filters = []
for key in args:
filters.append(str(key))
if len(filters) == 0:
stream = tweetstream.SampleStream(username, password)
else:
stream = tweetstream.FilterStream(username, password, track=filters)
try:
count = 0
while count < numtweets:
for tweet in stream:
# a check is needed on text as some "tweets" are actually just API operations
# the language selection doesn't really work but it's better than nothing(?)
if tweet.get('text') and tweet['user']['lang'] == 'en':
if tweet['retweet_count'] == 0:
# bundle up the features I want and send them to the db population function
bundle = (tweet['id'], tweet['user']['screen_name'], tweet['retweet_count'], tweet['text'])
db_initpop(bundle)
break
else:
# a RT has a different structure. This bundles the original tweet. Getting the
# retweets comes later, after the stream is de-accessed.
bundle = (tweet['retweeted_status']['id'], tweet['retweeted_status']['user']['screen_name'], \
tweet['retweet_count'], tweet['retweeted_status']['text'])
db_initpop(bundle)
break
count += 1
except tweetstream.ConnectionError, e:
print 'Disconnected from Twitter at '+time.strftime("%d %b %Y %H:%M:%S", time.localtime()) \
+'. Reason: ', e.reason
def db_initpop(bundle):
"""
This function places basic tweet features in the database. Note the placeholder values:
these can act as a check to verify that no further expansion was available for that method.
"""
#unpack the bundle
tweet_id, user_sn, retweet_count, tweet_text = bundle
curs.execute("""INSERT INTO tblTweets VALUES (null,?,?,?,?,?,?)""", \
(tweet_id, user_sn, retweet_count, tweet_text, 'cleaned text', 'cleaned retweet text'))
conn.commit()
print 'Database populated with tweet '+str(tweet_id)+' at '+time.strftime("%d %b %Y %H:%M:%S", time.localtime())
Good luck!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.