Having trouble submitting a query that has > and < signs to Bigquery API - python

Our environment uses Python 2.7 along with BigQuery library 0.27.0.
The query that is to be submitted is part of a JSON string which is loaded by
json.loads(json_blob)
then the value for query is extracted from a key:
query_str = json_blob["sql_command"]
Printing the query_str gives the following value:
('query_str: ', '" select distinct id from my_table where step_count > 3 and lower(name) = \'test\') "')
When the script submits the query for execution as following:
job = self.bq_client.run_async_query(job_id, query_str, udf_resources=udf_obj, query_parameters=query_params)
BigQuery job comes back with an error, and when I lookup the job information using the job_id on https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get I see that the query that is actually executed is as following:
select distinct id from my_table where step_count \u003e 3 and lower(name) = 'test')
I have read on encoding/decoding and tried them, and doesn't make a diference.
Is there a way that I can convert that query_str retrieved from the json_blob (query_str = json_blob["sql_command"]) to a true string? We know that when we define such query as a string (hard-coded in the script rather than retrieved from a key in a JSON blob) the query gets executed successfully, ex.
query_str = """select distinct id from my_table where step_count > 3 and lower(name) = 'test')"""
Any suggestions is greatly appreciated.

I'm not able to reproduce this using Python 2.7:
$ python
>>> from google.cloud import bigquery
>>> import json
>>> client = bigquery.Client('<project name redacted>')
>>> json_blob = json.loads('{"sql_command":"select distinct id from my_table where step_count > 3 and lower(name) = \'test\'"}')
>>> query_str = json_blob["sql_command"]
>>> query_job = client.query(query_str)
>>> rows = query_job.result()
I get an error that my_table can't be resolved (as expected), but no syntax error. Syntax validation happens prior to table resolution. Checking the job information, I see:
$ bq --format=prettyjson show -j <job id>
{
"configuration": {
"query": {
"priority": "INTERACTIVE",
"query": "select distinct id from my_table where step_count > 3 and lower(name) = 'test'",
"useLegacySql": false
}
},
so there was no problem with passing the query to BigQuery. Some suggestions:
Check the code points in the string; maybe it doesn't have the content that you think it does:
print [ord(c) for c in query_str]
The code point for the greater than sign is 62, for example, so you should see it in the output.
Pick a different serialization format rather than JSON and see if that affects the result. Maybe something else in your process is performing escaping without you realizing it, and you can identify the source of the problem by using e.g. protocol buffers instead and seeing if that makes a difference.

Related

Problem querying AWS Athena from Lambda introducing a variable

I need help on a little problem that I have with my AWS Lambda function. This function queries my AWS Athena database.
The code looks like this :
import json
import boto3
import time
def lambda_handler(event, context):
client = boto3.client('athena')
QueryResponse = client.start_query_execution(
QueryString = "MY QUERY;",
QueryExecutionContext = {
'Database' : 'myDatabase'
},
ResultConfiguration = {
'OutputLocation' : 's3://mys3Bucket'
}
)
#Oberserve results :
queryId = QueryResponse['QueryExecutionId']
The code works great, but I am having some troubles with the "WHERE" part of my sql query (that is a long one)
Here is the part of my Query :
WHERE x.id_date > cast(date_format(date_trunc('day', current_timestamp -
interval '3' day), '%Y%m%d') as integer)
and x.id_date <= cast(date_format(current_timestamp, '%Y%m%d') as integer)
and c.label = 'NAME'
My query is written on a single line to fit the Python code replacing "MY QUERY".
Le problem is :
I need to replace the 'NAME' part by a variable (string) that will be given to my Lambda. I tried to use %s to replace by the given variable, but as there is '%Y%m%d' in my query, the code is waiting for string to replace these part too, but it is just made to format the date as I want to. I tried to replace NAME by a string and it works perfectly so I know my query is not the problem. I tried to put 'c.label = '%s' in first to see if it the % method would simply replace the first %s and let the other ones do their job but it didn't work.
So my question is : How can I replace 'NAME' by a str variable ?can I do this keeping my query on a single line ? (if yes, how ?) or at least how can I divide my query in different lines I could interact with ?
Thanks for your help.
As said in comment, the solution was to use :
MyString = 'my string to replace in query'
QueryString = f"SELECT * FROM {MyString};"

Compile query from raw string (without using .text(...)) using Sqlalchemy connection and Postgres

I am using Sqlalchemy 1.3 to connect to a PostgreSQL 9.6 database (through Psycopg).
I have a very, very raw Sql string formatted using Psycopg2 syntax which I can not modify because of some legacy issues:
statement_str = SELECT * FROM users WHERE user_id=%(user_id)s
Notice the %(user_id)s
I can happily execute that using a sqlalchemy connection just by doing:
connection = sqlalch_engine.connect()
rows = conn.execute(statement_str, user_id=self.user_id)
And it works fine. I get my user and all is nice and good.
Now, for debugging purposes I'd like to get the actual query with the %(user_id)s argument expanded to the actual value. For instance: If user_id = "foo", then get SELECT * FROM users WHERE user_id = 'foo'
I've seen tons of examples using sqlalchemy.text(...) to produce a statement and then get a compiled version. I have that thanks to other answers like this one or this one been able to produce a decent str when I have an SqlAlchemy query.
However, in this particular case, since I'm using a more cursor-specific syntax %(user_id) I can't do that. If I try:
text(statement_str).bindparams(user_id="foo")
I get:
This text() construct doesn't define a bound parameter named 'user_id'
So I guess what I'm looking for would be something like
conn.compile(statement_str, user_id=self.user_id)
But I haven't been able to get that.
Not sure if this what you want but here goes.
Assuming statement_str is actually a string:
import sqlalchemy as sa
statement_str = "SELECT * FROM users WHERE user_id=%(user_id)s"
params = {'user_id': 'foo'}
query_text = sa.text(statement_str % params)
# str(query_text) should print "select * from users where user_id=foo"
Ok I think I got it.
The combination of SqlAlchemy's raw_connection + Psycopg's mogrify seems to be the answer.
conn = sqlalch_engine.raw_connection()
try:
cursor = conn.cursor()
s_str = cursor.mogrify(statement_str, {'user_id': self.user_id})
s_str = s_str.decode("utf-8") # mogrify returns bytes
# Some cleanup for niceness:
s_str = s_str.replace('\n', ' ')
s_str = re.sub(r'\s{2,}', ' ', s_str)
finally:
conn.close()
I hope someone else finds this helpful

executing a raw sql query from sqlalchemy on postgresql

I have a raw sql query which is:
select distinct(user_id) from details_table where event_id in (29,10) and user_id in (7,11,24,45) and epoch_timestamp >= 1433116800 and epoch_timestamp <= 1506816000;
which in psql returns:
user_id
---------
7
24
(2 rows)
Now when i run this raw sql query via sqlalchemy I'm getting a sqlalchemy.engine.result.ResultProxy object in response and not the result as above. The code i'm using right now is as follows:
from flask import current_app
sql_query = text(select distinct(user_id) from details_table where event_id in (29,10) and user_id in (7,24) and epoch_timestamp >= 1433116800 and epoch_timestamp <= 1506816000;)
filtering_users = db.get_engine(current_app, bind='<my_binding>')\
.execute(sql_query)
print(type(filtering_users))
# <class 'sqlalchemy.engine.result.ResultProxy'>
print(filtering_users)
# <sqlalchemy.engine.result.ResultProxy object at 0x7fde74469550>
I used the reference from here but unlike the solution there I'm getting a ResultProxy object.
What am I doing wrong here? My end goal is to get the list of users returned from executing the raw sql-query, stored into a list.
As explained is the SQLAlchemy documentation, the .execute() method returns only a proxy on which you'll have to iterate (or apply any aggregation method) to view the actual result of the query. Apparently, in your case, what you want is the .fetchall() method.
If you try something like this:
from sqlalchemy import create_engine
engine = create_engine('/path/to/your/db...')
connection = engine.connect()
my_query = 'SELECT * FROM my_table'
results = connection.execute(my_query).fetchall()
the results variable will be a list of all the items that the query fetches.
Hope this helps!

AWS DynamoDB Python - boto3 Key() methods not recognized (Query)

I am using Lambda (Python) to query my DynamoDB database. I am using the boto3 library, and I was able to make an "equivalent" query:
This script works:
import boto3
from boto3.dynamodb.conditions import Key, Attr
import json
def create_list(event, context):
resource = boto3.resource('dynamodb')
table = resource.Table('Table_Name')
response = table.query(
TableName='Table_Name',
IndexName='Custom-Index-Name',
KeyConditionExpression=Key('Number_Attribute').eq(0)
)
return response
However, when I change the query expression to this:
KeyConditionExpression=Key('Number_Attribute').gt(0)
I get the error:
"errorType": "ClientError",
"errorMessage": "An error occurred (ValidationException) when calling the Query operation: Query key condition not supported"
According to this [1] resource, "gt" is a method of Key(). Does anyone know if this library has been updated, or what other methods are available other than "eq"?
[1] http://boto3.readthedocs.io/en/latest/reference/customizations/dynamodb.html#ref-dynamodb-conditions
---------EDIT----------
I also just tried the old method using:
response = client.query(
TableName = 'Table_Name',
IndexName='Custom_Index',
KeyConditions = {
'Custom_Number_Attribute':{
'ComparisonOperator':'EQ',
'AttributeValueList': [{'N': '0'}]
}
}
)
This worked, but when I try:
response = client.query(
TableName = 'Table_Name',
IndexName='Custom_Index',
KeyConditions = {
'Custom_Number_Attribute':{
'ComparisonOperator':'GT',
'AttributeValueList': [{'N': '0'}]
}
}
)
...it does not work.
Why would EQ be the only method working in these cases? I'm not sure what I'm missing in the documentation.
From what I think:
Your Partition Key is Number_Attribute, and so you cannot do a gt when doing a query (you can do an eq and that is it.)
You can do a gt or between for your Sort Key when doing a query. It is also called Range key, and because it "smartly" puts the items next to each other, it offers the possibility of doing gt and between efficiently in a query
Now, if you want to do a between to your partition Key, then you will have to use scan like the below:
Key('Number_Attribute').gt(0)
response = table.scan(
FilterExpression=fe
)
Keep in mind of the following concerning scan:
The scan method reads every item in the entire table, and returns all of the data in the table. You can provide an optional filter_expression, so that only the items matching your criteria are returned. However, note that the filter is only applied after the entire table has been scanned.
So in other words, it's a bit of a costly operation comparing to query. You can see an example in the documentation here.
Hope that helps!

How to execute raw SQL in Flask-SQLAlchemy app

How do you execute raw SQL in SQLAlchemy?
I have a python web app that runs on flask and interfaces to the database through SQLAlchemy.
I need a way to run the raw SQL. The query involves multiple table joins along with Inline views.
I've tried:
connection = db.session.connection()
connection.execute( <sql here> )
But I keep getting gateway errors.
Have you tried:
result = db.engine.execute("<sql here>")
or:
from sqlalchemy import text
sql = text('select name from penguins')
result = db.engine.execute(sql)
names = [row[0] for row in result]
print names
Note that db.engine.execute() is "connectionless", which is deprecated in SQLAlchemy 2.0.
SQL Alchemy session objects have their own execute method:
result = db.session.execute('SELECT * FROM my_table WHERE my_column = :val', {'val': 5})
All your application queries should be going through a session object, whether they're raw SQL or not. This ensures that the queries are properly managed by a transaction, which allows multiple queries in the same request to be committed or rolled back as a single unit. Going outside the transaction using the engine or the connection puts you at much greater risk of subtle, possibly hard to detect bugs that can leave you with corrupted data. Each request should be associated with only one transaction, and using db.session will ensure this is the case for your application.
Also take note that execute is designed for parameterized queries. Use parameters, like :val in the example, for any inputs to the query to protect yourself from SQL injection attacks. You can provide the value for these parameters by passing a dict as the second argument, where each key is the name of the parameter as it appears in the query. The exact syntax of the parameter itself may be different depending on your database, but all of the major relational databases support them in some form.
Assuming it's a SELECT query, this will return an iterable of RowProxy objects.
You can access individual columns with a variety of techniques:
for r in result:
print(r[0]) # Access by positional index
print(r['my_column']) # Access by column name as a string
r_dict = dict(r.items()) # convert to dict keyed by column names
Personally, I prefer to convert the results into namedtuples:
from collections import namedtuple
Record = namedtuple('Record', result.keys())
records = [Record(*r) for r in result.fetchall()]
for r in records:
print(r.my_column)
print(r)
If you're not using the Flask-SQLAlchemy extension, you can still easily use a session:
import sqlalchemy
from sqlalchemy.orm import sessionmaker, scoped_session
engine = sqlalchemy.create_engine('my connection string')
Session = scoped_session(sessionmaker(bind=engine))
s = Session()
result = s.execute('SELECT * FROM my_table WHERE my_column = :val', {'val': 5})
docs: SQL Expression Language Tutorial - Using Text
example:
from sqlalchemy.sql import text
connection = engine.connect()
# recommended
cmd = 'select * from Employees where EmployeeGroup = :group'
employeeGroup = 'Staff'
employees = connection.execute(text(cmd), group = employeeGroup)
# or - wee more difficult to interpret the command
employeeGroup = 'Staff'
employees = connection.execute(
text('select * from Employees where EmployeeGroup = :group'),
group = employeeGroup)
# or - notice the requirement to quote 'Staff'
employees = connection.execute(
text("select * from Employees where EmployeeGroup = 'Staff'"))
for employee in employees: logger.debug(employee)
# output
(0, 'Tim', 'Gurra', 'Staff', '991-509-9284')
(1, 'Jim', 'Carey', 'Staff', '832-252-1910')
(2, 'Lee', 'Asher', 'Staff', '897-747-1564')
(3, 'Ben', 'Hayes', 'Staff', '584-255-2631')
You can get the results of SELECT SQL queries using from_statement() and text() as shown here. You don't have to deal with tuples this way. As an example for a class User having the table name users you can try,
from sqlalchemy.sql import text
user = session.query(User).from_statement(
text("""SELECT * FROM users where name=:name""")
).params(name="ed").all()
return user
For SQLAlchemy ≥ 1.4
Starting in SQLAlchemy 1.4, connectionless or implicit execution has been deprecated, i.e.
db.engine.execute(...) # DEPRECATED
as well as bare strings as queries.
The new API requires an explicit connection, e.g.
from sqlalchemy import text
with db.engine.connect() as connection:
result = connection.execute(text("SELECT * FROM ..."))
for row in result:
# ...
Similarly, it’s encouraged to use an existing Session if one is available:
result = session.execute(sqlalchemy.text("SELECT * FROM ..."))
or using parameters:
session.execute(sqlalchemy.text("SELECT * FROM a_table WHERE a_column = :val"),
{'val': 5})
See "Connectionless Execution, Implicit Execution" in the documentation for more details.
result = db.engine.execute(text("<sql here>"))
executes the <sql here> but doesn't commit it unless you're on autocommit mode. So, inserts and updates wouldn't reflect in the database.
To commit after the changes, do
result = db.engine.execute(text("<sql here>").execution_options(autocommit=True))
This is a simplified answer of how to run SQL query from Flask Shell
First, map your module (if your module/app is manage.py in the principal folder and you are in a UNIX Operating system), run:
export FLASK_APP=manage
Run Flask shell
flask shell
Import what we need::
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy(app)
from sqlalchemy import text
Run your query:
result = db.engine.execute(text("<sql here>").execution_options(autocommit=True))
This use the currently database connection which has the application.
Flask-SQLAlchemy v: 3.0.x / SQLAlchemy v: 1.4
users = db.session.execute(db.select(User).order_by(User.title.desc()).limit(150)).scalars()
So basically for the latest stable version of the flask-sqlalchemy specifically the documentation suggests using the session.execute() method in conjunction with the db.select(Object).
Have you tried using connection.execute(text( <sql here> ), <bind params here> ) and bind parameters as described in the docs? This can help solve many parameter formatting and performance problems. Maybe the gateway error is a timeout? Bind parameters tend to make complex queries execute substantially faster.
If you want to avoid tuples, another way is by calling the first, one or all methods:
query = db.engine.execute("SELECT * FROM blogs "
"WHERE id = 1 ")
assert query.first().name == "Welcome to my blog"

Categories

Resources