I want to ask that if I can update the parameter in SQL query using python. I want to read a SQL query first, and process the SQL outputted data using python. However, I need to specify the data that I want to filter in SQL query, and I wonder if there is any way that I can update the parameter in python instead of updating the SQL query.
The SQL query is like the following:
set nocount on;
declare #pdate datetime
set #pdate = '2022-12-31'
select
cast(L.Date as datetime) as Date
, Amount
, AccountNumber
, Property
, County
, ZipCode
, Price
, Owner
from Account.Detail L
inner join Owner.Detail M
on L.Date = M.Date
and L.Number = M.Number
inner join Purchase.Detail P
on L.Date = P.Date
and L.Purchase.Number = P.Purchase.Number
where L.Date = #pdate
and Purchase.Number not in ('CL1', 'CL2')
and Amount > 0
And I want to run the python code like following:
import pyodbc
server = 'my_server_name'
database = 'my_database_name'
connection = pyodbc.connect(Trusted_Connection = "yes", DRIVER = "{SQL Server}", SERVER = server, DATABASE = database)
cursor = connection.cursor()
query = open('Pathway_for_SQL_Query.sql').read()
data = pd.read_sql(query, connection)
connection.close()
I need to declare the #pdate in SQL query every time, I want to ask if I can update the #pdate using Python?
Instead of parsing and replacing an SQL script, you could use bind variables and have Python control the value (note the "?" in the query):
pdate = "some value"
# query could be read from file, given here for simplicity
query = """
select
cast(L.Date as datetime) as Date
, Amount
, AccountNumber
, Property
, County
, ZipCode
, Price
, Owner
from Account.Detail L
inner join Owner.Detail M
on L.Date = M.Date
and L.Number = M.Number
inner join Purchase.Detail P
on L.Date = P.Date
and L.Purchase.Number = P.Purchase.Number
where L.Date = ?
and Purchase.Number not in ('CL1', 'CL2')
and Amount > 0
"""
data = pd.read_sql(query, connection, params=(pdate,))
Related
It's not straight forward to find information on this so wondering if there are some docs I can look at but basically I want to achieve passing multiple conditions to either .where() or .order_by() that is safe from SQL injection.
Here's how I am currently doing this: Two tables: Archive and Backup, and I am trying to filter by archive.city, archive.zip, and backup.serial and then I am ordering by all of those fields. The values are coming from the user via URL parameters so I need to make sure these are safe from SQL injection and sanitized.
filters = []
sorts = []
if 'city' in query:
city = query['city']
filters.append(text(f'archive.city = {city}'))
sorts.append(text(f'archive.city = {city}'))
if 'zip' in query:
zip = query['zip']
filters.append(text(f'archive.zip > {zip}'))
sorts.append(text(f'archive.zip DESC'))
if 'serial' in query:
serial = query['serial']
filters.append(text(f'backup.serial IN {serial}'))
sorts.append(text(f'backup.serial ASC'))
with Session(engine) as session:
results = session.exec(select(Archive, Backup)
.join(Backup)
.where(and_(*filters))
.order_by(*sorts).all()
as I understand the text() is not safe from sql injection, so how do I transform this so that it does what I want and is safe from sql injection?
You can invoke .where() and .order_by() on a select() multiple times and SQLAlchemy will logically "and" them for you:
qry = select(Task)
qry = qry.where(Task.description == "foo")
qry = qry.where(Task.priority < 2)
qry = qry.order_by(Task.priority)
qry = qry.order_by(Task.description)
print(qry)
"""
SELECT task.id, task.description, task.priority
FROM task
WHERE task.description = :description_1 AND task.priority < :priority_1
ORDER BY task.priority, task.description
"""
How can I use a single variable to binding in more than one filter in Django SQL?
What I want:
I defined the variable: date
I want to use the same information in the SQL WHERE Clausures: CUR_DATE, ACHIEVED_DATE
def kpiMonitoring(self):
with connections['db'].cursor()as cursor:
import time
date = time.strftime('%Y%m%d')
cursor.execute("""
SELECT * FROM TABLE1
WHERE CUR_DATE = date
AND ACHIEVED_DATE = date
""",[date])
row = dictfetchall(cursor)
cursor.close()
return row
I can do it by this way, but this solution is not scalable:
def kpiMonitoring(self):
with connections['db'].cursor()as cursor:
import time
date = time.strftime('%Y%m%d')
date2 = date
cursor.execute("""
SELECT * FROM TABLE1
WHERE CUR_DATE = %s
AND ACHIEVED_DATE = %s
""",[date, date2])
row = dictfetchall(cursor)
cursor.close()
return row
Is there another way to do it?
You can perform such query with a named parameter:
cursor.execute(
'''SELECT * FROM TABLE1
WHERE CUR_DATE = %(date)s
AND ACHIEVED_DATE = %(date)s''',
{'date': date }
)
another solution is to simply check if the CUR_DATE is the same as the ACHIEVED_DATE and thus use the parameter only once:
cursor.execute(
'''SELECT * FROM TABLE1
WHERE CUR_DATE = %s
AND ACHIEVED_DATE = CUR_DATE''',
[date]
)
but regardless, using raw queries is not a good idea. It often, as you found out yourself, does not scale very well (making use of LEFT OUTER JOINs is usually shorter with the Django ORM), and usually it is less error prone, and less sensitive for database migrations.
I have a permanent table in bigquery that I want to append to with data coming from a csv in google cloud storage. I first read the csv file into a big query temp table:
table_id = "incremental_custs"
external_config = bigquery.ExternalConfig("CSV")
external_config.source_uris = [
"gs://location/to/csv/customers_5083983446185_test.csv"
]
external_config.schema=schema
external_config.options.skip_leading_rows = 1
job_config = bigquery.QueryJobConfig(table_definitions={table_id: external_config})
sql_test = "SELECT * FROM `{table_id}`;".format(table_id=table_id)
query_job = bq_client.query(sql_test,job_config=job_config)
customer_updates = query_job.result()
print(customer_updates.total_rows)
Up until here all works and I retrieve the records from the tmp table. Issue arises when I try to then combine it with a permanent table:
sql = """
create table `{project_id}.{dataset}.{table_new}` as (
select customer_id, email, accepts_marketing, first_name, last_name,phone,updated_at,orders_count,state,
total_spent,last_order_name,tags,ll_email,points_approved,points_spent,guest,enrolled_at,ll_updated_at,referral_id,
referred_by,referral_url,loyalty_tier_membership,insights_segment,rewards_claimed
from (
select * from `{project_id}.{dataset}.{old_table}`
union all
select * from `{table_id}`
ORDER BY customer_id, orders_count DESC
))
order by orders_count desc
""".format(project_id=project_id, dataset=dataset_id, table_new=table_new, old_table=old_table, table_id=table_id)
query_job = bq_client.query(sql)
query_result = query_job.result()
I get the following error:
BadRequest: 400 Table name "incremental_custs" missing dataset while no default dataset is set in the request.
Am I missing something here? Thanks !
Arf, you forgot the external config! You don't pass it in your second script
query_job = bq_client.query(sql)
Simply update it like in the first one
query_job = bq_client.query(sql_test,job_config=job_config)
A fresh look is always easier!
Goal
I am aiming to insert database records into MySQL using Python. But with an extra detail, I'll explain as I go along..
This is my current script (Fully functional & working):
#Get data from SQL
sqlCursor = mjmConnection.cursor()
sqlCursor.execute("SELECT sol.id, p.id, p.code,p.description, p.searchRef1, so.number, c.code, c.name, sol.requiredQty \
FROM salesorderline sol JOIN \
salesorder so \
ON sol.salesorderid = so.id JOIN \
product p \
ON sol.productid = p.id JOIN \
customer c \
ON so.customerid = c.id \
WHERE so.orderdate > DATEADD(dd,-35,CAST(GETDATE() AS date));")
#Send recieved data from SQL query from above to MySQL database
print("Sending MJM records to MySQL Database")
mjmCursorMysql = productionConnection.cursor()
for x in sqlCursor.fetchall():
a,b,c,d,e,f,g,h,i = x
mjmCursorMysql.execute("INSERT ignore INTO mjm_python (id, product_id, product_code, product_description, product_weight, \
salesorder_number, customer_code, customer_name, requiredQty) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s);", (a,b,c,d,e,f,g,h,i))
productionConnection.commit()
mjmCursorMysql.close()
sqlCursor.close()
What it does
The above script does the following:
Gets data from SQL Server
Inserts that data into MySQL
I have specifically used IGNORE in the MySQL query, to prevent duplicate id numbers.
Data will look like this:
Next..
Now - i'd like to add a column name sales_id_increment. This will start from 1 and increment for each same salesorder_number and reset back to 1 when there is a different salesorder_number. So I am wanting it to look something like this:
Question
How do I achieve this? Where do I need to look, in my Python script or the MySQL query?
You can get this column when you select the rows from SQL Server with window functions ROW_NUMBER() or DENSE_RANK() (if there are duplicate ids):
SELECT sol.id, p.id, p.code,p.description, p.searchRef1, so.number, c.code, c.name, sol.requiredQty,
ROW_NUMBER() OVER (PARTITION BY so.number ORDER BY sol.id) sales_id_increment
FROM salesorderline sol
JOIN salesorder so ON sol.salesorderid = so.id
JOIN product p ON sol.productid = p.id
JOIN customer c ON so.customerid = c.id
WHERE so.orderdate > DATEADD(dd,-35,CAST(GETDATE() AS date));
In trying to replicate a MySQL query in SQL Alchemy, I've hit a snag in specifying which tables to select from.
The query that works is
SELECT c.*
FROM attacks AS a INNER JOIN hosts h ON a.host_id = h.id
INNER JOIN cities c ON h.city_id = c.id
GROUP BY c.id;
I try to accomplish this in SQLAlchemy using the following function
def all_cities():
session = connection.globe.get_session()
destination_city = aliased(City, name='destination_city')
query = session.query(City). \
select_from(Attack).\
join((Host, Attack.host_id == Host.id)).\
join((destination_city, Host.city_id == destination_city.id)).\
group_by(destination_city.id)
print query
results = [result.serialize() for result in query]
session.close()
file(os.path.join(os.path.dirname(__file__), "servers.geojson"), 'a').write(geojson.feature_collection(results))
When printing the query, I end up with ALMOST the right query
SELECT
cities.id AS cities_id,
cities.country_id AS cities_country_id,
cities.province AS cities_province,
cities.latitude AS cities_latitude,
cities.longitude AS cities_longitude,
cities.name AS cities_name
FROM cities, attacks
INNER JOIN hosts ON attacks.host_id = hosts.id
INNER JOIN cities AS destination_city ON hosts.city_id = destination_city.id
GROUP BY destination_city.id
However, you will note that it is selecting from cities, attacks...
How can I get it to select only from the attacks table?
The line here :
query = session.query(City)
is querying the City table also that's why you are getting the query as
FROM cities, attacks