I want to look at data for the past 7 days, so have generated the values:
from_date = str(date.today() - timedelta(7))
to_date = str(date.today()
This is so I can do a query using PyMySql such as:
data = """SELECT * FROM table WHERE date > 'from_date' AND date < 'to_date'"""
db_conn = pymysql.connect(host=xxx, user=xxx, password=xxx)
df = pd.read_sql(data, con=db_conn)
This doesn't work, and I've tried different quotation marks around from_date and to_date to try and do this. What is the best way to refer to this in PyMySql?
You need to pass variable values into the query through the query parameters:
data = """SELECT * FROM table WHERE date > %s AND date < %s"""
df = pd.read_sql(data, con=db_conn, params=[from_date, to_date])
Here %s in the query are positional placeholders.
You might also need to format the dates:
df = pd.read_sql(data, con=db_conn,
params=[from_date.strftime('%Y-%m-%d'),
to_date.strftime('%Y-%m-%d')])
Related
I'm trying to query a database using Python/Pandas. This will be a recurring request where I'd like to look back into a window of time that changes over time, so I'd like to use some smarts in how I do this.
In my SQLite query, if I say
WHERE table.date BETWEEN DATETIME('now', '-6 month') AND DATETIME('now')
I get the result I expect. But if I try to move those to variables, the resulting table comes up empty. I found out that the endDate variable does work but the startDate does not. Presumably I'm doing something wrong with the escapes around the apostrophes? Since the result is coming up empty it's like it's looking at DATETIME(\'now\') and not seeing the '-6 month' bit (comparing now vs. now which would be empty). Any ideas how I can pass this through to the query correctly using Python?
startDate = 'DATETIME(\'now\', \'-6 month\')'
endDate = 'DATETIME(\'now\')'
query = '''
SELECT some stuff
FROM table
WHERE table.date BETWEEN ? AND ?
'''
df = pd.read_sql_query(query, db, params=[startDate, endDate])
You can try with the string format as shown below,
startDate = "DATETIME('now', '-6 month')"
endDate = "DATETIME('now')"
query = '''
SELECT some stuff
FROM table
WHERE table.date BETWEEN {start_date} AND {end_data}
'''
df = pd.read_sql_query(query.format(start_date=startDate, end_data=endDate), db)
When you provide parameters to a query, they're treated as literals, not expressions that SQL should evaluate.
You can pass the function arguments rather than the function as a string.
startDate = 'now'
startOffset = '-6 month'
endDate = 'now'
endOffset = '+0 seconds'
query = '''
SELECT some stuff
FROM table
WHERE table.date BETWEEN DATETIME(?, ?) AND DATETIME(?, ?)
'''
df = pd.read_sql_query(query, db, params=[startDate, startOffset, endDate, endOffset])
I have a table in MySQL which stores dates. I also have a datetime variable called x. I want to get the closest date (from the table) to the x variable
I've been trying to do something like:
get_closest_date = []
query = "SELECT date_ FROM set_payment7777"
mycursor.execute(query)
for row in mycursor:
get_closest_date.append(row)
x = datetime(int(to_year_drop.get()), int(to_month_drop.get()), int(to_day_drop.get()))
cloz_dict = {abs(x).timestamp() - date.timestamp() : date for date in get_closest_date}
res = cloz_dict[min(cloz_dict.keys())]
print(res)
but it doesn't seem to work.
any possible solution?
I would use a LIMIT query here:
query = """SELECT date_
FROM set_payment7777
ORDER BY ABS(DATEDIFF(date_, ?))
LIMIT 1"""
x = datetime(int(to_year_drop.get()), int(to_month_drop.get()), int(to_day_drop.get()))
mycursor.execute(query, (x,))
date_closest = mycursor.fetchone()[0]
The trick here is to basically sort your table based on how close each date_ value is to the input parameter provided. Then, we just examine the single date closest.
I have the following query in which i'm trying to pass start dates and end dates in a sql query.
def get_data(start_date,end_date):
ic = Connector()
q = f"""
select * from example_table a
where a.date between {start_date} and {end_date}
"""
result = ic.query(q)
return result
df = pd.DataFrame(get_data('2021-01-01','2021-01-31'))
print(df)
which leads to the following error:
AnalysisException: Incompatible return types 'STRING' and 'BIGINT' of exprs 'a.date' and '2021 - 1 - 1'.\n (110) (SQLExecDirectW)")
I have also tried to parse the dates as follows:
import datetime
start_date = datetime.date(2021,1,1)
end_date = datetime.date(2021,5,13)
df = pd.DataFrame(get_data(start_date,end_date))
but i still get the same error.
Any help will be much appreciated.
It seems to me, that it is because how you inject your values into sql query they don't get recognized as date values. Database will likely 2021-01-01 interpret as mathematical expression with 2019 being the result.
You should try put parentheses around your values.
q = f"""
select * from example_table a
where a.date between '{start_date}' and '{end_date}'
"""
Or preferably if your db library allows it don't inject your values directly
q = """
select * from example_table a
where a.date between %s and %s
"""
result = ic.query(q, (start_date, end_date))
EDIT: Some database libraries may use place-holder with different format than %s. You should probably consult documentation of db library you are using.
i have tried this but didn't work as expected
I'm connecting to a postgres sql database using python. I want to run a query for each day in a date range and append the results in a dataframe or export straight to a csv (whichever gets me the data in one place).
import pandas as pd
import pymysql
from datetime import date
dates = [
date(year=2020, month=10, day=12),
date(year=2020, month=10, day=13),
date(year=2020, month=10, day=14),
]
conn = pymysql.connect(...)
cursor = conn.cursor()
frame = []
for date in dates:
query = """SELECT * FROM table WHERE date = {date}""".format(date=date)
cursor.execute(query)
data = cursor.fetchall()
df = pd.DataFrame(list(data))
frame.append(df)
conn.close()
You should retrieve all data, then create your DataFrame
values = []
for date in dates:
cursor.execute("""SELECT * FROM table WHERE date = {date}""".format(date=date))
values.append(cursor.fetchall())
conn.close()
df = pd.DataFrame(values)
If you cursor is not in dictionary mode (results are just values) you may specify the columns by yourself df = pd.DataFrame(values, columns=[])
Try doing cursor.execute("SELECT * FROM table WHERE date = %s", (date,)), it's better to pass parameter that way toa void SQL injections
I have a somewhat complex sql query that should update multiple columns on multiple rows in a table. Am trying to pass the multiple parameters to the query and also loop though the data to be updated through psycopg2 but I can't figure out a way to do this.
Here is the sample data I want to loop through.
data = [(214, 'Feb', 545), (215, 'March', 466)]
So far here is the sql query I have
query = """
UPDATE table_1
SET
date_from =
(CASE version
WHEN 1 THEN '1900-01-01' ELSE
( SELECT date_to
FROM table_1
WHERE month = data.month
AND cust_key = data.cust_key
AND prod_key = data.prod_key
AND version = (
SELECT version-1
FROM table_1
WHERE month = data.month
AND cust_key = data.cust_key
AND prod_key = data.prod_key
ORDER BY version DESC LIMIT 1)
)
END),
date_to = current_date
FROM (VALUES %s) AS data(cust_key, month, prod_key)
WHERE month = data.month
AND cust_key = data.cust_key
AND prod_key = data.prod_key
"""
Here is how I am passing my parameters
WHERE month = data.month
AND cust_key = data.cust_key
AND prod_key = data.prod_key
FROM (VALUES %s) AS data(cust_key, month, prod_key)
And this is how I am executing the query
cursor = db.cursor()
execute_values(cursor, query, (data,))
db.commit()
return True
When I execute the query, I get this error psycopg2.errors.InvalidColumnReference: table "data" has 2 columns available but 3 columns specified I have gone through multiple solutions on this site but none seems to work for me.
Is there a way around this?
Your data is already in the format as expected by psycopg2.extras.execute_values.
So, do not convert it to another tuple, simply do
execute_values(cursor, query, data)