Python MySQLdb SELECT not returning proper value

Python MySQLdb SELECT not returning proper value - python

Here's the code I'm working on:
poljeID = int(cursor.execute("SELECT poljeID FROM stanje"))
xkoord = cursor.execute("SELECT xkoord FROM polje WHERE poljeID = %s;", poljeID)
ykoord = cursor.execute("SELECT ykoord FROM polje WHERE poljeID = %s;", poljeID)
print xkoord, ykoord
It's a snippet from it, basically what it needs to do is fetch the ID of the field (poljeID) where an agent is currently on (stanje) and use it to get the x and y coordinates of that field (xkoord, ykoord).
The initial values for the variables are:
poljeID = 1
xkoord = 0
ykoord = 0
The values that I get with that code are:
poljeID = 1
xkoord = 1
ykoord = 1
What am I doing wrong?

cursor.execute does not return the result of the query, it returns the number of rows affected. To get the result, you need to do cursor.fetchone() (or cursor.fetchall()) for each query.
(Note, really the second and third queries should be done at once: SELECT xkoord, ycoord FROM ...)

Related

Return Missing Rows from Python SQL Query

Is there anyway i can compare two different databases (postgresl, sql server) and return the missing rows? I am missing one row in the postgresql table that is not in the sql server one and have no clue how to return that answer to me.
I have two connections opened for postgresql (bpo_table_results) and for sql server(rps_table_results)
postgresql table:
date count amount
1/1/21 500 1,234,654.12
sql server table:
date count amount
1/1/21 500 1,234,654.12
1/2/21 4541 3,457,787.24
expected results:
The row in the amount of 3,457,787.24 is missing from your posgresql table.
code:
def queryRPS(sql_server_conn, sql_server_cursor):
rps_item_count_l = []
rps_icl_amt_l = []
rps_table_q_2 = f"""select * from rps..sendfile where processingdate = '{cd}' and datasetname like '%ICL%' """
rps_table_results = sql_server_cursor.execute(rps_table_q_2).fetchall()
for row in rps_table_results:
rps_item_count = row[16]
rps_item_count_l.append(rps_item_count)
rps_icl_amt = row[18]
rps_icl_amt_l.append(rps_icl_amt)
def queryBPO(postgres_conn, postgres_cursor,rps_item_count_l, rps_icl_amt_l):
bpo_results_l = []
rps_results_l = []
for rps_count, rps_amount in zip(rps_item_count_l, rps_icl_amt_l):
rps_amount_f = str(rps_amount).rstrip('0')
rps_amount_f = ("{:,}".format(float(rps_amount_f)))
bpo_icl_awk_q_2 = """select * from ppc_data.icl_awk where num_items = '%s' and
file_total = '%s' """ % (str(rps_count), str(rps_amount_f))
postgres_cursor.execute(bpo_icl_awk_q_2)
bpo_table_results = postgres_cursor.fetchall()
rps_table_q_2 = f"""select * from rps..sendfile where processingdate = '{cd}' and datasetname like '%ICL%' """
rps_table_results = sql_server_cursor.execute(rps_table_q_2).fetchall()
rps_item_count_l, rps_icl_amt_l = queryRPS(sql_server_conn, sql_server_cursor)
queryBPO(postgres_conn, postgres_cursor, rps_item_count_l, rps_icl_amt_l)

Django: how to update multiple records?

I need to update multiple records (rows) in a table.
First, I get informations to select rows to be update:
ran_st1 = 1
ran_st2 = 1
ran_bra = 'A'
pay_ide = 'FR'
bra_lib = 'Polyvitamines et oligo-éléments'
Then, I select rows to be updated:
rows= Randomisation.objects.filter(Q(ran_st1 = ran_st1) & Q(ran_st2 = ran_st2) & Q(ran_bra = ran_bra) & Q(pay_ide = pay_ide))
And then, I though to make a loop like that, but not sure:
for row in rows:
r = get_object_or_404(Randomisation, ran_ide = row.ran_ide)
r.ran_act = 1
r.save()

You can update with .update(..) [Django-doc]:
Randomisation.objects.filter(
ran_st1=ran_st1,
ran_st2 = ran_st2,
ran_bra = ran_bra,
pay_ide = pay_ide
).update(ran_act=1)
This will work with a query that looks like:
UPDATE randomisation
SET ran_act = 1
WHERE ran_st1 = 1
AND ran_st2 = 1
AND ran_bra = 'A'
AND pay_ide = 'FR'
AND bra_lib = 'Polyvitamines et oligo-elements'
This is thus done in one query, not in several queries where you each time fetch an element, update that element, and then make an update query to the database for that specific record.

read_sql query returns an empty dataframe after I pass parameters as a dict in python pandas

I am trying to parameterize some parts of a SQL Query using the below dictionary:
query_params = dict(
{'target':'status',
'date_from':'201712',
'date_to':'201805',
'drform_target':'NPA'
})
sql_data_sample = str("""select *
from table_name
where dt = %(date_to)s
and %(target)s in (%(drform_target)s)
----------------------------------------------------
union all
----------------------------------------------------
(select *,
from table_name
where dt = %(date_from)s
and %(target)s in ('ACT')
order by random() limit 50000);""")
df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)
However this returns a dataframe with no records at all. I am not sure what the error is since no error is being thrown.
df_data_sample.shape
Out[7]: (0, 1211)
The final PostgreSql query would be:
select *
from table_name
where dt = '201805'
and status in ('NPA')
----------------------------------------------------
union all
----------------------------------------------------
(select *
from table_name
where dt = '201712'
and status in ('ACT')
order by random() limit 50000);-- This part of random() is only for running it on my local and not on server.
Below is a small sample of data for replication. The original data has more than a million records and 1211 columns
service_change_3m service_change_6m dt grp_m2 status
0 -2 201805 $50-$75 NPA
0 0 201805 < $25 NPA
0 -1 201805 $175-$200 ACT
0 0 201712 $150-$175 ACT
0 0 201712 $125-$150 ACT
-1 1 201805 $50-$75 NPA
Can someone please help me with this?
UPDATE:
Based on suggestion by #shmee.. I am finally using :
target = 'status'
query_params = dict(
{
'date_from':'201712',
'date_to':'201805',
'drform_target':'NPA'
})
sql_data_sample = str("""select *
from table_name
where dt = %(date_to)s
and {0} in (%(drform_target)s)
----------------------------------------------------
union all
----------------------------------------------------
(select *,
from table_name
where dt = %(date_from)s
and {0} in ('ACT')
order by random() limit 50000);""").format(target)
df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

Yes, I am quite confident that your issue results from trying to set column names in your query via parameter binding (and %(target)s in ('ACT')) as mentioned in the comments.
This results in your query restricting the result set to records where 'status' in ('ACT') (i.e. Is the string 'status' an element of a list containing only the string 'ACT'?). This is, of course, false, hence no record gets selected and you get an empty result.
This should work as expected:
import psycopg2.sql
col_name = 'status'
table_name = 'public.churn_data'
query_params = {'date_from':'201712',
'date_to':'201805',
'drform_target':'NPA'
}
sql_data_sample = """select *
from {0}
where dt = %(date_to)s
and {1} in (%(drform_target)s)
----------------------------------------------------
union all
----------------------------------------------------
(select *
from {0}
where dt = %(date_from)s
and {1} in ('ACT')
order by random() limit 50000);"""
sql_data_sample = sql.SQL(sql_data_sample).format(sql.Identifier(table_name),
sql.Identifier(col_name))
df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

SQLAlchemy reference a subquery from a case expression

I have a hybrid_property that returns a string based on some calculations made on a one-to-many relationship.
The raw sql for the hybrid_property expression is:
Here's the raw sql:
SELECT
CASE
WHEN s.quantity_received = 0 THEN "unreceived"
WHEN s.dif = 0.0 THEN "received"
WHEN s.dif > 0.0 THEN "partially_received"
WHEN s.dif < 0.0 THEN "over_received"
END as status
FROM (
SELECT li.quantity_received, sum(li.quantity - li.received) as 'dif'
FROM line_items as li
WHERE li.o_id = xxx
) as s
The Models
class LineItem(BaseModel):
__table__ = Table('line_items', autoload=True)
order = relationship("Order", backreef="line_itms", primaryjoin="Order.id == foregin(LineItem.o_id)")
class Order(BaseModel):
__table__ = Table('orders', autoload=True)
#hybrid_property
def status(self):
qty_received, qty_ordered = 0, 0
for li in self.line_items:
if li.status != "cancelled":
qty_ordered += li.quantity
qty_received += li.quantity_received
if qty_received == 0:
status = "unreceived"
elif qty_received == qty_ordered:
status = "received"
elif qty_received < qty_ordered:
status = "partially_received"
elif qty_received > qty_ordered:
status = "over_received"
return status
#status.expression
def status(cls):
line_items_calc = select([LineItem.quantity_received,
func.sum(LineItem.quantity - LineItem.quantity_received).label('dif')]) \
.where(and_(LineItem.o_id == Order.id,
or_(LineItem.fulfillment_status != "cancelled",
LineItem.fulfillment_status == None))) \
.alias()
qq = select([
case([
(qs.c.quantity_received == 0, "unreceived"),
(qs.c.dif == 0, "received"),
(qs.c.dif > 0, "partially_received"),
(qs.c.dif < 0, "over_received")]
)]) \
.select_from(line_items_calc) \
.as_scalar()
return qq
I have 2 orders, o1 and o2 with line items:
LineItem(o_id=o1.id, quantity=1, quantity_received=1)
LineItem(o_id=o2.id, quantity=1, quantity_received=0)
LineItem(o_id=o2.id, quantity=2, quantity_received=1)
Order1 should have status "received" and Order2 should have "partially_received".
But when I query for "received" I get nothing and when querying for "partially_received" I get 2 results instead of one.
It looks like it is not filtering the LineItems by Order.id and so it uses all to calculate the status (since total_qty would be 4 and total received would be 2, which will give "partially_received")
Order.query().filter(Order.status == 'received').all() # returns []
Order.query().filter(Order.status == 'partially_received').all() # returns [Order1, Order2]
If add the .correlate_except(LineItem) to line_items_calc query, I get theh following error:
OperationalError: (_mysql_exceptions.OperationalError) (1054,
"Unknown column 'orders.id' in 'where clause'") [SQL: u'SELECT
count(*) AS count_1 \nFROM (SELECT * \nFROM orders \nWHERE
orders.account_id = %s AND (SELECT CASE WHEN (a_3.quantity_received =
%s) THEN %s WHEN (a_3.dif = %s) THEN %s WHEN (a_3.dif > %s) THEN %s
WHEN (a_3.dif < %s) THEN %s END AS a_2 \nFROM (SELECT
line_items.quantity_received AS quantity_received,
sum(line_items.quantity - line_items.quantity_received) AS dif \nFROM
line_items \nWHERE line_items.o_id = orders.id AND
(line_items.fulfillment_status != %s OR line_items.fulfillment_status
IS NULL)) AS a_3) = %s) AS a_1'] [parameters: (1L, 0, 'unreceived', 0,
'received', 0, 'partially_received', 0, 'over_received', 'cancelled',
u'over_received')]

It would seem that you're trying to correlate the expression to the outermost query, but as it turned out the current nested subquery approach is not feasible in MySQL, because it does not allow correlated subqueries in FROM clause at all – compared to some other databases that simply do not allow correlating with previous FROM list items, except if using LATERAL.
On the other hand the nested subquery is redundant, since you can use aggregates in a CASE expression in the SELECT list, but in your current subquery you mix non-aggregate and aggregate expressions:
SELECT li.quantity_received, sum(li.quantity - li.received) as 'dif'
which is more than likely not what you wanted. Some other databases would not even allow such a query to execute, but MySQL silently picks a value for li.quantity_received from an unspecified row in the group, if ONLY_FULL_GROUP_BY is disabled. It is by default enabled in 5.7.5 and onwards, and you should consider enabling it. Looking at your hybrid property's other half it looks like you probably meant to take the sum of received quantity as well.
Below is a version of status expression that fulfills the 2 test cases you've presented in your question:
#status.expression
def status(cls):
qty_received = func.coalesce(func.sum(LineItem.quantity_received), 0)
qty_ordered = func.coalesce(func.sum(LineItem.quantity), 0)
return select([case([
(qty_received == 0, "unreceived"),
(qty_received == qty_ordered, "received"),
(qty_received < qty_ordered, "partially_received"),
(qty_received > qty_ordered, "over_received")])]).\
where(and_(func.coalesce(LineItem.fulfillment_status, "") != "cancelled",
LineItem.o_id == cls.id)).\
correlate_except(LineItem).\
as_scalar()
I believe it's a closer representation of the Python side approach than your original. Note the use of COALESCE for NULL handling.

Build a dynamic update query in psycopg2

I have to construct a dynamic update query for postgresql.
Its dynamic, because beforehand I have to determine which columns to update.
Given a sample table:
create table foo (id int, a int, b int, c int)
Then I will construct programmatically the "set" clause
_set = {}
_set['a'] = 10
_set['c'] = NULL
After that I have to build the update query. And here I'm stuck.
I have to construct this sql Update command:
update foo set a = 10, b = NULL where id = 1
How to do this with the psycopg2 parametrized command? (i.e. looping through the dict if it is not empty and build the set clause) ?
UPDATE
While I was sleeping I have found the solution by myself. It is dynamic, exactly how I wanted to be :-)
create table foo (id integer, a integer, b integer, c varchar)
updates = {}
updates['a'] = 10
updates['b'] = None
updates['c'] = 'blah blah blah'
sql = "upgrade foo set %s where id = %s" % (', '.join("%s = %%s" % u for u in updates.keys()), 10)
params = updates.values()
print cur.mogrify(sql, params)
cur.execute(sql, params)
And the result is what and how I needed (especially the nullable and quotable columns):
"upgrade foo set a = 10, c = 'blah blah blah', b = NULL where id = 10"

There is actually a slightly cleaner way to make it, using the alternative column-list syntax:
sql_template = "UPDATE foo SET ({}) = %s WHERE id = {}"
sql = sql_template.format(', '.join(updates.keys()), 10)
params = (tuple(addr_dict.values()),)
print cur.mogrify(sql, params)
cur.execute(sql, params)

Using psycopg2.sql – SQL string composition module
The module contains objects and functions useful to generate SQL dynamically, in a convenient and safe way.
from psycopg2 import connect, sql
conn = connect("dbname=test user=postgres")
upd = {'name': 'Peter', 'age': 35, 'city': 'London'}
ref_id = 12
sql_query = sql.SQL("UPDATE people SET {data} WHERE id = {id}").format(
data=sql.SQL(', ').join(
sql.Composed([sql.Identifier(k), sql.SQL(" = "), sql.Placeholder(k)]) for k in upd.keys()
),
id=sql.Placeholder('id')
)
upd.update(id=ref_id)
with conn:
with conn.cursor() as cur:
cur.execute(sql_query, upd)
conn.close()
Running print(sql_query.as_string(conn)) before closing connection will reveal this output:
UPDATE people SET "name" = %(name)s, "age" = %(age)s, "city" = %(city)s WHERE id = %(id)s

No need for dynamic SQL. Supposing a is not nullable and b is nullable.
If you want to update both a and b:
_set = dict(
id = 1,
a = 10,
b = 20, b_update = 1
)
update = """
update foo
set
a = coalesce(%(a)s, a), -- a is not nullable
b = (array[b, %(b)s])[%(b_update)s + 1] -- b is nullable
where id = %(id)s
"""
print cur.mogrify(update, _set)
cur.execute(update, _set)
Output:
update foo
set
a = coalesce(10, a), -- a is not nullable
b = (array[b, 20])[1 + 1] -- b is nullable
where id = 1
If you want to update none:
_set = dict(
id = 1,
a = None,
b = 20, b_update = 0
)
Output:
update foo
set
a = coalesce(NULL, a), -- a is not nullable
b = (array[b, 20])[0 + 1] -- b is nullable
where id = 1

An option without python format using psycopg2's AsIs function for column names (although that doesn't prevent you from SQL injection over column names). Dict is named data:
update_statement = f'UPDATE foo SET (%s) = %s WHERE id_column=%s'
columns = data.keys()
values = [data[column] for column in columns]
query = cur.mogrify(update_statement, (AsIs(','.join(columns)), tuple(values), id_value))

Here's my solution that I have within a generic DatabaseHandler class that provides a lot of flexibility when using pd.DataFrame as your source.
def update_data(
self,
table: str,
df: pd.DataFrame,
indexes: Optional[list] = None,
column_map: Optional[dict] = None,
commit: Optional[bool] = False,
) -> int:
"""Update data in the media database
Args:
table (str): the "tablename" or "namespace.tablename"
df (pandas.DataFrame): dataframe containing the data to update
indexes (list): the list of columns in the table that will be in the WHERE clause of the update statement.
If not provided, will use df indexes.
column_map (dict): dictionary mapping the columns in df to the columns in the table
columns in the column_map that are also in keys will not be updated
Key = df column.
Value = table column.
commit (bool): if True, the transaction will be committed (default=False)
Notes:
If using a column_map, only the columns in the data_map will be updated or used as indexes.
Order does not matter. If not using a column_map, all columns in df must exist in table.
Returns:
int : rows updated
"""
try:
if not indexes:
# Use the dataframe index instead
indexes = []
for c in df.index.names:
if not c:
raise Exception(
f"Dataframe contains indexes without names. Unable to determine update where clause."
)
indexes.append(c)
update_strings = []
tdf = df.reset_index()
if column_map:
target_columns = [c for c in column_map.keys() if c not in indexes]
else:
column_map = {c: c for c in tdf.columns}
target_columns = [c for c in df.columns if c not in indexes]
for i, r in tdf.iterrows():
upd_params = ", ".join(
[f"{column_map[c]} = %s" for c in target_columns]
)
upd_list = [r[c] if pd.notna(r[c]) else None for c in target_columns]
upd_str = self._cur.mogrify(upd_params, upd_list).decode("utf-8")
idx_params = " AND ".join([f"{column_map[c]} = %s" for c in indexes])
idx_list = [r[c] if pd.notna(r[c]) else None for c in indexes]
idx_str = self._cur.mogrify(idx_params, idx_list).decode("utf-8")
update_strings.append(f"UPDATE {table} SET {upd_str} WHERE {idx_str};")
full_update_string = "\n".join(update_strings)
print(full_update_string) # Debugging
self._cur.execute(full_update_string)
rowcount = self._cur.rowcount
if commit:
self.commit()
return rowcount
except Exception as e:
self.rollback()
raise e
Example usages:
>>> df = pd.DataFrame([
{'a':1,'b':'asdf','c':datetime.datetime.now()},
{'a':2,'b':'jklm','c':datetime.datetime.now()}
])
>>> cls.update_data('my_table', df, indexes = ['a'])
UPDATE my_table SET b = 'asdf', c = '2023-01-17T22:13:37.095245'::timestamp WHERE a = 1;
UPDATE my_table SET b = 'jklm', c = '2023-01-17T22:13:37.095250'::timestamp WHERE a = 2;
>>> cls.update_data('my_table', df, indexes = ['a','b'])
UPDATE my_table SET c = '2023-01-17T22:13:37.095245'::timestamp WHERE a = 1 AND b = 'asdf';
UPDATE my_table SET c = '2023-01-17T22:13:37.095250'::timestamp WHERE a = 2 AND b = 'jklm';
>>> cls.update_data('my_table', df.set_index('a'), column_map={'a':'db_a','b':'db_b','c':'db_c'} )
UPDATE my_table SET db_b = 'asdf', db_c = '2023-01-17T22:13:37.095245'::timestamp WHERE db_a = 1;
UPDATE my_table SET db_b = 'jklm', db_c = '2023-01-17T22:13:37.095250'::timestamp WHERE db_a = 2;
Note however that this is not safe from SQL injection due to the way it generates the where clause.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python MySQLdb SELECT not returning proper value - python

cursor.execute does not return the result of the query, it returns the number of rows affected. To get the result, you need to do cursor.fetchone() (or cursor.fetchall()) for each query. (Note, really the second and third queries should be done at once: SELECT xkoord, ycoord FROM ...)

Related

Return Missing Rows from Python SQL Query

Django: how to update multiple records?

read_sql query returns an empty dataframe after I pass parameters as a dict in python pandas

SQLAlchemy reference a subquery from a case expression

Build a dynamic update query in psycopg2

Categories

Resources