Pass a Python Variable containing multiple ID Numbers into external BigQuery Script

Pass a Python Variable containing multiple ID Numbers into external BigQuery Script - python

I have created a python class, and one of my methods is meant to take in either a single ID number or a list of ID numbers. The function will then use the ID numbers to query from a table in BigQuery using a .sql script. Currently, the function works fine for a single ID number using the following:
def state_data(self, state, id_number):
if state == 'NY':
sql_script = self.sql_scripts['get_data_ny']
else:
sql_script = self.sql_scripts['get_data_rest']
sql_script = sql_script.replace('##id_number##', id_number)
I'm having issues with passing in multiple ID numbers at once. There are 3 different ways that I've tried without success:
The above method, passing in the multiple ID numbers as a tuple to use with WHERE ID_NUM IN('##id_number##'). This doesn't work, as when the .sql script gets called, a syntax error is returned, as parentheses and quotes are automatically added. For example, the SQL statement attempts to run as WHERE ID_NUM IN('('123', '124')'). This would run fine without one of the two sets of parentheses and quotes, but no matter what I try to pass in, they always get added.
The second technique I have tried is to create a table, populate it with the passed in ID numbers, and then join with the larger table in BQ. It goes as follows:
CREATE OR REPLACE TABLE ID_Numbers
(
ID_Number STRING
);
INSERT INTO ID_Numbers (ID_Number)
VALUES ('##id_number##');
-- rest of script is a simple left join of the above created table with the BQ table containing the data for each ID
This again works fine for single ID numbers, but passing in multiple VALUES (in this case ID Numbers) would require a ('##id_number##') per unique ID. One thing that I have not yet attempted - to assign a variable to each unique ID and pass each one in as a new VALUE. I am not sure if this technique will work.
The third technique I've tried is to include the full SQL query in the function, rather than calling a .sql script. The list of ID numbers get passed in as tuple, and the query goes as follows:
id_nums = tuple(id_number)
query = ("""SELECT * FROM `data_table`
WHERE ID_NUM IN{}""").format(id_nums)
This technique also does not work, as I get the following error:
AttributeError: 'QueryJob' object has no attribute 'format'.
I've attempted to look into this error but I cannot find anything that helps me out effectively.
Finally, I'll note that none of the posts asking the same or similar questions have solved my issues so far.
I am looking for any and all advice for a way that I can successfully pass a variable containing multiple ID numbers into my function that ultimately calls and runs a BQ query.

You should be able to use *args to get the id_numbers as a sequence and f-strings and str.join() to build the SQL query:
class MyClass:
def state_data(self, state, *id_numbers):
print(f"{state=}")
query = f"""
SELECT * FROM `data_table`
WHERE ID_NUM IN ({", ".join(str(id_number) for id_number in id_numbers)})
"""
print(query)
my_class = MyClass()
my_class.state_data("some state", 123)
my_class.state_data("some more state", 123, 124)
On my machine, this prints:
➜ sql python main.py
state='some state'
SELECT * FROM `data_table`
WHERE ID_NUM IN (123)
state='some more state'
SELECT * FROM `data_table`
WHERE ID_NUM IN (123, 124)

Related

Updating MySQL table using Python loop and escape

I wrote a python script to update currency exchange rates using API calls. I successfully parsed the json results and extracted individual exchange rates as floats. I am however struggling with formatting/implementing an SQL table update loop.
Here is a code snippet that is tripping me up, assume that val = an actual exchange rate variable that is supplied from the API fetching/parsing section of the code:
mycursor = mydb.cursor()
sql = "UPDATE currencies SET coefficient = %s"
val = 0.03137
mycursor.execute(sql, val)
mydb.commit()
Running this gives me the following error:
Could not process parameters: float(0.03137), it must be of type list, tuple or dict:
I do not even know what to search for in order to reach an explanation that I understand and that helps me implement what I want correctly.

From: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-execute.html
Note
In Python, a tuple containing a single value must include a comma. For
example, ('abc') is evaluated as a scalar while ('abc',) is evaluated
as a tuple.
Thus in my example the expression that works is:
val = (0.03137, )

sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "product_pkey" [duplicate]

I have a tabled called products
which has following columns
id, product_id, data, activity_id
What I am essentially trying to do is copy bulk of existing products and update it's activity_id and create new entry in the products table.
Example:
I already have 70 existing entries in products with activity_id 2
Now I want to create another 70 entries with same data except for updated activity_id
I could have thousands of existing entries that I'd like to make a copy of and update the copied entries activity_id to be a new id.
products = self.session.query(model.Products).filter(filter1, filter2).all()
This returns all the existing products for a filter.
Then I iterate through products, then simply clone existing products and just update activity_id field.
for product in products:
product.activity_id = new_id
self.uow.skus.bulk_save_objects(simulation_skus)
self.uow.flush()
self.uow.commit()
What is the best/ fastest way to do these bulk entries so it kills time, as of now it's OK performance, is there a better solution?

You don't need to load these objects locally, all you really want to do is have the database create these rows.
You essentially want to run a query that creates the rows from the existing rows:
INSERT INTO product (product_id, data, activity_id)
SELECT product_id, data, 2 -- the new activity_id value
FROM product
WHERE activity_id = old_id
The above query would run entirely on the database server; this is far preferable over loading your query into Python objects, then sending all the Python data back to the server to populate INSERT statements for each new row.
Queries like that are something you could do with SQLAlchemy core, the half of the API that deals with generating SQL statements. However, you can use a query built from a declarative ORM model as a starting point. You'd need to
Access the Table instance for the model, as that then lets you create an INSERT statement via the Table.insert() method.
You could also get the same object from models.Product query, more on that later.
Access the statement that would normally fetch the data for your Python instances for your filtered models.Product query; you can do so via the Query.statement property.
Update the statement to replace the included activity_id column with your new value, and remove the primary key (I'm assuming that you have an auto-incrementing primary key column).
Apply that updated statement to the Insert object for the table via Insert.from_select().
Execute the generated INSERT INTO ... FROM ... query.
Step 1 can be achieved by using the SQLAlchemy introspection API; the inspect() function, applied to a model class, gives you a Mapper instance, which in turn has a Mapper.local_table attribute.
Steps 2 and 3 require a little juggling with the Select.with_only_columns() method to produce a new SELECT statement where we swapped out the column. You can't easily remove a column from a select statement but we can, however, use a loop over the existing columns in the query to 'copy' them across to the new SELECT, and at the same time make our replacement.
Step 4 is then straightforward, Insert.from_select() needs to have the columns that are inserted and the SELECT query. We have both as the SELECT object we have gives us its columns too.
Here is the code for generating your INSERT; the **replace keyword arguments are the columns you want to replace when inserting:
from sqlalchemy import inspect, literal
from sqlalchemy.sql import ClauseElement
def insert_from_query(model, query, **replace):
# The SQLAlchemy core definition of the table
table = inspect(model).local_table
# and the underlying core select statement to source new rows from
select = query.statement
# validate asssumptions: make sure the query produces rows from the above table
assert table in select.froms, f"{query!r} must produce rows from {model!r}"
assert all(c.name in select.columns for c in table.columns), f"{query!r} must include all {model!r} columns"
# updated select, replacing the indicated columns
as_clause = lambda v: literal(v) if not isinstance(v, ClauseElement) else v
replacements = {name: as_clause(value).label(name) for name, value in replace.items()}
from_select = select.with_only_columns([
replacements.get(c.name, c)
for c in table.columns
if not c.primary_key
])
return table.insert().from_select(from_select.columns, from_select)
I included a few assertions about the model and query relationship, and the code accepts arbitrary column clauses as replacements, not just literal values. You could use func.max(models.Product.activity_id) + 1 as a replacement value (wrapped as a subselect), for example.
The above function executes steps 1-4, producing the desired INSERT SQL statement when printed (I created a products model and query that I thought might be representative):
>>> print(insert_from_query(models.Product, products, activity_id=2))
INSERT INTO products (product_id, data, activity_id) SELECT products.product_id, products.data, :param_1 AS activity_id
FROM products
WHERE products.activity_id != :activity_id_1
All you have to do is execute it:
insert_stmt = insert_from_query(models.Product, products, activity_id=2)
self.session.execute(insert_stmt)

cx_Oracle execute/executemany error when updating table

I need conditionaly update Oracle table from my Python code. It's a simple piece of code, but I encountered cx_Oracle.DatabaseError: ORA-01036: illegal variable name/number with following attempts
id_as_list = ['id-1', 'id-2'] # list of row IDs in the DB table
id_as_list_of_tuples = [('id-1'), ('id-2')] # the same as list of tuples
sql_update = "update my_table set processed = 1 where object_id = :1"
# then when I tried any of following commands, result was "illegal variable name/number"
cursor.executemany(sql_update, id_as_list) # -> ends with error
cursor.executemany(sql_update, id_as_list_of_tuples) # -> ends with error
for id in id_as_list:
cursor.execute(sql_update, id) # -> ends with error
Correct solution was to use list of dictionaries and the key name in the SQL statement:
id_as_list_of_dicts = [{'id': 'id-1'}, {'id': 'id-2'}]
sql_update = "update my_table set processed = 1 where object_id = :id"
cursor.executemany(sql_update, id_as_list_of_dicts) # -> works
for id in id_as_list_of_dicts:
cursor.execute(sql_update, id) # -> also works
I've found some helps and tutorials like this and they all used ":1, :2,..." syntax (but on the other hand I haven't found any example with update and cx_Oracle). Although my issue has been solved with help of dictionaries I wonder if it's common way of update or if I do something wrong in the ":1, :2,..." syntax.
Oracle 12c, Python 3.7, cx_Oracle 7.2.1

You can indeed bind with dictionaries but the overhead of creating the dictionaries can be undesirable. You need to make sure you create a list of sequences when using executemany(). So in your case, you want something like this instead:
id_as_list = [['id-1'], ['id-2']] # list of row IDs in the DB table
id_as_list_of_tuples = [('id-1',), ('id-2',)] # the same as list of tuples
In the first instance you had a list of strings. Strings are sequences in their own right so in that case cx_Oracle was expecting 4 bind variables (the number of characters in each string).
In the second instance you had the same data as the first instance -- as you were simply including parentheses around the strings, not creating tuples! You need the trailing comma as shown in my example to create tuples as you thought you were creating!

Python SQLite Return Default String For Non-Existent Row

I have a DB with ID/Topic/Definition columns. When a select query is made, with possibly hundreds of parameters, I would like the fetchall call to also return the topic of any non-existent rows with a default text (i.e. "Not Found").
I realize this could be done in a loop, but that would query the DB every cycle and have a significant performance hit. With the parameters joined by "OR" in a single select statement the search is nearly instantaneous.
Is there a way to get a return of the query (topic) with default text for non-existent rows in SQLite?
Table Structure (named "dictionary")
ID|Topic|Definition
1|wd1|def1
2|wd3|def3
Sample Query
SELECT Topic,Definition FROM dictionary WHERE Topic = "wd1" or Topic = "wd2" or topic = "wd3"'
Desired Return
[(wd1, def1), (wd2, "Not Found"), (wd3, def3)]

To get data like wd2 out of a query, such data must be in the database in the first place.
You could put it into a temporary table, or use a common table expression.
To include rows without a match, use an outer join:
WITH IDs(ID) AS ( VALUES ('wd1'), ('wd2'), ('wd3') )
SELECT Topic,
IFNULL(Definition, 'Not Found') AS Definition
FROM IDs
LEFT JOIN dictionary USING (ID);

Python: using pyodbc and replacing row field values

I'm trying to figure out if it's possible to replace record values in a Microsoft Access (either .accdb or .mdb) database using pyodbc. I've poured over the documentation and noted where it says that "Row Values Can Be Replaced" but I have not been able to make it work.
More specifically, I'm attempting to replace a row value from a python variable. I've tried:
setting the connection autocommit to "True"
made sure that it's not a data type issue
Here is a snippet of the code where I'm executing a SQL query, using fetchone() to grab just one record (I know with this script the query is only returning one record), then I am grabbing the existing value for a field (the field position integer is stored in the z variable), and then am getting the new value I want to write to the field by accessing it from an existing python dictionary created in the script.
pSQL = "SELECT * FROM %s WHERE %s = '%s'" % (reviewTBL, newID, basinID)
cursor.execute(pSQL)
record = cursor.fetchone()
if record:
oldVal = record[z]
val = codeCrosswalk[oldVal]
record[z] = val
I've tried everything I can think bit cannot get it to work. Am I just misunderstanding the help documentation?
The script runs successfully but the newly assigned value never seems to commit. I even tried putting "print str(record[z])this after the record[z] = val line to see if the field in the table has the new value and the new value would print like it worked...but then if I check in the table after the script has finished the old values are still in the table field.
Much appreciate any insight into this...I was hoping this would work like how using VBA in MS Access databases you can use an ADO Recordset to loop through records in a table and assign values to a field from a variable.
thanks,
Tom

The "Row values can be replaced" from the pyodbc documentation refers to the fact that you can modify the values on the returned row objects, for example to perform some cleanup or conversion before you start using them. It does not mean that these changes will automatically be persisted in the database. You will have to use sql UPDATE statements for that.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pass a Python Variable containing multiple ID Numbers into external BigQuery Script - python

Related

Updating MySQL table using Python loop and escape

sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "product_pkey" [duplicate]

cx_Oracle execute/executemany error when updating table

Python SQLite Return Default String For Non-Existent Row

Python: using pyodbc and replacing row field values

Categories

Resources