Below is the code. In iterating the dictionary, the code is querying multiple times. Is it the best practice to execute the query or Pinging the DB multiple times ?
import cx_Oracle
connDev = 'username/password#hostname:port/service'
connDev = cx_Oracle.connect(connDev)
cursor = connDev.cursor()
d = {'2006': '20170019201',
'2006172': '2017000002',
'200617123': '200003'
}
for key,value in d.items():
cursDev.execute('SELECT columnName from tableName where columnName={}'.format(key))
if len(cursDev.fetchall())!=0:
# cursDev.execute('UPDATE tableName SET columnName= {0} WHERE columnName= {1} '.format(value, key))
else:
continue
connDev.commit()
cursDev.close()
connDev.close()
You could run a single query and get everything:
cursDev.execute(
'SELECT columnName FROM tableName WHERE columnName IN ({})'.format(
','.join(':p{}'.format(n) for n in range(len(d))),
{'p{}'.format(n): k for n, k in enumerate(d)}
)
or run the updates directly - they do nothing if the row is not found:
for k, v in d.items():
cursDev.execute(
'UPDATE tableName SET columnName = {} WHERE columnName = :value',
{'value': v}
)
Note that both examples are using parameterized queries - the data is being passed separated from the query and it is the job of the database to do the parameter interpolation - thus freeing you from quote hell and preventing injection automatically, besides performing better.
The code uses :value named-style parameter placeholders because that is what cx_Oracle uses - see documentation on cx_Oracle.paramstyle.
For a 'batch' update like this, the executemany() call is going to be most efficient way.
As #nosklo noted, the SELECT calls aren't necessary - they just take time. And with executemany() you don't need to do repeated execute() calls, which is another saving.
From samples/ArrayDMLRowCounts.py:
# delete the following parent IDs only
parentIdsToDelete = [20, 30, 50]
print("Deleting Parent IDs:", parentIdsToDelete)
print()
# enable array DML row counts for each iteration executed in executemany()
cursor.executemany("""
delete from ChildTable
where ParentId = :1""",
[(i,) for i in parentIdsToDelete],
arraydmlrowcounts = True)
# display the number of rows deleted for each parent ID
rowCounts = cursor.getarraydmlrowcounts()
for parentId, count in zip(parentIdsToDelete, rowCounts):
print("Parent ID:", parentId, "deleted", count, "rows.")
See Efficient and Scalable Batch Statement Execution in Python cx_Oracle for more examples, including those with multiple binds like you will need.
[Never ever (except is specialized cases) build up a SQL statement by concatenating strings. It is a security hole and can also give poor performance. Always use bind variables]
Related
I have a python script with a basic GUI that logs into a DB and executes a query.
The Python script also asks for 1 parameter called "collection Name" which is taken from the tkinter .get function and is added as a %s inside the Query text. The result is that each time I can execute a query with a different "Collection name". This works and it is fine
Now, I want to add a larger string of Collection Names into my .get function so I can do cursor.execute a query with multiple collection names to get more complex data. But I am having issues with inputing multiple "collection names" into my app.
Below is a piece of my Query1, which has the %s variable that it then gets from the input to tkinter.
From #Session1
Join vGSMRxLevRxQual On(#Session1.SessionId = vGSMRxLevRxQual.SessionId)
Where vGSMRxLevRxQual.RxLevSub<0 and vGSMRxLevRxQual.RxLevSub>-190
and #Session1.CollectionName in (%s)
Group by
#Session1.Operator
Order by #Session1.Operator ASC
IF OBJECT_ID('tempdb..#SelectedSession1') IS NOT NULL DROP TABLE #SelectedSession1
IF OBJECT_ID('tempdb..#Session1') IS NOT NULL DROP TABLE #Session1
Here, is where I try to execute the query
if Query == "GSMUERxLevelSub" :
result = cursor.execute(GSMUERxLevelSub, (CollectionName,))
output = cursor.fetchmany
df = DataFrame(cursor.fetchall())
filename = "2021_H1 WEEK CDF GRAPHS().xlsx"
df1 = DataFrame.transpose(df, copy=False)
Lastly, here is where I get the value for the Collection name:
CollectionName = f_CollectionName.get()
enter image description here
enter code here
Your issues are due to a list/collection being a invalid parameter.
You'll need to transform collectionName
collection_name: list[str] = ['collection1', 'collection2']
new_collection_name = ','.join(f'"{c}"' for c in collection_name)
cursor.execute(sql, (new_collection_name,))
Not sure if this approach will be susceptible to SQL injection if that's a concern.
Edit:
Forgot the DBAPI would put another set of quotes around the parameters. If you can do something like:
CollectionName = ["foo", "bar"]
sql = f"""
From #Session1
Join vGSMRxLevRxQual On(#Session1.SessionId = vGSMRxLevRxQual.SessionId)
Where vGSMRxLevRxQual.RxLevSub<0 and vGSMRxLevRxQual.RxLevSub>-190
and #Session1.CollectionName in ({",".join(["%s"] * len(CollectionName))})
"""
sql += """
Group by
#Session1.Operator
Order by #Session1.Operator ASC
"""
cursor.execute(sql, (CollectionName,))
EDIT: Update to F-string
While reading this question: SQL Multiple Updates vs single Update performance
I was wondering how could I dynamically implement an update for several variables at the same time using a connector like MariaDB's. Reading the official documentation I did not find anything similar.
This question is similar, and it has helped me to understand how to use parametrized queries with custom connectors but it does not answer my question.
Let's suppose that, from one of the views of the project, we receive a dictionary.
This dictionary has the following structure (simplified example):
{'form-0-input_file_name': 'nofilename', 'form-0-id': 'K0944', 'form-0-gene': 'GJXX', 'form-0-mutation': 'NM_0040(p.Y136*)', 'form-0-trix': 'ZSSS4'}
Assuming that each key in the dictionary corresponds to a column in a table of the database, if I'm not mistaken we would have to iterate over the dictionary and build the query in each iteration.
Something like this (semi pseudo-code, probably it's not correct):
query = "UPDATE `db-dummy`.info "
for key in a_dict:
query += "SET key = a_dict[key]"
It is not clear to me how to construct said query within a loop.
What is the most pythonic way to achieve this?
Although this could work.
query = "UPDATE `db-dummy`.info "
for index, key in enumerate(a_dict):
query = query + ("," if index != 0 else "") +" SET {0} = '{1}'".format(key,a_dict[key])
You should consider parameterized queries for safety and security. Moreover, a dynamic dictionary may also raise other concerns, it may be best to verify or filter on a set of agreed keys before attempting such an operation.
query = "UPDATE `db-dummy`.info "
for index, key in enumerate(a_dict):
query = query + ("," if index != 0 else "") +" SET {0} = ? ".format(key)
# Then execute with your connection/cursor
cursor.execute(query, tuple(a_dict.values()) )
This is what I did (inspired by #ggordon's answer)
query = "UPDATE `db-dummy`.info "
for index, key in enumerate(a_dict):
if index == 0:
query = query + "SET {0} = ?".format(key)
else:
query = query + ", {0} = ?".format(key)
query += " WHERE record_id = " + record_id
And it works!
im very new to Python but want to preform some mathmatic functions using Python's libraries getting interger values from a mysql table i have running,
ive sucessfully established a connection using mysql.connector however im at a loss,
I can select and print Rows and columbs but im unsure of the Syntax to physically define my query as an "x" or "y" in order to preform mathmatic operations with the varible.
Any help would be greatly appreciated.
EDIT
sql_select_Query = "select * from ATABLE"
cursor = mySQLconnection .cursor()
cursor.execute(sql_select_Query)
records = cursor.fetchall()`
and
for row in records:
print("Name = ", row[1], )
print("X_num = ", row[2])
print("Y_num = ", row[3])
print("Signal_Strength = ", row[4], "\n")
cursor.close()
gives me as an example
Name = X,
X_num = Y,
Y_num = Z,
SS = Q
what i would prefer in my selection operation is to define the X , Y, Z, Q to a Global name that i could then use for atleast my application math operations using Numpy libraries for example being able to perform an operator
X*Y-Z+Q
I hope that is a bit clearer
From the gate, I would recommend following the advice of this thread highlighting the use of select *. Turning a field into an integer is possible with your SQL selection statement int the way of CAST or CONVERT. Sort of like this (my daily language is SQL Server; check out the mysql documentation for exacts):
sql_select_Query = "select Name, CAST(X as INT),CAST(Y as BIGINT) from ATABLE"
In my personal experience, SQL tends to age better than Python (tongue in cheek). Aside, if your SQL instance is on a server; I code to the workhorse as error catching is better.
But coming from it in the other direction, if you want these elements to be re-callable later, I'm suggest fetching your feedback into a dictionary.
Information about Python dictionaries can be found here. At least that way, you're pretty much working from a global but fairly structured set of captured data.
It is a bad idea to play with locals() and globals() if you don't exactly know what you're doing. Create a dictionary.
sql_select_Query = "select * from ATABLE"
cursor = mySQLconnection.cursor()
cursor.execute(sql_select_Query)
records = cursor.fetchall()
columns = [item[0] for item in cursor.description] # Grab the table column names
for record in records:
# Create a dictionary {column_name: value, ...} for each row
variable_dict = dict(zip(columns, record))
print("X variable is: ", variable_dict['X'])
# <Calculation here>
You can also configure MySQL to return values as a dictionary but this is probably an easier starting point.
This way, your "variable X" value would just be variable_dict['X'] and there's no need to make any global values other than the dictionary.
I have a table with the following model:
CREATE TABLE IF NOT EXISTS {} (
user_id bigint ,
pseudo text,
importance float,
is_friend_following bigint,
is_friend boolean,
is_following boolean,
PRIMARY KEY ((user_id), is_friend_following)
);
I also have a table containing my seeds. Those (20) users are the starting point of my graph. So I select their ID and search in the table above to get their Followers and friends, and from there I build my graph (networkX).
def build_seed_graph(cls, name):
obj = cls()
obj.name = name
query = "SELECT twitter_id FROM {0};"
seeds = obj.session.execute(query.format(obj.seed_data_table))
obj.graph.add_nodes_from(obj.seeds)
for seed in seeds:
query = "SELECT friend_follower_id, is_friend, is_follower FROM {0} WHERE user_id={1}"
statement = SimpleStatement(query.format(obj.network_table, seed), fetch_size=1000)
friend_ids = []
follower_ids = []
for row in obj.session.execute(statement):
if row.friend_follower_id in obj.seeds:
if row.is_friend:
friend_ids.append(row.friend_follower_id)
if row.is_follower:
follower_ids.append(row.friend_follower_id)
if friend_ids:
for friend_id in friend_ids:
obj.graph.add_edge(seed, friend_id)
if follower_ids:
for follower_id in follower_ids:
obj.graph.add_edge(follower_id, seed)
return obj
The problem is that the time it takes to build the graph is too long and I would like to optimize it.
I've got approximately 5 millions rows in my table 'network_table'.
I'm wondering if it would be faster for me if instead of doing a query with a where clauses to just do a single query on whole table? Will it fit in memory? Is that a good Idea? Are there better way?
I suspect the real issue may not be the queries but rather the processing time.
I'm wondering if it would be faster for me if instead of doing a query with a where clauses to just do a single query on whole table? Will it fit in memory? Is that a good Idea? Are there better way?
There should not be any problem with doing a single query on the whole table if you enable paging (https://datastax.github.io/python-driver/query_paging.html - using fetch_size). Cassandra will return up to the fetch_size and will fetch additional results as you read them from the result_set.
Please note that if you have many rows in the table that are non seed related then a full scan may be slower as you will receive rows that will not include a "seed"
Disclaimer - I am part of the team building ScyllaDB - a Cassandra compatible database.
ScyllaDB have published lately a blog on how to efficiently do a full scan in parallel http://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/ which applies to Cassandra as well - if a full scan is relevant and you can build the graph in parallel than this may help you.
It seems like you can get rid of the last 2 if statements, since you're going through data that you already have looped through once:
def build_seed_graph(cls, name):
obj = cls()
obj.name = name
query = "SELECT twitter_id FROM {0};"
seeds = obj.session.execute(query.format(obj.seed_data_table))
obj.graph.add_nodes_from(obj.seeds)
for seed in seeds:
query = "SELECT friend_follower_id, is_friend, is_follower FROM {0} WHERE user_id={1}"
statement = SimpleStatement(query.format(obj.network_table, seed), fetch_size=1000)
for row in obj.session.execute(statement):
if row.friend_follower_id in obj.seeds:
if row.is_friend:
obj.graph.add_edge(seed, row.friend_follower_id)
elif row.is_follower:
obj.graph.add_edge(row.friend_follower_id, seed)
return obj
This also gets rid of many append operations on lists that you're not using, and should speed up this function.
Please suggest is there way to write query multi-column in clause using SQLAlchemy?
Here is example of the actual query:
SELECT url FROM pages WHERE (url_crc, url) IN ((2752937066, 'http://members.aye.net/~gharris/blog/'), (3799762538, 'http://www.coxandforkum.com/'));
I have a table that has two columns primary key and I'm hoping to avoid adding one more key just to be used as an index.
PS I'm using mysql DB.
Update: This query will be used for batch processing - so I would need to put few hundreds pairs into the in clause. With IN clause approach I hope to know fixed limit of how many pairs I can stick into one query. Like Oracle has 1000 enum limit by default.
Using AND/OR combination might be limited by the length of the query in chars. Which would be variable and less predictable.
Assuming that you have your model defined in Page, here's an example using tuple_:
keys = [
(2752937066, 'http://members.aye.net/~gharris/blog/'),
(3799762538, 'http://www.coxandforkum.com/')
]
select([
Page.url
]).select_from(
Page
).where(
tuple_(Page.url_crc, Page.url).in_(keys)
)
Or, using the query API:
session.query(Page.url).filter(tuple_(Page.url_crc, Page.url).in_(keys))
I do not think this is currently possible in sqlalchemy, and not all RDMBS support this.
You can always transform this to a OR(AND...) condition though:
filter_rows = [
(2752937066, 'http://members.aye.net/~gharris/blog/'),
(3799762538, 'http://www.coxandforkum.com/'),
]
qry = session.query(Page)
qry = qry.filter(or_(*(and_(Page.url_crc == crc, Page.url == url) for crc, url in filter_rows)))
print qry
should produce something like (for SQLite):
SELECT pages.id AS pages_id, pages.url_crc AS pages_url_crc, pages.url AS pages_url
FROM pages
WHERE pages.url_crc = ? AND pages.url = ? OR pages.url_crc = ? AND pages.url = ?
-- (2752937066L, 'http://members.aye.net/~gharris/blog/', 3799762538L, 'http://www.coxandforkum.com/')
Alternatively, you can combine two columns into just one:
filter_rows = [
(2752937066, 'http://members.aye.net/~gharris/blog/'),
(3799762538, 'http://www.coxandforkum.com/'),
]
qry = session.query(Page)
qry = qry.filter((func.cast(Page.url_crc, String) + '|' + Page.url).in_(["{}|{}".format(*_frow) for _frow in filter_rows]))
print qry
which produces the below (for SQLite), so you can use IN:
SELECT pages.id AS pages_id, pages.url_crc AS pages_url_crc, pages.url AS pages_url
FROM pages
WHERE (CAST(pages.url_crc AS VARCHAR) || ? || pages.url) IN (?, ?)
-- ('|', '2752937066|http://members.aye.net/~gharris/blog/', '3799762538|http://www.coxandforkum.com/')
I ended up using the test() based solution: generated "(a,b) in ((:a1, :b1), (:a2,:b2), ...)" with named bind vars and generating dictionary with bind vars' values.
params = {}
for counter, r in enumerate(records):
a_param = "a%s" % counter
params[a_param] = r['a']
b_param = "b%s" % counter
params[b_param] = r['b']
pair_text = "(:%s,:%s)" % (a_param, b_param)
enum_pairs.append(pair_text)
multicol_in_enumeration = ','.join(enum_pairs)
multicol_in_clause = text(
" (a,b) in (" + multicol_in_enumeration + ")")
q = session.query(Table.id, Table.a,
Table.b).filter(multicol_in_clause).params(params)
Another option I thought about using mysql upserts but this would make whole included even less portable for the other db engine then using multicolumn in clause.
Update SQLAlchemy has sqlalchemy.sql.expression.tuple_(*clauses, **kw) construct that can be used for the same purpose. (I haven't tried it yet)