Why doesn't this work? WHERE IN - python

(Not a duplicate. I know that there's a way of doing this that works: Parameter substitution for a SQLite "IN" clause.)
I'd like to know what I'm missing in this code. I build a simple table. Then I successfully copy some of its records to a new table where the records are qualified by a WHERE clause that involves two lists. Having tossed that table I attempt to copy the same records but this time I put the list into a variable which I insert into the sql statement. This time no records are copied.
How come?
import sqlite3
conn = sqlite3.connect(':memory:')
curs = conn.cursor()
oldTableRecords = [ [ 15, 3 ], [ 2, 1], [ 44, 2], [ 6, 9 ] ]
curs.execute('create table oldTable (ColA integer, ColB integer)')
curs.executemany('insert into oldTable (ColA, ColB) values (?,?)', oldTableRecords)
print ('This goes ...')
curs.execute('''create table newTable as
select * from oldTable
where ColA in (15,3,44,9) or ColB in (15,3,44,9)''')
for row in curs.execute('select * from newTable'):
print ( row)
curs.execute('''drop table newTable''')
print ('This does not ...')
TextTemp = ','.join("15 3 44 9".split())
print (TextTemp)
curs.execute('''create table newTable as
select * from oldTable
where ColA in (?) or ColB in (?)''', (TextTemp,TextTemp))
for row in curs.execute('select * from newTable'):
print ( row)
Output:
This goes ...
(15, 3)
(44, 2)
(6, 9)
This does not ...
15,3,44,9
TIA!

The whole point of a SQL parameter is to prevent SQL syntax in values from being executed. That includes commas between values; if this wasn't the case then you couldn't ever use values with commas in query parameters and is probably a security issue to boot.
You can't just use one ? to insert multiple values into a query; the whole TextTemp value is seen as one value, producing the following equivalent:
create table newTable as
select * from oldTable
where ColA in ('15,3,44,9') or ColB in ('15,3,44,9')
None of the values in ColA or ColB have a single row with the string value 15,3,44,9.
You need to use separate placeholders for each of the values in your parameter:
col_values = [int(v) for v in "15 3 44 9".split()]
placeholders = ', '.join(['?'] * len(col_values))
sql = '''create table newTable as
select * from oldTable
where ColA in ({0}) or ColB in ({0})'''.format(placeholders)
curs.execute(sql, col_values * 2)

Related

Is there a way to automate UNION ALL insertion?

I am using an oracle 19c, and I am trying to insert using the union all method. I tried to automate it and I am getting ORA-00907.
Here is my code:
def insert(items):
# items is a list of dicts ->
# [{"test": "Test", "Test": "test", "r": "a"}, {"test": "Test", "Test": "test", "s": "a"}...]
cursor = connection.cursor()
insertions = []
for item in items:
insertions.append(item["test"], item["Test"])
query = """INSERT INTO C##USER.RANDOM
SELECT (:1, :2) FROM dual
""" + "\n".join(["UNION ALL SELECT (:{i}, {i+1}) FROM dual" for i in range(3, len(insertions), 2)])
cursor.execute(query, insertions)
I believe executemany is the better option for your use case.
example from the page:
dataToInsert = [
(10, 'Parent 10'),
(20, 'Parent 20'),
(30, 'Parent 30'),
(40, 'Parent 40'),
(50, 'Parent 50')
]
cursor.executemany("insert into ParentTable values (:1, :2)", dataToInsert)
If you want to effectively insert a large number of generated test data,
you should neither use
INSERT /*+APPEND*/ INTO ... VALUES (...)
As in the related question.
Note that you are inserting row by row so the APPENDhint is meaningless here and is ignored.
neither you should use as large UNION ALL select and bind thousands of bind variables.
As pointed by others this will take a large parsing time.
You should approach this with one INSERT statement that process all rows to be inserted:
Example
insert /*+ APPEND */ into tab (col1,col2)
select rownum, 'Test'||rownum from dual
connect by level <= 10000;
Note this will populate your table with 10000 rows such as
COL1 COL2
---------- --------------------------------------------
1 Test1
2 Test2
3 Test3
4 Test4
5 Test5
....

Count the number of non-null values in each column of each table in MySQL

Is there a way to produce this output using SQL for all tables in a given database (using MySQL) without having to specify individual table names and columns?
Table Column Count
---- ---- ----
Table1 Col1 0
Table1 Col2 100
Table1 Col3 0
Table1 Col4 67
Table1 Col5 0
Table2 Col1 30
Table2 Col2 0
Table2 Col3 2
... ... ...
The purpose is to identify columns for analysis based on how much data they contain (a significant number of columns are empty).
The 'workaround' solution using python (one table at a time):
# Libraries
import pymysql
import pandas as pd
import pymysql.cursors
# Connect to mariaDB
connection = pymysql.connect(host='localhost',
user='root',
password='my_password',
db='my_database',
charset='latin1',
cursorclass=pymysql.cursors.DictCursor)
# Get column metadata
sql = """SELECT *
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA`='my_database'
"""
with connection.cursor() as cursor:
cursor.execute(sql)
result = cursor.fetchall()
# Store in dataframe
df = pd.DataFrame(result)
df = df[['TABLE_NAME', 'COLUMN_NAME']]
# Build SQL string (one table at a time for now)
my_table = 'my_table'
df_my_table = df[df.TABLE_NAME==my_table].copy()
cols = list(df_my_table.COLUMN_NAME)
col_strings = [''.join(['COUNT(', x, ') AS ', x, ', ']) for x in cols]
col_strings[-1] = col_strings[-1].replace(',','')
sql = ''.join(['SELECT '] + col_strings + ['FROM ', my_table])
# Execute
with connection.cursor() as cursor:
cursor.execute(sql)
result = cursor.fetchall()
The result is a dictionary of column names and counts.
Basically, no. See also this answer.
Also, note that the closest match of the answer above is actually the method you're already using, but less efficiently implemented in reflective SQL.
I'd do the same as you did - build a SQL like
SELECT
COUNT(*) AS `count`,
SUM(IF(columnName1 IS NULL,1,0)) AS columnName1,
...
SUM(IF(columnNameN IS NULL,1,0)) AS columnNameN
FROM tableName;
using information_schema as a source for table and column names, then execute it for each table in MySQL, then disassemble the single row returned into N tuple entries (tableName, columnName, total, nulls).
It is possible, but it's not going to be quick.
As mentioned in a previous answer you can work your way through the columns table in the information_schema to build queries to get the counts. It's then just a question of how long you are prepared to wait for the answer because you end up counting every row, for every column, in every table. You can speed things up a bit if you exclude columns that are defined as NOT NULL in the cursor (i.e. IS_NULLABLE = 'YES').
The solution suggested by LSerni is going to be much faster, particularly if you have very wide tables and/or high row counts, but would require more work handling the results.
e.g.
DELIMITER //
DROP PROCEDURE IF EXISTS non_nulls //
CREATE PROCEDURE non_nulls (IN sname VARCHAR(64))
BEGIN
-- Parameters:
-- Schema name to check
-- call non_nulls('sakila');
DECLARE vTABLE_NAME varchar(64);
DECLARE vCOLUMN_NAME varchar(64);
DECLARE vIS_NULLABLE varchar(3);
DECLARE vCOLUMN_KEY varchar(3);
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE cur1 CURSOR FOR
SELECT `TABLE_NAME`, `COLUMN_NAME`, `IS_NULLABLE`, `COLUMN_KEY`
FROM `information_schema`.`columns`
WHERE `TABLE_SCHEMA` = sname
ORDER BY `TABLE_NAME` ASC, `ORDINAL_POSITION` ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;
DROP TEMPORARY TABLE IF EXISTS non_nulls;
CREATE TEMPORARY TABLE non_nulls(
table_name VARCHAR(64),
column_name VARCHAR(64),
column_key CHAR(3),
is_nullable CHAR(3),
rows BIGINT,
populated BIGINT
);
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO vTABLE_NAME, vCOLUMN_NAME, vIS_NULLABLE, vCOLUMN_KEY;
IF done THEN
LEAVE read_loop;
END IF;
SET #sql := CONCAT('INSERT INTO non_nulls ',
'(table_name,column_name,column_key,is_nullable,rows,populated) ',
'SELECT \'', vTABLE_NAME, '\',\'', vCOLUMN_NAME, '\',\'', vCOLUMN_KEY, '\',\'',
vIS_NULLABLE, '\', COUNT(*), COUNT(`', vCOLUMN_NAME, '`) ',
'FROM `', sname, '`.`', vTABLE_NAME, '`');
PREPARE stmt1 FROM #sql;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
END LOOP;
CLOSE cur1;
SELECT * FROM non_nulls;
END //
DELIMITER ;
call non_nulls('sakila');

Run checks on Items from tables in Sqlite and python

I have two tables below:
----------
Items | QTY
----------
sugar | 14
mango | 10
apple | 50
berry | 1
----------
Items |QTY
----------
sugar |10
mango |5
apple |48
berry |1
I use the following query in python to check difference between the QTY of table one and table two.
cur = conn.cursor()
cur.execute("select s.Items, s.qty - t.qty as quantity from Stock s join Second_table t on s.Items = t.Items;")
remaining_quantity = cur.fetchall()
I'm a bit stuck on how to go about what I need to accomplish. I need to check the difference between the quantity of table one and table two, if the quantity (difference) is under 5 then for those Items I want to be able to store this in another table column with the value 1 if not then the value will be 0 for those Items. How can I go about this?
Edit:
I have attempted this like by looping through the rows and if the column value is less than 5 then insert into the new table with the value below. :
for row in remaining_quantity:
print(row[1])
if((row[1]) < 5):
cur.execute('INSERT OR IGNORE INTO check_quantity_tb VALUES (select distinct s.Items, s.qty, s.qty - t.qty as quantity, 1 from Stock s join Second_table t on s.Items = t.Items'), row)
print(row)
But I get a SQL syntax error not sure where the error could be :/
First modify your first query so you retrieve all relevant infos and don't have to issue subqueries later:
readcursor = conn.cursor()
readcursor.execute(
"select s.Items, s.qty, s.qty - t.qty as remain "
"from Stock s join Second_table t on s.Items = t.Items;"
)
Then use it to update your third table:
writecursor = conn.cursor()
for items, qty, remain in readcursor:
print(remain)
if remain < 5:
writecursor.execute(
'INSERT OR IGNORE INTO check_quantity_tb VALUES (?, ?, ?, ?)',
(items, qty, remain, 1)
)
conn.commit()
Note the following points:
1/ We use two distinct cursor so we can iterate over the first one while wrting with the second one. This avoids fetching all results in memory, which can be a real life saver on huge datasets
2/ when iterating on the first cursor, we unpack the rows into their individual componants. This is called "tuple unpacking" (but actually works for most sequence types):
>>> row = ("1", "2", "3")
>>> a, b, c = row
>>> a
'1'
>>> b
'2'
>>> c
'3'
3/ We let the db-api module do the proper sanitisation and escaping of the values we want to insert. This avoids headaches with escaping / quoting etc and protects your code from SQL injection attacks (not that you might have one here, but that's the correct way to write parameterized queries in Python).
NB : since you didn't not post your full table definitions nor clear specs - not even the full error message and traceback - I only translated your code snippet to something more sensible (avoiding the costly and useless subquery, which migh or not be the cause of your error). I can't garantee it will work out of the box, but at least it should put you back on tracks.
NB2 : you mentionned you had to set the last col to either 1 or 0 depending on remain value. If that's the case, you want your loop to be:
writecursor = conn.cursor()
for items, qty, remain in readcursor:
print(remain)
flag = 1 if remain < 5 else 0
writecursor.execute(
'INSERT OR IGNORE INTO check_quantity_tb VALUES (?, ?, ?, ?)',
(items, qty, remain, flag)
)
conn.commit()
If you instead only want to process rows where remain < 5, you can specify it directly in your first query with a where clause.

Python Sqlite3 insert operation with a list of column names

Normally, if i want to insert values into a table, i will do something like this (assuming that i know which columns that the values i want to insert belong to):
conn = sqlite3.connect('mydatabase.db')
conn.execute("INSERT INTO MYTABLE (ID,COLUMN1,COLUMN2)\
VALUES(?,?,?)",[myid,value1,value2])
But now i have a list of columns (the length of list may vary) and a list of values for each columns in the list.
For example, if i have a table with 10 columns (Namely, column1, column2...,column10 etc). I have a list of columns that i want to update.Let's say [column3,column4]. And i have a list of values for those columns. [value for column3,value for column4].
How do i insert the values in the list to the individual columns that each belong?
As far as I know the parameter list in conn.execute works only for values, so we have to use string formatting like this:
import sqlite3
conn = sqlite3.connect(':memory:')
conn.execute('CREATE TABLE t (a integer, b integer, c integer)')
col_names = ['a', 'b', 'c']
values = [0, 1, 2]
conn.execute('INSERT INTO t (%s, %s, %s) values(?,?,?)'%tuple(col_names), values)
Please notice this is a very bad attempt since strings passed to the database shall always be checked for injection attack. However you could pass the list of column names to some injection function before insertion.
EDITED:
For variables with various length you could try something like
exec_text = 'INSERT INTO t (' + ','.join(col_names) +') values(' + ','.join(['?'] * len(values)) + ')'
conn.exec(exec_text, values)
# as long as len(col_names) == len(values)
Of course string formatting will work, you just need to be a bit cleverer about it.
col_names = ','.join(col_list)
col_spaces = ','.join(['?'] * len(col_list))
sql = 'INSERT INTO t (%s) values(%s)' % (col_list, col_spaces)
conn.execute(sql, values)
I was looking for a solution to create columns based on a list of unknown / variable length and found this question. However, I managed to find a nicer solution (for me anyway), that's also a bit more modern, so thought I'd include it in case it helps someone:
import sqlite3
def create_sql_db(my_list):
file = 'my_sql.db'
table_name = 'table_1'
init_col = 'id'
col_type = 'TEXT'
conn = sqlite3.connect(file)
c = conn.cursor()
# CREATE TABLE (IF IT DOESN'T ALREADY EXIST)
c.execute('CREATE TABLE IF NOT EXISTS {tn} ({nf} {ft})'.format(
tn=table_name, nf=init_col, ft=col_type))
# CREATE A COLUMN FOR EACH ITEM IN THE LIST
for new_column in my_list:
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))
conn.close()
my_list = ["Col1", "Col2", "Col3"]
create_sql_db(my_list)
All my data is of the type text, so I just have a single variable "col_type" - but you could for example feed in a list of tuples (or a tuple of tuples, if that's what you're into):
my_other_list = [("ColA", "TEXT"), ("ColB", "INTEGER"), ("ColC", "BLOB")]
and change the CREATE A COLUMN step to:
for tupl in my_other_list:
new_column = tupl[0] # "ColA", "ColB", "ColC"
col_type = tupl[1] # "TEXT", "INTEGER", "BLOB"
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))
As a noob, I can't comment on the very succinct, updated solution #ron_g offered. While testing, though I had to frequently delete the sample database itself, so for any other noobs using this to test, I would advise adding in:
c.execute('DROP TABLE IF EXISTS {tn}'.format(
tn=table_name))
Prior the the 'CREATE TABLE ...' portion.
It appears there are multiple instances of
.format(
tn=table_name ....)
in both 'CREATE TABLE ...' and 'ALTER TABLE ...' so trying to figure out if it's possible to create a single instance (similar to, or including in, the def section).

sqlite3 in Python extract entries that correspond to values in tuple

I am looking for an sqlite3 command that let's me select entries given by a tuple, let me explain with an example:
Here is my data:
my_data = [(1,8),(2,4),(3,5),(4,7),(5,13)]
and I am trying to extract entries who's first values are either 1,2 or 4; hence, my desired output is:
[(1, 8), (2, 4), (4, 7)]
I can achieve that with a code below; however, I think that my code is not optimal:
import sqlite3
my_data = [(1,8),(2,4),(3,5),(4,7),(5,13)]
key_indexes = (1,2,4)
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''CREATE TABLE my_table
(val_1 INTEGER, val_2 INTEGER)''')
for entry in my_data:
c.execute('''INSERT INTO my_table VALUES(?,?)''',entry)
conn.commit()
result = []
for ind in key_indexes:
c.execute('''SELECT* FROM my_table WHERE val_1 = ?''', (ind,))
temp_res = c.fetchall()
result.extend(temp_res)
I am looking for a code that can replace the for loop at the and with an sqlite3 command.
I want to stick (1,2,4) somewhere in this line:
c.execute('''SELECT* FROM my_table WHERE val_1 = ?''', (ind,))
instead of doing a for loop.
Thank You in Advance
To replace the last for-loop you can build up the list of indexes/indices in a string and hit the database just once.
Please note that this is a two-step process, and not vulnerable -- in and of itself -- to SQL injection attacks.
my_query = '''SELECT val_1, val_2
FROM my_table
WHERE val_1 IN ({:s});'''.format(",".join("?"*len(key_indexes)))
# -> 'SELECT val_1, val_2 FROM my_table WHERE val_1 IN (?,?,?);'
c.execute(myquery, ind).fetchall()
Additionally:
You didn't directly ask about this, but the first for loop and call to execute() could be reduced to a single call to executemany().
You should test which of the two options is faster because the DB-API doesn't specify exactly how executemany() should be implemented; performance may differ across RDBMSs.
c.executemany('''INSERT INTO my_table VALUES (?,?);''', my_data)
You may read up on executemany() here: http://www.python.org/dev/peps/pep-0249/
For now, suffice it to say that it takes as the second argument a sequence of parameters.

Categories

Resources