I am trying to access PostgreSQL using psycopg2:
sql = """
SELECT
%s
FROM
table;
"""
cur = con.cursor()
input = (['id', 'name'], )
cur.execute(sql, input)
data = pd.DataFrame.from_records(cur.fetchall())
However, the returned result is:
0
0 [id, name]
1 [id, name]
2 [id, name]
3 [id, name]
4 [id, name]
If I try to access single column, it looks like:
0
0 id
1 id
2 id
3 id
4 id
It looks like something is wrong with the quoting around column name (single quote which should not be there):
In [49]: print cur.mogrify(sql, input)
SELECT
'id'
FROM
table;
but I am following doc: http://initd.org/psycopg/docs/usage.html#
Anyone can tell me what is going on here? Thanks a lot!!!
Use the AsIs extension
import psycopg2
from psycopg2.extensions import AsIs
column_list = ['id','name']
columns = ', '.join(column_list)
cursor.execute("SELECT %s FROM table", (AsIs(columns),))
And mogrify will show that it is not quoting the column names and passing them in as is.
Nowadays, you can use sql.Identifier to do this in a clean and secure way :
from psycopg2 import sql
statement = """
SELECT
{id}, {name}
FROM
table;
"""
with con.cursor() as cur:
cur.execute(sql.SQL(statement).format(
id=sql.SQL.Identifier("id"),
name=sql.SQL.Identifier("name")
))
data = pd.DataFrame.from_records(cur.fetchall())
More information on query composition here : https://www.psycopg.org/docs/sql.html
The reason was that you were passing the string representation of the array ['id', 'name'] as SQL query parameter but not as the column names. So the resulting query was similar to
SELECT 'id, name' FROM table
Looks your table had 5 rows so the returned result was just this literal for each row.
Column names cannot be the SQL query parameters but can be just the usual string parameters which you can prepare before executing the query-
sql = """
SELECT
%s
FROM
table;
"""
input = 'id, name'
sql = sql % input
print(sql)
cur = con.cursor()
cur.execute(sql)
data = pd.DataFrame.from_records(cur.fetchall())
In this case the resulting query is
SELECT
id, name
FROM
table;
Related
I want get a db into pandas df in Python. I use a following code:
self.cursor = self.connection.cursor()
query = """
SELECT * FROM `an_visit` AS `visit`
JOIN `an_ip` AS `ip` ON (`visit`.`ip_id` = `ip`.`ip_id`)
JOIN `an_useragent` AS `useragent` ON (`visit`.`useragent_id` = `useragent`.`useragent_id`)
JOIN `an_pageview` AS `pageview` ON (`visit`.`visit_id` = `pageview`.`visit_id`)
WHERE `visit`.`visit_id` BETWEEN %s AND %s
"""
self.cursor.execute(query, (start_id, end_id))
df = pd.DataFrame(self.cursor.fetchall())
This code works, but I want to get column names as well. I tried this question MySQL: Get column name or alias from query
but this did not work:
fields = map(lambda x: x[0], self.cursor.description)
result = [dict(zip(fields, row)) for row in self.cursor.fetchall()]
How can I get column names from db into df? Thanks
The easy way to include column names within recordset is to set dictionary=True as following:
self.cursor = self.connection.cursor(dictionary=True)
Then, all of fetch(), fetchall() and fetchone() are return dictionary with column name and data
check out links:
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursordict.html
https://mariadb-corporation.github.io/mariadb-connector-python/connection.html
What work to me is:
field_names = [i[0] for i in self.cursor.description ]
the best practice to list all the columns in the database is to execute this query form the connection cursor
SELECT TABLE_CATALOG,TABLE_SCHEMA,TABLE_NAME,COLUMN_NAME,DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA='<schema>' AND TABLE_NAME = '<table_name>'
There is a column_names properties in MySql cursor that you can use:
row = dict(zip(self.cursor.column_names, self.cursor.fetchone()))
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-column-names.html
I have a sqlite database named StudentDB which has 3 columns Roll number, Name, Marks. Now I want to fetch only the columns that user selects in the IDE. User can select one column or two or all the three. How can I alter the query accordingly using Python?
I tried:
import sqlite3
sel={"Roll Number":12}
query = 'select * from StudentDB Where({seq})'.format(seq=','.join(['?']*len(sel))),[i for k,i in sel.items()]
con = sqlite3.connect(database)
cur = con.cursor()
cur.execute(query)
all_data = cur.fetchall()
all_data
I am getting:
operation parameter must be str
You should control the text of the query. The where clause shall allways be in the form WHERE colname=value [AND colname2=...] or (better) WHERE colname=? [AND ...] if you want to build a parameterized query.
So you want:
query = 'select * from StudentDB Where ' + ' AND '.join('"{}"=?'.format(col)
for col in sel.keys())
...
cur.execute(query, tuple(sel.values()))
In your code, the query is now a tuple instead of str and that is why the error.
I assume you want to execute a query like below -
select * from StudentDB Where "Roll number"=?
Then you can change the sql query like this (assuming you want and and not or) -
query = "select * from StudentDB Where {seq}".format(seq=" and ".join('"{}"=?'.format(k) for k in sel.keys()))
and execute the query like -
cur.execute(query, tuple(sel.values()))
Please make sure in your code the provided database is defined and contains the database name and studentDB is indeed the table name and not database name.
I am trying to iterate through a dataframe and fetching values from indivdidual column to use as my parameters in sql query.
for index,frame in df1.iterrows():
sql = "select * from issuers where column_1 = %s;"
cur.execute(sql, frame['column_1'])
row = cur.fetchone()
id = row[0]
print id
But I am getting the following error
"TypeError: not all arguments converted during string formatting"
How can I solve this? In case I need to add multiple parameters, how can I do that?
Instead of this:
cur.execute(sql, frame['column_1'])
Try this:
cur.execute(sql, [frame['column_1']])
The second parameter of execute is a list containing all values to be inseted into sql.
In order to insert multiple values use something as follows:
sql = "select * from issuers where column_1 = %s and column_2 = %s;"
cur.execute(sql, ["val1", "val2"])
For more information please refere to the documentation
EDIT
Here an example for INSERT INTO in SQL.
sql = "INSERT INTO user (firstname, lastname) VALUES (%s, %s)"
cur.execute(sql, ["John", "Doe"])
How do I insert a python dictionary into a Postgresql2 table? I keep getting the following error, so my query is not formatted correctly:
Error syntax error at or near "To" LINE 1: INSERT INTO bill_summary VALUES(To designate the facility of...
import psycopg2
import json
import psycopg2.extras
import sys
with open('data.json', 'r') as f:
data = json.load(f)
con = None
try:
con = psycopg2.connect(database='sanctionsdb', user='dbuser')
cur = con.cursor(cursor_factory=psycopg2.extras.DictCursor)
cur.execute("CREATE TABLE bill_summary(title VARCHAR PRIMARY KEY, summary_text VARCHAR, action_date VARCHAR, action_desc VARCHAR)")
for d in data:
action_date = d['action-date']
title = d['title']
summary_text = d['summary-text']
action_date = d['action-date']
action_desc = d['action-desc']
q = "INSERT INTO bill_summary VALUES(" +str(title)+str(summary_text)+str(action_date)+str(action_desc)+")"
cur.execute(q)
con.commit()
except psycopg2.DatabaseError, e:
if con:
con.rollback()
print 'Error %s' % e
sys.exit(1)
finally:
if con:
con.close()
You should use the dictionary as the second parameter to cursor.execute(). See the example code after this statement in the documentation:
Named arguments are supported too using %(name)s placeholders in the query and specifying the values into a mapping.
So your code may be as simple as this:
with open('data.json', 'r') as f:
data = json.load(f)
print(data)
""" above prints something like this:
{'title': 'the first action', 'summary-text': 'some summary', 'action-date': '2018-08-08', 'action-desc': 'action description'}
use the json keys as named parameters:
"""
cur = con.cursor()
q = "INSERT INTO bill_summary VALUES(%(title)s, %(summary-text)s, %(action-date)s, %(action-desc)s)"
cur.execute(q, data)
con.commit()
Note also this warning (from the same page of the documentation):
Warning: Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
q = "INSERT INTO bill_summary VALUES(" +str(title)+str(summary_text)+str(action_date)+str(action_desc)+")"
You're writing your query in a wrong way, by concatenating the values, they should rather be the comma-separated elements, like this:
q = "INSERT INTO bill_summary VALUES({0},{1},{2},{3})".format(str(title), str(summery_text), str(action_date),str(action_desc))
Since you're not specifying the columns names, I already suppose they are in the same orders as you have written the value in your insert query. There are basically two way of writing insert query in postgresql. One is by specifying the columns names and their corresponding values like this:
INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)
VALUES (value1, value2, value3,...valueN);
Another way is, You may not need to specify the column(s) name in the SQL query if you are adding values for all the columns of the table. However, make sure the order of the values is in the same order as the columns in the table. Which you have used in your query, like this:
INSERT INTO TABLE_NAME VALUES (value1,value2,value3,...valueN);
Is there a way to produce this output using SQL for all tables in a given database (using MySQL) without having to specify individual table names and columns?
Table Column Count
---- ---- ----
Table1 Col1 0
Table1 Col2 100
Table1 Col3 0
Table1 Col4 67
Table1 Col5 0
Table2 Col1 30
Table2 Col2 0
Table2 Col3 2
... ... ...
The purpose is to identify columns for analysis based on how much data they contain (a significant number of columns are empty).
The 'workaround' solution using python (one table at a time):
# Libraries
import pymysql
import pandas as pd
import pymysql.cursors
# Connect to mariaDB
connection = pymysql.connect(host='localhost',
user='root',
password='my_password',
db='my_database',
charset='latin1',
cursorclass=pymysql.cursors.DictCursor)
# Get column metadata
sql = """SELECT *
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA`='my_database'
"""
with connection.cursor() as cursor:
cursor.execute(sql)
result = cursor.fetchall()
# Store in dataframe
df = pd.DataFrame(result)
df = df[['TABLE_NAME', 'COLUMN_NAME']]
# Build SQL string (one table at a time for now)
my_table = 'my_table'
df_my_table = df[df.TABLE_NAME==my_table].copy()
cols = list(df_my_table.COLUMN_NAME)
col_strings = [''.join(['COUNT(', x, ') AS ', x, ', ']) for x in cols]
col_strings[-1] = col_strings[-1].replace(',','')
sql = ''.join(['SELECT '] + col_strings + ['FROM ', my_table])
# Execute
with connection.cursor() as cursor:
cursor.execute(sql)
result = cursor.fetchall()
The result is a dictionary of column names and counts.
Basically, no. See also this answer.
Also, note that the closest match of the answer above is actually the method you're already using, but less efficiently implemented in reflective SQL.
I'd do the same as you did - build a SQL like
SELECT
COUNT(*) AS `count`,
SUM(IF(columnName1 IS NULL,1,0)) AS columnName1,
...
SUM(IF(columnNameN IS NULL,1,0)) AS columnNameN
FROM tableName;
using information_schema as a source for table and column names, then execute it for each table in MySQL, then disassemble the single row returned into N tuple entries (tableName, columnName, total, nulls).
It is possible, but it's not going to be quick.
As mentioned in a previous answer you can work your way through the columns table in the information_schema to build queries to get the counts. It's then just a question of how long you are prepared to wait for the answer because you end up counting every row, for every column, in every table. You can speed things up a bit if you exclude columns that are defined as NOT NULL in the cursor (i.e. IS_NULLABLE = 'YES').
The solution suggested by LSerni is going to be much faster, particularly if you have very wide tables and/or high row counts, but would require more work handling the results.
e.g.
DELIMITER //
DROP PROCEDURE IF EXISTS non_nulls //
CREATE PROCEDURE non_nulls (IN sname VARCHAR(64))
BEGIN
-- Parameters:
-- Schema name to check
-- call non_nulls('sakila');
DECLARE vTABLE_NAME varchar(64);
DECLARE vCOLUMN_NAME varchar(64);
DECLARE vIS_NULLABLE varchar(3);
DECLARE vCOLUMN_KEY varchar(3);
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE cur1 CURSOR FOR
SELECT `TABLE_NAME`, `COLUMN_NAME`, `IS_NULLABLE`, `COLUMN_KEY`
FROM `information_schema`.`columns`
WHERE `TABLE_SCHEMA` = sname
ORDER BY `TABLE_NAME` ASC, `ORDINAL_POSITION` ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;
DROP TEMPORARY TABLE IF EXISTS non_nulls;
CREATE TEMPORARY TABLE non_nulls(
table_name VARCHAR(64),
column_name VARCHAR(64),
column_key CHAR(3),
is_nullable CHAR(3),
rows BIGINT,
populated BIGINT
);
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO vTABLE_NAME, vCOLUMN_NAME, vIS_NULLABLE, vCOLUMN_KEY;
IF done THEN
LEAVE read_loop;
END IF;
SET #sql := CONCAT('INSERT INTO non_nulls ',
'(table_name,column_name,column_key,is_nullable,rows,populated) ',
'SELECT \'', vTABLE_NAME, '\',\'', vCOLUMN_NAME, '\',\'', vCOLUMN_KEY, '\',\'',
vIS_NULLABLE, '\', COUNT(*), COUNT(`', vCOLUMN_NAME, '`) ',
'FROM `', sname, '`.`', vTABLE_NAME, '`');
PREPARE stmt1 FROM #sql;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
END LOOP;
CLOSE cur1;
SELECT * FROM non_nulls;
END //
DELIMITER ;
call non_nulls('sakila');