I want get a db into pandas df in Python. I use a following code:
self.cursor = self.connection.cursor()
query = """
SELECT * FROM `an_visit` AS `visit`
JOIN `an_ip` AS `ip` ON (`visit`.`ip_id` = `ip`.`ip_id`)
JOIN `an_useragent` AS `useragent` ON (`visit`.`useragent_id` = `useragent`.`useragent_id`)
JOIN `an_pageview` AS `pageview` ON (`visit`.`visit_id` = `pageview`.`visit_id`)
WHERE `visit`.`visit_id` BETWEEN %s AND %s
"""
self.cursor.execute(query, (start_id, end_id))
df = pd.DataFrame(self.cursor.fetchall())
This code works, but I want to get column names as well. I tried this question MySQL: Get column name or alias from query
but this did not work:
fields = map(lambda x: x[0], self.cursor.description)
result = [dict(zip(fields, row)) for row in self.cursor.fetchall()]
How can I get column names from db into df? Thanks
The easy way to include column names within recordset is to set dictionary=True as following:
self.cursor = self.connection.cursor(dictionary=True)
Then, all of fetch(), fetchall() and fetchone() are return dictionary with column name and data
check out links:
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursordict.html
https://mariadb-corporation.github.io/mariadb-connector-python/connection.html
What work to me is:
field_names = [i[0] for i in self.cursor.description ]
the best practice to list all the columns in the database is to execute this query form the connection cursor
SELECT TABLE_CATALOG,TABLE_SCHEMA,TABLE_NAME,COLUMN_NAME,DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA='<schema>' AND TABLE_NAME = '<table_name>'
There is a column_names properties in MySql cursor that you can use:
row = dict(zip(self.cursor.column_names, self.cursor.fetchone()))
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-column-names.html
Related
I'd like to be able to get the column names of my query back from the database so I can then use it to dynamically set the column length and name of my gui table.
So far after establishing a connection I've tried:
mycursor = mydb.cursor()
mycursor.callproc('db1.load_table', args)
for r in mycursor.stored_results():
result = r.fetchall()
column_names = mycursor.column_names
I've also tried using mycursor.description
It didn't work.
What's the right approach here?
Something like this you can :
column_names = [desc[0] for desc in mycursor.description]
I'm trying to do this query in sqlalchemy
SELECT id, name FROM user WHERE id IN (123, 456)
I would like to bind the list [123, 456] at execution time.
How about
session.query(MyUserClass).filter(MyUserClass.id.in_((123,456))).all()
edit: Without the ORM, it would be
session.execute(
select(
[MyUserTable.c.id, MyUserTable.c.name],
MyUserTable.c.id.in_((123, 456))
)
).fetchall()
select() takes two parameters, the first one is a list of fields to retrieve, the second one is the where condition. You can access all fields on a table object via the c (or columns) property.
Assuming you use the declarative style (i.e. ORM classes), it is pretty easy:
query = db_session.query(User.id, User.name).filter(User.id.in_([123,456]))
results = query.all()
db_session is your database session here, while User is the ORM class with __tablename__ equal to "users".
An alternative way is using raw SQL mode with SQLAlchemy, I use SQLAlchemy 0.9.8, python 2.7, MySQL 5.X, and MySQL-Python as connector, in this case, a tuple is needed. My code listed below:
id_list = [1, 2, 3, 4, 5] # in most case we have an integer list or set
s = text('SELECT id, content FROM myTable WHERE id IN :id_list')
conn = engine.connect() # get a mysql connection
rs = conn.execute(s, id_list=tuple(id_list)).fetchall()
Hope everything works for you.
Just wanted to share my solution using sqlalchemy and pandas in python 3. Perhaps, one would find it useful.
import sqlalchemy as sa
import pandas as pd
engine = sa.create_engine("postgresql://postgres:my_password#my_host:my_port/my_db")
values = [val1,val2,val3]
query = sa.text("""
SELECT *
FROM my_table
WHERE col1 IN :values;
""")
query = query.bindparams(values=tuple(values))
df = pd.read_sql(query, engine)
With the expression API, which based on the comments is what this question is asking for, you can use the in_ method of the relevant column.
To query
SELECT id, name FROM user WHERE id in (123,456)
use
myList = [123, 456]
select = sqlalchemy.sql.select([user_table.c.id, user_table.c.name], user_table.c.id.in_(myList))
result = conn.execute(select)
for row in result:
process(row)
This assumes that user_table and conn have been defined appropriately.
Or maybe use .in_(list), similar to what #Carl has already suggested
as
stmt = select(
id,
name
).where(
id.in_(idlist),
)
Complete code assuming you have the data model in User class:
def fetch_name_ids(engine, idlist):
# create an empty dataframe
df = pd.DataFrame()
try:
# create session with engine
session = Session(engine, future=True)
stmt = select(
User.id,
User.name
).where(
User.id.in_(idlist),
)
data = session.execute(stmt)
df = pd.DataFrame(data.all())
if len(df) > 0:
df.columns = data.keys()
else:
columns = data.keys()
df = pd.DataFrame(columns=columns)
except SQLAlchemyError as e:
error = str(e.__dict__['orig'])
session.rollback()
raise error
else:
session.commit()
finally:
engine.dispose()
session.close()
return df
This question posted a solution to the select query, unfortunately, it is not working for the update query. Using this solution, it would help even in the select conditions also.
Update Query Solution:
id_list = [1, 2, 3, 4, 5] # in most cases we have an integer list or set
query = 'update myTable set content = 1 WHERE id IN {id_list}'.format(id_list=tuple(id_list))
conn.execute(query)
Note: Use a tuple instead of a list.
Just an addition to the answers above.
If you want to execute a SQL with an "IN" statement you could do this:
ids_list = [1,2,3]
query = "SELECT id, name FROM user WHERE id IN %s"
args = [(ids_list,)] # Don't forget the "comma", to force the tuple
conn.execute(query, args)
Two points:
There is no need for Parenthesis for the IN statement(like "... IN(%s) "), just put "...IN %s"
Force the list of your ids to be one element of a tuple. Don't forget the " , " : (ids_list,)
EDIT
Watch out that if the length of list is one or zero this will raise an error!
Python Version - 2.7.6
Pandas Version - 0.17.1
MySQLdb Version - 1.2.5
In my database ( PRODUCT ) , I have a table ( XML_FEED ). The table XML_FEED is huge ( Millions of record )
I have a pandas.DataFrame() ( PROCESSED_DF ). The dataframe has thousands of rows.
Now I need to run this
REPLACE INTO TABLE PRODUCT.XML_FEED
(COL1, COL2, COL3, COL4, COL5),
VALUES (PROCESSED_DF.values)
Question:-
Is there a way to run REPLACE INTO TABLE in pandas? I already checked pandas.DataFrame.to_sql() but that is not what I need. I do not prefer to read XML_FEED table in pandas because it very huge.
With the release of pandas 0.24.0, there is now an official way to achieve this by passing a custom insert method to the to_sql function.
I was able to achieve the behavior of REPLACE INTO by passing this callable to to_sql:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(table.table.insert(replace_string=""), data)
You would pass it like so:
df.to_sql(db, if_exists='append', method=mysql_replace_into)
Alternatively, if you want the behavior of INSERT ... ON DUPLICATE KEY UPDATE ... instead, you can use this:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
data = [dict(zip(keys, row)) for row in data_iter]
stmt = insert(table.table).values(data)
update_stmt = stmt.on_duplicate_key_update(**dict(zip(stmt.inserted.keys(),
stmt.inserted.values())))
conn.execute(update_stmt)
Credits to https://stackoverflow.com/a/11762400/1919794 for the compile method.
Till this version (0.17.1) I am unable find any direct way to do this in pandas. I reported a feature request for the same.
I did this in my project with executing some queries using MySQLdb and then using DataFrame.to_sql(if_exists='append')
Suppose
1) product_id is my primary key in table PRODUCT
2) feed_id is my primary key in table XML_FEED.
SIMPLE VERSION
import MySQLdb
import sqlalchemy
import pandas
con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password#localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated
Please note:-
The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.
I needed a generic solution to this problem, so I built on shiva's answer--maybe it will be helpful to others. This is useful in situations where you grab a table from a MySQL database (whole or filtered), update/add some rows, and want to perform a REPLACE INTO statement with df.to_sql().
It finds the table's primary keys, performs a delete statement on the MySQL table with all keys from the pandas dataframe, and then inserts the dataframe into the MySQL table.
def to_sql_update(df, engine, schema, table):
df.reset_index(inplace=True)
sql = ''' SELECT column_name from information_schema.columns
WHERE table_schema = '{schema}' AND table_name = '{table}' AND
COLUMN_KEY = 'PRI';
'''.format(schema=schema, table=table)
id_cols = [x[0] for x in engine.execute(sql).fetchall()]
id_vals = [df[col_name].tolist() for col_name in id_cols]
sql = ''' DELETE FROM {schema}.{table} WHERE 0 '''.format(schema=schema, table=table)
for row in zip(*id_vals):
sql_row = ' AND '.join([''' {}='{}' '''.format(n, v) for n, v in zip(id_cols, row)])
sql += ' OR ({}) '.format(sql_row)
engine.execute(sql)
df.to_sql(table, engine, schema=schema, if_exists='append', index=False)
If you use to_sql you should be able to define it so that you replace values if they exist, so for a table named 'mydb' and a dataframe named 'df', you'd use:
df.to_sql(mydb,if_exists='replace')
That should replace values if they already exist, but I am not 100% sure if that's what you're looking for.
I have the data in pandas dataframe which I am storing in SQLITE database using Python. When I am trying to query the tables inside it, I am able to get the results but without the column names. Can someone please guide me.
sql_query = """Select date(report_date), insertion_order_id, sum(impressions), sum(clicks), (sum(clicks)+0.0)/sum(impressions)*100 as CTR
from RawDailySummaries
Group By report_date, insertion_order_id
Having report_date like '2014-08-12%' """
cursor.execute(sql_query)
query1 = cursor.fetchall()
for i in query1:
print i
Below is the output that I get
(u'2014-08-12', 10187, 2024, 8, 0.3952569169960474)
(u'2014-08-12', 12419, 15054, 176, 1.1691244851866613)
What do I need to do to display the results in a tabular form with column names
In DB-API 2.0 compliant clients, cursor.description is a sequence of 7-item sequences of the form (<name>, <type_code>, <display_size>, <internal_size>, <precision>, <scale>, <null_ok>), one for each column, as described here. Note description will be None if the result of the execute statement is empty.
If you want to create a list of the column names, you can use list comprehension like this: column_names = [i[0] for i in cursor.description] then do with them whatever you'd like.
Alternatively, you can set the row_factory parameter of the connection object to something that provides column names with the results. An example of a dictionary-based row factory for SQLite is found here, and you can see a discussion of the sqlite3.Row type below that.
Step 1: Select your engine like pyodbc, SQLAlchemy etc.
Step 2: Establish connection
cursor = connection.cursor()
Step 3: Execute SQL statement
cursor.execute("Select * from db.table where condition=1")
Step 4: Extract Header from connection variable description
headers = [i[0] for i in cursor.description]
print(headers)
Try Pandas .read_sql(), I can't check it right now but it should be something like:
pd.read_sql( Q , connection)
Here is a sample code using cx_Oracle, that should do what is expected:
import cx_Oracle
def test_oracle():
connection = cx_Oracle.connect('user', 'password', 'tns')
try:
cursor = connection.cursor()
cursor.execute('SELECT day_no,area_code ,start_date from dic.b_td_m_area where rownum<10')
#only print head
title = [i[0] for i in cursor.description]
print(title)
# column info
for x in cursor.description:
print(x)
finally:
cursor.close()
if __name__ == "__main__":
test_oracle();
I am trying to access PostgreSQL using psycopg2:
sql = """
SELECT
%s
FROM
table;
"""
cur = con.cursor()
input = (['id', 'name'], )
cur.execute(sql, input)
data = pd.DataFrame.from_records(cur.fetchall())
However, the returned result is:
0
0 [id, name]
1 [id, name]
2 [id, name]
3 [id, name]
4 [id, name]
If I try to access single column, it looks like:
0
0 id
1 id
2 id
3 id
4 id
It looks like something is wrong with the quoting around column name (single quote which should not be there):
In [49]: print cur.mogrify(sql, input)
SELECT
'id'
FROM
table;
but I am following doc: http://initd.org/psycopg/docs/usage.html#
Anyone can tell me what is going on here? Thanks a lot!!!
Use the AsIs extension
import psycopg2
from psycopg2.extensions import AsIs
column_list = ['id','name']
columns = ', '.join(column_list)
cursor.execute("SELECT %s FROM table", (AsIs(columns),))
And mogrify will show that it is not quoting the column names and passing them in as is.
Nowadays, you can use sql.Identifier to do this in a clean and secure way :
from psycopg2 import sql
statement = """
SELECT
{id}, {name}
FROM
table;
"""
with con.cursor() as cur:
cur.execute(sql.SQL(statement).format(
id=sql.SQL.Identifier("id"),
name=sql.SQL.Identifier("name")
))
data = pd.DataFrame.from_records(cur.fetchall())
More information on query composition here : https://www.psycopg.org/docs/sql.html
The reason was that you were passing the string representation of the array ['id', 'name'] as SQL query parameter but not as the column names. So the resulting query was similar to
SELECT 'id, name' FROM table
Looks your table had 5 rows so the returned result was just this literal for each row.
Column names cannot be the SQL query parameters but can be just the usual string parameters which you can prepare before executing the query-
sql = """
SELECT
%s
FROM
table;
"""
input = 'id, name'
sql = sql % input
print(sql)
cur = con.cursor()
cur.execute(sql)
data = pd.DataFrame.from_records(cur.fetchall())
In this case the resulting query is
SELECT
id, name
FROM
table;