How to save sql table as pandas dataframe? - python

I have been trying to extracting a sql table using cx_oracle and saving it as pandas dataframe using the following script:
import cx_Oracle
import pandas as pd
id = 1234
connection = cx_Oracle.connect(user="user", password='pwd',dsn="dsn")
# Obtain a cursor
cursor = connection.cursor()
# Execute the query
query = """select * from table where id= {id}"""
my_sql =cursor.execute(query.format(id=id))
df_sql = pd.read_sql(my_sql, connection)
I am able to connect to the database but I am unable to save it as pandas dataframe. How do I do that? I get the following error :
File "file/to/path.py", line 38, in file
df_sql = pd.read_sql(my_sql, connection)
File "C:\file/to/path\venv\lib\site-packages\pandas\io\sql.py", line 495, in read_sql
return pandas_sql.read_query(
File "File/to/path\venv\lib\site-packages\pandas\io\sql.py", line 1771, in read_query
cursor = self.execute(*args)
File "File/to/path\venv\lib\site-packages\pandas\io\sql.py", line 1737, in execute
raise ex from exc
pandas.io.sql.DatabaseError: Execution failed on sql '<cx_Oracle.Cursor on <cx_Oracle.Connection to dsn>>': expecting string or bytes object

The first argument to the pd.read_sql should be the query (if I'm not mistaken). You are parsing a cursor object. Try replace my_sql in pd.read_sql with query i.e
pd.read_sql(query.format(id=id))
or use the cursor object i.e
df = pd.DataFrame(my_sql.fetchall())
Note, fetchall() does only return the data i.e not the header, which can be obtained using cursor.description (see the SO answer here )

Related

Python PYODBC - Previous SQL was not a query

I have the following python code, it reads through a text file line by line and takes characters x to y of each line as the variable "Contract".
import os
import pyodbc
cnxn = pyodbc.connect(r'DRIVER={SQL Server};CENSORED;Trusted_Connection=yes;')
cursor = cnxn.cursor()
claimsfile = open('claims.txt','r')
for line in claimsfile:
#ldata = claimsfile.readline()
contract = line[18:26]
print(contract)
cursor.execute("USE calms SELECT XREF_PLAN_CODE FROM calms_schema.APP_QUOTE WHERE APPLICATION_ID = "+str(contract))
print(cursor.fetchall())
When including the line cursor.fetchall(), the following error is returned:
Programming Error: Previous SQL was not a query.
The query runs in SSMS and replace str(contract) with the actual value of the variable results will be returned as expected.
Based on the data, the query will return one value as a result formatted as NVARCHAR(4).
Most other examples have variables declared prior to the loop and the proposed solution is to set NO COUNT on, this does not apply to my problem so I am slightly lost.
P.S. I have also put the query in its own standalone file without the loop to iterate through the file in case this was causing the problem without success.
In your SQL query, you are actually making two commands: USE and SELECT and the cursor is not set up with multiple statements. Plus, with database connections, you should be selecting the database schema in the connection string (i.e., DATABASE argument), so TSQL's USE is not needed.
Consider the following adjustment with parameterization where APPLICATION_ID is assumed to be integer type. Add credentials as needed:
constr = 'DRIVER={SQL Server};SERVER=CENSORED;Trusted_Connection=yes;' \
'DATABASE=calms;UID=username;PWD=password'
cnxn = pyodbc.connect(constr)
cur = cnxn.cursor()
with open('claims.txt','r') as f:
for line in f:
contract = line[18:26]
print(contract)
# EXECUTE QUERY
cur.execute("SELECT XREF_PLAN_CODE FROM APP_QUOTE WHERE APPLICATION_ID = ?",
[int(contract)])
# FETCH ROWS ITERATIVELY
for row in cur.fetchall():
print(row)
cur.close()
cnxn.close()

Error loading log file data into mysql using cvs format and python

I am trying to take a data from a log file in cvs format, open the log file and inserting row by row into mysql. I am getting an error like this:
ERROR Traceback (most recent call last): File "/Users/alex/PycharmProjects/PA_REPORTING/padb_populate.py", line 26, in VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)', row) File "/Users/alex/anaconda/lib/python2.7/site-packages/MySQLdb/cursors.py", line 187, in execute query = query % tuple([db.literal(item) for item in args]) TypeError: not all arguments converted during string formatting.
import csv
import MySQLdb
mydb = MySQLdb.connect(host='192.168.56.103',
user='user',
passwd='pass',
db='palogdb')
cursor = mydb.cursor()
csv_data = csv.reader(file('/tmp/PALOG_DEMODATA-100.csv'))
for row in csv_data:
cursor.execute('INSERT INTO palogdb(RECEIVE_TIME,SERIAL,TYPE,SUBTYPE,COL1,TIME_GENERATED,SRC,DST,NATSRC,NATDST,RULE,\
SRCUSR,DSTUSR,APP,VSYS1,FROM,TO,INBOUND_IF,OUTBOUND_IF,LOGSET,COL2,SESSIONID,COL3,REPEATCNT,SOURCEPORT,NATSPORT,NATDPORT, \
FLAGS,PROTO,ACTION,BYTES,BYTES_SENT,BYTES_RECEIVED,PACKETS,START,ELAPSED,CATEGORY,COL4,SEQNO,ACTIONFLAGS,SRCLOC,DSTLOC,NONE, \
PKTS_SENT,PKTS_RECEIVED,SESSION_END_REASON) \
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)', row)
#close the connection to the database.
mydb.commit()
cursor.close()
Is it possible, that you don't have enough data in row for all your %s's? Maybe your row is interpreted as one value, and thus only the first %s is expanded? Try *row to expand the vector to values.
To debug, you could try to build the string passed to execute by some other method, e.g.
sql_string = 'INSERT ... VALUES ({}, {}, {})'.format(*row)
and print it. If you get such an error, you can check, whether the generated string looks reasonable...

Using IBM_DB with Pandas

I am trying to use the data analysis tool Pandas in Python Language. I am trying to read data from a IBM DB, using ibm_db package. According to the documentation in Pandas website we need to provide at least 2 arguments, one would be the sql that would be executed and other would be the connection object of the database. But when i do that, it gives me error that the connection object does not have a cursor() method in it. I figured maybe this is not how this particular DB Package worked. I tried to find a few workarounds but was not successfull.
Code:
print "hello PyDev"
con = db.connect("DATABASE=db;HOSTNAME=localhost;PORT=50000;PROTOCOL=TCPIP;UID=admin;PWD=admin;", "", "")
sql = "select * from Maximo.PLUSPCUSTOMER"
stmt = db.exec_immediate(con,sql)
pd.read_sql(sql, db)
print "done here"
Error:
hello PyDev
Traceback (most recent call last):
File "C:\Users\ray\workspace\Firstproject\pack\test.py", line 15, in <module>
pd.read_sql(sql, con)
File "D:\etl\lib\site-packages\pandas\io\sql.py", line 478, in read_sql
chunksize=chunksize)
File "D:\etl\lib\site-packages\pandas\io\sql.py", line 1504, in read_query
cursor = self.execute(*args)
File "D:\etl\lib\site-packages\pandas\io\sql.py", line 1467, in execute
cur = self.con.cursor()
AttributeError: 'ibm_db.IBM_DBConnection' object has no attribute 'cursor'
I am able to fetch data if i fetch it from the database but i need to read into a dataframe and need to write back to the database after processing data.
Code for fetching from DB
stmt = db.exec_immediate(con,sql)
tpl=db.fetch_tuple(stmt)
while tpl:
print(tpl)
tpl=db.fetch_tuple(stmt)
On doing further studying the package, i found that I need to wrap the IBM_DB connection object in a ibm_db_dbi connection object, which is part of the https://pypi.org/project/ibm-db/ package.
So
conn = ibm_db_dbi.Connection(con)
df = pd.read_sql(sql, conn)
The above code works and pandas fetches data into dataframe successfully.
you can also check out https://pypi.python.org/pypi/ibmdbpy
It provides Pandas style API without pulling out all data into Python memory.
Documentation is here: http://pythonhosted.org/ibmdbpy/index.html
Here is a quick demo how to use it in Bluemix Notebooks:
https://www.youtube.com/watch?v=tk9T1yPkn4c
You can just use ibm_db_dbi.connect like this (tested)
import ibm_db_dbi
import pandas as pd
config = {
'database:xxx, 'hostname':xxx, 'port': xxx,
'protocol':xxx, 'uid': xxx, 'password': xxx
}
conn = ibm_db_dbi.connect(
'database={database};'
'hostname={hostname};'
'port={port};'
'protocol={protocol};'
'uid={uid};'
'pwd={password}'.format(**config), '', '')
sql = 'select xxxx from xxxx'
df = pd.read_sql(sql, conn)
from ibm_db import connect
import pandas as pd
import ibm_db_dbi
cnxn = connect('DATABASE=YourDatabaseName;'
'HOSTNAME=YourHost;' # localhost would work
'PORT=50000;'
'PROTOCOL=TCPIP;'
'UID=UserName;'
'PWD=Password;', '', '')
sql = "SELECT * FROM Maximo.PLUSPCUSTOMER"
conn=ibm_db_dbi.Connection(cnxn)
df = pd.read_sql(sql, conn)
df.head()

NoneType object is not iterable error in pandas

I am trying to pull some data from a stored proc on a sql server using python.
Here is my code:
import datetime as dt
import pyodbc
import pandas as pd
conn = pyodbc.connect('Trusted_Connection=yes', driver = '{SQL Server Native client 11.0}',server = '*****, database = '**')
pd.read_sql("EXEC ******** '20140528'",conn)
I get the error: TypeError: 'NoneType' object is not iterable
I suspect this is because I have a cell in the sql table with value NULL but not sure if that's the true reason why I am getting the error. I have run many sql statements using the same code without any errors.
Here's the traceback:
In[39]: pd.read_sql("EXEC [dbo].[] '20140528'",conn)
Traceback (most recent call last):
File "C:*", line 3032, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-39-68fb1c956dd7>", line 1, in <module>
pd.read_sql("EXEC [dbo].[] '20140528'",conn)
File "C:*", line 467, in read_sql
chunksize=chunksize
File "c:***", line 1404, in read_query
columns = [col_desc[0] for col_desc in cursor.description]
TypeError: 'NoneType' object is not iterable
Your sproc needs
SET NOCOUNT ON;
Without this sql will return the rowcount for the call, which will come back without a column name, causing the NoneType error.
pd.read_sql() expects to have output to return, and tries to iterate through the output; that's where the TypeError is coming from. Instead, execute with a cursor object:
import datetime as dt
import pyodbc
import pandas as pd
conn = pyodbc.connect('Trusted_Connection=yes', driver = '{SQL Server Native client 11.0}',server = '*****', database = '**')
cur = conn.cursor()
cur.execute("EXEC ******** '20140528'")
You won't receive any output, but since none is expected, your code should run without error.
It's way too late for the post,
But i encountered the same error,after searching lot on stack-overflow found that its not an issue with pandas/python but with the query/stored proc
My workaround was Debugging the python script and step into built-in code of pandas,
reach till this path site-packages/pandas/io/sql.py , there your will see this code
cursor=self.execute(*args)
Execute this line with debugger and view the cursor object , you will find what is getting returned by the stored proc, in my case there was an irrelevant message i was triggering from stored proc
import datetime as dt
import pyodbc
import pandas as pd
conn = pyodbc.connect('Trusted_Connection=yes; driver =SQL Server Native client
11.0; server = *****, database = **')
sqlSend = conn.cursor()
sqlSend.execute(f"EXEC ******** '20140528'")
conn.commint()
SQL command text that contains multiple SQL statements is called an anonymous code block. An anonymous code block can return multiple results, where each result can be
a row count,
a result set containing zero or more rows of data, or
an error.
The following example fails ...
sql = """\
SELECT 1 AS foo INTO #tmp;
SELECT * FROM #tmp;
"""
df = pd.read_sql_query(sql, cnxn)
# TypeError: 'NoneType' object is not iterable
... because the first SELECT ... INTO returns a row count before the second SELECT returns its result set.
The fix is to start the anonymous code block with SET NOCOUNT ON; which suppresses the row count and only returns the result set:
sql = """\
SET NOCOUNT ON;
SELECT 1 AS foo INTO #tmp;
SELECT * FROM #tmp;
"""
df = pd.read_sql_query(sql, cnxn)
# no error

Python using mysql connector list databases LIKE and then use those databases in order and run query

I'm trying to write a script using pythong and the mysql-connector library. The script should connect to the mysql server do a "SHOW DATABASES LIKE 'pdns_%' and then using the results returned by the query use each database and then run another query while using that database.
Here is the code
import datetime
import mysql.connector
from mysql.connector import errorcode
cnx = mysql.connector.connect (user='user', password='thepassword',
host='mysql.server.com',buffered=True)
cursor = cnx.cursor()
query = ("show databases like 'pdns_%'")
cursor.execute(query)
databases = query
for (databases) in cursor:
cursor.execute("USE %s",(databases[0],))
hitcounts = ("SELECT Monthname(hitdatetime) AS 'Month', Count(hitdatetime) AS 'Hits' WHERE hitdatetime >= Date_add(Last_day(Date_sub(Curdate(), interval 4 month)), interval 1 day) AND hitdatetime < Date_add(Last_day(Date_sub(Curdate(), interval 1 month)), interval 1 day) GROUP BY Monthname(hitdatetime) ORDER BY Month(hitdatetime)")
cursor.execute(hitcounts)
print(hitcounts)
cursor.close()
cnx.close()
When running the script it stops with the following error'd output
Traceback (most recent call last):
File "./mysql-test.py", line 18, in <module>
cursor.execute("USE %s",(databases[0],))
File "/usr/lib/python2.6/site-packages/mysql/connector/cursor.py", line 491, in execute
self._handle_result(self._connection.cmd_query(stmt))
File "/usr/lib/python2.6/site-packages/mysql/connector/connection.py", line 635, in cmd_query
statement))
File "/usr/lib/python2.6/site-packages/mysql/connector/connection.py", line 553, in _handle_result
raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''pdns_382'' at line 1
Based on the error I'm guessing there is an issue with how its doing the datbase name from the first query. Any pointers in the correct direction would be very helpful as I'm very much a beginner. Thank you very much.
Alas, the two-args form of execute does not support "meta" parameters, such as names of databases, tables, or fields (roughly, think of identifiers you wouldn't quote if writing the query out manually). So, the failing statement:
cursor.execute("USE %s",(databases[0],))
needs to be re-coded as:
cursor.execute("USE %s" % (databases[0],))
i.e, the single arg form of execute, with a string interpolation. Fortunately, this particular case does not expose you to SQL injection risks, since you're only interpolating DB names coming right from the DB engine.

Categories

Resources