How to read a SQLite database file using polars package in Python

How to read a SQLite database file using polars package in Python - python

I want to read a SQLite database file (database.sqlite) using polars package. I tried following unsuccessfully:
import sqlite3
import polars as pl
conn = sqlite3.connect('database.sqlite')
df = pl.read_sql("SELECT * from table_name", conn)
print(df)
Getting following error:
AttributeError: 'sqlite3.Connection' object has no attribute 'split'
Any suggestions?

From the docs, you can see pl.read_sql accepts connection string as a param, and you are sending the object sqlite3.Connection, and that's why you get that message.
You should first generate the connection string, which is url for your db
db_path = 'database.sqlite'
connection_string = 'sqlite://' + db_path
And after that, you can type the updated next line, which gave you problems:
df = pl.read_sql("SELECT * from table_name", connection_string)

Related

pd.read_sql, how to convert data types?

I am running a sql query from my python code and attempting to create a dataframe from it. When I execute the code, pandas produces the following error message:
pandas.io.sql.DatabaseError: Execution failed on sql '*my connection info*' : expecting string or bytes object
The relevant code is:
import cx_Oracle
import cx_Oracle as cx
import pandas as pd
dsn_tns = cx.makedsn('x.x.x.x', 'y',
service_name='xxx')
conn = cx.connect(user='x', password='y', dsn=dsn_tns)
sql_query1 = conn.cursor()
sql_query1.execute("""select * from *table_name* partition(p20210712) t""")
df = pd.read_sql(sql_query1,conn)
I was thinking to convert all values in query result to strings with df.astype(str) function, but I cannot find the proper way to accomplish this within the pd.read_sql statement. Would data type conversion correct this issue?

Importing a .sql file in python

I have just started learning SQL and I'm having some difficulties to import my sql file in python.
The .sql file is in my desktop, as well is my .py file.
That's what I tried so far:
import codecs
from codecs import open
import pandas as pd
sqlfile = "countries.sql"
sql = open(sqlfile, mode='r', encoding='utf-8-sig').read()
pd.read_sql_query("SELECT name FROM countries")
But I got the following message error:
TypeError: read_sql_query() missing 1 required positional argument: 'con'
I think I have to create some kind of connection, but I can't find a way to do that. Converting my data to an ordinary pandas DataFrame would help me a lot.
Thank you

This is the code snippet taken from https://www.dataquest.io/blog/python-pandas-databases/ should help.
import pandas as pd
import sqlite3
conn = sqlite3.connect("flights.db")
df = pd.read_sql_query("select * from airlines limit 5;", conn)
Do not read database as an ordinary file. It has specific binary format and special client should be used.
With it you can create connection which will be able to handle SQL queries. And can be passed to read_sql_query.
Refer to documentation often https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html

You need a database connection. I don't know what SQL flavor are you using, but suppose you want to run your query in SQL server
import pyodbc
con = pyodbc.connect(driver='{SQL Server}', server='yourserverurl', database='yourdb', trusted_connection=yes)
then pass the connection instance to pandas
pd.read_sql_query("SELECT name FROM countries", con)
more about pyodbc here
And if you want to query an SQLite database
import sqlite3
con = sqlite3.connect('pathto/example.db')
More about sqlite here

Python: Having Trouble Connecting to Postgresql DB

I have been using the following lines of code for the longest time, without any hitch, but today it seems to have produced the following error and I cannot figure out why. The strange thing is that I have other scripts that use the same code and they all seem to work...
import pandas as pd
import psycopg2
link_conn_string = "host='<host>' dbname='<db>' user='<user>' password='<pass>'"
conn = psycopg2.connect(link_conn_string)
df = pd.read_sql("SELECT * FROM link._link_bank_report_lms_loan_application", link_conn_string)
Error Message:
"Could not parse rfc1738 URL from string '%s'" % name)
sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'host='<host>' dbname='<db>' user='<user>' password='<pass>''

change link_conn_string to be like here:
postgresql://[user[:password]#][netloc][:port][/dbname][?param1=value1&...]
Eg:
>>> import psycopg2
>>> cs = 'postgresql://vao#localhost:5432/t'
>>> c = psycopg2.connect(cs)
>>> import pandas as pd
>>> df = pd.read_sql("SELECT now()",c)
>>> print df;
now
0 2017-02-27 21:58:27.520372+00:00

Your connection string is wrong.
You should remove the line where you assign a string to "link_conn_string" variable and replace the next line with something like this (remember to replace localhost, postgres and secret with the name of the machine where postgresql is running, the user and password required to make that connection):
conn = psycopg2.connect(dbname="localhost", user="postgres", password="secret")
Also, you can check if the database is working with "psql" command, from the terminal, type (again, remember to change user and database):
psql -U postgresql database

Using IBM_DB with Pandas

I am trying to use the data analysis tool Pandas in Python Language. I am trying to read data from a IBM DB, using ibm_db package. According to the documentation in Pandas website we need to provide at least 2 arguments, one would be the sql that would be executed and other would be the connection object of the database. But when i do that, it gives me error that the connection object does not have a cursor() method in it. I figured maybe this is not how this particular DB Package worked. I tried to find a few workarounds but was not successfull.
Code:
print "hello PyDev"
con = db.connect("DATABASE=db;HOSTNAME=localhost;PORT=50000;PROTOCOL=TCPIP;UID=admin;PWD=admin;", "", "")
sql = "select * from Maximo.PLUSPCUSTOMER"
stmt = db.exec_immediate(con,sql)
pd.read_sql(sql, db)
print "done here"
Error:
hello PyDev
Traceback (most recent call last):
File "C:\Users\ray\workspace\Firstproject\pack\test.py", line 15, in <module>
pd.read_sql(sql, con)
File "D:\etl\lib\site-packages\pandas\io\sql.py", line 478, in read_sql
chunksize=chunksize)
File "D:\etl\lib\site-packages\pandas\io\sql.py", line 1504, in read_query
cursor = self.execute(*args)
File "D:\etl\lib\site-packages\pandas\io\sql.py", line 1467, in execute
cur = self.con.cursor()
AttributeError: 'ibm_db.IBM_DBConnection' object has no attribute 'cursor'
I am able to fetch data if i fetch it from the database but i need to read into a dataframe and need to write back to the database after processing data.
Code for fetching from DB
stmt = db.exec_immediate(con,sql)
tpl=db.fetch_tuple(stmt)
while tpl:
print(tpl)
tpl=db.fetch_tuple(stmt)

On doing further studying the package, i found that I need to wrap the IBM_DB connection object in a ibm_db_dbi connection object, which is part of the https://pypi.org/project/ibm-db/ package.
So
conn = ibm_db_dbi.Connection(con)
df = pd.read_sql(sql, conn)
The above code works and pandas fetches data into dataframe successfully.

you can also check out https://pypi.python.org/pypi/ibmdbpy
It provides Pandas style API without pulling out all data into Python memory.
Documentation is here: http://pythonhosted.org/ibmdbpy/index.html
Here is a quick demo how to use it in Bluemix Notebooks:
https://www.youtube.com/watch?v=tk9T1yPkn4c

You can just use ibm_db_dbi.connect like this (tested)
import ibm_db_dbi
import pandas as pd
config = {
'database:xxx, 'hostname':xxx, 'port': xxx,
'protocol':xxx, 'uid': xxx, 'password': xxx
}
conn = ibm_db_dbi.connect(
'database={database};'
'hostname={hostname};'
'port={port};'
'protocol={protocol};'
'uid={uid};'
'pwd={password}'.format(**config), '', '')
sql = 'select xxxx from xxxx'
df = pd.read_sql(sql, conn)

from ibm_db import connect
import pandas as pd
import ibm_db_dbi
cnxn = connect('DATABASE=YourDatabaseName;'
'HOSTNAME=YourHost;' # localhost would work
'PORT=50000;'
'PROTOCOL=TCPIP;'
'UID=UserName;'
'PWD=Password;', '', '')
sql = "SELECT * FROM Maximo.PLUSPCUSTOMER"
conn=ibm_db_dbi.Connection(cnxn)
df = pd.read_sql(sql, conn)
df.head()

NoneType object is not iterable error in pandas

I am trying to pull some data from a stored proc on a sql server using python.
Here is my code:
import datetime as dt
import pyodbc
import pandas as pd
conn = pyodbc.connect('Trusted_Connection=yes', driver = '{SQL Server Native client 11.0}',server = '*****, database = '**')
pd.read_sql("EXEC ******** '20140528'",conn)
I get the error: TypeError: 'NoneType' object is not iterable
I suspect this is because I have a cell in the sql table with value NULL but not sure if that's the true reason why I am getting the error. I have run many sql statements using the same code without any errors.
Here's the traceback:
In[39]: pd.read_sql("EXEC [dbo].[] '20140528'",conn)
Traceback (most recent call last):
File "C:*", line 3032, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-39-68fb1c956dd7>", line 1, in <module>
pd.read_sql("EXEC [dbo].[] '20140528'",conn)
File "C:*", line 467, in read_sql
chunksize=chunksize
File "c:***", line 1404, in read_query
columns = [col_desc[0] for col_desc in cursor.description]
TypeError: 'NoneType' object is not iterable

Your sproc needs
SET NOCOUNT ON;
Without this sql will return the rowcount for the call, which will come back without a column name, causing the NoneType error.

pd.read_sql() expects to have output to return, and tries to iterate through the output; that's where the TypeError is coming from. Instead, execute with a cursor object:
import datetime as dt
import pyodbc
import pandas as pd
conn = pyodbc.connect('Trusted_Connection=yes', driver = '{SQL Server Native client 11.0}',server = '*****', database = '**')
cur = conn.cursor()
cur.execute("EXEC ******** '20140528'")
You won't receive any output, but since none is expected, your code should run without error.

It's way too late for the post,
But i encountered the same error,after searching lot on stack-overflow found that its not an issue with pandas/python but with the query/stored proc
My workaround was Debugging the python script and step into built-in code of pandas,
reach till this path site-packages/pandas/io/sql.py , there your will see this code
cursor=self.execute(*args)
Execute this line with debugger and view the cursor object , you will find what is getting returned by the stored proc, in my case there was an irrelevant message i was triggering from stored proc

import datetime as dt
import pyodbc
import pandas as pd
conn = pyodbc.connect('Trusted_Connection=yes; driver =SQL Server Native client
11.0; server = *****, database = **')
sqlSend = conn.cursor()
sqlSend.execute(f"EXEC ******** '20140528'")
conn.commint()

SQL command text that contains multiple SQL statements is called an anonymous code block. An anonymous code block can return multiple results, where each result can be
a row count,
a result set containing zero or more rows of data, or
an error.
The following example fails ...
sql = """\
SELECT 1 AS foo INTO #tmp;
SELECT * FROM #tmp;
"""
df = pd.read_sql_query(sql, cnxn)
# TypeError: 'NoneType' object is not iterable
... because the first SELECT ... INTO returns a row count before the second SELECT returns its result set.
The fix is to start the anonymous code block with SET NOCOUNT ON; which suppresses the row count and only returns the result set:
sql = """\
SET NOCOUNT ON;
SELECT 1 AS foo INTO #tmp;
SELECT * FROM #tmp;
"""
df = pd.read_sql_query(sql, cnxn)
# no error

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read a SQLite database file using polars package in Python - python

Related

pd.read_sql, how to convert data types?

Importing a .sql file in python

Python: Having Trouble Connecting to Postgresql DB

Using IBM_DB with Pandas

NoneType object is not iterable error in pandas

Categories

Resources