I have just started learning SQL and I'm having some difficulties to import my sql file in python.
The .sql file is in my desktop, as well is my .py file.
That's what I tried so far:
import codecs
from codecs import open
import pandas as pd
sqlfile = "countries.sql"
sql = open(sqlfile, mode='r', encoding='utf-8-sig').read()
pd.read_sql_query("SELECT name FROM countries")
But I got the following message error:
TypeError: read_sql_query() missing 1 required positional argument: 'con'
I think I have to create some kind of connection, but I can't find a way to do that. Converting my data to an ordinary pandas DataFrame would help me a lot.
Thank you
This is the code snippet taken from https://www.dataquest.io/blog/python-pandas-databases/ should help.
import pandas as pd
import sqlite3
conn = sqlite3.connect("flights.db")
df = pd.read_sql_query("select * from airlines limit 5;", conn)
Do not read database as an ordinary file. It has specific binary format and special client should be used.
With it you can create connection which will be able to handle SQL queries. And can be passed to read_sql_query.
Refer to documentation often https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html
You need a database connection. I don't know what SQL flavor are you using, but suppose you want to run your query in SQL server
import pyodbc
con = pyodbc.connect(driver='{SQL Server}', server='yourserverurl', database='yourdb', trusted_connection=yes)
then pass the connection instance to pandas
pd.read_sql_query("SELECT name FROM countries", con)
more about pyodbc here
And if you want to query an SQLite database
import sqlite3
con = sqlite3.connect('pathto/example.db')
More about sqlite here
Related
I am using python to connect to DB2 Database
I have installed ibm_db and ibm_dbi packages and imported in to the code
import ibm_db
import ibm_db_dbi
1)created a connection string as conn_str
conn_str='database=pydev;hostname=host.test.com;port=portno;protocol=tcpip;uid=db2inst1;pwd=secret'
ibm_db_conn = ibm_db.connect(conn_str,'','')
conn = ibm_db_dbi.Connection(ibm_db_conn)
2)Now i need to read a DB2 table which is in under schemas called as "BRUD" into python pandas
could any one please help me in getting the connection for this
I'm not sure about sql syntax, but resolution looks like:
df = pd.read_sql('SELECT * FROM BRUD.table_name', conn)
I'm trying to upload a dataframe in SQL server using pandas (to_sql) function, I get the below error
[SQL Server Native Client 11.0]Invalid character value for cast
specification (0) (SQLExecDirectW)')
I checked for variables' names and types and they are exactly the same in the SQL database and pandas dataframe.
How can I fix this?
Thanks
df.to_sql(raw_table, connDB, if_exists='append', index=False )
plz try this , this code use to juypter note book and SQL workbench
import mysql.connector
from mysql.connector import Error
from sqlalchemy import create_engine
import pandas as pd
mydata = pd.read_csv("E:\\Hourly_Format\\upload.csv")
engine = create_engine("mysql://root:admin#localhost/pythondb", pool_size=10, max_overflow=20)
mydata.to_sql(name='emp',con=engine,if_exists='append', index=False)
jupyter :-
workbench :-
I am using a Databricks notebook and trying to export my dataframe as CSV to my local machine after querying it. However, it does not save my CSV to my local machine. Why?
Connect to Database
#SQL Connector
import pandas as pd
import psycopg2
import numpy as np
from pyspark.sql import *
#Connection
cnx = psycopg2.connect(dbname= 'test', host='test', port= '1234', user= 'test', password= 'test')
cursor = cnx.cursor()
SQL Query
query = """
SELECT * from products;
"""
# Execute the query
try:
cursor.execute(query)
except OperationalError as msg:
print ("Command skipped: ")
#Fetch all rows from the result
rows = cursor.fetchall()
# Convert into a Pandas Dataframe
df = pd.DataFrame( [[ij for ij in i] for i in rows] )
Exporting Data as CSV to Local Machine
df.to_csv('test.csv')
It does NOT give any error but when I go to my Mac machine's search icon to find "test.csv", it is not existent. I presume that the operation did not work, thus the file was never saved from the Databricks cloud server to my local machine...Does anybody know how to fix it?
Select from SQL Server:
import pypyodbc
cnxn = pypyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=Server_Name;"
"Database=TestDB;"
"Trusted_Connection=yes;")
#cursor = cnxn.cursor()
#cursor.execute("select * from Actions")
cursor = cnxn.cursor()
cursor.execute('SELECT * FROM Actions')
for row in cursor:
print('row = %r' % (row,))
From SQL Server to Excel:
import pyodbc
import pandas as pd
# cnxn = pyodbc.connect("Driver={SQL Server};SERVER=xxx;Database=xxx;UID=xxx;PWD=xxx")
cnxn = pyodbc.connect("Driver={SQL Server};SERVER=EXCEL-PC\SQLEXPRESS;Database=NORTHWND;")
data = pd.read_sql('SELECT * FROM Orders',cnxn)
data.to_excel('C:\\your_path_here\\foo.xlsx')
Since you are using Databricks, you are most probably working on a remote machine. Like it was already mentioned, saving the way you do wont work (file will be save to the machine your notebooks master node is on). Try running:
import os
os.listdir(os.getcwd())
This will list all the files that are in directory from where notebook is running (at least it is how jupyter notebooks work). You should see saved file here.
However, I would think that Databricks provides a utility functions to their clients for easy data download from the cloud. Also, try using spark to connect to db - might be a little more convenient.
I think these two links should be useful for you:
Similar question on databricks forums
Databricks documentation
Because you're running this in a Databricks notebook, when you're using Pandas to save your file to test.csv, this is being saved to the Databricks driver node's file directory. A way to test this out is the following code snippet:
# Within Databricks, there are sample files ready to use within
# the /databricks-datasets folder
df = spark.read.csv("/databricks-datasets/samples/population-vs-price/data_geo.csv", inferSchema=True, header=True)
# Converting the Spark DataFrame to a Pandas DataFrame
import pandas as pd
pdDF = df.toPandas()
# Save the Pandas DataFrame to disk
pdDF.to_csv('test.csv')
The location of your test.csv is within the /databricks/driver/ folder of your Databricks' cluster driver node. To validate this:
# Run the following shell command to see the results
%sh cat test.csv
# The output directory is shown here
%sh pwd
# Output
# /databricks/driver
To save the file to your local machine (i.e. your Mac), you can view the Spark DataFrame using the display command within your Databricks notebook. From here, you can click on the "Download to CSV" button which is highlighted in red in the below image.
I have a need to access data that resides in a remote db2 database via a sql statement and convert it to a Pandas DataFrame. All from my Mac. I looked at using Pandas' read_sql with the ibm_db_sa adapter, but it looks like the prerequisite client side software is not supported on the Mac
I came up with an jdbc option, which I'm posting, but I'm curious to know if anyone else has any ideas
Here's an option using jdbc, the pip installable JayDeBeApi and the appropriate db jar file
Note: this could be used for other jdbc/jaydebeapi compliant databases like Oracle, MS Sql Server, etc
import jaydebeapi
import pandas as pd
def read_jdbc(sql, jclassname, driver_args, jars=None, libs=None):
'''
Reads jdbc compliant data sources and returns a Pandas DataFrame
uses jaydebeapi.connect and doc strings :-)
https://pypi.python.org/pypi/JayDeBeApi/
:param sql: select statement
:param jclassname: Full qualified Java class name of the JDBC driver.
e.g. org.postgresql.Driver or com.ibm.db2.jcc.DB2Driver
:param driver_args: Argument or sequence of arguments to be passed to the
Java DriverManager.getConnection method. Usually the
database URL. See
http://docs.oracle.com/javase/6/docs/api/java/sql/DriverManager.html
for more details
:param jars: Jar filename or sequence of filenames for the JDBC driver
:param libs: Dll/so filenames or sequence of dlls/sos used as
shared library by the JDBC driver
:return: Pandas DataFrame
'''
try:
conn = jaydebeapi.connect(jclassname, driver_args, jars, libs)
except jaydebeapi.DatabaseError as de:
raise
try:
curs = conn.cursor()
curs.execute(sql)
columns = [desc[0] for desc in curs.description] #getting column headers
#convert the list of tuples from fetchall() to a df
return pd.DataFrame(curs.fetchall(), columns=columns)
except jaydebeapi.DatabaseError as de:
raise
finally:
curs.close()
conn.close()
Some examples
#DB2
conn = 'jdbc:db2://<host>:5032/<db>:currentSchema=<schema>;'
class_name = 'com.ibm.db2.jcc.DB2Driver'
sql = 'SELECT name FROM table_name FETCH FIRST 5 ROWS ONLY'
df = read_jdbc(sql, class_name, [conn, 'myname', 'mypwd'])
#PostgreSQL
conn = 'jdbc:postgresql://<host>:5432/<db>?currentSchema=<schema>'
class_name = 'org.postgresql.Driver'
jar = '/path/to/jar/postgresql-9.4.1212.jar'
sql = 'SELECT name FROM table_name LIMIT 5'
df = read_jdbc(sql, class_name, [conn, 'myname', 'mypwd'], jars=jar)
I got a simpler answer from https://stackoverflow.com/a/33805547/914967 where it uses pip module ibm_db only:
import ibm_db
import ibm_db_dbi
import pandas as pd
conn_handle = ibm_db.connect('DATABASE={};HOSTNAME={};PORT={};PROTOCOL=TCPIP;UID={};PWD={};'.format(db_name, hostname, port_number, user, password), '', '')
conn = ibm_db_dbi.Connection(conn_handle)
df = pd.read_sql(sql, conn)
Bob, you should check out ibmdbpy (https://pypi.python.org/pypi/ibmdbpy). It is a pandas data frame style API to DB2 and dashDB tables. It supports both underlying DB2 client drivers, ODBC and JDBC.
So as prerequisites you need to set up the DB2 client driver package for Mac that you can find here: http://www-01.ibm.com/support/docview.wss?uid=swg21385217
After #IanBjorhovde commented on my question I investigated another solution that allows me to use sqlalchemy and pandas' read_sql()
Here are the steps I took. Note: I got this working on OSX Yosemite (10.10.4) for python 3.4 and 3.5
1) Download IBM DB2 Express-C (no-cost community edition of DB2)
https://www-01.ibm.com/marketing/iwm/iwm/web/pick.do?source=swg-db2expressc&S_TACT=000000VR&lang=en_US&S_OFF_CD=10000761
2) After navigating to the unzipped dir
sudo ./db2_install
I accepted the default location of /opt/IBM/db2/V10.1
3) Install ibm_db and ibm_db_sa
pip install ibm_db
I built ibm_db_sa from source because the pip installed failed
python setup.py install
That should do it. You might get an error like 'Reason: image not found' when you try to connect to your db so read this for the fix. Note: might require a reboot
Example usage:
import ibm_db_sa
import pandas as pd
from sqlalchemy import select, create_engine
eng = create_engine('ibm_db_sa://<user_name>:<pwd>#<host>:5032/<db name>')
sql = 'SELECT name FROM table_name FETCH FIRST 5 ROWS ONLY'
df = pd.read_sql(sql, eng)
I used to read the data from CSV file, while I just imported all CSV data in SQL database, but I have difficulty in extracting data using Python from SQL.
My original code of read CSV is like this:
import pandas as pd
stock_data = pd.read_csv(filepath_or_buffer='stock_data_w.csv', parse_dates=[u'date'], encoding='gbk')
stock_data[u'change_weekly'] = stock_data.groupby(u'code')[u'change'].shift(-1)
Now I want to read data from SQL, here is my code, but it doesn't work and I am not sure how to sort it out:
import pandas as pd
import MySQLdb
db = MySQLdb.connect(host='localhost', user='root', passwd='232323', db='test', port=3306)
cur = db.cursor()
cur.execute("SELECT * FROM stock_data_w")
stock_data = pd.DataFrame(data=cur.fetchall(), columns=[i[0] for i in cur.description])
stock_data[u'change_weekly'] = stock_data.groupby(u'code')[u'change'].shift(-1)
the error is: "raise PandasError('DataFrame constructor not properly called!') pandas.core.common.PandasError: DataFrame constructor not properly called!"
Use below way to convert your cursor object to crate data frame.
stock_data = pd.DataFrame(data=cursor.fetchall(), index=None,
columns=cursor.keys())
print stock_data
In mysqldb, columns=[i[0] for i in cursor.description]
or
Make your connection with alchemy and use,
stock_data = pd.read_sql("SELECT * from stock_data_w",
con= cnx,parse_dates=['date'])
I'm not sure whether mysql.connector is supported in pandas read_sql(). You can give a try and let us know :)