I'm trying to connect MySQL with python in order to automate some reports. By now, I'm just testing the connection. Seems it's working but here comes the problem: the output from my Python code is different from the one that I get in MySQL.
Here I attach the query used and the output that I can find in MySQL:
The testing query for the Python connection:
SELECT accountID
FROM Account
WHERE accountID in ('340','339','343');
The output from MySQL (using Dbiever). For this test, the column chosen contains integers:
accountID
1 339
2 340
3 343
Here I attach the actual output from my Python code:
today:
20200811
Will return true if the connection works:
True
Empty DataFrame
Columns: [accountID]
Index: []
In order to help you understand the problem, please find attached my python code:
import pandas as pd
import json
import pymysql
import paramiko
from datetime import date, time
tiempo_inicial = time()
today = date.today()
today= today.strftime("%Y%m%d")
print('today:')
print(today)
#from paramiko import SSHClient
from sshtunnel import SSHTunnelForwarder
**(part that contains all the connection information, due to data protection this part can't be shared)**
print('will return true if connection works:')
print(conn.open)
query = '''SELECT accountId
FROM Account
WHERE accountID in ('340','339','343');'''
data = pd.read_sql_query(query, conn)
print(data)
conn.close()
Under my point of view doesn't have a sense this output as the connection is working and the query it's being tested previously in MySQL with a positive output. I tried with other columns that contain names or dates and the result doesn't change.
Any idea why I'm getting this "Empty DataFrame" output?
Thanks
Related
I have just started learning SQL and I'm having some difficulties to import my sql file in python.
The .sql file is in my desktop, as well is my .py file.
That's what I tried so far:
import codecs
from codecs import open
import pandas as pd
sqlfile = "countries.sql"
sql = open(sqlfile, mode='r', encoding='utf-8-sig').read()
pd.read_sql_query("SELECT name FROM countries")
But I got the following message error:
TypeError: read_sql_query() missing 1 required positional argument: 'con'
I think I have to create some kind of connection, but I can't find a way to do that. Converting my data to an ordinary pandas DataFrame would help me a lot.
Thank you
This is the code snippet taken from https://www.dataquest.io/blog/python-pandas-databases/ should help.
import pandas as pd
import sqlite3
conn = sqlite3.connect("flights.db")
df = pd.read_sql_query("select * from airlines limit 5;", conn)
Do not read database as an ordinary file. It has specific binary format and special client should be used.
With it you can create connection which will be able to handle SQL queries. And can be passed to read_sql_query.
Refer to documentation often https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html
You need a database connection. I don't know what SQL flavor are you using, but suppose you want to run your query in SQL server
import pyodbc
con = pyodbc.connect(driver='{SQL Server}', server='yourserverurl', database='yourdb', trusted_connection=yes)
then pass the connection instance to pandas
pd.read_sql_query("SELECT name FROM countries", con)
more about pyodbc here
And if you want to query an SQLite database
import sqlite3
con = sqlite3.connect('pathto/example.db')
More about sqlite here
I have this code:
import teradata
import dask.dataframe as dd
login = login
pwd = password
udaExec = teradata.UdaExec (appName="CAF", version="1.0",
logConsole=False)
session = udaExec.connect(method="odbc", DSN="Teradata",
USEREGIONALSETTINGS='N', username=login,
password=pwd, authentication="LDAP");
And the connection is working.
I want to get a dask dataframe. I have tried this:
sqlStmt = "SOME SQL STATEMENT"
df = dd.read_sql_table(sqlStmt, session, index_col='id')
And I'm getting this error message:
AttributeError: 'UdaExecConnection' object has no attribute '_instantiate_plugins'
Does anyone have a suggestion?
Thanks in advance.
read_sql_table expects a SQLalchemy connection string, not a "session" as you are passing. I have not heard of teradata being used via sqlalchemy, but apparently there is at least one connector you could install, and possibly other solutions using the generic ODBC driver.
However, you may wish to use a more direct approach using delayed, something like
from dask import delayed
# make a set of statements for each partition
statements = [sqlStmt + " where id > {} and id <= {}".format(bounds)
for bounds in boundslist] # I don't know syntax for tera
def get_part(statement):
# however you make a concrete dataframe from a SQL statement
udaExec = ..
session = ..
df = ..
return dataframe
# ideally you should provide the meta and divisions info here
df = dd.from_delayed([delayed(get_part)(stm) for stm in statements],
meta= , divisions=)
We will be interested to hear of your success.
I have a dictionary with 3 keys which correspond to field names in a SQL Server table. The values of these keys come from an excel file and I store this dictionary in a dataframe which I now need to insert into a SQL table. This can all be seen in the code below:
import pandas as pd
import pymssql
df=[]
fp = "file path"
data = pd.read_excel(fp,sheetname ="CRM View" )
row_date = data.loc[3, ]
row_sita = "ABZPD"
row_event = data.iloc[12, :]
df = pd.DataFrame({'date': row_date,
'sita': row_sita,
'event': row_event
}, index=None)
df = df[4:]
df = df.fillna("")
print(df)
My question is how do I insert this dictionary into a SQL table now?
Also, as a side note, this code is part of a loop which needs to go through several excel files one by one, insert the data into dictionary then into SQL then delete the data in the dictionary and start again with the next excel file.
You could try something like this:
import MySQLdb
# connect
conn = MySQLdb.connect("127.0.0.1","username","passwore","table")
x = conn.cursor()
# write
x.execute('INSERT into table (row_date, sita, event) values ("%d", "%d", "%d")' % (row_date, sita, event))
# close
conn.commit()
conn.close()
You might have to change it a little based on your SQL restrictions, but should give you a good start anyway.
For the pandas dataframe, you can use the pandas built-in method to_sql to store in db. Following is the way to use it.
import sqlalchemy as sa
params = urllib.quote_plus("DRIVER={};SERVER={};DATABASE={};Trusted_Connection=True;".format("{SQL Server}",
"<db_server_url>",
"<db_name>"))
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine = sa.create_engine(conn_str)
df.to_sql(<table_name>, engine,schema=<schema_name>, if_exists="append", index=False)
For this method you you will need to install sqlalchemy package.
pip install sqlalchemy
You will also need to setup the MSSql DSN on the machine.
I'm connecting to an influx db using python. Using built in dataframe tools I am successfully accessing data and am able to do everything I'd like, accept I can't access the timestamp values. For example:
import sys
from influxdb import DataFrameClient
reload(sys)
sys.setdefaultencoding('utf-8')
user = 'reader'
password = 'oddstringtoconfusebadguys'
dbname = 'autoweights'
host = '55.777.244.112'
protocol = 'line'
port = 8086
client = DataFrameClient(host,port=8086,username=user,password=password,
database=dbname,verify_ssl=False,ssl=True)
results = client.query("select * from measurementname")
df = results['measurementname']
for index, row in df.iterrows():
print row
results look like this
Name: 2017-11-14 22:11:23.534395882+00:00, dtype: object
host C4:27:EB:D7:D9:70
value 327
I can easily access row['host'] and row['value']. The date/time stamp is obviously important but try as I might I can't find an approach to get the values.
You can get the timestamp using the index not the row parameter
for index, row in df.iterrows():
print(index)
print(row)
You can also use Pinform library, an ORM for InfluxDB to easily get timestamp, fields and tags.
I have been using the following lines of code for the longest time, without any hitch, but today it seems to have produced the following error and I cannot figure out why. The strange thing is that I have other scripts that use the same code and they all seem to work...
import pandas as pd
import psycopg2
link_conn_string = "host='<host>' dbname='<db>' user='<user>' password='<pass>'"
conn = psycopg2.connect(link_conn_string)
df = pd.read_sql("SELECT * FROM link._link_bank_report_lms_loan_application", link_conn_string)
Error Message:
"Could not parse rfc1738 URL from string '%s'" % name)
sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'host='<host>' dbname='<db>' user='<user>' password='<pass>''
change link_conn_string to be like here:
postgresql://[user[:password]#][netloc][:port][/dbname][?param1=value1&...]
Eg:
>>> import psycopg2
>>> cs = 'postgresql://vao#localhost:5432/t'
>>> c = psycopg2.connect(cs)
>>> import pandas as pd
>>> df = pd.read_sql("SELECT now()",c)
>>> print df;
now
0 2017-02-27 21:58:27.520372+00:00
Your connection string is wrong.
You should remove the line where you assign a string to "link_conn_string" variable and replace the next line with something like this (remember to replace localhost, postgres and secret with the name of the machine where postgresql is running, the user and password required to make that connection):
conn = psycopg2.connect(dbname="localhost", user="postgres", password="secret")
Also, you can check if the database is working with "psql" command, from the terminal, type (again, remember to change user and database):
psql -U postgresql database