So I'm trying to use the to_sql command to write records stored in a DataFrame to a SQL database using sqlalchemy.
I have mySQL installed
I have a dataframe called df
What I've tried:
First I created a schema in mySQL workbench called "task_db", then:
import pandas as pd
import sqlalchemy
import pymysql
import sqlalchemy as db
engine = db.create_engine("mysql+pymysql://myusername:mypassword#3306/task_db")
df.to_sql("result", engine, schema=None, if_exists="fail", index=True, index_label=None, chunksize=None, dtype=None, method=None)
In the errors it mentions several times "Can't connect to MySQL server on '3306'", but that is the localhost I got showing on mySQL workbench.
You didn't declare this localhost anywhere. Try this connection string: mysql+pymysql://myusername:mypassword#localhost/task_db?host=localhost?port=3306
Related
I've already had a working connection through ODBC using Cloudera ODBC Driver for Apache Hive, where I had my DSN set and all I needed was to call pyodbc.connect(f"DSN={mydsn}", autocommit=True).
Since I'm planning to use pandas on the query result, I've read that SQLAlchemy is the preferred choice and I'd like to avoid warnings resulting from other ways of connection. My DSN for Hive was using Zookeeper and "Hosts" field was filled in the form of host1:2181,host2:2181,host3:2181. I'm trying to connect to these 3 hosts and I've tried changing connection url in analogous way to the one provided in here, but I got invalid literal for int() with base 10: '2181,host2:2181,host3:2181 etc.
from sqlalchemy import create_engine
query = """SELECT TOP 10 * from eb.mobile_sa"""
conn_url = f'hive://{UID}#host1:2181,host2:2181,host3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'
engine = create_engine(conn_url)
with engine.connect() as conn:
df = pd.read_sql(query, conn)
I found kazoo module that is said to be Zookeeper implementation in Python, but when I tried the very first lines from Basic Usage and just 1 host:
from kazoo.client import KazooClient
zk = KazooClient(hosts = "host1:2181", read_only=True)
zk.start()
I got a lot of lines of Connection dropped: socket connection error
How can I correctly connect to multiple hosts in hive?
Scenario:
I am trying to Convert the SQL output directly to Table using dataframe.to_sql, so for that i am using sqlalchemy.create_engine() and its throwing error when trying to createngine()
sqlchemyparams= urllib.parse.quote_plus(ConnectionString)
sqlchemy_conn_str = 'mssql+pypyodbc:///?odbc_connect={}'.format(sqlchemyparams)
engine_azure = sqlalchemy.create_engine(sqlchemy_conn_str,echo=True,fast_executemany =
True, poolclass=NullPool)
df_top_features.to_sql('Topdata', engine_azure,schema='dbo', index = False, if_exists =
'replace')
2.It will work fine if i use:pyodbc
sqlchemy_conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(sqlchemyparams)
So is there any way i can using pypyodbc in sqlchem_conn_str
SQLAlchemy does not have a pypyodbc driver defined for the mssql dialect, so
mssql+pypyodbc:// …
simply will not work. There may be some way to "fool" your code into using pypyodbc when you specify mssql+pyodbc://, similar to doing
import pypyodbc as pyodbc
in plain Python, but it is not recommended.
In cases where pyodbc cannot be used, the recommended alternative would be mssql+pymssql://.
Here's what I do
import sqlalchemy as sa
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
Then create varaibles to holder the server, database, username and password and pass it to...
params = urllib.parse.quote_plus("DRIVER={SQL Server};"
"SERVER="+server+";"
"DATABASE="+database+";"
"UID="+username+";"
"PWD="+password+";")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
then upload data to sql using.
dfc.to_sql('jobber',con=engine,index=False, if_exists='append')
Using https://www.dataquest.io/blog/sql-insert-tutorial/ as a source.
I'm trying to connect to remote informix DB as follows using python3 sqlalchemy but it fails to connect
sqlalchemy.create_engine("informix://usr1:pwd1#XXX:23300/DB_NAME;SERVER=dsinfmx").connect()
I get the below ERROR while connecting.
sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:informix
Can someone please provide some help on this.. From Dbeaver, DB server is accessible.
I assume you are using Informix Python drivers. If not please install Informix Python driver i.e IfxPy. Details to install Informix Python drivers are at this link https://github.com/OpenInformix/IfxPy/blob/master/README.md
Try out below code.
from sqlalchemy import create_engine
from sqlalchemy.dialects import registry
from sqlalchemy.orm import sessionmaker
registry.register("informix", "IfxAlchemy.IfxPy", "IfxDialect_IfxPy")
registry.register("informix.IfxPy", "IfxAlchemy.IfxPy", "IfxDialect_IfxPy")
registry.register("informix.pyodbc", "IfxAlchemy.pyodbc", "IfxDialect_pyodbc")
from sqlalchemy import Table, Column, Integer
ConStr = 'informix://<username>:<password>#<machine name>:<port number>/<database name>;SERVER=<server name>'
engine = create_engine(ConStr)
connection = engine.connect()
connection.close()
print( "Done2" )
I have been given several tables in SQL Server and am trying to figure out the best way to join them.
What I've done is:
1) open a connection in R to the database
2) pull all the column names from the INFORMATION_SCHEMA.COLUMNS table
3) build loops in R to try every combination of columns and see what the row count is of the inner join of the 2 columns
I'm wondering if there's a better way to do this or if there's a package or utility that helps with this type of problem.
You could do your joins in python using pandas. Pandas has a powerful IO engine, so you could import from SQL Server into a pandas dataframe, perform your joins with python and write back to SQL server.
Below is a script I use to perform an import from SQL Server and an export to a MySQL table. I use the python package sqlalchemy for my ORM connections. You could follow this example and read up on joins in pandas.
import pyodbc
import pandas as pd
from sqlalchemy import create_engine
# MySQL info
username = 'user'
password = 'pw'
sqlDB = 'mydb'
# Create MSSQL PSS Connector
server = 'server'
database = 'mydb'
connMSSQL = pyodbc.connect(
'DRIVER={ODBC Driver 13 for SQL Server};' +
f'SERVER={server};PORT=1433;DATABASE={database};Trusted_Connection=yes;')
# Read Table into pandas dataframe
tsql = '''
SELECT [Index],
Tag,
FROM [dbo].[Tags]
'''
df = pd.read_sql(tsql, connMSSQL, index_col='Index')
# Write df to MySQL db
engine = create_engine(
f'mysql+mysqldb://{username}:{password}#localhost/mydb', pool_recycle=3600)
with engine.connect() as connMySQL:
df.to_sql('pss_alarms', connMySQL, if_exists='replace')
Let's say I have the following connection information for a MSSQL server:
'Driver={SQL Server};'
'Server=VCAB18RPACRGZ12\GNRSRZ11,1414;'
'Database=sampleDB;'
'uid=sampleID;'
'pwd=samplePW'
I want to write a python dataframe to the MSSQL server as a table. I have the following code:
from sqlalchemy import create_engine
connection = create_engine('mssql+pyodbc://sampleID:samplePW#myhost:VCAB18RPACRGZ12\GNRSRZ11,1414/sampleDB?driver=SQL+Server+Native+Client+10.0')
My above connection code is erroring out. I'm not sure exactly where my connection information is supposed to go in the create_engine statement.
This is my error ...
ValueError: invalid literal for int() with base 10:
'VCAB18RPACRGZ12\GNRSRZ11,1414'
Your Server Address is not correct.
If 1414 is the port#, you should use ":" instead of ",".
The SQLAlchemy uses pyodbc as the default DBAPI. pymssql is also available.
Below is the connection string sample:
# pyodbc -DSN
engine = create_engine('mssql+pyodbc://scott:tiger#mydsn')
# pymssql
engine = create_engine('mssql+pymssql://scott:tiger#hostname:port/dbname')
# pyodbc -DSN Less connection
from sqlalchemy import create_engine
#assumes driver name=[SQL+Server+Native+Client+10.0]
#engine = create_engine('mssql+pyodbc://username:password#hostname:port/databasename?driver=SQL+Server+Native+Client+10.0')
engine = create_engine(r'mssql+pyodbc://sampleID:samplePW#VCAB18RPACRGZ12\GNRSRZ11:1414/sampleDB?driver=SQL+Server+Native+Client+10.0')
print engine