I got stuck in pandas while extracting data from Aginity workbench - python

I guess there is a connectivity issue. because the same code was working fine in my old laptop but not working in new laptop.
Below are my line of code.
import numpy as np
import pandas as pd
import pandasql
import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://prd*****:Au*****$#10.31.13.6:****/redshiftdb')
query1 = 'Select * from pres_sandbox.abias_atb limit 10'
e = pd.read_sql_query(query1,engine)
e.to_csv('C:\\Users\\jawed.sheikh\\Desktop\\ATTRIBUTES UPLOAD\\Attribute_22072022qq.csv', index = False) #Date change
print("File has been created in 'ATTRIBUTES UPLOAD' folder")
Also I tried below lines to get output but not able to extract the data.
import mysql.connector as sql
db_connection = sql.connect(host='10.31.13.6', port= '****', database='redshiftdb', user='prd******', password='A*****$',
connect_timeout=1000)
db_cursor = db_connection.cursor()
db_cursor.execute('Select * from pres_sandbox.abias_atb limit 10')
table_rows = db_cursor.fetchall()
df = pd.DataFrame(table_rows)
print(df)
From first query I am getting below error:
OperationalError: (psycopg2.OperationalError) could not translate host name "********#10.31.13.6" to address: Unknown server error (password removed)
(Background on this error at: https://sqlalche.me/e/14/e3q8)
From second query I am getting below error:
InterfaceError: 2013: Lost connection to MySQL server during query
Note:All the password and host are correct.

Related

pyodbc in Databricks: A retriable error occurred while attempting to download a result file from the cloud store but the retry limit had been exceeded

I have an SQL endpoint in Azure Databricks that I need to query. I have installed the Simba Spark ODCB connector and configured it correctly because when I call the end point with the Python Databricks library it returns the full dataframe (about 900K rows).
from databricks import sql
import pandas as pd
def databricks_to_dataframe():
with sql.connect(server_hostname="<server host name>",
http_path="<http path>",
access_token="<access token to databricks>") as connection:
with connection.cursor() as cursor:
cursor.execute("SELECT * FROM my_table")
result = cursor.fetchall()
df = pd.DataFrame(result)
return df
When I try to do the same with pyodc, I get the following error:
Error: ('HY000', "[HY000] [Simba][Hardy] (35) Error from server: error code: '0' error message: '[Simba][Hardy] (134) File 31dc6dfe-3a38-4e4a-8f03-66f6efdb4581: A retriable error occurred while attempting to download a result file from the cloud store but the retry limit had been exceeded. Error Detail: File 31dc6dfe-3a38-4e4a-8f03-66f6efdb4581: The result file URL had expired on 1658755584065 (Unix timestamp)'. (35) (SQLFetch)")
Here the code for reference:
import pyodbc
conn = pyodbc.connect("DSN=My_DSN", autocommit=True)
cursor = conn.cursor()
cursor.execute("SELECT * FROM my_table")
data = cursor.fetchall()
When I limit the query to, say to 20k rows it works fine.
And I have the same issue with R (RODBC) but this time no error message at all, just an empty dataframe! Below is the code in R:
library(RODBC)
conn <- odbcConnect("My_DSN")
Data <- sqlQuery(conn, "SELECT * FROM my_table")
Here too, when I limit the query to a few k, it works fine. any ideas?
Thanks!
found the way in python, leaving this here in case someone will need it
I just needed to add EnableQueryResultDownload=0 to the conn like so
import pyodbc
import pandas as pd
conn = pyodbc.connect("DSN=My_DSN;EnableQueryResultDownload=0;",autocommit=True)
cursor = conn.cursor()
cursor.execute("SELECT * FROM my_table")
df = pd.DataFrame.from_records(cursor.fetchall(), columns=[col[0] for col in cursor.description])
in R:
library(RODBC)
conn <- odbcConnect("My_DSN;EnableQueryResultDownload=0;")
Data <- sqlQuery(conn, "SELECT * FROM my_table")

"Connection is busy" error message using petl with pyodbc connection

With python, I am trying to read a table from SQL Server, populate one field and load the resulting table back to SQL Server as a new table. I'm using "petl".
I can read the table with no problems, I populate the field successfully, but then I get an error when I try to load the resulting table back to SQL Server.
This is the python code I'm using:
import petl as etl
import pyodbc
def populate_field(z, row):
...
con = pyodbc.connect('Trusted_Connection=yes', driver='{SQL Server}', server=r'my_server', database='my_db')
query = r'SELECT TOP 10 * FROM my_table'
tab = etl.fromdb(con, query)
tab = etl.convert(tab, 'my_field', populate_field, pass_row=True)
etl.todb(tab, con, 'my_new_table')
And this is the error message I get on "etl.todb(tab, con, 'my_new_table')"
Error: ('HY000', '[HY000] [Microsoft][ODBC SQL Server Driver]Connection is busy with results for another hstmt (0) (SQLExecDirectW)')

SQL Alchemy, pymssql, Pandas 0.24.2 to_sql Trying to create table when table already exists

I am trying to use Pandas and Sql Alchemy. This is basically what I am trying to do. If I drop the table, it will create it but I want it to append and not have to do table renaming. I have tried updating and changing versions of all the libraries. I am at a loss. If I start with no table it creates it, then i run the code again and it crashes. The error message just says the table already exists, which I know, that is why I am telling it to append. Also, before the load i am reading data using PYMSSQL and it reads fine to a dataframe
Python Command
def writeDFtoSSDatabase(tgtDefiniton,df):
try:
if int(tgtDefiniton.loadBatchSize) > 0:
batchSize = int(tgtDefiniton.loadBatchSize)
else:
batchSize = 1000
#Domain error using SQL Alchemy
logging.debug("Writting Dataframe to SQL Server database")
#hardcoded type beccause that is only type for now
with createDBConnection(tgtDefiniton.tgtDatabaseServer
,tgtDefiniton.tgtDatabaseDatabase
,tgtDefiniton.tgtDatabaseUser
,tgtDefiniton.tgtDatabasePassword,tgtDefiniton.tgtDataType).connect().execution_options(schema_translate_map={
None: tgtDefiniton.tgtDatabaseSchema}) as conn:
logging.debug("Writting DF to Database table {0}".format(tgtDefiniton.tgtDatabaseTable))
logging.debug("ifTableExists: {0}.".format(tgtDefiniton.ifTableExists))
if tgtDefiniton.ifTableExists == "append":
logging.debug('Appending Data')
df.to_sql(tgtDefiniton.tgtDatabaseTable,con=conn,if_exists='append',chunksize = batchSize,index=False)
elif tgtDefiniton.ifTableExists == "replace":
logging.debug('Replacing Table and Data')
df.to_sql(tgtDefiniton.tgtDatabaseTable,con=conn,if_exists='replace',chunksize = batchSize,index=False)
else:
df.to_sql(tgtDefiniton.tgtDatabaseTable,con=conn,if_exists='fail',index=False)
logging.debug("Data wrote to database")
except Exception as e:
logging.error(e)
raise
Error
(Background on this error at: http://sqlalche.me/e/e3q8)
2021-08-30 13:31:42 ERROR (pymssql.OperationalError) (2714, b"There is already an object
named 'test' in the database.DB-Lib error message 20018, severity 16:\nGeneral SQL Server
error: Check messages from the SQL Server\n")
EDIT:
Log Entry
2021-08-30 13:31:36 DEBUG Writting Dataframe to SQL Server database
2021-08-30 13:31:36 DEBUG create_engine(mssql+pymssql://REST OF CONNECTION INFO
2021-08-30 13:31:36 DEBUG DB Engine Created
2021-08-30 13:31:36 DEBUG Writting DF to Database table test
2021-08-30 13:31:36 DEBUG ifTableExists: append.
2021-08-30 13:31:36 DEBUG Appending Data
2021-08-30 13:31:42 ERROR (pymssql.OperationalError) (2714, b"There is already an object named 'test' in the database.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n")
[SQL:
I had the same problem and I found two ways to solve it although I lack the insight as to why this solves it:
Either pass the database name in the url when creating a connection
or pass the database name as a schema in pd.to_sql.
Doing both does not hurt.
```
#create connection to MySQL DB via sqlalchemy & pymysql
user = credentials['user']
password = credentials['password']
port = credentials['port']
host = credentials['hostname']
dialect = 'mysql'
driver = 'pymysql'
db_name = 'test_db'
# setup SQLAlchemy
from sqlalchemy import create_engine
cnx = f'{dialect}+{driver}://{user}:{password}#{host}:{port}/'
engine = create_engine(cnx)
# create database
with engine.begin() as con:
con.execute(f"CREATE DATABASE {db_name}")
############################################################
# either pass the db_name vvvv - HERE- vvvv after creating a database
cnx = f'{dialect}+{driver}://{user}:{password}#{host}:{port}/{db_name}'
############################################################
engine = create_engine(cnx)
table = 'test_table'
col = 'test_col'
with engine.begin() as con:
# this would work here instead of creating a new engine with a new link
# con.execute(f"USE {db_name}")
con.execute(f"CREATE TABLE {table} ({col} CHAR(1));")
# insert into database
import pandas as pd
df = pd.DataFrame({col : ['a','b','c']})
with engine.begin() as con:
# this has no effect here
# con.execute(f"USE {db_name}")
df.to_sql(
name= table,
if_exists='append',
con=con,
############################################################
# or pass it as a schema vvvv - HERE - vvvv
#schema=db_name,
############################################################
index=False
)```
Tested with python version 3.8.13 and sqlalchemy 1.4.32.
Same problem might have appeared here and here.
If I understood you correctly you are trying to upload pandas dataframe into SQL table that already exists. Then you just need to create a connection with sql alchemy and write your data to the table:
import pyodbc
import sqlalchemy
import urllib
from sqlalchemy.pool import NullPool
serverName = 'Server_Name'
dataBase = 'Database_Name'
conn_str = urllib.parse.quote_plus(
r'DRIVER={SQL Server};SERVER=' + serverName + r';DATABASE=' + dataBase + r';TRUSTED_CONNECTION=yes')
conn = 'mssql+pyodbc:///?odbc_connect={}'.format(conn_str) #IF you are using MS Sql Server Studio
engine = sqlalchemy.create_engine(conn, poolclass=NullPool)
connection = engine.connect()
sql_table.to_sql('Your_Table_Name', engine, schema='Your_Schema_Name', if_exists='append', index=False,
chunksize=200)
connection.close()

Getting error on python while transferring data from SQL server to snowflake

I am getting below error
query = command % processed_params TypeError: not all arguments
converted during string formatting
I am trying to pull data from SQL server and then inserting it into Snowflake
my below code
import pyodbc
import sqlalchemy
import snowflake.connector
driver = 'SQL Server'
server = 'tanmay'
db1 = 'testing'
tcon = 'no'
uname = 'sa'
pword = '123'
cnxn = pyodbc.connect(driver='{SQL Server}',
host=server, database=db1, trusted_connection=tcon,
user=uname, password=pword)
cursor = cnxn.cursor()
cursor.execute("select * from Admin_tbldbbackupdetails")
rows = cursor.fetchall()
#for row in rows:
# #data = [(row[0], row[1],row[2], row[3],row[4], row[5],row[6], row[7])]
print (rows[0])
cnxn.commit()
cnxn.close()
connection = snowflake.connector.connect(user='****',password='****',account='*****')
cursor2 = connection.cursor()
cursor2.execute("USE WAREHOUSE FOOD_WH")
cursor2.execute("USE DATABASE Test")
sql1="INSERT INTO CN_RND.Admin_tbldbbackupdetails_ip"
"(id,dbname, dbpath, backupdate, backuptime, backupStatus, FaildMsg, Backupsource)"
"values (?,?,?,?,?,?,?,?)"
cursor2.execute(sql1,*rows[0])
It's obviously string parsing error.
You missed to provide parameter to %s printout.
If you cannot fix it step back and try another approach.
Use another script to achieve the same and get back to you bug tomorrow :-)
My script is doing pretty much the same:
1. Connect to SQL Server
-> fetchmany
-> multipart upload to s3
-> COPY INTO Snowflake table
Details are here: Snowpipe-for-SQLServer

Exporting pandas dataframe to MySQL using SQLAlchemy

I intend to export a pandas dataframe to MySQL using SQLAlchemy. Despite referring to all previous posts, I am unable to solve the issue:
import pandas as pd
import pymysql
from sqlalchemy import create_engine
df=pd.read_excel(r"C:\Users\mazin\1-601.xlsx")
cnx = create_engine('mysql+pymysql://[root]:[aUtO1115]#[localhost]:[3306]/[patenting in psis]', echo=False)
df.to_sql(name='inventor_dataset', con=cnx, if_exists = 'replace', index=False)
Following is the error:
OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect
to MySQL server on 'localhost]:[3306' ([Errno 11001] getaddrinfo
failed)")
After a lot of tinkering with the code and exploring different packages in Python, I was able to make the code work.
Code:
import mysql.connector
import sqlalchemy
database_username = 'root'
database_password = 'mysql'
database_ip = '127.0.0.1'
database_name = 'patenting_in_psi'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}#{2}/{3}'.
format(database_username, database_password,
database_ip, database_name), pool_recycle=1, pool_timeout=57600).connect()
df22.to_sql(con=database_connection, name='university_dataset_ca', if_exists='append',chunksize=100)
database_connection.close()

Categories

Resources