I am trying to build a connection to a SQL Server database through Azure Databricks. While there are a number of questions attempting to alleviate issues here, I have yet to find a solution for mine.
I am able to connect, at least it seems because the DataFrame (spark) object reads the exact schema from my database table, yet in attempting to look at the data (display(df) or df.show()) it throws a connection error.
Here is how I'm connecting:
jdbcUrl = "jdbc:sqlserver://{}:{};database={}".format(jdbcHostname, jdbcPort, jdbcDatabase)
connectionProperties = {
'user': jdbcUsername,
'password': jdbcPassword,
"driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
pushdown_query = "(select * from persons where personid = 3040) Person"
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)
display(df)
I can see the object df and it correctly identifies all 47 fields in the table (see below photo), but display(df) throws the following error:
SQLServerException: The TCP/IP connection to the host hornets-sql.westus.cloudapp.azure.com, port 1433 has failed. Error: "connect timed out. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
By default, SQL Server has a firewall enabled and don't allow access from arbitrary IPs. In your case you may have the driver node allowed in the firewall - that's why you get the table schema, but the reading itself happens on the worker nodes that may not be enabled on the firewall, and reading fails.
You may try to solve it by:
Adding public IPs all worker nodes to the firewall of SQL server
or by configuring the private link/endpoint for Azure SQL into the VNet of your
or using service endpoints for Azure services
The last two items are well covered in the Databrick's blog post.
Related
Recently, one of our servers was migrated to 3-node cluster from a pylon server. The connection string below is what I used previously via python and pyodbc and never had any issues.
server = 'test_server'
database = 'test_db'
cnxn = 'DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';Trusted_Connection=yes'
With the new server I started receiving time out errors. So i thought I had to add MultiSubnetFailover to the connection string such as the following
server = 'test_server'
database = 'test_db'
cnxn = 'DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';Trusted_Connection=yes;MultiSubnetFailover=True'
However, I am still receiving a time out error as well as an additiaonl error seen below
[Microsoft][ODBC SQL Server Driver]Login timeout expired (0) (SQLDriverConnect); [HYT00] [Microsoft][ODBC SQL Server Driver]Invalid connection string attribute (0)
Does pyodbc support MultiSubnetFailover? I couldn't find documentation one way or another.
If so, how do I implement it? On the other side, if it does not, how would i go about connecting?
Lastly, should I use the IP address instead?
The ancient SQL Server ODBC driver that ships with Windows doesn't support MultiSubnetFailover. I suggest you move to a modern driver or have your DBA set RegisterAllProvidersIP to zero to support down level clients.
In the interim, you could specify the current listener IP address or the host name of the current primary node. However, that will fail if the primary is failed over to a secondary node on a different subnet.
I'm trying to connect to Amazon RDS Postgresql database with this python code
import psycopg2
engine = psycopg2.connect(
database="vietop2database",
user="postgres",
password="07041999",
host="vietop2.cf4afg8yq42c.us-east-1.rds.amazonaws.com",
port='5433'
)
cursor = engine.cursor()
print('opened database successfully')
I encountered an error:
could not connect to server: Connection timed out
Is the server running on host "vietop2.cf4afg8yq42c.us-east-1.rds.amazonaws.com" (54.161.159.194) and accepting
TCP/IP connections on port 5433?
I consulted this trouble shooting on amazon and I already make sure the DB instance's public accessibility is set to Yes to allow external connections. I also changed port to 5433 and set VPC security to default. Yet, I fail to connect to the database. What might be the reasons? Please help me. Thank you very much
Below are the database connectivity and configuration information
I found the answer. I need to add new inbound rule allowing all traffic of IPv4 type
I have a postgres database running on a digital ocean server. The database is protected by a firewall and ssl root certificate, I add the Outbound addresses provided by the Azure Function App to the database firewall and I am passing the certificate through the connection string.
pg_conn = psycopg2.connect(host= os.environ.get("PG_HOST"),database= os.environ.get("PG_DB"), user=os.environ.get("PG_USER"), password=os.environ.get("PG_PASSWORD"), port=os.environ.get("PG_PORT"), sslmode='require', sslrootcert = r'my-proyect/certificate.crt' )
But when I upload my function to the cloud the connection sends a timeout
Connection timed out Is the server running on that host and accepting TCP/IP connections?
As per my knowledge, A connection time-out error is typically due to connectivity issues or networking issues.
Firewall if not allowing the access to the port number which application has.
Here is the tool for troubleshooting these sort of issues is portgry
portqry -n [hostname] -e [port number]
you can even add applications to Trusted Resources in Postgry SQL
Here is the document which has complete information about connection time out error.
I wrote a simple lambda function in python to fetch some data from AWS RDS. PostgreSQL is the database engines.
conn = psycopg2.connect(host=hostname, user=username, password=password, dbname=db_name, connect_timeout=50)
I did like this. But it didn't work. Always returns an error like this
Response:
{
"errorMessage": "2018-06-06T11:28:53.775Z Task timed out after 3.00 seconds"
}
How can I resolve this??
It is most probably timing-out because the network connection cannot be established.
If you wish to connect to the database via a public IP address, then your Lambda function should not be connected to the VPC. Instead, the connection will go from Lambda, via the internet, into the VPC and to the Amazon RDS instance.
If you wish to connect to the database via a private IP address, then your Lambda function should be configured to use the same VPC as the Amazon RDS instance.
In both cases, the connection should be established using the DNS Name of the RDS instance, but it will resolve differently inside and outside of the VPC.
Finally, the Security Group associated with the Amazon RDS instance needs to allow the incoming connection. This, too, will vary depending upon whether the request is coming from public or private space. You can test by opening the security group to 0.0.0.0/0 and, if it works, then try to restrict it to the minimum possible range.
My problem: My team is writing a program in python and, locally, I have a db (let's call it test.db) on SQL SERVER. So I have the following code:
connection = pypyodbc.connect( "DRIVER={SQL Server};SERVER=localhost;trusted_connection='yes';DATABASE=test")
My issue now, is my teammates need to use my Database. All I've found online is how to allow access to my DB through Management Studio ( so allow TCP/IP and allow port 1434)
I would like to have something like this
connection = pypyodbc.connect( "DRIVER={SQL Server};SERVER="My ip address" ;trusted_connection='yes';DATABASE=test")
Is this possible? ( where my ip address = some kind of number)
I was thinking of installing a server that listens for incoming connections but I've never really done this so I was wondering if there was another way to go about this.