importing data to sql server database from json - python

I am attempting to import data into a SQL Server 2012 database from a large json file using Python 2.7. Everything in the below code executes and returns no error, however when I go into Sql Server Management Studio and query the table, it returns zero rows. Why is this?
import json, pyodbc
#import data
path = 'phys2211-001_clickstream_export'
records = [json.loads(line) for line in open(path)]
#connect to database, create db cursor
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=localhost;DATABASE=fall_2013_blended;Trusted_Connection=Yes')
cursor = cnxn.cursor()
#insert data into db
for record in records:
cursor.execute("insert into clickstream_json(json_event) values (?)",json.dumps(record))

I didn't have autocommit setup. Therefore it ran through everything then committed nothing to the database. It was similar to this question:
Issue creating table in MS SQL Database with Python script

Related

update statement under pyodbc issue

I am currently developing a program in python that interacts with multiple database. I am using pyodbc to connect, and execute queries. One of the database is an azure database. I noticed sometimes the sent data is not updated in the database although the program run successfully and no error was thrown. Is there any practices that i should follow to make sure this doesn't happen or is this related to my code or db connection issue? I am a beginner. Would appreciate everyone's help thank you!
Also is the .commit() line should be run after every sql run?
The program should be updating a row of data in the database based on a condition, this particular query sometimes doesn't take effect, but no error was thrown. I also executed multiple queries after that, no issue was found for the next queries. It is successfully executed.
the query is a simple query which is
UPDATE DraftVReg SET VRStatus = 'Potential Duplicate Found' WHERE RowID = ?
I tried to reproduce your scenario on my end and was able to update the SQL row in the Azure SQL DB with Pyodbc module.
Yes, Its very necessary to use
conn.commit
to commit your changes inside a database after you perform operations such as Update or Insert inside Azure SQL DB programmatically.
1) Fetch Data with Select statement.
I was able to fetch the Table’s data successfully with Select * from ‘Tablename’ query inside pyodbc code before I try UPDATE statement.
import pyodbc
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};''SERVER=tcp:sqlservernamesql.database.windows.net,1433;''DATABASE=databasename; UID=siliconuser;PWD=Password;')
#conn.commit()
cursor = conn.cursor()
cursor.execute('Select * FROM StudentReviews')
#conn.commit()
for i in cursor:
print(i)
cursor.close()
conn.close()
Result:-
2) UPDATE the rows require conn.commit()
Code :-
import pyodbc
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};''SERVER=tcp:siliconserversql.database.windows.net,1433;''DATABASE=silicondb; UID=userid; PWD=Password;')
cursor = conn.cursor()
#cursor.execute('Select * FROM StudentReviews')
cursor.execute("UPDATE StudentReviews SET ReviewTime = ('7') WHERE ReviewText = ('SQL DB')")
conn.commit()
cursor.close()
conn.close()
Result:-
Update statement Executed successfully and the Table Row was updated in Azure SQL, Refer Below :-
3) With autocommit=true
Thank you #Gord thompson for the comment and suggestion!
Code :-
import pyodbc
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};''SERVER=tcp:siliconserversql.database.windows.net,1433;''DATABASE=silicondb; UID=username; PWD=Password;', autocommit=True)
#conn.commit()
cursor = conn.cursor()
cursor.execute("UPDATE StudentReviews SET ReviewTime = ('8') WHERE ReviewText = ('SQL DB')")
cursor.close()
conn.close()
Results :- With autocommit=true, You do not need to add conn.commit everytime you update the SQL DB.

Data Types problem while ingesting from Salesforce to Azure SQL using pyodbc in Python

I'm ingesting data from Salesforce to Azure SQL Database using Python with pyodbc.
I make a first connexion with Salesforce as shown bellow:
cnxn = pyodbc.connect('DRIVER={Devart ODBC Driver for Salesforce};User ID=xxx;Password=xxx;Security Token=xxx')
Then I import Salesforce data, as shown bellow:
cursor = cnxn.cursor()
cursor.execute("select * from X where Y > 'VALUE'")
row = cursor.fetchall()
After that I make a second connexion with the destination which is Azure SQL Database, as shown bellow:
cnxn = pyodbc.connect('DRIVER={Devart ODBC Driver for SQL Azure};Server=xxx;Database=xxx;Port=1433;User ID=xxx;Password=xxx')
Until now, everything is working fine. But when I try to insert the output that I got from Salesforce (in variable row) I face data type problems, from which we can cite:
Tabulations "\t"
Back to the line sign (in Azure SQL Database is CHAR(13) + CHAR(10) +)
Charaters that contains quote (e.g. "big data's technology")
Here is how I launch the insertion query:
cursor.executemany('INSERT INTO dbo.Account (Column_a,Column_b,Column_c) VALUES (?,?,?,?)', row)
cursor.commit()
Here is the first error I get:
pyodbc.Error: ('HY000', '[HY000] [Devart][ODBC][Microsoft SQL Azure]Statement(s) could not be prepared.\r\nMust declare the scalar variable "#_39".\r\nLine 1: Specified scale 14 is invalid. (0) (SQLExecDirectW)')
This issue was apparently caused by a defect in the
Devart ODBC Driver for SQL Azure
Using Microsoft's
ODBC Driver 17 for SQL Server
solved the problem .

Connect to SQL Server and run query as "passthrough" from Python

I currently have code that executes queries on data stored on a SQL Server database, such as the following:
import pyodbc
conn = pyodbc.connect(
r'DRIVER={SQL Server};'
r'SERVER=SQL2SRVR;'
r'DATABASE=DBO732;'
r'Trusted_Connection=yes;'
)
sqlstr = '''
SELECT Company, Street_Address, City, State
FROM F556
WHERE [assume complicated criteria statement here]
'''
crsr = conn.cursor()
for row in crsr.execute(sqlstr):
print(row.Company, row.Street_Address, row.City, row.State)
I can't find documentation online of whether pyodbc can (or is by default) running my queries on the SQL Server (as passthrough queries), or whether (if pyodbc can't do that) there is another way (maybe sqlalchemy or similar?) of doing that. Any insight?
Or is there a way to execute passthrough queries directly from Pandas?
If you are working with pandas and SQL Server then you should already have created a SQLAlchemy Engine object (usually named engine). To execute a raw DML statement you can use the construct
with engine.begin() as conn:
conn.execute("UPDATE table_name SET column_name ...")
print("table updated")

Python to SQL Server Insert

I'm trying to follow the method for inserting a Panda data frame into SQL Server that is mentioned here as it appears to be the fastest way to import lots of rows.
However I am struggling with figuring out the connection parameter.
I am not using DSN , I have a server name, a database name, and using trusted connection (i.e. windows login).
import sqlalchemy
import urllib
server = 'MYServer'
db = 'MyDB'
cxn_str = "DRIVER={SQL Server Native Client 11.0};SERVER=" + server +",1433;DATABASE="+db+";Trusted_Connection='Yes'"
#cxn_str = "Trusted_Connection='Yes',Driver='{ODBC Driver 13 for SQL Server}',Server="+server+",Database="+db
params = urllib.parse.quote_plus(cxn_str)
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
conn = engine.connect().connection
cursor = conn.cursor()
I'm just not sure what the correct way to specify my connection string is. Any suggestions?
I have been working with pandas and SQL server for a while and the fastest way I found to insert a lot of data in a table was in this way:
You can create a temporary CSV using:
df.to_csv('new_file_name.csv', sep=',', encoding='utf-8')
Then use pyobdc and BULK INSERT Transact-SQL:
import pyodbc
conn = pyodbc.connect(DRIVER='{SQL Server}', Server='server_name', Database='Database_name', trusted_connection='yes')
cur = conn.cursor()
cur.execute("""BULK INSERT table_name
FROM 'C:\\Users\\folders path\\new_file_name.csv'
WITH
(
CODEPAGE = 'ACP',
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)""")
conn.commit()
cur.close()
conn.close()
Then you can delete the file:
import os
os.remove('new_file_name.csv')
It was a second to charge a lot of data at once into SQL Server. I hope this gives you an idea.
Note: don't forget to have a field for the index. It was my mistake when I started to use this lol.
Connection string parameter values should not be enclosed in quotes so you should use Trusted_Connection=Yes instead of Trusted_Connection='Yes'.

Create a schema in SQL Server using pyodbc

I am using pyodbc to read from a SQL Server database and create analogous copies of the same structure in a different database somewhere else.
Essentially:
for db in source_dbs:
Execute('create database [%s]' % db) # THIS WORKS.
for schema in db:
# The following result in an error starting with:
# [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]
Execute('create schema [%s].[%s]' % (db, schema)
# Incorrect syntax near '.'
Execute('use [%s]; create schema [%s]' %(db, schema)
# CREATE SCHEMA' must be the first statement in a query batch.
In this example, you can assume that Execute creates a cursor using pyodbc and executes the argument SQL string.
I'm able to create the empty databases, but I can't figure out how to create the schemas within them.
Is there a solution, or is this a limitation of using pyodbc with MS SQL Server?
EDIT: FWIW - I also tried to pass the database name to Execute, so I could try to set the database name in the connection string. This doesn't work either - it seems to ignore the database name completely.
Python database connections usually default to having transactions enabled (autocommit == False) and SQL Server tends to dislike certain DDL commands being executed in a transaction.
I just tried the following and it worked for me:
import pyodbc
connStr = (
r"Driver={SQL Server Native Client 10.0};"
r"Server=(local)\SQLEXPRESS;"
r"Trusted_connection=yes;"
)
cnxn = pyodbc.connect(connStr, autocommit=True)
crsr = cnxn.cursor()
crsr.execute("CREATE DATABASE pyodbctest")
crsr.execute("USE pyodbctest")
crsr.execute("CREATE SCHEMA myschema")
crsr.close()
cnxn.close()

Categories

Resources