I am VERY new to Azure and Azure functions, so be gentle. :-)
I am trying to write an Azure timer function (using Python) that will take the results returned from an API call and insert the results into a table in Azure SQL.
I am virtually clueless. If someone would be willing to handhold me through the process, it would be MOST appreciated.
I have the API call already written, so that part is done. What I totally don't get is how to get the results from what is returned into Azure SQL.
The result set I am returning is in the form of a Pandas dataframe.
Again, any and all assistance would be AMAZING!
Thanks!!!!
Here is an example that writes a panda data structure to and SQL Table:
import pyodbc
import pandas as pd
# insert data from csv file into dataframe.
# working directory for csv file: type "pwd" in Azure Data Studio or Linux
# working directory in Windows c:\users\username
df = pd.read_csv("c:\\user\\username\department.csv")
# Some other example server values are
# server = 'localhost\sqlexpress' # for a named instance
# server = 'myserver,port' # to specify an alternate port
server = 'yourservername'
database = 'AdventureWorks'
username = 'username'
password = 'yourpassword'
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
# Insert Dataframe into SQL Server:
for index, row in df.iterrows():
cursor.execute("INSERT INTO HumanResources.DepartmentTest (DepartmentID,Name,GroupName) values(?,?,?)", row.DepartmentID, row.Name, row.GroupName)
cnxn.commit()
cursor.close()
To make it work for your case you need to:
replace the read from csv file with your function call
Change the insert statement to match the structure of your SQL Table.
For more details see: https://learn.microsoft.com/en-us/sql/machine-learning/data-exploration/python-dataframe-sql-server?view=sql-server-ver15
Related
I am trying to run this code once a day to log the dataframes to make historical dataset.
I have connected mysql with pymysql to save my pandas dataframe into mysql using pymysql and converted pandas dataframe into sql using .to_sql method.
However, if I run this code 2 times, the name of the table overlaps and won't run 2nd time.
Therefore I need to change the name(data_day001, data_day002, data_day003...) of the table each time I run this code.
# Credentials to database connection
hostname="hostname"
dbname="sql_database"
uname="admin"
pwd="password"
# Create SQLAlchemy engine to connect to MySQL Database
engine = create_engine("mysql+pymysql://{user}:{pw}#{host}/{db}"
.format(host=hostname, db=dbname, user=uname, pw=pwd))
# Convert dataframe to sql table
channel_data.to_sql('data_day001', engine, index=False)
Please advise me how I could solve this problem.
Thank you so much in advance.
Use the inspect function:
from sqlalchemy import create_engine, inspect
def get_table_name(engine):
names = inspect(engine).get_table_names()
return f"data_day{len(names):03}"
engine = create_engine(...)
channel_data.to_sql(get_table_name(engine), engine, index=False)
After some days:
>>> inspect(engine).get_table_names()
['data_day000', 'data_day001', 'data_day002', 'data_day003', 'data_day004']
I am extracting millions of data from sql server and inserting into oracle db using python. It is taking 1 record to insert in oracle table in 1 sec.. takes hours to insert. What is the fastest approach to load ?
My code below:
def insert_data(conn,cursor,query,data,batch_size = 10000):
recs = []
count = 1
for rec in data:
recs.append(rec)
if count % batch_size == 0:
cursor.executemany(query, recs,batcherrors=True)
conn.commit()`enter code here`
recs = []
count = count +1
cursor.executemany(query, recs,batcherrors=True)
conn.commit()
Perhaps you cannot buy a 3d Party ETL tool, but you can certainly write a procedure in PL/SQL in the oracle database.
First, install the oracle Transparenet Gateway for ODBC. No license cost involved.
Second, in the oracl db, create a db link to reference the MSSQL database via the gateway.
Third, write a PL/SQL procedure to pull the data from the MSSQL database, via the db link.
I was once presented a problem similar to yours. developer was using SSIS to copy around a million rows from mssql to oracle. Taking over 4 hours. I ran a trace on his process and saw that it was copying row-by-row, slow-by-slow. Took me less than 30 minutes write a pl/sql proc to copy the data, and it completed in less than 4 minutes.
I give a high-level view of the entire setup and process, here:
EDIT:
Thought you might like to see exactly how simple the actual procedure is:
create or replace my_load_proc
begin
insert into my_oracle_table (col_a,
col_b,
col_c)
select sql_col_a,
sql_col_b,
sql_col_c
from mssql_tbl#mssql_link;
end;
My actual procedure has more to it, dealing with run-time logging, emailing notification of completion, etc. But the above is the 'guts' of it, pulling the data from mssql into oracle.
then you might wanna use pandas or pyspark or other big data frameworks available on python
there are a lot of example out there, here is how to load data from Microsoft Docs:
import pyodbc
import pandas as pd
import cx_Oracle
server = 'servername'
database = 'AdventureWorks'
username = 'yourusername'
password = 'databasename'
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
query = "SELECT [CountryRegionCode], [Name] FROM Person.CountryRegion;"
df = pd.read_sql(query, cnxn)
# you do data manipulation that is needed here
# then insert data into oracle
conn = create_engine('oracle+cx_oracle://xxxxxx')
df.to_sql(table_name, conn, index=False, if_exists="replace")
something like that, ( that might not work 100% , but just to give you an idea how you can do it)
I am trying to open a .sqlite3 file in python but I see no information is returned. So I tried r and still get empty for tables. I would like to know what tables are in this file.
I used the following code for python:
import sqlite3
from sqlite3 import Error
def create_connection(db_file):
""" create a database connection to the SQLite database
specified by the db_file
:param db_file: database file
:return: Connection object or None
"""
try:
conn = sqlite3.connect(db_file)
return conn
except Error as e:
print(e)
return None
database = "D:\\...\assignee.sqlite3"
conn = create_connection(database)
cur = conn.cursor()
rows = cur.fetchall()
but rows are empty!
This is where I got the assignee.sqlite3 from:
https://github.com/funginstitute/downloads
I also tried RStudio, below is the code and results:
> con <- dbConnect(drv=RSQLite::SQLite(), dbname="D:/.../assignee")
> tables <- dbListTables(con)
But this is what I get
first make sure you provided correct path on your connection string to the sql
light db ,
use this conn = sqlite3.connect("C:\users\guest\desktop\example.db")
also make sure you are using the SQLite library in the unit tests and the production code
check the types of sqllite connection strings and determain which one your db belongs to :
Basic
Data Source=c:\mydb.db;Version=3;
Version 2 is not supported by this class library.
SQLite
In-Memory Database
An SQLite database is normally stored on disk but the database can also be
stored in memory. Read more about SQLite in-memory databases.
Data Source=:memory:;Version=3;New=True;
SQLite
Using UTF16
Data Source=c:\mydb.db;Version=3;UseUTF16Encoding=True;
SQLite
With password
Data Source=c:\mydb.db;Version=3;Password=myPassword;
so make sure you wrote the proper connection string for your sql lite db
if you still cannot see it, check if the disk containing /tmp full otherwise , it might be encrypted database, or locked and used by some other application maybe , you may confirm that by using one of the many tools for sql light database ,
you may downliad this tool , try to navigate directly to where your db exist and it will give you indication of the problem .
download windows version
Download Mac Version
Download linux version
good luck
I am a new Python coder and also a new data scientist so please forgive any foolish sounding things here. I'll keep the details out unless anyone's curious but basically I need to connect to Microsoft SQL Server and upload a Pandas DF that is relatively large (~500k rows) and I need to do this almost every day as the project currently stands.
It doesn't have to be a Pandas DF - I've read about using odo for csv files but I haven't been able to get anything to work. The issue I'm having is that I can't bulk insert the DF because the file isn't on the same machine as the SQL Server instance. I'm consistently getting errors like the following:
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC SQL
Server Driver][SQL Server]Incorrect syntax near the keyword 'IF'.
(156) (SQLExecDirectW)")
As I've attempted different SQL statements you can replace IF with whatever has been the first COL_NAME in the CREATE statement. I'm using SQLAlchemy to create the engine and connect to the database. This may go without saying but the pd.to_sql() method is just way too slow for how much data I'm moving so that's why I need something faster.
I'm using Python 3.6 by the way. I've put down here most of the things that I've tried that haven't been successful.
import pandas as pd
from sqlalchemy import create_engine
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 1)), columns=list('test_col'))
address = 'mssql+pyodbc://uid:pw#server/path/database?driver=SQL Server'
engine = create_engine(address)
connection = engine.raw_connection()
cursor = connection.cursor()
# Attempt 1 <- This failed to even create a table at the cursor_execute statement so my issues could be way in the beginning here but I know that I have a connection to the SQL Server because I can use pd.to_sql() to create tables successfully (just incredibly slowly for my tables of interest)
create_statement = """
DROP TABLE test_table
CREATE TABLE test_table (test_col)
"""
cursor.execute(create_statement)
test_insert = '''
INSERT INTO test_table
(test_col)
values ('abs');
'''
cursor.execute(test_insert)
Attempt 2 <- From iabdb WordPress blog I came across
def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
records = [str(tuple(x)) for x in take_rates.values]
insert_ = """
INSERT INTO test_table
("A")
VALUES
"""
for batch in chunker(records, 2): # This would be set to 1000 in practice I hope
print(batch)
rows = str(batch).strip('[]')
print(rows)
insert_rows = insert_ + rows
print(insert_rows)
cursor.execute(insert_rows)
#conn.commit() # don't know when I would need to commit
conn.close()
# Attempt 3 # From a related Stack Exchange Post
create the table but first drop if it already exists
command = """DROP TABLE IF EXISTS test_table
CREATE TABLE test_table # these columns are from my real dataset
"Serial Number" serial primary key,
"Dealer Code" text,
"FSHIP_DT" timestamp without time zone,
;"""
cursor.execute(command)
connection.commit()
# stream the data using 'to_csv' and StringIO(); then use sql's 'copy_from' function
output = io.StringIO()
# ignore the index
take_rates.to_csv(output, sep='~', header=False, index=False)
# jump to start of stream
output.seek(0)
contents = output.getvalue()
cur = connection.cursor()
# null values become ''
cur.copy_from(output, 'Config_Take_Rates_TEST', null="")
connection.commit()
cur.close()
It seems to me that MS SQL Server is just not a nice Database to play around with...
I want to apologize for the rough formatting - I've been at this script for weeks now but just finally decided to try to organize something for StackOverflow. Thank you very much for any help anyone can offer!
If you only need to replace the existing table, truncate it and use bcp utility to upload the table. It's much faster.
from subprocess import call
command = "TRUNCATE TABLE test_table"
take_rates.to_csv('take_rates.csv', sep='\t', index=False)
call('bcp {t} in {f} -S {s} -U {u} -P {p} -d {db} -c -t "{sep}" -r "{nl}" -e {e}'.format(t='test_table', f='take_rates.csv', s=server, u=user, p=password, db=database, sep='\t', nl='\n')
You will need to install bcp utility (yum install mssql-tools on CentOS/RedHat).
'DROP TABLE IF EXISTS test_table' just looks like invalid tsql syntax.
you can do something like this:
if (object_id('test_table') is not null)
DROP TABLE test_table
I have a situation, where I want to write the dataframe ex: store_dataframe
to remote mysql database.
store_dataframe contains around 15 columns and 200000 rows. I am using pandas to create this dataframe.
from sqlalchemy import create_engine
My DB connection looks like
def db_connection():
dbServer='xxx.xxx.xxx.xxx'
dbPass='xxx'
dbSchema='accounts'
dbUser='xxx'
db=dbapi.connect(host=dbServer,db=dbSchema,user=dbUser,passwd=dbPass)
return db
I am writing this data to remote mysql database using following lines of code.
store_dataframe = getthedataframe()
engine = create_engine('mysql+mysqlconnector://xxx:xxx#xxx.xxx.xxx.xxx:3306/merchant', echo=False)
res.to_sql(name='store_dataframe', con=engine, if_exists = 'replace', index=False,chunksize=5000)
db.commit()
Issue:
I am not able to write the dataframe to db unless I restart the database .
Its weird to restart the database every time I want to write this. What mistake I am making?
P.S : does not throw any error,the program hangs/doesn't respond.
My application resides in one system which connects to mysql in another system