Issue with inserting data into Sql server using Python Pandas Dataframe - python

I am trying to pull data from a REST API and insert it into SQL Server. If we have the script do the PhotoBinary,Filetype together it works but as soon as I add the ID which is an integer we get the error below. Also if I just have it pull ID on its own from the API it works.
I am trying to pull 3 pieces of information
The EmployeeID which is an int.
The Binary String representation of the image
The file type of the original file e.g.: .jpg
The target table is setup as:
Create table Employee_Photo
(
EmployeeID int,
PhotoBinary varchar(max),
FileType varchar(10)
)
The Error I get is:
Traceback (most recent call last):
File "apiphotopullwithid.py", line 64, in <module>
cursor.execute("INSERT INTO dbo.Employee_Photo([EmployeeID],[PhotoBinary],[FileType]) values (?,?,?)", row['EMPID'],row['Photo'],row['PhotoType'])
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 5 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)')
import json
import pandas as pd
import sqlalchemy
import pyodbc
import requests
url = "https://someurl.com/api/PersonPhoto"
headers = {
'Accept': "application/json",
'Authorization': "apikey XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
'Content-Type': "application/json",
'cache-control': "no-cache"
}
response = requests.request("GET", url, headers=headers)
data = json.loads(response.text)
ID,Photo,PhotoType = [],[],[]
for device in data['PersonPhoto']:
ID.append(device[u'ID'])
Photo.append(device[u'Photo'])
PhotoType.append(device[u'PhotoType'])
df = pd.DataFrame([ID,Photo,PhotoType]).T
df.columns = ['EMPID','Photo','PhotoType']
df = df.astype({'EMPID':'Int64'})
connStr = pyodbc.connect(
"DRIVER={SQL Server};"
"SERVER=SQLTest;"
"Database=Intranet123;"
"Trusted_Connection=yes;"
#"UID=ConnectME;"
#"PWD={Password1}"
)
cursor = connStr.cursor()
for index,row in df.iterrows():
cursor.execute("INSERT INTO dbo.Employee_Photo([EmployeeID],[PhotoBinary],[FileType]) values (?,?,?)", row['EMPID'],row['Photo'],row['PhotoType'])
connStr.commit()
cursor.close()
connStr.close()

In most Python database APIs including pyodbc adhering to the PEP 249 specs, the parameters argument in cursor.execute() is usually a sequence (i.e., tuple, list). Therefore, bind all values into an iterable and not as three separate argument values:
sql = "INSERT INTO dbo.Employee_Photo ([EmployeeID],[PhotoBinary],[FileType]) VALUES (?,?,?)"
# TUPLE
cursor.execute(sql, (row['EMPID'], row['Photo'], row['PhotoType']))
# LIST
cursor.execute(sql, [row['EMPID'], row['Photo'], row['PhotoType']])
By the way, avoid the explicit iterrows loop and use implicit loop with executemany using Pandas' DataFrame.values:
# EXECUTE PARAMETERIZED QUERY
sql_cols = ['EMPID', 'Photo', 'PhotoType']
cursor.executemany(sql, df[sql_cols].values.tolist())
conn.commit()
Actually, you do not even need Pandas as a middle layer (use library for just data science) and interact with original returned json:
# NESTED LIST OF TUPLES
vals = [(int(device[u'ID']), device[u'Photo'], device[u'PhotoType']) \
for device in data['PersonPhoto']]
cursor.executemany(sql, vals)
conn.commit()

You're using the old Windows built-in SQL Server driver. Try the newer one, which you can get from here for multiple platforms.
Don't read too much into the error message. Something is malformed in the network protocol layer.
Can you dump the types and values of the parameters causing the issue. My guess is that the driver is setting the parameter types incorrectly.
EG:
for index,row in df.iterrows():
empid = row['EMPID']
photo = row['Photo']
photoType = row['PhotoType']
print("empid is ",type(empid), " photo is ", type(photo), " photoType is ", type(photoType))
print("empid: ",empid, " photo: ", photo, " photoType: ", photoType)
cursor.execute("INSERT INTO dbo.Employee_Photo([EmployeeID],[PhotoBinary],[FileType]) values (?,?,?)", empid,photo,photoType)
connStr.commit()
cursor.close()
connStr.close()

Related

Want to create python API and integrated with swagger/postman

Requirement: 1. I want to create python API which will help to insert data in big query table and this API will host in swagger/postman, from there user can provide input data so that it will get reflected in big query table.
Can anyone help me to find out suitable solution with code
import sqlite3 as sql
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('path/to/file.json')
project_id = 'project_id'
client = bigquery.Client(credentials= credentials,project=project_id)
def add_data(group_name, user_name):
try:
# Connecting to database
con = sql.connect('shot_database.db')
# Getting cursor
c = con.cursor()
# Adding data
job_config.use_legacy_sql = True
query_job = client.query("""
INSERT INTO `table_name` (group, user)
VALUES (%s, %s)""",job_config = job_config)
results = query_job.result() # Wait for the job to complete.
# Applying changes
con.commit()
except:
print("An error has occured")
The code you provided is a mix of SQLite and BigQuery, but it likes that you're trying to use BigQuery to insert data into a table. To insert data into a BigQuery table using Python, you can use the insert_data() method of the Client class. Here's I am adding an example of how you can use this method to insert data into a table called "mytable" in a dataset called "mydataset":
# Define the data you want to insert
data = [
{
"group": group_name,
"user": user_name
}
]
# Insert the data
table_id = "mydataset.mytable"
errors = client.insert_data(table_id, data)
if errors == []:
print("Data inserted successfully")
else:
print("Errors occurred while inserting data:")
print(json.dumps(errors, indent=2))
Then, You can create an API using Flask or Django and call the add_data method which you have defined to insert data into big query table.

fastapi snowflake connection only pulling 1 record

I am trying to read data from snowflake database using FASTAPI. I was able to create the connection which is able to pull data from snowflake.
The issue which I am facing right now is that I am only getting 1 record (instead of 10 records).
I suspect I am not using correct keyword while returning the data. appreciate any help.
Here is my code :-
from fastapi import FastAPI
import snowflake.connector as sf
import configparser
username='username_value'
password='password_value'
account= 'account_value'
warehouse= 'test_wh'
database= 'test_db'
ctx=sf.connect(user=username,password=password,account=account,warehouse=warehouse,database=database)
app = FastAPI()
#app.get('/test API')
async def fetchdata():
cursor = ctx.cursor()
cursor.execute("USE WAREHOUSE test_WH ")
cursor.execute("USE DATABASE test_db")
cursor.execute("USE SCHEMA test_schema")
sql = cursor.execute ("SELECT DISTINCT ID,NAME,AGE,CITY FROM TEST_TABLE WHERE AGE > 60")
for data in sql:
return data
You use return in your inner for-loop. This will return the first row encountered.
If you want to return all rows as a list, you can probably do (I'm not familiar with the snowflake connector):
return list(data)
instead of the for-loop, or sql.fetchall().

Save resulting dict from api into db - psycopg2

I want to save an API response, on some table of my database, I'm using Postgres along with psycopg2.
This is my code:
import json
import requests
import psycopg2
def my_func():
response = requests.get("https://path/to/api/")
data = response.json()
while data['next'] is not None:
response = requests.get(data['next'])
data = response.json()
for item in data['results']:
try:
connection = psycopg2.connect(user="user",
password="user",
host="127.0.0.1",
port="5432",
database="mydb")
cursor = connection.cursor()
postgres_insert_query = """ INSERT INTO table_items (NAME VALUES (%s)"""
record_to_insert = print(item['name'])
cursor.execute(postgres_insert_query, record_to_insert)
connection.commit()
count = cursor.rowcount
print (count, "success")
except (Exception, psycopg2.Error) as error :
if(connection):
print("error", error)
finally:
if(connection):
cursor.close()
connection.close()
my_func()
I mean, I just wanted to sort of "print" all the resulting data from my request into the db, is there a way to accomplish this?
I'm a bit confused as You can see, I mean, what could be some "print" equivalent to achieve this?
I mean, I just want to save from the API response, the name field, into the database table. Or actually INSERT that, I guess psycopg2 has some sort of function for this circumstance?
Any example You could provide?
EDIT
Sorry, I forgot, if I run this code it will throw this:
PostgreSQL connection is closed
A particular name
Failed to insert record into table_items table syntax error at or near "VALUES"
LINE 1: INSERT INTO table_items (NAME VALUES (%s)
There are a few issues here. I'm not sure what the API is or what it is returning, but I will make some assumptions and suggestions based on those.
There is a syntax error in your query, it is missing a ) it should be:
postgres_insert_query = 'INSERT INTO table_items (NAME) VALUES (%s)'
(I'm also assuming thatNAME` is a real column in your database).
Even with this correction, you will have a problem since:
record_to_insert = print(item['name']) will set record_to_insert to None. The return value of the print function is always None. The line should instead be:
record_to_insert = item['name']
(assuming the key name in the dict item is actually the field you're looking for)
I believe calls to execute must pass replacements as a tuple so the line: cursor.execute(postgres_insert_query, record_to_insert) should be:
cursor.execute(postgres_insert_query, (record_to_insert,))

Memory error on linux server while fetching data from bigquery using python?

I am trying to fetch data from big query using python. The code runs fine on my laptop but throws memory error on Linux server. Can this be optimized so that it can run on the server as well?
Error : table has 5 million rows...Linux machine with 8 GB ram....error "out of memory", process killed
Below is the code:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/Users/Desktop/big_query_test/soy-serenity-89ed73.json"
client = bigquery.Client()
# Perform a query.
QUERY = “SELECT * FROM `soy-serenity-89ed73.events10`”
query_job = client.query(QUERY)
df = query_job.to_dataframe()
I can suggest two approaches:
option 1
SELECT the data in chunks to reduce the size of the data you received on each iteration from BigQuery.
For example, your table is partition you can do this:
WHERE _PARTITIONTIME = currentLoopDate
where currentLoopDate will be a date variable in your python code (Similar option will be to use ROW_NUMBER
option 2
By using BigQuery client library you can use a Jobs.insert API and set configuration.query.priority to batch.
# from google.cloud import bigquery
# client = bigquery.Client()
query = (
'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
'WHERE state = "TX" '
'LIMIT 100')
query_job = client.query(
query,
# Location must match that of the dataset(s) referenced in the query.
location='US') # API request - starts the query
for row in query_job: # API request - fetches results
# Row values can be accessed by field name or index
assert row[0] == row.name == row['name']
print(row)
See this link for some more details
After you get the jobId write a loop using Jobs.getQueryResults to get the chunks of data by setting the maxResults parameter of the API

Error on updating nvarchar(max) column from Python

I'm updating the output of Google reverse geocoding (which is in JSON format),
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=localhost;DATABASE=mydb;UID=test;PWD=abc#123;autocommit=True')
cursor = cnxn.cursor()
wp = urllib.request.urlopen("http://maps.googleapis.com/maps/api/geocode/json?latlng=18.5504,73.9412&sensor=false")
pw = wp.read()
#print(pw)
cursor.execute("UPDATE GEOCODE_tbl SET JSON_str = ? WHERE GEOCODE_ID = ?", pw,749904)
print('Done')
cnxn.commit()
But it gives error
('22018', '[22018] [Microsoft][ODBC SQL Server Driver][SQL Server]Operand type clash: image is incompatible with nvarchar(max) (206) (SQLExecDirectW)')
What kind of error is that?
The JSON_str column has such JSON output, so I'm executing the task for those column whose JSON_str column is NULL.
Does anyone have any idea about it?
The value pw is not of type str. Try converting your query to this:
cursor.execute("UPDATE GEOCODE_tbl SET JSON_str = ? WHERE GEOCODE_ID = ?", (str(pw), 749904))
Good luck!

Categories

Resources