I am trying to push back to SQL SERVER a data frame, but I am having a hard time doing so.
With the following code, I receive this error :
pyodbc.DataError: ('22008', '[22008] [Microsoft][ODBC SQL Server
Driver]Exceeding the capacity of the field datetime (0)
(SQLExecDirectW)')
Here's my code until now:
import pandas as pd
import pyodbc
import numpy as np
df = pd.read_excel(r'path.xlsx')
new_names = {"Calendar Date": "CALENDAR_DATE",
"Origin ID": "ORIGIN_ID",
"Dest ID": "DEST_ID",
"Destination Name": "DESTINATION_NAME",
"Destination City": "DESTINATION_CITY",
"Destination State": "DESTINATION_STATE",
"Carrier Name": "CARRIER_NAME",
"Stop Number": "STOP_NUMBER",
"Planned Arrival Time Start": "PLANNED_ARRIVAL_TIME_START",
"Planned Arrival Time End": "PLANNED_ARRIVAL_TIME_END",
"Delivery App't Time Start": "DELIVERY_APPT_TIME_START",
"Delivery App't Time End": "DELIVERY_APPT_TIME_END",
"Actual Delivery Departure Time": "ACTUAL_DELIVERY_DEPARTURE_TIME",
"Reason Code and Description": "REASON_CODE_AND_DESCRIPTION",
"Days Late Vs Plan": "DAYS_LATE_VS_PLAN",
"Hrs Late Vs Plan": "HRS_LATE_VS_PLAN",
"Days Late Vs Appt": "DAYS_LATE_VS_APPT",
"Hrs Late Vs Appt": "HRS_LATE_VS_APPT"}
df.rename(columns=new_names, inplace=True)
conn = pyodbc.connect('Driver={SQL Server};'
'Server=xxx;'
'Database=Business_Planning;'
'UID="xxx";'
'PWD="xxx";'
'Trusted_Connection=yes;')
cursor = conn.cursor()
SQL_Query = pd.read_sql_query('SELECT * FROM Business_Planning.dbo.OTD_1_DELIVERY_TRACKING_F_IMPORT', conn)
df2 = pd.DataFrame(SQL_Query, columns=["CALENDAR_DATE", "ORIGIN_ID", "DEST_ID", "DESTINATION_NAME", "DESTINATION_CITY",
"DESTINATION_STATE", "SCAC", "CARRIER_NAME", "SID", "STOP_NUMBER",
"PLANNED_ARRIVAL_TIME_START", "PLANNED_ARRIVAL_TIME_END",
"DELIVERY_APPT_TIME_START", "DELIVERY_APPT_TIME_END",
"ACTUAL_DELIVERY_DEPARTURE_TIME", "REASON_CODE_AND_DESCRIPTION",
"DAYS_LATE_VS_PLAN", "HRS_LATE_VS_PLAN", "DAYS_LATE_VS_APPT",
"HRS_LATE_VS_APPT"])
df3 = pd.concat([df2, df]).drop_duplicates(["SID", "STOP_NUMBER", "PLANNED_ARRIVAL_TIME_START"],
keep='last').sort_values(
["SID", "STOP_NUMBER", "PLANNED_ARRIVAL_TIME_START"])
df3['SID'].replace('', np.nan, inplace=True)
df3.dropna(subset=['SID'], inplace=True)
conn.execute('TRUNCATE TABLE Business_Planning.dbo.OTD_1_DELIVERY_TRACKING_F_IMPORT')
for index, row in df3.iterrows():
conn.execute(
"INSERT INTO OTD_1_DELIVERY_TRACKING_F_IMPORT([CALENDAR_DATE], [ORIGIN_ID], [DEST_ID], [DESTINATION_NAME], "
"[DESTINATION_CITY], [DESTINATION_STATE], [SCAC], [CARRIER_NAME], [SID], [STOP_NUMBER], "
"[PLANNED_ARRIVAL_TIME_START], [PLANNED_ARRIVAL_TIME_END], [DELIVERY_APPT_TIME_START], "
"[DELIVERY_APPT_TIME_END], [ACTUAL_DELIVERY_DEPARTURE_TIME], [REASON_CODE_AND_DESCRIPTION], "
"[DAYS_LATE_VS_PLAN], [HRS_LATE_VS_PLAN], [DAYS_LATE_VS_APPT], [HRS_LATE_VS_APPT]) "
"values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
row['CALENDAR_DATE'],
row['ORIGIN_ID'],
row['DEST_ID'],
row['DESTINATION_NAME'],
row['DESTINATION_CITY'],
row['DESTINATION_STATE'],
row['SCAC'],
row['CARRIER_NAME'],
row['SID'],
row['STOP_NUMBER'],
row['PLANNED_ARRIVAL_TIME_START'],
row['PLANNED_ARRIVAL_TIME_END'],
row['DELIVERY_APPT_TIME_START'],
row['DELIVERY_APPT_TIME_END'],
row['ACTUAL_DELIVERY_DEPARTURE_TIME'],
row['REASON_CODE_AND_DESCRIPTION'],
row['DAYS_LATE_VS_PLAN'],
row['HRS_LATE_VS_PLAN'],
row['DAYS_LATE_VS_APPT'],
row['HRS_LATE_VS_APPT'])
conn.commit()
conn.commit()
conn.close()
The error is coming from that part:
for index, row in df3.iterrows():
conn.execute(
"INSERT INTO OTD_1_DELIVERY_TRACKING_F_IMPORT([CALENDAR_DATE], [ORIGIN_ID], [DEST_ID], [DESTINATION_NAME], "
"[DESTINATION_CITY], [DESTINATION_STATE], [SCAC], [CARRIER_NAME], [SID], [STOP_NUMBER], "
"[PLANNED_ARRIVAL_TIME_START], [PLANNED_ARRIVAL_TIME_END], [DELIVERY_APPT_TIME_START], "
"[DELIVERY_APPT_TIME_END], [ACTUAL_DELIVERY_DEPARTURE_TIME], [REASON_CODE_AND_DESCRIPTION], "
"[DAYS_LATE_VS_PLAN], [HRS_LATE_VS_PLAN], [DAYS_LATE_VS_APPT], [HRS_LATE_VS_APPT]) "
"values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
row['CALENDAR_DATE'],
row['ORIGIN_ID'],
row['DEST_ID'],
row['DESTINATION_NAME'],
row['DESTINATION_CITY'],
row['DESTINATION_STATE'],
row['SCAC'],
row['CARRIER_NAME'],
row['SID'],
row['STOP_NUMBER'],
row['PLANNED_ARRIVAL_TIME_START'],
row['PLANNED_ARRIVAL_TIME_END'],
row['DELIVERY_APPT_TIME_START'],
row['DELIVERY_APPT_TIME_END'],
row['ACTUAL_DELIVERY_DEPARTURE_TIME'],
row['REASON_CODE_AND_DESCRIPTION'],
row['DAYS_LATE_VS_PLAN'],
row['HRS_LATE_VS_PLAN'],
row['DAYS_LATE_VS_APPT'],
row['HRS_LATE_VS_APPT'])
conn.commit()
The fields listed represent every columns from df3.
I can't seem to get it right, anybody has a clue?
I strongly advice you to use to_sql() pandas.DataFrame method.
Besides that use sqlalchemy lib to connect with your database too. Use this example:
import pyodbc
import sqlalchemy
engine = sqlalchemy.create_engine('mssql+pyodbc://{0}:{1}#{2}:1433/{3}?driver=ODBC+Driver+{4}+for+SQL+Server'.format(username,password,server,bdName,driverVersion))
pd.to_sql("TableName",con=engine,if_exists="append")
Related
I am trying to convert the below JSON into a pandas dataframe. The JSON is being captured using flask request method.
{
"json_str": [{
"TileName ": "Master",
"ImageLink ": "Link1",
"Report Details": [{
"ReportName": "Primary",
"ReportUrl": "link1",
"ADGroup": ["operations", "Sales"],
"IsActive": 1
}, {
"ReportName": "Secondry",
"ReportUrl": "link2",
"ADGroup": ["operations", "Sales"],
"IsActive": 1
}],
"OpsFlag": 1
}]
}
Now below are the code snippet that I am
Using `request.json.get() method to get the JSON
Converting into string using json.loads()
Normalizing using pd.json_normalize and finally
Using pyodbc to run a Stored Procedure to insert the data into Database
Below is the code snippets:
###Step 1 and 2###
json_strg = request.json.get("json_str",None) <----in flask app.py
json_strf = json.dumps(json_strg)
js_obj = json.loads(json_strf)
###Step 3###
df = pd.json_normalize(js_obj,record_path='Report Details',\
meta=
['TileName','ImageLink','OpsFlag'],errors='ignore').explode('ADGroup').apply(pd.Series)
Cols = ['TileName','ImageLink','ReportName','ReportUrl','ADGroup','OpsFlag','IsActive']
df= df[Cols]
###Step 4###
conn = pyodbc.connect(conn_string)
cur=conn.cursor()
for i,v in df.iterrows():
sql = """SET NOCOUNT ON;
EXEC [dbo].[mystored_proc] ?, ?, ?, ?, ?,?,?"""
value = tuple(v)
cur.execute(sql,value)
conn.commit()
The above code is giving the below error:
"('42000', '[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]The incoming
tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect.
Parameter 4 (\"\"): The supplied value is not a valid instance of data type float.
Check the source data for invalid values. An example of an invalid value is data of
numeric type with scale greater than precision. (8023) (SQLExecDirectW)')"
Strange thing is when I am running the above JSON string with the value of json_str part (i.e. string starting from '{"TileName:), no error is coming and it is actually inserting data into DB.
This means, there is no issue in Step 3 and 4. Issue is in step 1 and 2.
Any clue?
json file
"mappingdef": [
{
"src": "A",
"dest": "id"
},
{
"src": "B",
"dest": "expense_type"
},
{
"src": "C",
"dest": "balance"
},
{
"src": "D",
"dest": "debit"
},
{
"src": "E",
"dest": "credit"
},
{
"src": "F",
"dest": "total_balance"
}
]
my python script:
#changing excel column names
df.columns = ["A", "B", "C", "D", "E", "F"]
#fetching data from dataframe
for row in range(df.shape[0]):
col_A = str(df.at[row, "A"]),
col_B = str(df.at[row, "B"]),
col_C = float(df.at[row, "C"]),
col_D = float(df.at[row, "D"]),
col_E = float(df.at[row, "E"]),
col_F = float(df.at[row, "F"])
#query to insert data in database
query2 = """
INSERT INTO ocean_street_apartments(
id,
expense_type,
balance,
debit,
credit,
total_balance)
values (%s, %s, %s, %s, %s, %s)
"""
i have this table definition info in json which tells src as excel column, and dest as database table column name. i want to read an excel file through pandas and want to map excel column (src) to database table column (dest). i am working in python
Assuming that its JSON file so its an API get response.
Things i am assuming you know how to do:
1)fetch get response and what is returned is an array of object descriptions for every file.
2)create script to download this and move it to a DF.
Now you have a a list of direct links to our csv files! We can read these urls directly using pandas.read_csv(url).
If data is problematic transform them.
It's time to Directly Load DF into a SQL DB using pandas.DataFrame.to_sql
Code below describes how to connect to a SQLite db.
def upload_to_sql(filenames, db_name, debug=False):
""" Given a list of paths, upload to a database
"""
conn = sqlite3.connect(f"{db_name}.db")
if debug:
print("Uploading into database")
for i, file_path in tqdm(list(enumerate(filenames))):
dat = pd.read_csv(file_path)
# rename labels
filename = os.path.basename(file_path).split('.')[0]
dat = factor_dataframe(dat, filename)
# write records to sql database
if i == 0: # if first entry, and table name already exist, replace
dat.to_sql(db_name, con=conn, index = False, if_exists='replace')
else: # otherwise append to current table given db_name
dat.to_sql(db_name, con=conn, index = False, if_exists='append')
# upload into sql database
upload_to_sql(download_urls, 'example', debug=True)
import psycopg2
import ijson
conn = psycopg2.connect(
host="localhost",
database="sea",
user="postgres",
password="hemant888")
cursor = conn.cursor()
chunk_size = 10
skiprows = 5
file_name = "Ocean Street Apartments Trial Balance 03-22.xlsx"
cursor.execute("""
SELECT COUNT(*)
FROM information_schema.tables
WHERE table_name = 'ocean_street_apartments'
""")
if cursor.fetchone()[0] != 1:
columns = list()
datatype = list()
row = list()
with open("tabledef.json", "r") as f:
for record in ijson.items(f, "item"):
for i in record["def"]["tabledef"]["columns"]:
col = i["name"]
columns.append(col)
dt = i["datatype"]
datatype.append(dt)
for i in range(len(columns)):
row.append("{col} {dt}".format(col=columns[i], dt=datatype[i]))
query1 = "create table ocean_street_apartments(" + \
",".join(map(str, row)) + ")"
cursor.execute(query1)
conn.commit()
while True:
df_chunk = pd.read_excel(file_name, skiprows=skiprows,
nrows=chunk_size)
skiprows += chunk_size
# When there is no data, we know we can break out of the loop.
if not df_chunk.shape[0]:
break
else:
columns = list()
columns_table = list()
with open("tabledef.json", "r") as f:
for record in ijson.items(f, "item"):
for i in record["def"]["tabledef"]["mappingdef"]:
col = i["src"]
columns.append(col)
col_table = i["dest"]
columns_table.append(col_table)
query2 = "INSERT INTO ocean_street_apartments(" + ",".join(
map(str, columns_table)) + ")values (%s, %s, %s, %s, %s, %s)"
df_chunk.columns = columns
values_list = list()
for row in range(df_chunk.shape[0]):
for col in df_chunk.columns:
val = str(df_chunk.at[row, col])
values_list.append(val)
values = tuple(values_list)
cursor.execute(query2, values)
values_list = list(values)
values_list.clear()
values = tuple(values_list)
conn.commit()
conn.close()
im a beginner in python , Trying to connect access database to python with a json file loaded in my program so I can read it and eventually analyze it for certain things. But I can't connect to it and tried different approaches still getting the same error.
import mysql.connector
import json
# create the key
from mysql.connector import cursor
mydb = mysql.connector.connect(host='localhost', port='3306', user='root', password='nihad147', database='tweets')
mycursor = mydb.cursor()
sql_tweet = """INSERT INTO tweet ( tweet_id,
id_user,
text,
tweet_location,
created_at,
name_screen,
categorie_id,
)
VALUES (%s,%s,%s,%s,%s,%s,%s)"""
sql_user = """INSERT INTO tweetuser (
id_user,
name_screen,
location_user,
count_followers,
friends_count,
statuse_count)
VALUES (%s,%s,%s,%s,%s,%s)"""
sql_location = """"insert into tweet_location (
location_id,
latitude,
longitude
tweet_id
VALUES(%s,%s,%s,%s)"""
myJsonFile = open('tweets.json', encoding="utf-8")
mycursor.execute("DELETE FROM tweet")
mycursor.execute("DELETE FROM tweetuser")
mycursor.execute("DELETE FROM tweet_location")
c = 0
for line in myJsonFile:
c = c + 1
print("tweet number ", c, " is uploading to the server")
data = json.loads(line)
# insert into tweet
val_tweet = (
data['tweet_id'], data['user_id_str'], data['raw_text'],data['location']['address']['city'],data['date'], data['user_screen_name'])
mycursor.execute(sql_tweet,sql_location, val_tweet)
mydb.commit()
# testing ifthe user already exist
user = "SELECT * FROM tweetuser WHERE id_user = '" + str(data['user_id_str']) + "'"
mycursor.execute(user)
myresult = mycursor.fetchall()
row_count = mycursor.rowcount
if row_count == 0:
val_user = (data['user_id_str'], data['user_screen_name'], data['location']['address']['city'],data['user_followers_count'],
data['user_friends_count'], data['user_statuses_count'])
mycursor.execute(sql_user, val_user)
mydb.commit()
print('done')
here's an example of json file data :
{
"tweet_id":"1261276320878788609",
"date":"Fri May 15 12:44:42 +0000 2020",
"raw_text":"برنامج وطني لدعم المبدعين في مواجهة #كورون",
"geo_source":"user_location",
"location":{
"address":{
"country":"Tunisia",
"country_code":"tn",
"state_district":"غزالة",
"county":"العرب",
"state":"Bizerte"
},
"response":"{'place_id': 235309103, 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright', 'osm_type': 'relation', 'osm_id': 7124228, 'boundingbox': ['37.105957', '37.2033466', '9.4739053', '9.6124953'], 'lat': '37.1551868', 'lon': '9.54834183807249', 'display_name': 'العرب, غزالة, Bizerte, Tunisia', 'class': 'boundary', 'type': 'administrative', 'importance': 0.45, 'icon': '/data/nominatimimages/mapicons/poi_boundary_administrative.p.20.png','address':{'county': 'العرب', 'state_district': 'غزالة', 'state': 'Bizerte', 'country': 'Tunisia', 'country_code': 'tn'}}",
"geohash":"snwg37buskzd",
"query_term":"arab",
"lon":9.54834183807249,
"lat":37.1551868
},
"user_friends_count":61,
"user_description":"I love UAE and his great leadership",
"user_created_at":"Wed Oct 09 11:41:41 +0000 2013",
"user_screen_name":"SikandarMirani",
"user_id_str":"706377881",
"user_verified":false,
"user_statuses_count":50804,
"user_followers_count":946,
"user_location":"Dubai United Arab Emirates"
}
thanks to you guys , i was able to solve the previous error since i didn't check tha data type of the id user it has to be bigint not int since it's a large data .
i had no problem connecting my jsonfile to my database but it got inserted only in tweetuser table but not in tweet table .
the tweet table is empty.
i would appreciate any kind of help thank you
The error
mysql.connector.errors.DataError: 1264 (22003): Out of range value for column 'id_user' at row 1
suggests that the value you are trying to use as the id_user is numerically too large.
Since you haven't posted the table definitions, my guess is you are using MEDIUMINT or SMALLINT or TINYINT for id_user and the actual user ID that you are trying to write into the database is too large for that data type.
In your example user_id_str is 706377881, however, the maximum value for MEDIUMINT is 8388607 and 16777215 (unsigned), respectively.
Check the data types in the table definitions.
You are connecting to your DB, that is not the problem.
The problem is that the user id that you are trying to insert has a length that surpasses the maximum allowed by MySQL for the datatype of that field. See here and here for more info related to your error.
Without knowing the structure of the json, how can I return a json object from the database query? All of the the information is there, I just can't figure out how to build the object.
import MySQLdb
import json
db = MySQLdb.connect( host, user, password, db)
cursor = db.cursor()
cursor.execute( query )
rows = cursor.fetchall()
field_names = [i[0] for i in cursor.description]
json_string = json.dumps( dict(rows) )
print field_names[0]
print field_names[1]
print json_string
db.close()
count
severity
{"321": "7.2", "1": "5.0", "5": "4.3", "7": "6.8", "1447": "9.3", "176": "10.0"}
The json object would look like:
{"data":[{"count":"321","severity":"7.2"},{"count":"1","severity":"5.0"},{"count":"5","severity":"4.3"},{"count":"7","severity":"6.8"},{"count":"1447","severity":"9.3"},{"count":"176","severity":"10.0"}]}
The problem you are encountering happens because you only turn the fetched items into dicts, without their description.
dict in python expects either another dict, or an iterable returning two-item tuples, where for each tuple the first item will be the key, and the second the value.
Since you only fetch two columns, you get the first one (count) as key, and the second (severity) as value for each fetched row.
What you want to do is also combine the descriptions, like so:
json_string = json.dumps([
{description: value for description, value in zip(field_names, row)}
for row in rows])
1- You can use pymsql DictCursor:
import pymysql
connection = pymysql.connect(db="test")
cursor = connection.cursor(pymysql.cursors.DictCursor)
cursor.execute("SELECT ...")
row = cursor.fetchone()
print row["key"]
2- MySQLdb also includes DictCursor that you can use. You need to pass cursorclass=MySQLdb.cursors.DictCursor when making the connection.
import MySQLdb
import MySQLdb.cursors
connection = MySQLdb.connect(db="test",cursorclass=MySQLdb.cursors.DictCursor)
cursor = connection.cursor()
cursor.execute("SELECT ...")
row = cursor.fetchone()
print row["key"]
I got this to work using Collections library, although the code is confusing:
import MySQLdb
import json
import collections
db = MySQLdb.connect(host, user, passwd, db)
cursor = db.cursor()
cursor.execute( query )
rows = cursor.fetchall()
field_names = [i[0] for i in cursor.description]
objects_list = []
for row in rows:
d = collections.OrderedDict()
d[ field_names[0] ] = row[0]
d[ field_names[1] ] = row[1]
objects_list.append(d)
json_string = json.dumps( objects_list )
print json_string
db.close()
[{"count": 176, "severity": "10.0"}, {"count": 1447, "severity": "9.3"}, {"count": 321, "severity": "7.2"}, {"count": 7, "severity": "6.8"}, {"count": 1, "severity": "5.8"}, {"count": 1, "severity": "5.0"}, {"count": 5, "severity": "4.3"}]
I am using Pyodbc to return a number of rows which are dumped into a JSON and sent to a server. I would like to iterate my SQL table and return all records. I am using cursor.fetchall() now, and the program returns one record. As shown below. When I use fetchone an error is returned AttributeError: 'unicode' object has no attribute 'SRNUMBER' and fetchmany returns one record as well. How do I successfully return all records? I am using Python 2.6.7
Code:
import pyodbc
import json
import collections
import requests
connstr = 'DRIVER={SQL Server};SERVER=server;DATABASE=ServiceRequest; UID=SA;PWD=pwd'
conn = pyodbc.connect(connstr)
cursor = conn.cursor()
cursor.execute("""
SELECT SRNUMBER, FirstName, LastName, ParentNumber
FROM MYLA311 """)
rows = cursor.fetchone()
objects_list = []
for row in rows:
d = collections.OrderedDict()
d['SRNUMBER']= row.SRNUMBER
d['FirstName']= row.FirstName
d['LastName']= row.LastName
d['ParentNumber']= row.ParentNumber
objects_list.append(d)
output = {"MetaData": {},
"SRData": d}
print output
j = json.dumps(output)
print json.dumps(output, sort_keys=True, indent=4)`
Output for fetchall and fetchmany:
{
"MetaData": {},
"SRData": {
"FirstName": "MyLAG",
"LastName": "ThreeEleven",
"ParentNumber": "021720151654176723",
"SRNUMBER": "1-3580171"
}
}
Use code from my answer here to build a list of dictionaries for the value of output['SRData'], then JSON encode the output dict as normal.
import pyodbc
import json
connstr = 'DRIVER={SQL Server};SERVER=server;DATABASE=ServiceRequest; UID=SA;PWD=pwd'
conn = pyodbc.connect(connstr)
cursor = conn.cursor()
cursor.execute("""SELECT SRNUMBER, FirstName, LastName, ParentNumber FROM MYLA311""")
# build list of column names to use as dictionary keys from sql results
columns = [column[0] for column in cursor.description]
results = []
for row in cursor.fetchall():
results.append(dict(zip(columns, row)))
output = {"MetaData": {}, "SRData": results}
print(json.dumps(output, sort_keys=True, indent=4))
For starters, the line
objects_list.append(d)
needs to be inside the for loop, not outside.