I got below error when use the Sqlalchemy to insert the data into Snowflake warehouse, any idea?
Error:
Failed to rewrite multi-row insert [SQL: 'INSERT INTO widgets (id, name, type)
SELECT %(id)s AS anon_1, %(name)s AS anon_2, widgets.id \nFROM widgets \nWHERE widgets.type = %(card_id)s']
[parameters: ({'id': 2, 'name': 'Lychee', 'card_id': 1}, {'id': 3, 'name': 'testing', 'card_id': 2})]
Code:
from sqlalchemy import *
from snowflake.sqlalchemy import URL
# Helper function for local debugging
def createSQLAlchemyEngine():
url = URL(
account='',
user='',
password='',
database='db',
schema='',
warehouse='',
role='',
proxy_host='',
proxy_port=8099
)
engine = create_engine(url)
return engine
conn = createSQLAlchemyEngine()
# Construct database
metadata = MetaData()
widgetTypes = Table('widgetTypes', metadata,
Column('id', INTEGER(), nullable=True),
Column('type', VARCHAR(), nullable=True))
widgets = Table('widgets', metadata,
Column('id', INTEGER(), nullable=True),
Column('name', VARCHAR(), nullable=True),
Column('type', INTEGER(), nullable=True))
engine = conn
metadata.create_all(engine)
# Connect and populate db for testing
conn = engine.connect()
sel = select([bindparam('id'), bindparam('name'), widgets.c.id]).where(widgets.c.type == bindparam('card_id'))
ins = widgets.insert().from_select(['id', 'name', 'type'], sel)
conn.execute(ins, [
# {'name': 'Melon', 'type_name': 'Squidgy'},
{'id': 2, 'name': 'Lychee', 'card_id' : 1 },
{'id': 3, 'name': 'testing', 'card_id': 2}
])
conn.close()
Basically, what Im trying to do this like below, but the Snowflake doesn't support this syntax.
insert into tableXX
values('Z',
(select max(b) +1 from tableXX a where a.CD = 'I'), 'val1', 'val2')
So, I have to do with something like this, but got the above error.
insert into tableXX
select 'val1', 'val2', ifnull(max(c), 0) + 1 from tableXX where a.CD = 'I',
select 'val1', 'val2', ifnull(max(c), 0) + 1 from tableXX where a.CD = 'I',
Target:
The logic behind the code is I want to update the sequence_id base on the max of existing sequence_id of record which has the same 'CD' with the new record.
Related
I have a problem inserting data into a nested column.
I use map_imperatively.
Columns of a different type are filled. Only the nested column remains empty.
My code:
import attr
from sqlalchemy import (
create_engine, Column, MetaData, insert
)
from sqlalchemy.orm import registry
from clickhouse_sqlalchemy import (
Table, make_session, types, engines,
)
uri = 'clickhouse+native://localhost/default'
engine = create_engine(uri)
session = make_session(engine)
metadata = MetaData(bind=engine)
mapper = registry()
#attr.dataclass
class NestedAttr:
key1: int
key2: int
key3: int
#attr.dataclass
class NestedInObject:
id: int
name: str
nested_attr: NestedAttr
nested_test = Table(
'nested_test', metadata,
Column(name='id', type_=types.Int8, primary_key=True),
Column(name='name', type_=types.String),
Column(
name='nested_attr',
type_=types.Nested(
Column(name='key1', type_=types.Int8),
Column(name='key2', type_=types.Int8),
Column(name='key3', type_=types.Int8),
)
),
engines.Memory()
)
mapper.map_imperatively(
NestedInObject,
nested_test
)
nested_test.create()
values = [
{
'id': 1,
'name': 'name',
'nested_attr.key1': [1, 2],
'nested_attr.key2': [1, 2],
'nested_attr.key3': [1, 2],
}
]
session.execute(insert(NestedInObject), values)
I don't get an error, but the nested columns are empty.
I tried different data. Checked the data type in the database. I don't understand why the columns are left empty.
This works, but I have to do 30 of 100 felids of the response. Is there a better way?
for record in data:
record["lastName"] = record["lastName"].replace("'","''")
record["lastName"] = record["lastName"].replace("'","''")
cursor.execute("Insert Into emp_temp (employeeId, firstName, lastName) values ('" + record["employeeId"] +"','"+ record["firstName"] + "','"+ record["lastName"] +"')")
cursor.commit()
cursor.close()
conn.close()```
Assuming that json.loads() is giving you a simple list of dict objects then that is precisely the format that can be directly consumed by SQLAlchemy:
# https://stackoverflow.com/q/67129218/2144390
import json
import sqlalchemy as sa
response_text = '''\
[{"employeeId": 1, "firstName": "Gord", "lastName": "Thompson"},
{"employeeId": 2, "firstName": "Bob", "lastName": "Loblaw"}]'''
data = json.loads(response_text)
print(type(data)) # <class 'list'>
print(type(data[0])) # <class 'dict'>
engine = sa.create_engine("mssql+pyodbc://#mssqlLocal64")
emp_temp = sa.Table("emp_temp", sa.MetaData(), autoload_with=engine)
with engine.begin() as conn:
conn.execute(emp_temp.insert(), data)
# check results
with engine.begin() as conn:
results = conn.execute(sa.text("SELECT * FROM emp_temp")).fetchall()
print(results)
# [(1, 'Gord', 'Thompson'), (2, 'Bob', 'Loblaw')]
Having a dataframe in the following way:
word classification counter
0 house noun 2
1 the article 2
2 white adjective 1
3 yellow adjective 1
I would like to store in Postgresql table with the following definition:
CREATE TABLE public.word_classification (
id SERIAL,
word character varying(100),
classification character varying(10),
counter integer,
start_date date,
end_date date
);
ALTER TABLE public.word_classification OWNER TO postgres;
The current basic configuration I have is as follows:
from sqlalchemy import create_engine
import pandas as pd
# Postgres username, password, and database name
POSTGRES_ADDRESS = 'localhost' ## INSERT YOUR DB ADDRESS IF IT'S NOT ON PANOPLY
POSTGRES_PORT = '5432'
POSTGRES_USERNAME = 'postgres' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES USERNAME
POSTGRES_PASSWORD = 'BVict31C' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES PASSWORD
POSTGRES_DBNAME = 'local-sandbox-dev' ## CHANGE THIS TO YOUR DATABASE NAME
# A long string that contains the necessary Postgres login information
postgres_str = ('postgresql://{username}:{password}#{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,password=POSTGRES_PASSWORD,ipaddress=POSTGRES_ADDRESS,port=POSTGRES_PORT,dbname=POSTGRES_DBNAME))
# Create the connection
cnx = create_engine(postgres_str)
data=[['the','article',0],['house','noun',1],['yellow','adjective',2],
['the','article',4],['house','noun',5],['white','adjective',6]]
df = pd.DataFrame(data, columns=['word','classification','position'])
df_db = pd.DataFrame(columns=['word','classification','counter','start_date','end_date'])
count_series=df.groupby(['word','classification']).size()
new_df = count_series.to_frame(name = 'counter').reset_index()
df_db = new_df.to_sql('word_classification',cnx,if_exists='append',chunksize=1000)
I would like to insert into the table as I am able to do with SQL syntax:
insert into word_classification(word, classification, counter)values('hello','world',1);
Currently, I am getting an error when inserting into the table because I am passing the index:
(psycopg2.errors.UndefinedColumn) column "index" of relation "word_classification" does not exist
LINE 1: INSERT INTO word_classification (index, word, classification...
^
[SQL: INSERT INTO word_classification (index, word, classification, counter) VALUES (%(index)s, %(word)s, %(classification)s, %(counter)s)]
[parameters: ({'index': 0, 'word': 'house', 'classification': 'noun', 'counter': 2}, {'index': 1, 'word': 'the', 'classification': 'article', 'counter': 2}, {'index': 2, 'word': 'white', 'classification': 'adjective', 'counter': 1}, {'index': 3, 'word': 'yellow', 'classification': 'adjective', 'counter': 1})]
I have been searching for ways to get rid of passing the index with no luck.
Thanks for your help
Turn off index when storing in database as follows:
df_db = new_df.to_sql('word_classification',cnx,if_exists='append',chunksize=1000, index=False)
I want to collect the data from Mysql table and converet it to avro format using python.
Consider this table in mysql
dept_no, dept_name
'd001', 'Marketing'
'd002', 'Finance'
'd003', 'Human Resources'
'd004', 'Production'
'd005', 'Development'
'd006', 'Quality Management'
'd007', 'Sales'
'd008', 'Research'
'd009', 'Customer Service'
mycursor.execute('select * from employees')
results = mycursor.fetchall()
when i fetch the results using above query
i get the results in a class-tuple format.
Where as to convert to Avro format, Schema as to be defined in the following format:
By hard-coding we can achieve the below format
The Following code is to generate avro file.
schema = {
'doc': 'A weather reading.',
'name': 'Weather',
'namespace': 'test',
'type': 'record',
'fields': [
{'name': 'dept_no', 'type': 'string'},
{'name': 'dept_name', 'type': 'string'},
],
}
And the records as
records = [
{u'dept_no': u'd001', u'dept_name': 'Marketing'},
{u'dept_no': u'd002', u'dept_name': 'Finance'},
{u'dept_no': u'd003', u'dept_name': 'Human Resources'},
{u'dept_no': u'd004', u'dept_name': 'Production'},
]
The Question Here is How do i MAP the schema and the data in the above format using Python dynamically.
import mysql.connector
mydb = mysql.connector.connect(
host="********************",
user="***********",
passwd="**********",
database='***********'
)
def byte_to_string(x):
temp_table_list = []
for row in x:
table = row[0].decode()
temp_table_list.append(table)
return temp_table_list
mycursor = mydb.cursor()
#Query to list all the tables
mycursor.execute("show tables")
r = mycursor.fetchall()
r = byte_to_string(r)
print(r)
x = len(r)
#Fetch all the records from table EMPLOYEES using Select *
mycursor.execute('select * from employees')
results = mycursor.fetchall()
print(type(results))
print(results)
#Displays Data of table employee record by record
for i in results:
print(i)
print(type(i))
#Fectching data from 2nd table departments
mycursor.execute('select * from departments')
data=[i[0] for i in mycursor.fetchall()]
mycursor.execute('select * from departments')
data1=[i[1] for i in mycursor.fetchall()]
print(data)
print(data1)
#zipbObj = zip(data,data1)
#dictOfWords = dict(zipbObj)
#print(dictOfWords)
mycursor.execute('SELECT `COLUMN_NAME`\
FROM `INFORMATION_SCHEMA`.`COLUMNS`\
WHERE `TABLE_SCHEMA`="triggerdb1"\
AND `TABLE_NAME`="departments"')
#Fetching the column names of the table as keys
keys=[i[0] for i in mycursor.fetchall()]
print(keys)
'''
zipbObj = zip(column_schema,data1)
dictOfWords = dict(zipbObj)
print(dictOfWords)
'''
#Finally we get the Header as key and records as values in a dict format
abc = {}
abc[keys[0]] = data1
abc[keys[1]] = data
print(abc)
The result is in the form of
#print(data)
>>>'d001', 'd002', 'd003', 'd004', 'd005', 'd006', 'd007', 'd008', 'd009']
#print(data1)
>>>['Marketing', 'Finance', 'Human Resources', 'Production', 'Development', 'Quality Management', 'Sales', 'Research', 'Customer Service']
#print(keys)
>>>['dept_no', 'dept_name']
#print(abc)
>>>{'dept_no': ['Marketing', 'Finance', 'Human Resources', 'Production', 'Development', 'Quality Management', 'Sales', 'Research', 'Customer Service'], 'dept_name': ['d001', 'd002', 'd003', 'd004', 'd005', 'd006', 'd007', 'd008', 'd009']}
The Question Here is How do i dynamically MAP the schema and the data from the resulting class Tuple/Dictionary and covert to avro using Python.
Thanks in Advance!
I have a set of data that a user needs to query using their own query string. The current solution creates a temporary in-memory sqlite database that the query is run against.
The dataset is a list of "flat" dictionaries, i.e. there is no nested data. The query string does not need to be SQL, but it should be simple to define using an existing query framework.
It needs to support ordering (ascending, descending, custom) and filtering.
The purpose of this question is to get a range of different solutions that might work for this use case.
import sqlite3
items = [
{'id': 1},
{'id': 2, 'description': 'This is a description'},
{'id': 3, 'comment': 'This is a comment'},
{'id': 4, 'height': 1.78}
]
# Assemble temporary sqlite database
conn = sqlite3.connect(':memory:')
cur = conn.cursor()
knownTypes = { "id": "real", "height": "real", "comment": "text" }
allKeys = list(set().union(*(d.keys() for d in items)))
allTypes = list(knownTypes.get(k, "text") for k in allKeys)
createTable_query = "CREATE TABLE data ({});".format(", ".join(["{} {}".format(x[0], x[1]) for x in zip(allKeys, allTypes)]))
cur.execute(createTable_query)
conn.commit()
qs = ["?" for i in range(len(allKeys))]
insertRow_query = "INSERT INTO data VALUES ({});".format(", ".join(qs))
for p in items:
vals = list([p.get(k, None) for k in allKeys])
cur.execute(insertRow_query, vals)
conn.commit()
# modify user query here
theUserQuery = "SELECT * FROM data"
# Get data from query
data = [row for row in cur.execute(theUserQuery)]
YAQL is what I'm looking for.
It doesn't do SQL, but it does execute a query string - which is a simple way to do complex user-defined sorting and filtering.
There's a library called litebox that does what you want. It is backed by SQLite.
from litebox import LiteBox
items = [
{'id': 1},
{'id': 2, 'description': 'This is a description'},
{'id': 3, 'comment': 'This is a comment'},
{'id': 4, 'height': 1.78}
]
types = {"id": int, "height": float, "comment": str}
lb = LiteBox(items, types)
lb.find("height > 1.5")
Result: [{'id': 4, 'height': 1.78}]