Parse DEFAULT parameter of a ddl statement using ddlparse - python

I am trying to parse a ddl statement using ddlparse. I am able to parse every field except Default parameter. I followed the below link.
https://github.com/shinichi-takii/ddlparse
Below is the ddl which i am trying to parse.
sample_ddl = """
CREATE TABLE My_Schema.Sample_Table (
Id integer PRIMARY KEY ,
Name varchar(100) NOT NULL DEFAULT 'BASANT',
Total bigint NOT NULL DEFAULT 1 ,
Avg decimal(5,1) NOT NULL,
Created_At date, -- Oracle 'DATE' -> BigQuery 'DATETIME'
UNIQUE (NAME)
);
"""
I can extract all information except DEFAULT parameter with below code :
for col in table.columns.values():
col_info = []
col_info.append("name = {}".format(col.name))
col_info.append("data_type = {}".format(col.data_type))
col_info.append("length = {}".format(col.length))
col_info.append("precision(=length) = {}".format(col.precision))
col_info.append("scale = {}".format(col.scale))
col_info.append("constraint = {}".format(col.constraint))
col_info.append("not_null = {}".format(col.not_null))
col_info.append("PK = {}".format(col.primary_key))
col_info.append("unique = {}".format(col.unique))
col_info.append("bq_legacy_data_type = {}".format(col.bigquery_legacy_data_type))
col_info.append("bq_standard_data_type = {}".format(col.bigquery_standard_data_type))
col_info.append("comment = '{}'".format(col.comment))
col_info.append("description(=comment) = '{}'".format(col.description))
col_info.append("BQ {}".format(col.to_bigquery_field()))
print(" : ".join(col_info))
Can anyone help me how to get the value for Default parameter?

Added supports to get the Default attribute in ddlparse v1.7.0.
for col in table.columns.values():
col_info = {}
col_info["default"] = col.default
print(json.dumps(col_info, indent=2, ensure_ascii=False))

The default constraint of your PRIMARY KEY does not make many sense. In the context of SQL Server you can create the following default constraint but it will not work:
DROP TABLE IF EXISTS dbo.Sample_Table
CREATE TABLE dbo.Sample_Table (
Id integer PRIMARY KEY DEFAULT 'BASANT',
Name varchar(100) NOT NULL, --COMMENT 'User name',
);
INSERT INTO dbo.Sample_Table (Name)
VALUES ('x')
Msg 245, Level 16, State 1, Line 9 Conversion failed when converting
the varchar value 'BASANT' to data type int.
Why? Because you can't set default value string to integer column.
And event if you said it to be number:
DROP TABLE IF EXISTS dbo.Sample_Table
CREATE TABLE dbo.Sample_Table (
Id integer PRIMARY KEY DEFAULT 1,
Name varchar(100) NOT NULL, --COMMENT 'User name',
);
INSERT INTO dbo.Sample_Table (Name)
VALUES ('x');
INSERT INTO dbo.Sample_Table (Name)
VALUES ('x');
It will work only the first time, the second you will get:
Msg 2627, Level 14, State 1, Line 23 Violation of PRIMARY KEY
constraint 'PK__Sample_T__3214EC0759729E5E'. Cannot insert duplicate
key in object 'dbo.Sample_Table'. The duplicate key value is (1).
because the primary key value must be unique. You may be looking for something like this:
ID INTEGER PRIMARY KEY IDENTITY(1,1)

Related

sql insert query with select query using pythonn and streamlit

i have an sql insert query that take values from user input and also insert the ID from another table as foreign key. for this is write the below query but it seems not working.
Status_type table
CREATE TABLE status_type (
ID int(5) NOT NULL,
status varchar(50) NOT NULL
);
info table
CREATE TABLE info (
ID int(11) NOT NULL,
name varchar(50), NULL
nickname varchar(50), NULL
mother_name varchar(50), NULL
birthdate date, NULL
status_type int <==this must be the foreign key for the status_type table
create_date date
);
for the user he has a dropdownlist that retrieve the value from the status_type table in order to select the value that he want to insert into the new record in the info table
where as the info table take int Type because i want to store the ID of the status_type and not the value
code:
query = '''
INSERT INTO info (ID,name,nickname,mother_name,birthdate,t1.status_type,created_date)
VALUES(?,?,?,?,?,?,?)
select t2.ID
from info as t1
INNER JOIN status_type as t2
ON t2.ID = t1.status_type
'''
args = (ID,name,nickname,mother_name,db,status_type,current_date)
cursor = con.cursor()
cursor.execute(query,args)
con.commit()
st.success('Record added Successfully')
the status_type field take an INT type (the ID of the value from another table ).
So when the user insert it insert the value.
What i need is to convert this value into its corresponding ID and store the ID
based on the answer of #Mostafa NZ I modified my query and it becomes like below :
query = '''
INSERT INTO info (ID,name,nickname,mother_name,birthdate,status_type,created_date)
VALUES(?,?,?,?,?,(select status_type.ID
from status_type
where status = ?),?)
'''
args = (ID,name,nickname,mother_name,db,status_type,current_date)
cursor = con.cursor()
cursor.execute(query,args)
con.commit()
st.success('Record added Successfully')
When creating a record, you can do one of these ways.
Receive as input from the user
Specify a default value for the field
INSERT INTO (...) VALUES (? ,? ,1 ,? ,?)
Use a select in the INSERT
INSERT INTO (...) VALUES (? ,? ,(SELECT TOP 1 ID FROM status_type ODER BY ID) ,? ,?)
When INSERT data, you can only enter the names of the destination table fields. t1.status_type is wrong in the following line
INSERT INTO info (ID,name,nickname,mother_name,birthdate,t1.status_type,created_date)

pymysql and BEFORE INSERT trigger problem

I have a DB table which looks like
CREATE TABLE `localquotes` (
`id` bigint NOT NULL AUTO_INCREMENT,
`createTime` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
`tag` varchar(8) NOT NULL,
`monthNum` int NOT NULL,
`flag` float NOT NULL DEFAULT '0',
`optionType` varchar(1) NOT NULL,
`symbol` varchar(30) NOT NULL,
`bid` float DEFAULT NULL,
`ask` float DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;
for which one I have created a trigger
CREATE DEFINER=`user`#`localhost` TRIGGER `localquotes_BEFORE_INSERT` BEFORE INSERT ON `localquotes` FOR EACH ROW BEGIN
SET new.tag=left(symbol,3);
SET new.monthNum=right(left(symbol,5),1);
SET new.optionType=left(right(symbol,11),1);
SET new.flag=right(left(symbol,11),4);
END
which is causing pymysql.err.OperationalError: (1054, "Unknown column 'symbol' in 'field list'") for pymysql on simple INSERT like
insertQuery = "INSERT INTO localquotes (tag,monthNum,flag,optionType,symbol,bid) VALUES (%s,%s,%s,%s,%s,%s)"
insertValues = ('UNKNOWN', d.strftime("%m"), 0, 'X', symbol, bid)
cursor.execute(insertQuery, insertValues)
db.commit()
When I will remove that trigger insert works fine.
Any clue why code is complaining about symbol column, which exists, when there is trigger set for that column value from the insert request?
You must reference all the columns of the row that spawned the trigger with the NEW.* prefix.
SET new.tag=left(new.symbol,3);
And so on.

Syntax error when creating table using pycopg2

def create_table(self, name: str, coulmn: str):
"""This method creates a table in the session.
Args:
name : Name of the table to be created.
coulmn : Column in the table to be created.
Format is "(name data_type condition, name2 data_type2 condition2...)".
"""
self.cur.execute(
query=SQL("CREATE TABLE {name} %s;").format(name=Identifier(name)),
vars=[coulmn]
)
This is method source code.
self.postgres.create_table(name="test", coulmn="(id serial PRIMARY KEY, name text)")
This is Test code.
psycopg2.errors.SyntaxError: 오류(error): 구문 오류(SyntaxError), "'(id serial PRIMARY KEY, name text)'" 부근(near)
LINE 1: CREATE TABLE "test" '(id serial PRIMARY KEY, name text)';
Why am I getting a syntax error?
A first run at this:
import psycopg2
from psycopg2 import sql
name = 'test'
columns = [('id', ' serial PRIMARY KEY,'), ('name', ' text')]
composed_cols = []
for col in columns:
composed_cols.append(sql.Composed([sql.Identifier(col[0]), sql.SQL(col[1])]))
[Composed([Identifier('id'), SQL(' serial PRIMARY KEY,')]),
Composed([Identifier('name'), SQL(' text')])]
qry = sql.SQL("CREATE TABLE {name} ({} {})").format(sql.SQL('').join(composed_cols[0]), sql.SQL('').join(composed_cols[1]), name=sql.Identifier(name))
print(qry.as_string(con))
CREATE TABLE "test" ("id" serial PRIMARY KEY, "name" text)
cur.execute(qry)
con.commit()
\d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+----------------------------------
id | integer | | not null | nextval('test_id_seq'::regclass)
name | text | | |
Indexes:
"test_pkey" PRIMARY KEY, btree (id)
Basically break the column definition into two components, the name/identifier and the type/constraint portion. Then create a list that has these elements composed into the correct sql objects. Build the query string by joining the elements of the list into the {} placeholders for the column name and type/constraint portions respectively. Use named placeholder {name} for the table name. The portion that needs to be paid attention to is the sql.SQL as that is a literal string and if it is coming from an outside source it would need to be validated.
UPDATE
Realized this could be simplified as:
col_str = '(id serial PRIMARY KEY, name text)'
qry = sql.SQL("CREATE TABLE {name} {cols} ").format(cols=sql.SQL(col_str), name=sql.Identifier(name))
print(qry.as_string(con))
CREATE TABLE "test" (id serial PRIMARY KEY, name text)

MySQL datetime column WHERE col IS NULL fails

I cannot get my very basic SQL query to work as it returns 0 values despite the fact that there are clearly nulls
query
SELECT
*
FROM
leads AS l
JOIN closes c ON l.id = c.lead_id
WHERE
c.close_date IS NULL
DDL
CREATE TABLE closes
(
id INT AUTO_INCREMENT
PRIMARY KEY,
lead_id INT NOT NULL,
close_date DATETIME NULL,
close_type VARCHAR(255) NULL,
primary_agent VARCHAR(255) NULL,
price FLOAT NULL,
gross_commission FLOAT NULL,
company_dollar FLOAT NULL,
address VARCHAR(255) NULL,
city VARCHAR(255) NULL,
state VARCHAR(10) NULL,
zip VARCHAR(10) NULL,
CONSTRAINT closes_ibfk_1
FOREIGN KEY (lead_id) REFERENCES leads (id)
)
ENGINE = InnoDB;
CREATE INDEX lead_id
ON closes (lead_id);
I should mention that I am inserting the data with a python web scraper and SQLAlchemy. If the data is not scraped it will be None on insert.
Here is a screenshot of datagrip showing a null value in the row
EDIT
Alright so I went ahead and ran the following on some of the entries in the table where the value was already <null>
UPDATE closes
SET close_date = NULL
WHERE
lead_id = <INTEGERVAL>
;
What is interesting now is that when running the original query I do actually return the 2 records that I ran the update query for (the expected outcome). This would lead me to believe that the issues is with how my SQLAlchemy model is mapping the values on insert.
models.py
class Close(db.Model, ItemMixin):
__tablename__ = 'closes'
id = db.Column(db.Integer, primary_key=True)
lead_id = db.Column(db.Integer, db.ForeignKey('leads.id'), nullable=False)
close_date = db.Column(db.DateTime)
close_type = db.Column(db.String(255))
primary_agent = db.Column(db.String(255))
price = db.Column(db.Float)
gross_commission = db.Column(db.Float)
company_dollar = db.Column(db.Float)
address = db.Column(db.String(255))
city = db.Column(db.String(255))
state = db.Column(db.String(10))
zip = db.Column(db.String(10))
def __init__(self, item):
self.build_from_item(item)
def build_from_item(self, item):
for k, v in item.items():
setattr(self, k, v)
But I am fairly confident the value is a python None in the event no value is scraped from the website. My understanding is that SQLAlchemy would map a None to NULL on insert and given that nullable=True is the default setting, which can seen on the generated DDL, I am still at a loss as to why it appears to be NULL when in reality it is not behaving that way.
EDIT 2
Only place where I think the issue would be happening is where my spider actually scrapes the data and assigns it to the Item which is shown below
closes.py
# item['close_date'] = None at this point
try:
item['close_date'] = arrow.get(item['close_date'], 'MMM D, YYYY').format('YYYY-MM-DD')
except ParserError as e:
# Maybe item['close_date'] = None here?
spider.logger.error(f'Parse error: {item["close_date"]} - {e}')
In the python code I've written this would appear to be the place where the issue would arise. But if arrow.get throws an exception the value of item['close_date'] should still be None. If that is not the case and even if it is it does not explain why it appears that the record value is NULL even thought it does not behave like it is.
I'm guessing that you're having an issue with the join, not the NULL value. The query below returns 1 result for me. More info about your data, the software used for querying (I tested with SQL Yog), and applicable versions might help.
EDIT
It could be that you're having issues with MySQL's 'zero date'.
https://dev.mysql.com/doc/refman/5.7/en/date-and-time-types.html
MySQL permits you to store a “zero” value of '0000-00-00' as a “dummy
date.” This is in some cases more convenient than using NULL values,
and uses less data and index space. To disallow '0000-00-00', enable
the NO_ZERO_DATE mode.
I've updated the SQL data below to include a zero date in the INSERT and SELECT's WHERE.
DROP TABLE IF EXISTS closes;
DROP TABLE IF EXISTS leads;
CREATE TABLE leads (
id INT(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (id)
) ENGINE=INNODB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;
INSERT INTO leads(id) VALUES (1),(2),(3);
CREATE TABLE closes (
id INT(11) NOT NULL AUTO_INCREMENT,
lead_id INT(11) NOT NULL,
close_date DATETIME DEFAULT NULL,
close_type VARCHAR(255) DEFAULT NULL,
primary_agent VARCHAR(255) DEFAULT NULL,
price FLOAT DEFAULT NULL,
gross_commission FLOAT DEFAULT NULL,
company_dollar FLOAT DEFAULT NULL,
address VARCHAR(255) DEFAULT NULL,
city VARCHAR(255) DEFAULT NULL,
state VARCHAR(10) DEFAULT NULL,
zip VARCHAR(10) DEFAULT NULL,
PRIMARY KEY (id),
KEY lead_id (lead_id),
CONSTRAINT closes_ibfk_1 FOREIGN KEY (lead_id) REFERENCES leads (id)
) ENGINE=INNODB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
INSERT INTO closes(id,lead_id,close_date,close_type,primary_agent,price,gross_commission,company_dollar,address,city,state,zip)
VALUES
(1,3,'0000-00-0000',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL),
(2,1,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL),
(3,2,'2018-01-09 17:01:44',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL);
SELECT
*
FROM
leads AS l
JOIN closes c ON l.id = c.lead_id
WHERE
c.close_date IS NULL OR c.close_date = '0000-00-00';

MariaDB duplicates being inserted

I have the following Python code to check if a MariaDB record exists already, and then inserting. However, I am having duplicates being inserted. Is there something wrong with the code, or is there a better way to do it? I'm new to using Python-MariaDB.
import mysql.connector as mariadb
from hashlib import sha1
mariadb_connection = mariadb.connect(user='root', password='', database='tweets_db')
# The values below are retrieved from Twitter API using Tweepy
# For simplicity, I've provided some sample values
id = '1a23bas'
tweet = 'Clear skies'
longitude = -84.361549
latitude = 34.022003
created_at = '2017-09-27'
collected_at = '2017-09-27'
collection_type = 'stream'
lang = 'us-en'
place_name = 'Roswell'
country_code = 'USA'
cronjob_tag = 'None'
user_id = '23abask'
user_name = 'tsoukalos'
user_geoenabled = 0
user_lang = 'us-en'
user_location = 'Roswell'
user_timezone = 'American/Eastern'
user_verified = 1
tweet_hash = sha1(tweet).hexdigest()
cursor = mariadb_connection.cursor(buffered=True)
cursor.execute("SELECT Count(id) FROM tweets WHERE tweet_hash = %s", (tweet_hash,))
if cursor.fetchone()[0] == 0:
cursor.execute("INSERT INTO tweets(id,tweet,tweet_hash,longitude,latitude,created_at,collected_at,collection_type,lang,place_name,country_code,cronjob_tag,user_id,user_name,user_geoenabled,user_lang,user_location,user_timezone,user_verified) VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)", (id,tweet,tweet_hash,longitude,latitude,created_at,collected_at,collection_type,lang,place_name,country_code,cronjob_tag,user_id,user_name,user_geoenabled,user_lang,user_location,user_timezone,user_verified))
mariadb_connection.commit()
cursor.close()
else:
cursor.close()
return
Below is the code for the table.
CREATE TABLE tweets (
id VARCHAR(255) NOT NULL,
tweet VARCHAR(255) NOT NULL,
tweet_hash VARCHAR(255) DEFAULT NULL,
longitude FLOAT DEFAULT NULL,
latitude FLOAT DEFAULT NULL,
created_at DATETIME DEFAULT NULL,
collected_at DATETIME DEFAULT NULL,
collection_type enum('stream','search') DEFAULT NULL,
lang VARCHAR(10) DEFAULT NULL,
place_name VARCHAR(255) DEFAULT NULL,
country_code VARCHAR(5) DEFAULT NULL,
cronjob_tag VARCHAR(255) DEFAULT NULL,
user_id VARCHAR(255) DEFAULT NULL,
user_name VARCHAR(20) DEFAULT NULL,
user_geoenabled TINYINT(1) DEFAULT NULL,
user_lang VARCHAR(10) DEFAULT NULL,
user_location VARCHAR(255) DEFAULT NULL,
user_timezone VARCHAR(100) DEFAULT NULL,
user_verified TINYINT(1) DEFAULT NULL
);
Add unique constant to tweet_has filed.
alter table tweets modify tweet_hash varchar(255) UNIQUE ;
Every table should have a PRIMARY KEY. Is id supposed to be that? (The CREATE TABLE is not saying so.) A PK is, by definition, UNIQUE, so that would cause an error on inserting a duplicate.
Meanwhile:
Why have a tweet_hash? Simply index tweet.
Don't say 255 when there are specific limits smaller than that.
user_id and user_name should be in another "lookup" table, not both in this table.
Does user_verified belong with the user? Or with each tweet?
If you are expecting millions of tweets, this table needs to be made smaller and indexed -- else you will run into performance problems.

Categories

Resources