How to ignore start on input in hive insert query

How to ignore start on input in hive insert query - python

I have data format in tab separated
State:ca city:california population:1M
I want to create DB, when I do insert I should ignore "state:" , "city:" and "poulation" and I want to insert state into state database with population and city into city table with population.
There will be 2 tables then one with state and population the other with city and population
CREATE EXTERNAL TABLE IF NOT EXISTS CSP.original
(
st STRING COMMENT 'State',
ct STRING COMMENT 'City',
po STRING COMMENT 'Population'
)
COMMENT 'Original Table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
this didn't work. It added comment but it didn't ignore.
And I also I want to create 2 tables for state and city. Can anyone please help me?

You would have to create external table first.
Step1:
CREATE EXTERNAL TABLE all_info (state STRING, population INT) PARTITIONED BY (date STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t;
Step2:
CREATE TABLE IF NOT EXISTS state (state string, population INT) PARTITIONED BY (date string);
CREATE TABLE IF NOT EXISTS city (city string, population INT) PARTITIONED BY (date string);
Step3:
INSERT OVERWRITE TABLE state
PARTITION (date = ‘201707076’)
SELECT *
FROM all_info
WHERE date = ‘20170706’ AND
instr(state, ‘state:’) = 1;
INSERT OVERWRITE TABLE city
PARTITION (date = ‘201707076’)
SELECT *
FROM all_info
WHERE date = ‘20170706’ AND
instr(state, ‘city:’) = 1;

Related

Adding data to an already existing partition in Postgres using Python

I need to parition a table on 2 columns, and insert records to an already existing partition of a Postgres table using Python (Psycopg2).
I am very new to Python and Postgres, hence struggling a bit with a challenging requirement. I searched the internet and found that Postgres does not support Partitioning by List on multiple columns.
I have 2 tables - "cust_details_curr" & "cust_details_hist". Both the tables will have the same structure. However the "_hist" table needs to be partitioned on 2 columns - 'area_code' and 'eff_date'
CREATE TABLE cust_details_curr
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
);
CREATE TABLE cust_details_hist
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
); -- Needs to be partitioned on area_code and eff_date
The "area_code" is passed as an argument to the process.
The column "eff_date" is supposed to contain the current process run
date.
There are multiple "area_codes" to be passed as an argument to the program - (there are 5 values - A501, A502, A503, A504, X101) all of which will run sequentially on the same day (i.e eff_date will be the same for all the runs).
The requirement is that whenever the "curr" table is being loaded for a specific "area_code", the program must first copy the data already existing in the "curr" table (for that specific area_code) into a partition of "eff_date" and that specific "area_code" of the "_hist" table. Next, the data pertaining to the same area_code in "curr" table must be deleted, and new data for that area_code will be loaded with the current process date in the eff_date column.
However, the process should run for 1 area_code at a time and hence the process will run for multiple area_codes on the same day. (which means they will all have eff_date = same current date)
So my question is -
how to partition the _hist table by 2 columns - area_code and
eff_date ?
Also, once a partition of the eff_date is created (assume 2022-08-01)
and loaded in the _hist table for one of the area_codes (assume
A501), the next job in the sequence will need to load the data for
another area_code (say A502) to load into the same eff_date partition
(since eff_date is same for both the process instances as they are
executed on the same day ) How can I insert data into the existing
partition ?
I devised the following (crude) way to handle the requirement when it was only for a single partition column - "eff_date". For that I would execute the sql queries below in order to somewhat implement the initial requirements for a single eff_date and area_code value.
However, I am struggling to figure out how to implement the same with multiple area_codes as a second partition column in the _hist table, And how to insert data into an already existing date partition (eff_dt), loaded by a previous area_code instance.
CREATE TABLE cust_details_curr
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
);
CREATE TABLE cust_details_hist
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
) PARTITIONED BY LIST (eff_dt); -- Partitioned by List
table_name = "cust_details_curr"
table_name_hist = table_name + '_hist'
e = datetime.now()
eff_date = e.strftime("%Y-%m-%d")
dttime = e.strftime("%Y%m%d_%H%M%S")
table_name_curr_part = table_name_part + '_' + str(dttime)
query_count = f"SELECT count(*) as cnt from {table_name} where area_code = '{area_code}'; "
query_date = f"SELECT distinct eff_date as eff_dt from {table_name} where area_code = '{area_code}';"
cur.execute(quey_date)
eff_date = cur.fetchone()[0]
query_crt = f"CREATE TABLE {table_name_curr_part} LIKE {table_name_part} INCLUDING DEFAULTS);"
query_ins_part = f"INSERT INTO {table_name_curr_part} SELECT * FROM {table_name} where area_code = '{area_code}' AND eff_dt = '{eff_date}';"
query_add_part = f"ALTER TABLE {table_name_part} ATTACH PARTITION {table_name_curr_part} FOR VALUES IN (DATE '{eff_date}') ;"
query_del = f"DELETE FROM {table_name} WHERE area_code = '{area_code}';"
query_ins_curr = f"INSERT INTO {table_name} (cust_id, area_code, cust_name, cust_age, eff_dt) VALUES %s"
cur.execute(....)
# Program trimmed in the interest of space
Can anyone please help me how to implement a workaround for the above requirements with multiple partition columns. How can I load data to an already existing partition ?
Happy to provide additional information. Any help is appreciated.

Insert unique auto-increment ID on python to sql?

I'd like to insert an Order_ID to make each row unique using python and pyodbc to SQL Server.
Currently, my code is:
name = input("Your name")
def connectiontoSQL(order_id,name):
query = f'''\
insert into order (Order_ID, Name)
values('{order_id}','{name}')'''
return (execute_query_commit(conn,query))
If my table in SQL database is empty and I'd like it to add a order_ID by 1 every time I execute,
How should I code order_id in Python such that it will automatically create the first order_ID as OD001, and if I execute another time, it would create OD002?

You can create a INT Identity column as your primary key and add a computed column that has the order number that you display in your application.
create table Orders
(
[OrderId] [int] IDENTITY(0,1) NOT NULL,
[OrderNumber] as 'OD'+ right( '00000' + cast(OrderId as varchar(6)) , 6) ,
[OrderDate] date,
PRIMARY KEY CLUSTERED
(
[OrderId] ASC
)
)

how to query two tables , some items in table 1 do not an association with table2 ?

my database schema is below, each id from email table is associated with an id from attachment table, some do not. for example id 1 from email table have attachment(s) from attachments table (3 entries of id 1 for example), while id 2 from email does not have an id2 in attachments. I tried to query the result using the following, but only attachments field showed. In essence, I want the result to show everything from date x to date y, whether it has an attachment or not.
SELECT CONCAT(email.from_fld , date_fld) AS name, email.body_fld, attachments.attach_fld
FROM email
INNER JOIN attachments
ON email.id = attachments.id
WHERE date_fld >= "2012-01-01 00:00:00" AND date_fld <= "2013-01-01 23:59:59" ORDER BY date_fld ASC;
This is my database Schema
email table
id INT
from_fld VARCHAR
to_fld VARCHAR
subj_fld MEDIUMTEXT
date_fld DATETIME
mailbox VARCHAR
mailto VARCHAR
body_fld LONGTEXT
numAttach INT
attachNames MEDIUMTEXT
attachText MEDIUMTEXT
headings MEDIUMTEXT
attachments table
id INT
type_fld VARCHAR
filename_fld VARCHAR
encode_fld INT
attach_fld LONGBLOB
origemail table
id INT
orig_fld LONGBLOB
tags table
id INT
cat_fld VARCHAR
key_fld VARCHAR
priority_fld INT
notes_fld MEDIUMTEXT

You want a left join if you want everything in the first table in the from clause:
SELECT CONCAT(e.from_fld , e.date_fld) AS name, e.body_fld, a.attach_fld
FROM email e LEFT JOIN
attachments a
ON e.id = a.id
WHERE e.date_fld >= '2012-01-01' AND
e.date_fld < '2013-01-02'
ORDER BY date_fld ASC;
Notice the other changes:
I introduced table aliases. These make the query easier to write and read.
I qualified all column names, so it is clear where the columns come form.
I use the ANSI standard delimiter for string and date constants (a single quote rather than a double quote).
I simplified the date comparisons.

Try using 'full join' instead of inner join

How to create variable columns in MYSQL table using python

How to create variable columns in a table according to the user input?
In other words, I have a table that contains the ID of students, but I need to create variable columns for weeks according to user's choice. For example if the number of weeks chosen by the user is 2 then we create a table like this
cur.execute("""CREATE TABLE Attendance
(
Week1 int,
Week2 int,
ID int primary key ,
)""")

You can just build the column defs as a string in Python:
num_weeks = 4
week_column_defs = ', '.join("Week{} int".format(week_num) for week_num in range(1, num_weeks+1))
command = """CREATE TABLE Attendance
(
{weeks} ,
ID int primary key ,
)""".format(weeks=week_column_defs)
cur.execute(command)

sqlite3.OperationalError: no such column: USA

I am moving data from one database to another with the following statment
cursor.execute("\
INSERT INTO table (ID, Country)\
SELECT ID, Country\
FROM database.t\
WHERE Country = `USA`\
GROUP BY Country\
;")
But I get the error
sqlite3.OperationalError: no such column: USA
Can't figure out why

Use single quotes, not backticks, when referring to a string literal in your SQLite query:
INSERT INTO table (ID, Country)
SELECT ID, Country
FROM database.t
WHERE Country = 'USA'
GROUP BY Country

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to ignore start on input in hive insert query - python

Related

Adding data to an already existing partition in Postgres using Python

Insert unique auto-increment ID on python to sql?

how to query two tables , some items in table 1 do not an association with table2 ?

How to create variable columns in MYSQL table using python

sqlite3.OperationalError: no such column: USA

Categories

Resources