I have a database stored in a GridDB container. The main table in the database contains a column with outdated data. I would like to replace the whole column with a new column from another table (with the same total number of rows). Is there any way I could do that with Python?
For example, the whole process looks thus:
-- old table
column_0 | column_1 | old_column
------------+-----------+------------
1344 | Max | 5263.42525
1345 | John | 1465.41234
1346 | Alex | 8773.12344
1347 | Matthew | 5489.23522
1348 | Mark | 9874.31423
-- replacement
col_0 | updated
---------+------------
4242 | 3553.42824
4243 | 8942.98731
4244 | 1424.36742
4245 | 7642.75352
4246 | 2844.92468
-- output
column_0 | column_1 | old_column
------------+-----------+------------
1344 | Max | 3553.42824
1345 | John | 8942.98731
1346 | Alex | 1424.36742
1347 | Matthew | 7642.75352
1348 | Mark | 2844.92468
I have tried to replace the values one by one but I want something faster and kind of automated.
What you are trying to do here is what is called a join in SQL. Assuming that rows between tables are matched according to their rank, you can join two subqueries with an rank() window function and then pick the wanted columns from both sets.
SELECT column_0, column_1, updated AS old_column
FROM
(SELECT rank() over() r, * FROM old_table ORDER BY column_0) left
JOIN (SELECT rank() over() r, * FROM replacement ORDER BY col_0) right
ON left.r = right.r
Each sub-queries will add a new counter for each row with the rank function, rows will be matched together according to that rank value producing a row containing columns from both sub-queries. The top level will then only pick the wanted column with an appropriate AS clause for the wanted naming.
Each subquery
I have two tables existing_students and old_students with data in them, now i want to introduce a new auto increment column say alumni_number and assign a number to all students(old and existing). First i start with old_students table say having 100 rows
ALTER TABLE OLD_STUDENTS ADD ALUMNI_NUMBER INT UNSIGNED NOT NULL AUTO_INCREMENT, ADD index (ALUMNI_NUMBER);
it will assign 1 to 100 to each row....now i want to start the count from 101 in existing_students....is it possible to allocated number 101,102 ...automatically to rows in existing_students table?
Any pointers will be helpful
Demo:
mysql> select * from mytable;
+----------+
| name |
+----------+
| Harry |
| Ron |
| Hermione |
+----------+
mysql> alter table mytable
add column id int unsigned not null auto_increment,
add key (id),
auto_increment=101;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> select * from mytable;
+----------+-----+
| name | id |
+----------+-----+
| Harry | 101 |
| Ron | 102 |
| Hermione | 103 |
+----------+-----+
I have 2 tables that look like this:
Table 1
| ID | Tel. | Name |
|:--:|:----:|:-------:|
| 1 | 1234 | Denis |
| 2 | 4567 | Michael |
| 3 | 3425 | Peter |
| 4 | 3242 | Mary |
Table 2
| ID | Contact Date |
|:--:|:------------:|
| 1 | 2014-05-01 |
| 2 | 2003-01-05 |
| 3 | 2020-01-10 |
| 4 | NULL |
Now I want to Compare the First Table with the second table with the ID column to look if contact 1 is already in the list where people were contacted. After this, I want to write the Contact Date into the first table to see the last contact date in the main table.
How would I do this?
Thanks for any answers!!
Here's a solution that might get your interest using MySQL; Then if you want to implement it into Python will be easy.
First, Let's create our T1 Table AS CODED
create table T1(
id integer primary key,
tel integer,
name varchar(100)
);
create table T2(
id integer primary key,
contactDate date
);
Second Table T2
create table T2(
id integer primary key,
contactDate date
);
Insertion Of T1 AND T2:
-- Table 'T1' Insertion
insert into T1(id, tel, name) VALUES
(1, 1234, "Denis"),
(2, 4567, "Michael"),
(3, 3425,"Peter"),
(4, 3242, "Mary");
-- Table 'T2' Insertion
insert into T2(id, contactDate) VALUES
(1, 20140105),
(2, 20030105),
(3, 20201001),
(4, Null);
Then, Create T3 With the Select Statement for both tables using INNER JOIN For Joins Results
CREATE TABLE T3 AS
SELECT T1.id, T1.name, T1.tel, T2.contactDate
FROM T1
INNER JOIN T2 ON T1.id=T2.id;
Then, SELECT to check the results
select * from T3;
OUTPUT
| id | name | tel | contactDate |
|:--:|:-------:|------|-------------|
| 1 | Denis | 1234 | 2014-01-05 |
| 2 | Michael | 4567 | 2003-01-05 |
| 3 | Peter | 3425 | 2020-10-01 |
| 4 | Mary | 3242 | NULL |
I Hope This Helps. I was really trying to merge the T3---> contactDate with T1 for like 3 hours. But it was process heavy. I will attach Links that could help you more.
Reference
INNER JOIN
SQL ALTER TABLE Statement
SQL Server INSERT Multiple Rows
INSERT INTO SELECT statement overview and examples
SQL INSERT INTO SELECT Statement
I have a table(from log file) with emails and three other columns that contains states of that user's interaction with a system, an email(user) may have 100 or 1000 entries, each entries contain those three combinations of values, that might repeat on and on for same email and others.
something look like this:
+---------+---------+---------+-----+
| email | val1 | val2 | val3 |
+---------+---------+---------+-----+
|jal#h | cast | core | cam |
|hal#b |little ja| qar | ja sa |
|bam#t | cast | core | cam |
|jal#h |little ja| qar | jaja |
+---------+---------+---------+-----+
and so, the emails repeat, all values repeat, and there are 40+ possible values for each columns, all strings. so i want to sort distinct email email and then put all possible value as column name, and under it a count for how many this value repeated for a particular email, like so:
+-------+--------+--------+------+----------+-----+--------+-------+
| email | cast | core | cam | little ja| qar | ja sa | blabla |
+-------+--------+--------+------+----------+-----+--------+--------|
|jal#h | 55 | 2 | 44 | 244 | 1 | 200 | 12 |
|hal#b | 900 | 513 | 101 | 146 | 2 | 733 | 833 |
|bam#t | 1231 | 33 | 433 | 411 | 933 | 833 | 53 |
+-------+--------+--------+------+----------+-----+--------+---------
I have tried mysql but i managed to count a certain value total occurances for each email, but not counting all possible values in each columns:
SELECT
distinct email,
count(val1) as "cast"
FROM table1
where val1 = 'cast'
group by email
This query clearly doesn't do it, as it output only on value 'cast' from the first column 'val1', What i'm looking for is all distinct values in first, second, and third columns be turned to columns heads and the values in rows will be the total for that value, for a certain email 'user'.
there is a pivote table thing but i couldn't get it to work.
I'm dealing with this data as a table in mysql, but it is available in csv file, so if it isn't possible with a query, python would be a possible solution, and prefered after sql.
update
in python, is it possible to output the data as:
+-------+--------+---------+------+----------+-----+--------+-------+
| | val1 | val2 | val3 |
+-------+--------+---------+------+----------+-----+--------+-------+
| email | cast |little ja|core | qar |cam | ja sa | jaja |
+-------+--------+---------+------+----------+-----+--------+--------|
|jal#h | 55 | 2 | 44 | 244 | 1 | 200 | 12 |
|hal#b | 900 | 513 | 101 | 146 | 2 | 733 | 833 |
|bam#t | 1231 | 33 | 433 | 411 | 933 | 833 | 53 |
+-------+--------+--------+------+----------+-----+--------+---------
I'm not very familiar with python.
If you use pandas, you can do a value_counts after grouping your data frame by email and then unstack/pivot it to wide format:
(df.set_index("email").stack().groupby(level=0).value_counts()
.unstack(level=1).reset_index().fillna(0))
To get the updated result, you can group by both the email and val* columns after the stack:
(df.set_index("email").stack().groupby(level=[0, 1]).value_counts()
.unstack(level=[1, 2]).fillna(0).sort_index(axis=1))
I'd reconstruct dataframe, then groupby and unstack with pd.value_counts
v = df.values
s = pd.Series(v[:, 1:].ravel(), v[:, 0].repeat(3))
s.groupby(level=0).value_counts().unstack(fill_value=0)
cam cast core ja sa jaja little ja qar
bam#t 1 1 1 0 0 0 0
hal#b 0 0 0 1 0 1 1
jal#h 1 1 1 0 1 1 1
If you know the list you can calculate it using group by:
SELECT email,
sum(val1 = 'cast') as `cast`,
sum(val1 = 'core') as `core`,
sum(val1 = 'cam') as `cam`,
. . .
FROM table1
GROUP BY email;
The . . . is for you to fill in the remaining values.
You can use this Query to generate a PREPARED Statement dynamic from your Values val1-val3 in your table:
SELECT
CONCAT( "SELECT email,\n",
GROUP_CONCAT(
CONCAT (" SUM(IF('",val1,"' IN(val1,val2,val3),1,0)) AS '",val1,"'")
SEPARATOR ',\n'),
"\nFROM table1\nGROUP BY EMAIL\nORDER BY email") INTO #myquery
FROM (
SELECT val1 FROM table1
UNION SELECT val2 FROM table1
UNION SELECT val3 FROM table1
) AS vals
ORDER BY val1;
-- ONLY TO VERIFY QUERY
SELECT #myquery;
PREPARE stmt FROM #myquery;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
sample table
mysql> SELECT * FROM table1;
+----+-------+-----------+------+-------+
| id | email | val1 | val2 | val3 |
+----+-------+-----------+------+-------+
| 1 | jal#h | cast | core | cam |
| 2 | hal#b | little ja | qar | ja sa |
| 3 | bam#t | cast | core | cam |
| 4 | jal#h | little ja | qar | cast |
+----+-------+-----------+------+-------+
4 rows in set (0,00 sec)
generate query
mysql> SELECT
-> CONCAT( "SELECT email,\n",
-> GROUP_CONCAT(
-> CONCAT (" SUM(IF('",val1,"' IN(val1,val2,val3),1,0)) AS '",val1,"'")
-> SEPARATOR ',\n'),
-> "\nFROM table1\nGROUP BY EMAIL\nORDER BY email") INTO #myquery
-> FROM (
-> SELECT val1 FROM table1
-> UNION SELECT val2 FROM table1
-> UNION SELECT val3 FROM table1
-> ) AS vals
-> ORDER BY val1;
Query OK, 1 row affected (0,00 sec)
verify query
mysql> -- ONLY TO VERIFY QUERY
mysql> SELECT #myquery;
SELECT email,
SUM(IF('cast' IN(val1,val2,val3),1,0)) AS 'cast',
SUM(IF('little ja' IN(val1,val2,val3),1,0)) AS 'little ja',
SUM(IF('core' IN(val1,val2,val3),1,0)) AS 'core',
SUM(IF('qar' IN(val1,val2,val3),1,0)) AS 'qar',
SUM(IF('cam' IN(val1,val2,val3),1,0)) AS 'cam',
SUM(IF('ja sa' IN(val1,val2,val3),1,0)) AS 'ja sa'
FROM table1
GROUP BY EMAIL
ORDER BY email
1 row in set (0,00 sec)
execute query
mysql> PREPARE stmt FROM #myquery;
Query OK, 0 rows affected (0,00 sec)
Statement prepared
mysql> EXECUTE stmt;
+-------+------+-----------+------+------+------+-------+
| email | cast | little ja | core | qar | cam | ja sa |
+-------+------+-----------+------+------+------+-------+
| bam#t | 1 | 0 | 1 | 0 | 1 | 0 |
| hal#b | 0 | 1 | 0 | 1 | 0 | 1 |
| jal#h | 2 | 1 | 1 | 1 | 1 | 0 |
+-------+------+-----------+------+------+------+-------+
3 rows in set (0,00 sec)
mysql> DEALLOCATE PREPARE stmt;
Query OK, 0 rows affected (0,00 sec)
mysql>
So I'm working on this database structuring and trying to figure out if this is the best method. I'm pulling records from a 3rd party site and store them to a temporary table (tableA) I then check for duplicates in tableB and then insert the non duplicated in to tableB from tableA. Is there anyway to get the id assigned from tableB each time a record is inserted? Right now I'm looking for the latest records inserted in tableB by date and then retrieving the IDs. Is there a more efficient way?
Is there a reason you're not using INSERT IGNORE? It seems to me that you could do away with the whole temporary-table process...
+----+------+
| id | name |
|----|------|
| 1 | adam |
| 2 | bob |
| 3 | carl |
+----+------+
If id has a unique constraint, then this:
INSERT IGNORE INTO tableName (id, name) VALUES (3, "carl"), (4, "dave");
...will result in:
+----+------+
| id | name |
|----|------|
| 1 | adam |
| 2 | bob |
| 3 | carl |
| 4 | dave |
+----+------+
...whereas if you'd just done an INSERT (without the IGNORE part), it would give you a unique key constraint error.
In terms of getting the ID back, just use:
SELECT LAST_INSERT_ID()
...after every INSERT IGNORE call you make.
It sounds like you want something called an after-insert trigger on table B. This is a piece of code that runs after inserting one or more rows into a table. Documentation is here.
The code is something like:
CREATE TRIGGER mytrigger AFTER INSERT ON B
FOR EACH ROW BEGIN
-- Do what you want with each row here
END;