Use a CSV to create a SQL table - python

I have a csv file with the following format:
+--------------+--------+--------+--------+
| Description | brand1 | brand2 | brand3 |
+--------------+--------+--------+--------+
| afkjdjfkafj | 1 | 0 | 0 |
| fhajdfhjafh | 1 | 0 | 0 |
| afdkjfkajljf | 0 | 1 | 0 |
+--------------+--------+--------+--------+
I want to write a python script that reads the csv and create a table in sql. I want the table to have the description and the derived brand. If there is a one in the brand name column of the csv then the description is associated with that brand. I want to then create a sql table with the description and the associated brand name.
The table will be :
+-------------+---------------+
| Description | derived brand |
+-------------+---------------+
| afkjdjfkafj | brand 1 |
+-------------+---------------+
So far I have written the code for reading the csv and made the descriptions a list.
df = pd.read_csv(SOURCE_FILE, delimiter=",")
descriptions = df['descriptions'].tolist()
Please provide some guidance on how to read the file and achieve this because I am so lost. Thanks!

I just answered a similar question on dba.stackexchange.com, but here's the basics
Create your table...
create table myStagingTable (Description varchar(64), Brand1 bit, Brand2 bit, Brand3 bit)
Then, bulk insert into it but ignore the first row, if your first row has column headers.
bulk insert myStagingTable
from 'C:\somefile.csv',
with( firstrow = 2,
fieldterminator = ',',
rowterminator = '\n')
Now your data will be in a table just like it is in your excel file.To insert it into your final table, you can use IIF and COALESCE
insert into finalTable
select distinct
[Description]
,DerivedBrand = coalesce(iif(Brand1 = 1,'Brand1',null),iif(Brand2 = 1,'Brand2',null),iif(Brand3 = 1,'Brand3',null))
from myStagingTable
See a DEMO HERE

Related

Python or Excel: How can you compare 2 columns and then write the value of a 3rd column in a new column?

I have 2 tables that look like this:
Table 1
| ID | Tel. | Name |
|:--:|:----:|:-------:|
| 1 | 1234 | Denis |
| 2 | 4567 | Michael |
| 3 | 3425 | Peter |
| 4 | 3242 | Mary |
Table 2
| ID | Contact Date |
|:--:|:------------:|
| 1 | 2014-05-01 |
| 2 | 2003-01-05 |
| 3 | 2020-01-10 |
| 4 | NULL |
Now I want to Compare the First Table with the second table with the ID column to look if contact 1 is already in the list where people were contacted. After this, I want to write the Contact Date into the first table to see the last contact date in the main table.
How would I do this?
Thanks for any answers!!
Here's a solution that might get your interest using MySQL; Then if you want to implement it into Python will be easy.
First, Let's create our T1 Table AS CODED
create table T1(
id integer primary key,
tel integer,
name varchar(100)
);
create table T2(
id integer primary key,
contactDate date
);
Second Table T2
create table T2(
id integer primary key,
contactDate date
);
Insertion Of T1 AND T2:
-- Table 'T1' Insertion
insert into T1(id, tel, name) VALUES
(1, 1234, "Denis"),
(2, 4567, "Michael"),
(3, 3425,"Peter"),
(4, 3242, "Mary");
-- Table 'T2' Insertion
insert into T2(id, contactDate) VALUES
(1, 20140105),
(2, 20030105),
(3, 20201001),
(4, Null);
Then, Create T3 With the Select Statement for both tables using INNER JOIN For Joins Results
CREATE TABLE T3 AS
SELECT T1.id, T1.name, T1.tel, T2.contactDate
FROM T1
INNER JOIN T2 ON T1.id=T2.id;
Then, SELECT to check the results
select * from T3;
OUTPUT
| id | name | tel | contactDate |
|:--:|:-------:|------|-------------|
| 1 | Denis | 1234 | 2014-01-05 |
| 2 | Michael | 4567 | 2003-01-05 |
| 3 | Peter | 3425 | 2020-10-01 |
| 4 | Mary | 3242 | NULL |
I Hope This Helps. I was really trying to merge the T3---> contactDate with T1 for like 3 hours. But it was process heavy. I will attach Links that could help you more.
Reference
INNER JOIN
SQL ALTER TABLE Statement
SQL Server INSERT Multiple Rows
INSERT INTO SELECT statement overview and examples
SQL INSERT INTO SELECT Statement

How to link data records in pandas?

I have a csv file which I can read into a pandas data frame. The data is like:
+--------+---------+------+----------------+
| Name | Address | ID | Linked_To |
+--------+---------+------+----------------+
| Name 1 | ABC | 1233 | 1234;1235 |
| Name 2 | DEF | 1234 | 1233;1236;1237 |
| Name 3 | GHI | 1235 | 1234;1233;2589 |
+--------+---------+------+----------------+
How do I run analysis on the linkage between ID and the Linked_To columns. For example, should I be turning the Linked_To values into a list and doing a VLOOKUP type analysis on the ID column? I know there must be an obvious way to do this but I am stumped.
Ideally the end result should be a list or dictionary which has the entire attributes of the row, including all of the other records its linked to.
OR is this a problem where I should be transforming the data into an SQL database?
for the unique and non-unique cases, a dictionary of IDs in linked_to for each ID could be obtained via:
def linked_ids(df):
#set up the dictionary
dict = {}
#iterate through the rows
for row in df.index:
#separate the semi-colon delimited linked to field
linked_to = df.ix[row,'Linked_to'].split(";")
if df.ix[row,'ID'] not in dict.keys():
dict[df.ix[row,'ID']] = []
for linked_id in linked_to:
if linked_id not in dict[df.ix[row,'ID']]:
dict[df.ix[row,'ID']].append(linked_id)
else:
for linked_id in linked_to:
if linked_id not in dict[df.ix[row,'ID']]:
dict[df.ix[row,'ID']].append(linked_id)
return dict
If you working with pandas dataframe , try this
df.set_index('ID').Linked_To.str.split(';').to_dict()
Out[142]:
{1233: ['1234', '1235'],
1234: ['1233', '1236', '1237'],
1235: ['1234', '1233', '2589']}

How can I filter exported tickets from database using Django?

I am working on a Django based web project where we handle tickets based requests. I am working on an implementation where I need to export all closed tickets everyday.
My ticket table database looks like,
-------------------------------------------------
| ID | ticket_number | ticket_data | is_closed |
-------------------------------------------------
| 1 | 123123 | data 1 | 1 |
-------------------------------------------------
| 2 | 123124 | data 2 | 1 |
-------------------------------------------------
| 3 | 123125 | data 3 | 1 |
-------------------------------------------------
| 4 | 123126 | data 4 | 1 |
-------------------------------------------------
And my ticket_exported table in database is similar to
----------------------------------
| ID | ticket_id | ticket_number |
----------------------------------
| 10 | 1 | 123123 |
----------------------------------
| 11 | 2 | 123124 |
----------------------------------
so my question is that when I process of exporting tickets, is there any way where I can make a single query to get list of all tickets which are closed but ticket_id and ticket_number is not in ticket_exported table? So when I run functions it should get tickets with ticket_id '3' and '4' because they are not exported in ticket_export database.
I don't want to go through all possible tickets and check one by one if their id exists in exported tickets table if I can just do it in one query whether it is raw SQL query or Django's queries.
Thanks everyone.
you can do without is_exported field:
exported_tickets = TicketsExported.objects.all()
unexported_tickets = Tickets.object.exclude(id__in=[et.id for et in exported_tickets])
but is_exported field can be useful somewhere else
Per my comment- you could probably save yourself a bunch of trouble and just add another BooleanField for 'is_exported' instead of having a separate model assuming there aren't fields specific to TicketExported.
#doniyor's answer gets you the queryset you're looking for though. In response to your raw SQL statement question: you want: unexported_tickets.query.

Select whole row and insert it into table_2

Is it possible to select whole row from table_1 (without autoincrement ID) and insert it into another table table_2 which has the same relational scheme as table_1 (the same columns)?
I can do that using for example Python but the table has too much rows to do write a code for that.
So this is the example:
table_1:
id | name | age | sex | degree
1 | Pate | 98 | it | doc
2 | Ken | 112 | male| -
table_2:
id | name | age | sex | degree
SQLite3:
INSERT INTO table_2 (SELECT * FROM table_1 WHERE id=2);
RESULT:
table_2:
id | name | age | sex | degree
1 | Ken | 112 | male| -
EDIT:
If this is not possible, it could be done including id so the table_2 would look like:
id | name | age | sex | degree
2 | Ken | 112 | male| -
The INSERT statement indeed has a form that inserts each row returned by a SELECT. However, that SELECT is not a subquery, so you have to omit the parentheses around it, and when you're not inserting all columns, you have to specifiy which columns to use:
INSERT INTO table_2 (name, age, sex, degree)
SELECT name, age, sex, degree
FROM table_1
WHERE id = 2;

Using if not exists to insert data from tableA to tableB and return insert ID from tableB

So I'm working on this database structuring and trying to figure out if this is the best method. I'm pulling records from a 3rd party site and store them to a temporary table (tableA) I then check for duplicates in tableB and then insert the non duplicated in to tableB from tableA. Is there anyway to get the id assigned from tableB each time a record is inserted? Right now I'm looking for the latest records inserted in tableB by date and then retrieving the IDs. Is there a more efficient way?
Is there a reason you're not using INSERT IGNORE? It seems to me that you could do away with the whole temporary-table process...
+----+------+
| id | name |
|----|------|
| 1 | adam |
| 2 | bob |
| 3 | carl |
+----+------+
If id has a unique constraint, then this:
INSERT IGNORE INTO tableName (id, name) VALUES (3, "carl"), (4, "dave");
...will result in:
+----+------+
| id | name |
|----|------|
| 1 | adam |
| 2 | bob |
| 3 | carl |
| 4 | dave |
+----+------+
...whereas if you'd just done an INSERT (without the IGNORE part), it would give you a unique key constraint error.
In terms of getting the ID back, just use:
SELECT LAST_INSERT_ID()
...after every INSERT IGNORE call you make.
It sounds like you want something called an after-insert trigger on table B. This is a piece of code that runs after inserting one or more rows into a table. Documentation is here.
The code is something like:
CREATE TRIGGER mytrigger AFTER INSERT ON B
FOR EACH ROW BEGIN
-- Do what you want with each row here
END;

Categories

Resources