MySQL query combining several tables - python

Background
In order to obtain data for my thesis I have to work with a large, fairly
complicated MySQL database, containing several tables and hundreds of GBs of
data. Unfortunately, I am new to SQL, and can't really figure out how to
extract the data that I need.
Database
The database consists of several tables that I want to combine. Here are the
relevant parts of it:
> show tables;
+---------------------------+
| Tables_in_database |
+---------------------------+
| Build |
| Build_has_ModuleRevisions |
| Configuration |
| ModuleRevisions |
| Modules |
| Product |
| TestCase |
| TestCaseResult |
+---------------------------+
The tables are linked together in the following manner
Product ---(1:n)--> Configurations ---(1:n)--> Build
Build ---(1:n)--> Build_has_ModuleRevisions ---(n:1)--> ModuleRevision ---(n:1)--> Modules
Build ---(1:n)--> TestCaseResult ---(n:1)--> TestCase
The contents of the tables are
> describe Product;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | UNI | NULL | |
+---------+--------------+------+-----+---------+----------------+
> describe Configuration;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Product_id | int(11) | YES | MUL | NULL | |
| name | varchar(255) | NO | UNI | NULL | |
+------------+--------------+------+-----+---------+----------------+
> describe Build;
+------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Configuration_id | int(11) | NO | MUL | NULL | |
| build_number | int(11) | NO | MUL | NULL | |
| build_id | varchar(32) | NO | MUL | NULL | |
| test_status | varchar(255) | NO | | | |
| start_time | datetime | YES | MUL | NULL | |
| end_time | datetime | YES | MUL | NULL | |
+------------------+--------------+------+-----+---------+----------------+
> describe Build_has_ModuleRevisions;
+-------------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Build_id | int(11) | NO | MUL | NULL | |
| ModuleRevision_id | int(11) | NO | MUL | NULL | |
+-------------------+----------+------+-----+---------+----------------+
> describe ModuleRevisions;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Module_id | int(11) | NO | MUL | NULL | |
| tag | varchar(255) | NO | MUL | | |
| revision | varchar(255) | NO | MUL | | |
+-----------+--------------+------+-----+---------+----------------+
> describe Modules;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | UNI | NULL | |
+---------+--------------+------+-----+---------+----------------+
> describe TestCase;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| TestSuite_id | int(11) | NO | MUL | NULL | |
| classname | varchar(255) | NO | MUL | NULL | |
| name | varchar(255) | NO | MUL | NULL | |
| testtype | varchar(255) | NO | MUL | NULL | |
+--------------+--------------+------+-----+---------+----------------+
> describe TestCaseResult;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Build_id | int(11) | NO | MUL | NULL | |
| TestCase_id | int(11) | NO | MUL | NULL | |
| status | varchar(255) | NO | MUL | NULL | |
| start_time | datetime | YES | MUL | NULL | |
| end_time | datetime | YES | MUL | NULL | |
+-------------+--------------+------+-----+---------+----------------+
As you can see the tables are linked with *_id fields. E.g. TestCaseResult
is linked to a Build by the Build_id field, and to a TestCase by the
TestCase_id field.
Problem Desciption
Now to my problem. Given a specific Configuration.name and Product.name as
input, I need to find all modules+revisions and failed testcases, for every
Build, sorted by Build.start_time.
What I Have Tried
The following query gives me all the Builds given a Configuration.name of
config1 and a Product.name of product1
SELECT
*
FROM
`database`.`Build` AS b
JOIN
Configuration AS c ON c.id = b.Configuration_id
JOIN
Product as p ON p.id = c.Product_id
WHERE
c.name = 'config1'
AND p.name = 'product1'
ORDER BY b.start_time;
This doesn't even solve half of my problem, though. Now, for every build I
need to
Find all Modules linked to the Build
Extract the Modules.name field
Extract the ModuleRevision.revision field
Find all TestCases linked to the Build
Where TestCaseResult.status = 'failure'
Extract the TestCase.name field linked to the TestCaseResult
Associate the Build with the extracted module name+revisions and testcase
names
Present the data ordered by Build.start_time so that I can perform
analyses on it.
In other words, of all the data available, I am only interested in linking the
fields Modules.name, ModuleRevision.revision, TestCaseResult.status, and
TestCaseResult.name to a particular Build, order this by Build.start_time
and then output this to a Python program I have written.
The end result should be something similar to
Build Build.start_time Modules+Revisions Failed tests
1 20140301 [(mod1, rev1), (mod2... etc] [test1, test2, ...]
2 20140401 [(mod1, rev2), (mod2... etc] [test1, test2, ...]
3 20140402 [(mod3, rev1), (mod2... etc] [test1, test2, ...]
4 20140403 [(mod1, rev3), (mod2... etc] [test1, test2, ...]
5 20140505 [(mod5, rev2), (mod2... etc] [test1, test2, ...]
My question
Is there a good (and preferrably efficient) SQL query that can extract and
present the data that I need?
If not, I am totally okay with extracting one or several supersets/subsets of
the data in order to parse it with Python if necessary. But how do I extract
the desired data?

It looks to me like you'd need more than one query for this. The problem is that the relationships of Build <-> ModuleRevision and Build <- TestCaseResult are basically independent. ModuleRevisions and TestCaseResults don't really have anything to do with each other as far as the schema is concerned. You have to query for one and then the other. You can't get them both in one query because because each row in your results basically represents one record of the "deepest" related table (in this case, either ModuleRevision or TestCaseResult) including any related information from its parent tables. Therefore, I think you'd need something like the following:
SELECT
M.name, MR.revision, B.id
FROM
ModuleRevisions MR
INNER JOIN
Modules M ON MR.Module_id = M.id
INNER JOIN
Build_has_ModuleRevisions BHMR ON MR.id = BHMR.ModuleRevision_id
INNER JOIN
Build B ON BHMR.Build_id = B.id
INNER JOIN
Configuration C ON B.Configuration_id = C.id
INNER JOIN
Product P ON C.Product_id = P.id
WHERE C.name = 'config1' AND P.name = 'product1'
ORDER BY B.start_time;
SELECT
TCR.status, TC.name, B.id
FROM
TestCaseResult TCR
INNER JOIN
TestCase TC ON TCR.TestCase_id = TC.id
INNER JOIN
Build B ON TCR.Build_id = B.id
INNER JOIN
Configuration C ON B.Configuration_id = C.id
INNER JOIN
Product P ON C.Product_id = P.id
WHERE C.name = 'config1' AND P.name = 'product1' and TCR.status = 'failure'
ORDER BY B.start_time;

Related

SQL Alchemy. has many relation that returns a particular column

I have the following tables:
Campaigns
+----------------------------+-------------------------------------------------------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+-------------------------------------------------------------------+------+-----+-------------------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| campaign_type_id | int(11)
+----------------------------+-------------------------------------------------------------------+------+-----+-------------------+----------------+
CampaignsSiteList
+--------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------+------+-----+---------+-------+
| campaign_id | int(11) | NO | PRI | NULL | |
| site_list_id | int(11) | NO | PRI | NULL | |
+--------------+------------+------+-----+---------+-------+
I'm using SQL Alchemy and I want to create a relationship so that objects of class Campaign have an attribute that return the list of site_list_id associated with them. I don't want the relation to return the list of CampaignSiteList objects, but a list that contains the column site_list_id of CampaignsSiteList.
You could just use a property on the class and pull them out yourself, something like:
class Campaigns():
# column definitions here
sites = relationship("CampaignSiteList", lazy="joined")
#property
def site_ids(self):
return [d.id for d in self.sites]

MySQL Encoding 4 byte in 3 byte utf-8 - Incorrect string value

According to the mysql documentation which supports only up to 3 byte utf-8 unicode encoding.
My question is, how can I replace characters that require 4 byte utf-8 encoding in my database? And how do I decode those characters in order to display exactly what the user wrote?
Part of the integration test:
description = u'baaam á ✓ ✌ ❤'
print description
test_convention = Blog.objects.create(title="test title",
description=description,
login=self.user,
tag=self.tag)
Error:
Creating test database for alias 'default'...
baaam á ✓ ✌ ❤
E..
======================================================================
ERROR: test_post_blog (blogs.tests.PostTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/admin/Developer/project/pro/blogs/tests.py", line 64, in test_post_blog
tag=self.tag)
File "build/bdist.macosx-10.9-intel/egg/MySQLdb/cursors.py", line 201, in execute
self.errorhandler(self, exc, value)
File "build/bdist.macosx-10.9-intel/egg/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
DatabaseError: (1366, "Incorrect string value: '\\xE2\\x9C\\x93 \\xE2\\x9C...' for column 'description' at row 1")
----------------------------------------------------------------------
Ran 3 tests in 1.383s
FAILED (errors=1)
Destroying test database for alias 'default'...
Table's configuration:
+----------------------------------+--------+---------+-------------------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+----------+----------------+---------+
| Name | Engine | Version | Collation | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Checksum | Create_options | Comment |
+----------------------------------+--------+---------+-------------------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+----------+----------------+---------+
| blogs_blog | InnoDB | 10 | utf8_general_ci | Compact | 25 | 1966 | 49152 | 0 | 32768 | 0 | 35 | 2014-02-09 00:57:59 | NULL | NULL | NULL | | |
+----------------------------------+--------+---------+-------------------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+----------+----------------+---------+
Update: I already changed the table and column configurations from utf-8 to utf8mb4 and still getting the same error, any ideas?
+----------------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+----------------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| blogs_blog | InnoDB | 10 | Compact | 5 | 3276 | 16384 | 0 | 32768 | 0 | 36 | 2014-02-17 22:24:18 | NULL | NULL | utf8mb4_general_ci | NULL | | |
+----------------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
and:
+---------------+--------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+---------------+--------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+
| id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| title | varchar(500) | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | |
| description | longtext | utf8mb4_general_ci | YES | | NULL | | select,insert,update,references | |
| creation_date | datetime | NULL | NO | | NULL | | select,insert,update,references | |
| login_id | int(11) | NULL | NO | MUL | NULL | | select,insert,update,references | |
| tag_id | int(11) | NULL | NO | MUL | NULL | | select,insert,update,references | |
+---------------+--------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+
It is supported, but not asutf8. Add the following to the [mysqld] section of my.cnf:
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
When creating a database, use:
CREATE DATABASE xxxxx DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci;
At the end of a CREATE TABLE command, add:
ENGINE=InnoDB ROW_FORMAT=COMPRESSED DEFAULT CHARSET=utf8mb4;

Importing csv file in mysql

I have a database with tables: person, player, coach, and team. All the tables have an auto-increment id field as the primary key. Person has id, firstname, lastname. Player and coach both have the id field, as well as person_id and team_id as foreign keys to tie them to a team.id or person.id field in the other tables.
I have one master csv file, from that I want import all the values in MySql different tables with ids.
And I want to check the value also in the data base. If the value is in database then do not import that value.
I have used CSV parsing and indexing function. But I am not able to do that. Can any one help me in that
My sql table below
mysql> describe person;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| firstname | varchar(30) | NO | | NULL | |
| lastname | varchar(30) | NO | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
mysql> describe player;
+-----------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| person_id | int(11) | NO | MUL | NULL | |
| team_id | int(11) | NO | MUL | NULL | |
+-----------+---------+------+-----+---------+----------------+
mysql> describe team;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| teamname | varchar(25) | NO | | NULL | |
| location | varchar(40) | NO | | NULL | |
| city | varchar(25) | NO | | NULL | |
| state | varchar(2) | NO | | NULL | |
| venue | varchar(35) | NO | | NULL | |
| league_id | int(11) | NO | MUL | NULL | |
+-----------+-------------+------+-----+---------+----------------+
My Csv file is
First Name Last Name teamname Location city state |venue
abc cdf india csv bng kar abc
After importing
I have a database with tables: person, player, coach, and team. All the tables have an auto-increment id field as the primary key. Person has id, firstname, lastname. Player and coach both have the id field, as well as person_id and team_id as foreign keys to tie them to a team.id or person.id field in the other tables.
I have one master csv file, from that I want import all the values in MySql different tables with ids.
And I want to check the value also in the data base. If the value is in database then do not import that value.
I have used CSV parsing and indexing function. But I am not able to do that. Can any one help me in that
My sql table below
mysql> describe person;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| firstname | varchar(30) | NO | | NULL | |
| lastname | varchar(30) | NO | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
mysql> describe player;
+-----------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| person_id | int(11) | NO | MUL | NULL | |
| team_id | int(11) | NO | MUL | NULL | |
+-----------+---------+------+-----+---------+----------------+
mysql> describe team;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| teamname | varchar(25) | NO | | NULL | |
| location | varchar(40) | NO | | NULL | |
| city | varchar(25) | NO | | NULL | |
| state | varchar(2) | NO | | NULL | |
| venue | varchar(35) | NO | | NULL | |
| league_id | int(11) | NO | MUL | NULL | |
+-----------+-------------+------+-----+---------+----------------+
My Csv file is
First Name Last Name teamname Location city state |venue
abc cdf india csv bng kar abc
After importing
id First Name Last Name teamname Location city state |venue coment
1 1 1 1 1 1 1 abc abc
I am trying with some small code
# initialize with empty ints and dicts
name,cities,countries,states=[],[],[],[]
with open('ind.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
reader.next() #skip header
for row in reader:
name.append(row[0])
cities.append(row[2])
states.append(row[3])
countries.append(row[4])
cl = list(set(countries))
sl = list(set(states))
citl = list(set(cities))
inf1 = list(set(name))
with open('countries.csv','w') as cfile:
writer = csv.writer(cfile, delimiter=',')
writer.writerow(['country_id','name'])
for i,x in enumerate(cl):
writer.writerow([i,x])
with open('state.csv','w') as cfile:
writer = csv.writer(cfile, delimiter=',')
writer.writerow(['state_id','country_id','state'])
for i,x in enumerate(sl):
writer.writerow([i,x,cl.index(countries[states.index(x)])])
with open('cities.csv','w') as cfile:
writer = csv.writer(cfile,delimiter=',')
writer.writerow(['city_id','city','st_id','country_id'])
for i,x in enumerate(citl):
writer.writerow([i,x,sl.index(states[cities.index(x)]),
cl.index(countries[cities.index(x)])
])
with open('inf123.csv','w') as cfile:
writer = csv.writer(cfile,delimiter=',')
writer.writerow(['Name_id', 'Name','city_id','st_id','country_id'])
for i,x in enumerate(inf1):
writer.writerow([i,x,
citl.index(cities[name.index(x)]),
sl.index(states[name.index(x)]),
cl.index(countries[name.index(x)])
])
import MySQLdb
import csv
mydb = MySQLdb.connect(host="localhost", # The Host
user="root", # username
passwd="root", # password
db="abcm") # name of the data base
cursor = mydb.cursor()
csv_data = csv.reader(file('countries.csv'))
for row in csv_data:
cursor.execute('INSERT INTO country(id, \
name )' \
'VALUES("%s", "%s")',
row)
#close the connection to the database.
mydb.commit()
cursor.close()
print "Done"
cursor = mydb.cursor()
csv_data = csv.reader(file('state.csv'))
for row in csv_data:
cursor.execute('INSERT INTO state(id, \
country, name )' \
'VALUES("%s", "%s", "%s")',
row)
#close the connection to the database.
mydb.commit()
cursor.close()
print "Done"
I have one master csv file, from that I want import all the values in
MySql different tables with ids.
This is not possible because the import routine doesn't know where you want to put the data.
If your master csv file contained a column containing the table name you could then
import your csv file into a temporary
use different sql statements to move the data into the appropriate tables

Create a graph using mysql

So , there is this interesting problem in front of me ,i have two tables one with information of user with host profiles on web and other stores profiles mentioned in the website , eg : on www.abc.com i have mentioned www.xyz.com so abc.com will be part of :
source table
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| website | varchar(255) | YES | | NULL | |
| user_id | varchar(25) | YES | MUL | NULL | |
| web_name | varchar(255) | YES | | NULL | |
+----------+--------------+------+-----+---------+----------------+
mention table will have entries (like xyz.com mentioned above)
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| web_link | varchar(255) | YES | | NULL | |
| user_id | varchar(25) | YES | MUL | NULL | |
| web_name | varchar(255) | YES | | NULL | |
+----------+--------------+------+-----+---------+----------------+
user_id is foreign key to these tables , now i want to generate a node based graph such that i select source.web_name and mention.webname and assign them ids such that they all be unique , eg: 0-> 1 because there can be chances of 1->0
I want to know what can be the best possible way to achieve this, should i change the schema to get it done or using python selects this can be done. I am not able to figure out giving unique ids to both source.web_name and mention.webname when they reside in different table .
If I have understood this correctly, you have web pages that can contain urls to other webpages and you want to model the references.
You can create a table of all web pages and a table of refrences
web_pages table:
id
website
etc.
and references or "mentions" table:
source_id (refers to id in web pages table)
target_id (also refers to id in web pages table)

Python MySQLdb: combine 2 tables,select data as dict. I am confused about the dict's key

I have two tables:user and post
and the structures of them are:
post:
+---------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | char(30) | YES | | NULL | |
| user_id | int(11) | YES | | NULL | |
+---------+----------+------+-----+---------+----------------+
user:
+---------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+----------+------+-----+---------+----------------+
| user_id | int(11) | NO | PRI | NULL | auto_increment |
| name | char(30) | YES | | NULL | |
| email | char(30) | YES | | NULL | |
+---------+----------+------+-----+---------+----------------+
i get this:(keys of data dict)
['post.user_id', 'user_id', 'name', 'email', 'post.name', 'id']
my python code is:
import MySQLdb
import MySQLdb.cursors
con = MySQLdb.connect(user = "root", passwd = "123456", db = "mydb", cursorclass=MySQLdb.cursors.DictCursor)
cur = con.cursor()
cur.execute("select * from user, post where user.user_id = post.user_id")
print cur.fetchone().keys()
but,why the keys of data dict is that? thanks. My English is not so well,excuse me
When you select *, you ask for all columns in both user and post. Since user and post have columns with overlapping names, the tablename is added before a few of them, to create unique keys.
I'm not sure what you were expecting, but you can explicitly control the keys you get by giving the columns aliases:
"select user.user_id as user_id, post.name as post_name, user.name as user_name ..."

Categories

Resources