SQLAlchemy - pretty print SQL query results - python

In Ruby console, it is possible to display SQL query results in a very human-friendly way (ActiveRecord + Hirb):
>> Tag.all :limit=>3, :order=>"id DESC"
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+
| id | created_at | description | name | namespace | predicate | value |
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+
| 907 | 2009-03-06 21:10:41 UTC | | gem:tags=yaml | gem | tags | yaml |
| 906 | 2009-03-06 08:47:04 UTC | | gem:tags=nomonkey | gem | tags | nomonkey |
| 905 | 2009-03-04 00:30:10 UTC | | article:tags=ruby | article | tags | ruby |
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+
3 rows in set
Is there a module that will allow me to do display SQLAlchemy result sets in a similar way in IPython?

Related

How to enable connection pooling in Flask-SQLAlchemy?

I'm using Flask-SqlAlchemy with a MySQL database. And recently I started getting this error:
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1040, 'Too many connections')
After some digging, it seems that I'm not using the connection pooling.
Based on my research SQLALCHEMY_POOL_SIZE is now deprecated in v2.4 and shouldn't be used anymore.
SQLAlchemy==1.3.7
Flask-SQLAlchemy==2.4.1
So what is the correct way of doing it?
Config:
SQLALCHEMY_DATABASE_URI = 'mysql://root:PASSWORD#localhost/main_db'
SQLALCHEMY_BINDS = {
'radius': 'mysql://root:PASSWORD#localhost/radius_db',
'cache': 'mysql://root:PASSWORD#localhost/cache_db',
}
Code:
def make_app():
app = Flask(__name__, template_folder="../templates")
app.config.from_object(config)
db.init_app(app)
app = my_fabric.make_app()
According to SqlAlchemy, I'm supposed to do this via create_engine
engine = create_engine("mysql+pymysql://user:pw#host/db", pool_size=20, max_overflow=0))
But Flask-SqlAlchemy is supposed to abstract this out. So I don't know how this should be configured.
UPDATE:
I'm running uWSGI with two processes.
I have now increased the max_connections to 500. It's hard to say if I have high traffic, but my database statistics after 16 hours showing this:
mysql> show status like '%onn%';
+-------------------------------------------------------+---------------------+
| Variable_name | Value |
+-------------------------------------------------------+---------------------+
| Aborted_connects | 5 |
| Connection_errors_accept | 0 |
| Connection_errors_internal | 0 |
| Connection_errors_max_connections | 0 |
| Connection_errors_peer_address | 0 |
| Connection_errors_select | 0 |
| Connection_errors_tcpwrap | 0 |
| Connections | 3897 |
| Locked_connects | 0 |
| Max_used_connections | 167 |
| Max_used_connections_time | 2019-11-29 00:11:51 |
| Mysqlx_connection_accept_errors | 0 |
| Mysqlx_connection_errors | 0 |
| Mysqlx_connections_accepted | 0 |
| Mysqlx_connections_closed | 0 |
| Mysqlx_connections_rejected | 0 |
| Performance_schema_session_connect_attrs_longest_seen | 117 |
| Performance_schema_session_connect_attrs_lost | 0 |
| Ssl_client_connects | 0 |
| Ssl_connect_renegotiates | 0 |
| Ssl_finished_connects | 0 |
| Threads_connected | 97 |
+-------------------------------------------------------+---------------------+
AND
mysql> SHOW STATUS WHERE variable_name LIKE "Threads_%" OR variable_name = "Connections";
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Connections | 3896 |
| Threads_cached | 8 |
| Threads_connected | 97 |
| Threads_created | 365 |
| Threads_running | 2 |
+-------------------+-------+
AND
mysql> SHOW VARIABLES LIKE 'max_connections';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 500 |
+-----------------+-------+
1 row in set (0.01 sec)
You can use sessions instead of connections
my_session = Session(engine)
results = my_session.execute(query)
my_session.close()
and when creating the engine you can set
pool_recycle=60.
(or a little higher). https://docs.sqlalchemy.org/en/13/core/pooling.html#pool-setting-recycle
Not saying this will solve your issues entirely, but I've rarely encountered problems using this setup.

How to join two tables in PySpark with two conditions in an optimal way

I have the following two tables in PySpark:
Table A - dfA
| ip_4 | ip |
|---------------|--------------|
| 10.10.10.25 | 168430105 |
| 10.11.25.60 | 168499516 |
And table B - dfB
| net_cidr | net_ip_first_4 | net_ip_last_4 | net_ip_first | net_ip_last |
|---------------|----------------|----------------|--------------|-------------|
| 10.10.10.0/24 | 10.10.10.0 | 10.10.10.255 | 168430080 | 168430335 |
| 10.10.11.0/24 | 10.10.11.0 | 10.10.11.255 | 168430336 | 168430591 |
| 10.11.0.0/16 | 10.11.0.0 | 10.11.255.255 | 168493056 | 168558591 |
I have joined both tables in PySpark using the following command:
dfJoined = dfB.alias('b').join(F.broadcast(dfA).alias('a'),
(F.col('a.ip') >= F.col('b.net_ip_first'))&
(F.col('a.ip') <= F.col('b.net_ip_last')),
how='right').select('a.*, b.*)
So I obtain:
| ip | net_cidr | net_ip_first_4 | net_ip_last_4| ...
|---------------|---------------|----------------|--------------| ...
| 10.10.10.25 | 10.10.10.0/24 | 10.10.10.0 | 10.10.10.255 | ...
| 10.11.25.60 | 10.10.11.0/24 | 10.10.11.0 | 10.10.11.255 | ...
The size of the tables makes this option not optimal due to the 2 conditions, I had thought of sorting table B so that it only implies one join condition.
Is there any way to limit the join and take only the first record that matches the join condition? Or some way to make the join in an optimal way?
Table A (number of records) << Table B (number of records)
Thank you!

MySQL query combining several tables

Background
In order to obtain data for my thesis I have to work with a large, fairly
complicated MySQL database, containing several tables and hundreds of GBs of
data. Unfortunately, I am new to SQL, and can't really figure out how to
extract the data that I need.
Database
The database consists of several tables that I want to combine. Here are the
relevant parts of it:
> show tables;
+---------------------------+
| Tables_in_database |
+---------------------------+
| Build |
| Build_has_ModuleRevisions |
| Configuration |
| ModuleRevisions |
| Modules |
| Product |
| TestCase |
| TestCaseResult |
+---------------------------+
The tables are linked together in the following manner
Product ---(1:n)--> Configurations ---(1:n)--> Build
Build ---(1:n)--> Build_has_ModuleRevisions ---(n:1)--> ModuleRevision ---(n:1)--> Modules
Build ---(1:n)--> TestCaseResult ---(n:1)--> TestCase
The contents of the tables are
> describe Product;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | UNI | NULL | |
+---------+--------------+------+-----+---------+----------------+
> describe Configuration;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Product_id | int(11) | YES | MUL | NULL | |
| name | varchar(255) | NO | UNI | NULL | |
+------------+--------------+------+-----+---------+----------------+
> describe Build;
+------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Configuration_id | int(11) | NO | MUL | NULL | |
| build_number | int(11) | NO | MUL | NULL | |
| build_id | varchar(32) | NO | MUL | NULL | |
| test_status | varchar(255) | NO | | | |
| start_time | datetime | YES | MUL | NULL | |
| end_time | datetime | YES | MUL | NULL | |
+------------------+--------------+------+-----+---------+----------------+
> describe Build_has_ModuleRevisions;
+-------------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Build_id | int(11) | NO | MUL | NULL | |
| ModuleRevision_id | int(11) | NO | MUL | NULL | |
+-------------------+----------+------+-----+---------+----------------+
> describe ModuleRevisions;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Module_id | int(11) | NO | MUL | NULL | |
| tag | varchar(255) | NO | MUL | | |
| revision | varchar(255) | NO | MUL | | |
+-----------+--------------+------+-----+---------+----------------+
> describe Modules;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | UNI | NULL | |
+---------+--------------+------+-----+---------+----------------+
> describe TestCase;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| TestSuite_id | int(11) | NO | MUL | NULL | |
| classname | varchar(255) | NO | MUL | NULL | |
| name | varchar(255) | NO | MUL | NULL | |
| testtype | varchar(255) | NO | MUL | NULL | |
+--------------+--------------+------+-----+---------+----------------+
> describe TestCaseResult;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Build_id | int(11) | NO | MUL | NULL | |
| TestCase_id | int(11) | NO | MUL | NULL | |
| status | varchar(255) | NO | MUL | NULL | |
| start_time | datetime | YES | MUL | NULL | |
| end_time | datetime | YES | MUL | NULL | |
+-------------+--------------+------+-----+---------+----------------+
As you can see the tables are linked with *_id fields. E.g. TestCaseResult
is linked to a Build by the Build_id field, and to a TestCase by the
TestCase_id field.
Problem Desciption
Now to my problem. Given a specific Configuration.name and Product.name as
input, I need to find all modules+revisions and failed testcases, for every
Build, sorted by Build.start_time.
What I Have Tried
The following query gives me all the Builds given a Configuration.name of
config1 and a Product.name of product1
SELECT
*
FROM
`database`.`Build` AS b
JOIN
Configuration AS c ON c.id = b.Configuration_id
JOIN
Product as p ON p.id = c.Product_id
WHERE
c.name = 'config1'
AND p.name = 'product1'
ORDER BY b.start_time;
This doesn't even solve half of my problem, though. Now, for every build I
need to
Find all Modules linked to the Build
Extract the Modules.name field
Extract the ModuleRevision.revision field
Find all TestCases linked to the Build
Where TestCaseResult.status = 'failure'
Extract the TestCase.name field linked to the TestCaseResult
Associate the Build with the extracted module name+revisions and testcase
names
Present the data ordered by Build.start_time so that I can perform
analyses on it.
In other words, of all the data available, I am only interested in linking the
fields Modules.name, ModuleRevision.revision, TestCaseResult.status, and
TestCaseResult.name to a particular Build, order this by Build.start_time
and then output this to a Python program I have written.
The end result should be something similar to
Build Build.start_time Modules+Revisions Failed tests
1 20140301 [(mod1, rev1), (mod2... etc] [test1, test2, ...]
2 20140401 [(mod1, rev2), (mod2... etc] [test1, test2, ...]
3 20140402 [(mod3, rev1), (mod2... etc] [test1, test2, ...]
4 20140403 [(mod1, rev3), (mod2... etc] [test1, test2, ...]
5 20140505 [(mod5, rev2), (mod2... etc] [test1, test2, ...]
My question
Is there a good (and preferrably efficient) SQL query that can extract and
present the data that I need?
If not, I am totally okay with extracting one or several supersets/subsets of
the data in order to parse it with Python if necessary. But how do I extract
the desired data?
It looks to me like you'd need more than one query for this. The problem is that the relationships of Build <-> ModuleRevision and Build <- TestCaseResult are basically independent. ModuleRevisions and TestCaseResults don't really have anything to do with each other as far as the schema is concerned. You have to query for one and then the other. You can't get them both in one query because because each row in your results basically represents one record of the "deepest" related table (in this case, either ModuleRevision or TestCaseResult) including any related information from its parent tables. Therefore, I think you'd need something like the following:
SELECT
M.name, MR.revision, B.id
FROM
ModuleRevisions MR
INNER JOIN
Modules M ON MR.Module_id = M.id
INNER JOIN
Build_has_ModuleRevisions BHMR ON MR.id = BHMR.ModuleRevision_id
INNER JOIN
Build B ON BHMR.Build_id = B.id
INNER JOIN
Configuration C ON B.Configuration_id = C.id
INNER JOIN
Product P ON C.Product_id = P.id
WHERE C.name = 'config1' AND P.name = 'product1'
ORDER BY B.start_time;
SELECT
TCR.status, TC.name, B.id
FROM
TestCaseResult TCR
INNER JOIN
TestCase TC ON TCR.TestCase_id = TC.id
INNER JOIN
Build B ON TCR.Build_id = B.id
INNER JOIN
Configuration C ON B.Configuration_id = C.id
INNER JOIN
Product P ON C.Product_id = P.id
WHERE C.name = 'config1' AND P.name = 'product1' and TCR.status = 'failure'
ORDER BY B.start_time;

MySQL Encoding 4 byte in 3 byte utf-8 - Incorrect string value

According to the mysql documentation which supports only up to 3 byte utf-8 unicode encoding.
My question is, how can I replace characters that require 4 byte utf-8 encoding in my database? And how do I decode those characters in order to display exactly what the user wrote?
Part of the integration test:
description = u'baaam á ✓ ✌ ❤'
print description
test_convention = Blog.objects.create(title="test title",
description=description,
login=self.user,
tag=self.tag)
Error:
Creating test database for alias 'default'...
baaam á ✓ ✌ ❤
E..
======================================================================
ERROR: test_post_blog (blogs.tests.PostTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/admin/Developer/project/pro/blogs/tests.py", line 64, in test_post_blog
tag=self.tag)
File "build/bdist.macosx-10.9-intel/egg/MySQLdb/cursors.py", line 201, in execute
self.errorhandler(self, exc, value)
File "build/bdist.macosx-10.9-intel/egg/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
DatabaseError: (1366, "Incorrect string value: '\\xE2\\x9C\\x93 \\xE2\\x9C...' for column 'description' at row 1")
----------------------------------------------------------------------
Ran 3 tests in 1.383s
FAILED (errors=1)
Destroying test database for alias 'default'...
Table's configuration:
+----------------------------------+--------+---------+-------------------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+----------+----------------+---------+
| Name | Engine | Version | Collation | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Checksum | Create_options | Comment |
+----------------------------------+--------+---------+-------------------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+----------+----------------+---------+
| blogs_blog | InnoDB | 10 | utf8_general_ci | Compact | 25 | 1966 | 49152 | 0 | 32768 | 0 | 35 | 2014-02-09 00:57:59 | NULL | NULL | NULL | | |
+----------------------------------+--------+---------+-------------------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+----------+----------------+---------+
Update: I already changed the table and column configurations from utf-8 to utf8mb4 and still getting the same error, any ideas?
+----------------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+----------------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| blogs_blog | InnoDB | 10 | Compact | 5 | 3276 | 16384 | 0 | 32768 | 0 | 36 | 2014-02-17 22:24:18 | NULL | NULL | utf8mb4_general_ci | NULL | | |
+----------------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
and:
+---------------+--------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+---------------+--------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+
| id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| title | varchar(500) | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | |
| description | longtext | utf8mb4_general_ci | YES | | NULL | | select,insert,update,references | |
| creation_date | datetime | NULL | NO | | NULL | | select,insert,update,references | |
| login_id | int(11) | NULL | NO | MUL | NULL | | select,insert,update,references | |
| tag_id | int(11) | NULL | NO | MUL | NULL | | select,insert,update,references | |
+---------------+--------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+
It is supported, but not asutf8. Add the following to the [mysqld] section of my.cnf:
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
When creating a database, use:
CREATE DATABASE xxxxx DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci;
At the end of a CREATE TABLE command, add:
ENGINE=InnoDB ROW_FORMAT=COMPRESSED DEFAULT CHARSET=utf8mb4;

RDF/SKOS concept hierarchy as Python dictionary

In Python, how do I turn RDF/SKOS taxonomy data into a dictionary that represents the concept hierarchy only?
The dictionary must have this format:
{ 'term1': [ 'term2', 'term3'], 'term3': [{'term4' : ['term5', 'term6']}, 'term6']}
I tried using RDFLib with JSON plugins, but did not get the result I want.
I'm not much of a Python user, and I haven't worked with RDFLib, but I just pulled the SKOS and vocabulary from the SKOS vocabularies page. I wasn't sure what concepts (RDFS or OWL classes) were in the vocabulary, nor what their hierarchy was, so I ran this a SPARQL query using Jena's ARQ to select classes and their subclasses. I didn't get any results. (There were classes defined of course, but none had subclasses.) Then I decided to use both the SKOS and SKOS-XL vocabularies, and to ask for properties and subproperties as well as classes and subclasses. This is the SPARQL query I used:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?property ?subproperty ?class ?subclass WHERE {
{ ?subclass rdfs:subClassOf ?class }
UNION
{ ?subproperty rdfs:subPropertyOf ?property }
}
ORDER BY ?class ?property
The results I got were
-------------------------------------------------------------------------------------------------------------------
| property | subproperty | class | subclass |
===================================================================================================================
| rdfs:label | skos:altLabel | | |
| rdfs:label | skos:hiddenLabel | | |
| rdfs:label | skos:prefLabel | | |
| skos:broader | skos:broadMatch | | |
| skos:broaderTransitive | skos:broader | | |
| skos:closeMatch | skos:exactMatch | | |
| skos:inScheme | skos:topConceptOf | | |
| skos:mappingRelation | skos:broadMatch | | |
| skos:mappingRelation | skos:closeMatch | | |
| skos:mappingRelation | skos:narrowMatch | | |
| skos:mappingRelation | skos:relatedMatch | | |
| skos:narrower | skos:narrowMatch | | |
| skos:narrowerTransitive | skos:narrower | | |
| skos:note | skos:changeNote | | |
| skos:note | skos:definition | | |
| skos:note | skos:editorialNote | | |
| skos:note | skos:example | | |
| skos:note | skos:historyNote | | |
| skos:note | skos:scopeNote | | |
| skos:related | skos:relatedMatch | | |
| skos:semanticRelation | skos:broaderTransitive | | |
| skos:semanticRelation | skos:mappingRelation | | |
| skos:semanticRelation | skos:narrowerTransitive | | |
| skos:semanticRelation | skos:related | | |
| | | _:b0 | <http://www.w3.org/2008/05/skos-xl#Label> |
| | | skos:Collection | skos:OrderedCollection |
-------------------------------------------------------------------------------------------------------------------
It looks like there's not much concept hierarchy in SKOS at all. Could that explain why you didn't get the results you wanted before?

Categories

Resources