Psycopg2 Insert into with conditions - python

I have this table in Postgres
ID| | IP | Remote-as | IRR-Record |
1 | | 192.168.1.1 |100 | |
2 | | 192.168.2.1 |200 | |
3 | | 192.168.3.1 |300 | |
4 | | 192.168.4.1 |400 | |
I want to add for each ip address the IIR-Record.
The IIR-record is inside a variable.
c = conn.cursor()
query = 'select * From "peers"'
c.execute(query)
for row in c:
c.execute('''INSERT INTO "peers" ("IRR-
Record") VALUES(variable)
conn.commit()
This code doesn't work because i gat the IIR-Record at the end of my table.
ID| | IP | Remote-as | IRR-Record |
1 | | 192.168.1.1 |100 | |
2 | | 192.168.2.1 |200 | |
3 | | 192.168.3.1 |300 | |
4 | | 192.168.4.1 |400 |
|Variable
|Variable
|Varibale
any Idea!!!!

You need to use an UPDATE query instead of INSERT
UPDATE peers
SET IRR-Record = <<MYVALUE>>
WHERE ID = <<MYID>>

I think it's something like this:
for row in c:
c.execute('UPDATE peers SET "IRR-Record"=%s WHERE ID=%s', (record_var, id_var))
Edit: Use UPDATE sql statement as per #Devasta's answer.

Related

Unstack (pivot?) dataframe in Pandas

I have a dataframe somewhat like this:
ID | Relationship | First Name | Last Name | DOB | Address | Phone
0 | 2 | Self | Vegeta | Saiyan | 01/01/1949 | Saiyan Planet | 123-456-7891
1 | 2 | Spouse | Bulma | Saiyan | 04/20/1969 | Saiyan Planet | 123-456-7891
2 | 3 | Self | Krilin | Human | 08/21/1992 | Planet Earth | 789-456-4321
3 | 4 | Self | Goku | Kakarot | 05/04/1975 | Planet Earth | 321-654-9870
4 | 4 | Child | Gohan | Kakarot | 04/02/2001 | Planet Earth | 321-654-9870
5 | 5 | Self | Freezer | Fridge | 09/15/1955 | Deep Space | 456-788-9568
I'm looking to have the rows with same ID appended to the right of the first row with that ID.
Example:
ID | Relationship | First Name | Last Name | DOB | Address | Phone | Spouse_First Name | Spouse_Last Name | Spouse_DOB | Child_First Name | Child_Last Name | Child_DOB |
0 | 2 | Self | Vegeta | Saiyan | 01/01/1949 | Saiyan Planet | 123-456-7891 | Bulma | Saiyan | 04/20/1969 | | |
1 | 3 | Self | Krilin | Human | 08/21/1992 | Planet Earth | 789-456-4321 | | | | | |
2 | 4 | Self | Goku | Kakarot | 05/04/1975 | Planet Earth | 321-654-9870 | | | | Gohan | Kakarot | 04/02/2001 |
3 | 5 | Self | Freezer | Fridge | 09/15/1955 | Deep Space | 456-788-9568 | | | | | |
My real scenario dataframe has more columns, but they all have the same information when the two rows share the same ID, so no need to duplicate those in the other rows. I only need to add to the right the columns that I choose, which in this case would be First Name, Last Name, DOB with the identifier for the new column label depending on what's on the 'Relationship' column (I can rename them later if it's not possible to do in a straight way, just wanted to illustrate my point.
Now that I've said this, I want to add that I have tried different ways and seems like approaching with unstack or pivot is the way to go but I have not been successful in making it work.
Any help would be greatly appreciated.
This solution assumes that the DataFrame is indexed by the ID column.
not_self = (
df.query("Relationship != 'Self'")
.pivot(columns='Relationship')
.swaplevel(axis=1)
.reindex(
pd.MultiIndex.from_product(
(
set(df['Relationship'].unique()) - {'Self'},
df.columns.to_series().drop('Relationship')
)
),
axis=1
)
)
not_self.columns = [' '.join((a, b)) for a, b in not_self.columns]
result = df.query("Relationship == 'Self'").join(not_self)
Please let me know if this is not what was wanted.

How to enable connection pooling in Flask-SQLAlchemy?

I'm using Flask-SqlAlchemy with a MySQL database. And recently I started getting this error:
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1040, 'Too many connections')
After some digging, it seems that I'm not using the connection pooling.
Based on my research SQLALCHEMY_POOL_SIZE is now deprecated in v2.4 and shouldn't be used anymore.
SQLAlchemy==1.3.7
Flask-SQLAlchemy==2.4.1
So what is the correct way of doing it?
Config:
SQLALCHEMY_DATABASE_URI = 'mysql://root:PASSWORD#localhost/main_db'
SQLALCHEMY_BINDS = {
'radius': 'mysql://root:PASSWORD#localhost/radius_db',
'cache': 'mysql://root:PASSWORD#localhost/cache_db',
}
Code:
def make_app():
app = Flask(__name__, template_folder="../templates")
app.config.from_object(config)
db.init_app(app)
app = my_fabric.make_app()
According to SqlAlchemy, I'm supposed to do this via create_engine
engine = create_engine("mysql+pymysql://user:pw#host/db", pool_size=20, max_overflow=0))
But Flask-SqlAlchemy is supposed to abstract this out. So I don't know how this should be configured.
UPDATE:
I'm running uWSGI with two processes.
I have now increased the max_connections to 500. It's hard to say if I have high traffic, but my database statistics after 16 hours showing this:
mysql> show status like '%onn%';
+-------------------------------------------------------+---------------------+
| Variable_name | Value |
+-------------------------------------------------------+---------------------+
| Aborted_connects | 5 |
| Connection_errors_accept | 0 |
| Connection_errors_internal | 0 |
| Connection_errors_max_connections | 0 |
| Connection_errors_peer_address | 0 |
| Connection_errors_select | 0 |
| Connection_errors_tcpwrap | 0 |
| Connections | 3897 |
| Locked_connects | 0 |
| Max_used_connections | 167 |
| Max_used_connections_time | 2019-11-29 00:11:51 |
| Mysqlx_connection_accept_errors | 0 |
| Mysqlx_connection_errors | 0 |
| Mysqlx_connections_accepted | 0 |
| Mysqlx_connections_closed | 0 |
| Mysqlx_connections_rejected | 0 |
| Performance_schema_session_connect_attrs_longest_seen | 117 |
| Performance_schema_session_connect_attrs_lost | 0 |
| Ssl_client_connects | 0 |
| Ssl_connect_renegotiates | 0 |
| Ssl_finished_connects | 0 |
| Threads_connected | 97 |
+-------------------------------------------------------+---------------------+
AND
mysql> SHOW STATUS WHERE variable_name LIKE "Threads_%" OR variable_name = "Connections";
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Connections | 3896 |
| Threads_cached | 8 |
| Threads_connected | 97 |
| Threads_created | 365 |
| Threads_running | 2 |
+-------------------+-------+
AND
mysql> SHOW VARIABLES LIKE 'max_connections';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 500 |
+-----------------+-------+
1 row in set (0.01 sec)
You can use sessions instead of connections
my_session = Session(engine)
results = my_session.execute(query)
my_session.close()
and when creating the engine you can set
pool_recycle=60.
(or a little higher). https://docs.sqlalchemy.org/en/13/core/pooling.html#pool-setting-recycle
Not saying this will solve your issues entirely, but I've rarely encountered problems using this setup.

How to join two tables in PySpark with two conditions in an optimal way

I have the following two tables in PySpark:
Table A - dfA
| ip_4 | ip |
|---------------|--------------|
| 10.10.10.25 | 168430105 |
| 10.11.25.60 | 168499516 |
And table B - dfB
| net_cidr | net_ip_first_4 | net_ip_last_4 | net_ip_first | net_ip_last |
|---------------|----------------|----------------|--------------|-------------|
| 10.10.10.0/24 | 10.10.10.0 | 10.10.10.255 | 168430080 | 168430335 |
| 10.10.11.0/24 | 10.10.11.0 | 10.10.11.255 | 168430336 | 168430591 |
| 10.11.0.0/16 | 10.11.0.0 | 10.11.255.255 | 168493056 | 168558591 |
I have joined both tables in PySpark using the following command:
dfJoined = dfB.alias('b').join(F.broadcast(dfA).alias('a'),
(F.col('a.ip') >= F.col('b.net_ip_first'))&
(F.col('a.ip') <= F.col('b.net_ip_last')),
how='right').select('a.*, b.*)
So I obtain:
| ip | net_cidr | net_ip_first_4 | net_ip_last_4| ...
|---------------|---------------|----------------|--------------| ...
| 10.10.10.25 | 10.10.10.0/24 | 10.10.10.0 | 10.10.10.255 | ...
| 10.11.25.60 | 10.10.11.0/24 | 10.10.11.0 | 10.10.11.255 | ...
The size of the tables makes this option not optimal due to the 2 conditions, I had thought of sorting table B so that it only implies one join condition.
Is there any way to limit the join and take only the first record that matches the join condition? Or some way to make the join in an optimal way?
Table A (number of records) << Table B (number of records)
Thank you!

SQLAlchemy - pretty print SQL query results

In Ruby console, it is possible to display SQL query results in a very human-friendly way (ActiveRecord + Hirb):
>> Tag.all :limit=>3, :order=>"id DESC"
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+
| id | created_at | description | name | namespace | predicate | value |
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+
| 907 | 2009-03-06 21:10:41 UTC | | gem:tags=yaml | gem | tags | yaml |
| 906 | 2009-03-06 08:47:04 UTC | | gem:tags=nomonkey | gem | tags | nomonkey |
| 905 | 2009-03-04 00:30:10 UTC | | article:tags=ruby | article | tags | ruby |
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+
3 rows in set
Is there a module that will allow me to do display SQLAlchemy result sets in a similar way in IPython?

MySQL query combining several tables

Background
In order to obtain data for my thesis I have to work with a large, fairly
complicated MySQL database, containing several tables and hundreds of GBs of
data. Unfortunately, I am new to SQL, and can't really figure out how to
extract the data that I need.
Database
The database consists of several tables that I want to combine. Here are the
relevant parts of it:
> show tables;
+---------------------------+
| Tables_in_database |
+---------------------------+
| Build |
| Build_has_ModuleRevisions |
| Configuration |
| ModuleRevisions |
| Modules |
| Product |
| TestCase |
| TestCaseResult |
+---------------------------+
The tables are linked together in the following manner
Product ---(1:n)--> Configurations ---(1:n)--> Build
Build ---(1:n)--> Build_has_ModuleRevisions ---(n:1)--> ModuleRevision ---(n:1)--> Modules
Build ---(1:n)--> TestCaseResult ---(n:1)--> TestCase
The contents of the tables are
> describe Product;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | UNI | NULL | |
+---------+--------------+------+-----+---------+----------------+
> describe Configuration;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Product_id | int(11) | YES | MUL | NULL | |
| name | varchar(255) | NO | UNI | NULL | |
+------------+--------------+------+-----+---------+----------------+
> describe Build;
+------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Configuration_id | int(11) | NO | MUL | NULL | |
| build_number | int(11) | NO | MUL | NULL | |
| build_id | varchar(32) | NO | MUL | NULL | |
| test_status | varchar(255) | NO | | | |
| start_time | datetime | YES | MUL | NULL | |
| end_time | datetime | YES | MUL | NULL | |
+------------------+--------------+------+-----+---------+----------------+
> describe Build_has_ModuleRevisions;
+-------------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Build_id | int(11) | NO | MUL | NULL | |
| ModuleRevision_id | int(11) | NO | MUL | NULL | |
+-------------------+----------+------+-----+---------+----------------+
> describe ModuleRevisions;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Module_id | int(11) | NO | MUL | NULL | |
| tag | varchar(255) | NO | MUL | | |
| revision | varchar(255) | NO | MUL | | |
+-----------+--------------+------+-----+---------+----------------+
> describe Modules;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | UNI | NULL | |
+---------+--------------+------+-----+---------+----------------+
> describe TestCase;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| TestSuite_id | int(11) | NO | MUL | NULL | |
| classname | varchar(255) | NO | MUL | NULL | |
| name | varchar(255) | NO | MUL | NULL | |
| testtype | varchar(255) | NO | MUL | NULL | |
+--------------+--------------+------+-----+---------+----------------+
> describe TestCaseResult;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Build_id | int(11) | NO | MUL | NULL | |
| TestCase_id | int(11) | NO | MUL | NULL | |
| status | varchar(255) | NO | MUL | NULL | |
| start_time | datetime | YES | MUL | NULL | |
| end_time | datetime | YES | MUL | NULL | |
+-------------+--------------+------+-----+---------+----------------+
As you can see the tables are linked with *_id fields. E.g. TestCaseResult
is linked to a Build by the Build_id field, and to a TestCase by the
TestCase_id field.
Problem Desciption
Now to my problem. Given a specific Configuration.name and Product.name as
input, I need to find all modules+revisions and failed testcases, for every
Build, sorted by Build.start_time.
What I Have Tried
The following query gives me all the Builds given a Configuration.name of
config1 and a Product.name of product1
SELECT
*
FROM
`database`.`Build` AS b
JOIN
Configuration AS c ON c.id = b.Configuration_id
JOIN
Product as p ON p.id = c.Product_id
WHERE
c.name = 'config1'
AND p.name = 'product1'
ORDER BY b.start_time;
This doesn't even solve half of my problem, though. Now, for every build I
need to
Find all Modules linked to the Build
Extract the Modules.name field
Extract the ModuleRevision.revision field
Find all TestCases linked to the Build
Where TestCaseResult.status = 'failure'
Extract the TestCase.name field linked to the TestCaseResult
Associate the Build with the extracted module name+revisions and testcase
names
Present the data ordered by Build.start_time so that I can perform
analyses on it.
In other words, of all the data available, I am only interested in linking the
fields Modules.name, ModuleRevision.revision, TestCaseResult.status, and
TestCaseResult.name to a particular Build, order this by Build.start_time
and then output this to a Python program I have written.
The end result should be something similar to
Build Build.start_time Modules+Revisions Failed tests
1 20140301 [(mod1, rev1), (mod2... etc] [test1, test2, ...]
2 20140401 [(mod1, rev2), (mod2... etc] [test1, test2, ...]
3 20140402 [(mod3, rev1), (mod2... etc] [test1, test2, ...]
4 20140403 [(mod1, rev3), (mod2... etc] [test1, test2, ...]
5 20140505 [(mod5, rev2), (mod2... etc] [test1, test2, ...]
My question
Is there a good (and preferrably efficient) SQL query that can extract and
present the data that I need?
If not, I am totally okay with extracting one or several supersets/subsets of
the data in order to parse it with Python if necessary. But how do I extract
the desired data?
It looks to me like you'd need more than one query for this. The problem is that the relationships of Build <-> ModuleRevision and Build <- TestCaseResult are basically independent. ModuleRevisions and TestCaseResults don't really have anything to do with each other as far as the schema is concerned. You have to query for one and then the other. You can't get them both in one query because because each row in your results basically represents one record of the "deepest" related table (in this case, either ModuleRevision or TestCaseResult) including any related information from its parent tables. Therefore, I think you'd need something like the following:
SELECT
M.name, MR.revision, B.id
FROM
ModuleRevisions MR
INNER JOIN
Modules M ON MR.Module_id = M.id
INNER JOIN
Build_has_ModuleRevisions BHMR ON MR.id = BHMR.ModuleRevision_id
INNER JOIN
Build B ON BHMR.Build_id = B.id
INNER JOIN
Configuration C ON B.Configuration_id = C.id
INNER JOIN
Product P ON C.Product_id = P.id
WHERE C.name = 'config1' AND P.name = 'product1'
ORDER BY B.start_time;
SELECT
TCR.status, TC.name, B.id
FROM
TestCaseResult TCR
INNER JOIN
TestCase TC ON TCR.TestCase_id = TC.id
INNER JOIN
Build B ON TCR.Build_id = B.id
INNER JOIN
Configuration C ON B.Configuration_id = C.id
INNER JOIN
Product P ON C.Product_id = P.id
WHERE C.name = 'config1' AND P.name = 'product1' and TCR.status = 'failure'
ORDER BY B.start_time;

Categories

Resources