I have a database holding names, and I have to create a new list which will hold such values as ID, name, and gender and insert it in the current database. I have to create a list of the names which are not in the database yet. So I simply checked only 3 names and trying to work with them.
I am not sure what sort of list I suppose to create and how I can loop through it to insert all the new values in the proper way.
That's what I have so far:
mylist = [["Betty Beth", "1", "Female"], ["John Cena", "2", "Male"]]
#get("/list_actors")
def list_actors():
with connection.cursor() as cursor:
sql = "INSERT INTO imdb VALUES (mylist)"
cursor.execute(sql)
connection.commit()
return "done"
I am very new to this material so I will appreciate any help. Thanks in advance!
vals = [["TEST1", 1], ["TEST2", 2]]
with connection.cursor() as cursor:
cursor.executemany("insert into test(prop, val) values (%s, %s)", vals )
connection.commit()
mysql> select * from test;
+----+-------+------+---------------------+
| id | prop | val | ts |
+----+-------+------+---------------------+
| 1 | TEST1 | 1 | 2017-05-19 09:46:16 |
| 2 | TEST2 | 2 | 2017-05-19 09:46:16 |
+----+-------+------+---------------------+
Adapted from https://groups.google.com/forum/#!searchin/pymysql-users/insert%7Csort:relevance/pymysql-users/4_D8bYusodc/EHFxjRh89XEJ
Related
ive written some code to parse a website, and input it into a mysql db.
The problem is I am getting a lot of duplicates per FKToTech_id like:
id | ref | FKToTech_id |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+
| 1 | website.com/path | 1 |
| 2 | website.com/path | 1 |
| 3 | website.com/path | 1
What Im looking for is instead to have (1) row in this database, based on if ref has been entered already for FKToTech_id and not have multiple of the same row like:
id | ref | FKToTech_id |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+
| 1 | website.com/path | 1 |
How can I modify my code below to just python pass if the above is True (==1 ref with same FKToTech_id?
for i in elms:
allcves = {cursor.execute("INSERT INTO TechBooks (ref, FKToTech_id) VALUES (%s, %s) ", (i.attrs["href"], row[1])) for row in cves}
mydb.commit()
Thanks
Make ref a unique column, then use INSERT IGNORE to skip the insert if it would cause a duplicate key error.
ALTER TABLE TechBooks ADD UNIQUE INDEX (ref);
for i in elms:
cursor.executemany("INSERT IGNORE INTO TechBooks (ref, FKToTech_id) VALUES (%s, %s) ", [(i.attrs["href"], row[1]) for row in cves])
mydb.commit()
I'm not sure what your intent was by assigning the results of cursor.execute() to allcves. cursor.execute() doesn't return a value unless you use multi=True. I've replaced the useless set comprehension with use of cursor.executemany() to insert many rows at once.
I have a table which looks like
mysql> select * from statements;
+----+----------------+--------+---------+
| id | account_number | amount | type |
+----+----------------+--------+---------+
| 1 | 1 | 1000 | Deposit |
| 2 | 1 | 500 | Fees |
+----+----------------+--------+---------+
2 rows in set (0.00 sec)
I have a PyMySQL connector through which I want to execute a query select * from statements where type in ('Deposit', 'Fees')
My question is different from possible duplicates as it asks particularly for "IN" type of queries, where list sizes can be dynamic and are slightly difficult to write than the usual %s hardcoded select * from statements where type in (%s, %s) type of queries.
I am wondering how to exactly write the query in a way that it is parameterized and relatively safe from SQL injection. My current code snippet is as follows:
import pymysql
connection = pymysql.connect('''SQL DB credentials''')
cur = connection.cursor()
l = ['Deposit', 'Fees']
st = f'select * from statements where type in (' + ','.join(['%s'] * len(l)) + ')'
cur.execute(st, l)
cur.fetchall()
Result:
((1, 1, 1000, 'Deposit'), (2, 1, 500, 'Fees'))
My question is, is this SQL statement parameterized well as safe from basic SQL injection?
Situation
I am using Python 3.7.2 with its built-in sqlite3 module. (sqlite3.version == 2.6.0)
I have a sqlite database that looks like:
| user_id | action | timestamp |
| ------- | ------ | ---------- |
| Alice | 0 | 1551683796 |
| Alice | 23 | 1551683797 |
| James | 1 | 1551683798 |
| ....... | ...... | .......... |
where user_id is TEXT, action is an arbitary INTEGER, and timestamp is an INTEGER representing UNIX time.
The database has 200M rows, and there are 70K distinct user_ids.
Goal
I need to make a Python dictionary that looks like:
{
"Alice":[(0, 1551683796), (23, 1551683797)],
"James":[(1, 1551683798)],
...
}
that has user_ids as keys and respective event logs as values, which are lists of tuples (action, timestamp). Hopefully each list will be sorted by timestamp in increasing order, but even if it isn't, I think it can be easily achieved by sorting each list after a dictionary is made.
Effort
I have the following code to query the database. It first queries for the list of users (with user_list_cursor), and then query for all rows belonging to the user.
import sqlite3
connection = sqlite3.connect("database.db")
user_list_cursor = connection.cursor()
user_list_cursor.execute("SELECT DISTINCT user_id FROM EVENT_LOG")
user_id = user_list_cursor.fetchone()
classified_log = {}
log_cursor = connection.cursor()
while user_id:
user_id = user_id[0] # cursor.fetchone() returns a tuple
query = (
"SELECT action, timestamp"
" FROM TABLE"
" WHERE user_id = ?"
" ORDER BY timestamp ASC"
)
parameters = (user_id,)
local_cursor.execute(query, parameters) # Here is the bottleneck
classified_log[user_id] = list()
for row in local_cursor.fetchall():
classified_log[user_id].append(row)
user_id = user_list_cursor.fetchone()
Problem
The query execution for each user is too slow. That single line of code (which is commented as bottleneck) takes around 10 seconds for each user_id. I think I am making a wrong approach with the queries. What is the right way to achieve the goal?
I tried searching with keywords "classify db by a column", "classify sql by a column", "sql log to dictionary python", but nothing seems to match my situation. I think this wouldn't be a rare need, so maybe I'm missing the right keyword to search with.
Reproducibility
If anyone is willing to reproduce the situation with a 200M row sqlite database, the following code will create a 5GB database file.
But I hope there is somebody who is familiar with such a situation and knows how to write the right query.
import sqlite3
import random
connection = sqlite3.connect("tmp.db")
cursor = connection.cursor()
cursor.execute(
"CREATE TABLE IF NOT EXISTS EVENT_LOG (user_id TEXT, action INTEGER, timestamp INTEGER)"
)
query = "INSERT INTO EVENT_LOG VALUES (?, ?, ?)"
parameters = []
for timestamp in range(200_000_000):
user_id = f"user{random.randint(0, 70000)}"
action = random.randint(0, 1_000_000)
parameters.append((user_id, action, timestamp))
cursor.executemany(query, parameters)
connection.commit()
cursor.close()
connection.close()
Big thanks to #Strawberry and #Solarflare for their help given in comments.
The following solution achieved more than 70X performance increase, so I'm leaving what I did as an answer for completeness' sake.
I used indices and queried for the whole table, as they suggested.
import sqlite3
from operators import attrgetter
connection = sqlite3.connect("database.db")
# Creating index, thanks to #Solarflare
cursor = connection.cursor()
cursor.execute("CREATE INDEX IF NOT EXISTS idx_user_id ON EVENT_LOG (user_id)")
cursor.commit()
# Reading the whole table, then make lists by user_id. Thanks to #Strawberry
cursor.execute("SELECT user_id, action, timestamp FROM EVENT_LOG ORDER BY user_id ASC")
previous_user_id = None
log_per_user = list()
classified_log = dict()
for row in cursor:
user_id, action, timestamp = row
if user_id != previous_user_id:
if previous_user_id:
log_per_user.sort(key=itemgetter(1))
classified_log[previous_user_id] = log_per_user[:]
log_per_user = list()
log_per_user.append((action, timestamp))
previous_user_id = user_id
So the points are
Indexing by user_id to make ORDER BY user_id ASC execute in acceptable time.
Reading the whole table, then classify by user_id, instead of making individual queries for each user_id.
Iterating over cursor to read row by row, instead of cursor.fetchall().
I'm using Python and SQLite to manipulate a database.
I have a SQLite table Movies in database Data that looks like this:
| ID | Country
+----------------+-------------
| 1 | USA, Germany, Mexico
| 2 | Brazil, Peru
| 3 | Peru
I have a table Countries in the same database that looks like this
| ID | Country
+----------------+-------------
| 1 | USA
| 1 | Germany
| 1 | Mexico
| 2 | Brazil
| 2 | Peru
| 3 | Peru
I want to insert from database Data all movies from Peru into a new database PeruData that looks like this
| ID | Country
+----------------+-------------
| 2 | Peru
| 3 | Peru
I'm new to SQL and having trouble programming the right query.
Here's my attempt:
con = sqlite3.connect("PeruData.db")
cur = con.cursor()
cur.execute("CREATE TABLE Movies (ID, Country);")
cur.execute("ATTACH DATABASE 'Data.db' AS other;")
cur.execute("\
INSERT INTO Movies \
(ID, Country) \
SELECT ID, Country
FROM other.Movies CROSS JOIN other.Countries\
WHERE other.Movies.ID = other.Countries.ID AND other.Countries.Country = 'Peru'\
con.commit()
con.close()
Clearly, I'm doing something wrong because I get the error
sqlite3.OperationalError: no such table: other.Countries
Here's a workaround which successfully got the result you wanted.
Instead of having to write con = sqlite3.connect("data.db") and then having to write con.commit() and con.close(), you can shorten your code to written like this:
with sqlite3.connect("Data.db") as connection:
c = connection.cursor()
This way you won't have to commit the changes and close each time you're working with a database. Just a nifty shortcut I learned. Now onto your code...
Personally, I'm unfamiliar with the SQL statement ATTACH DATABASE. I would incorporate your new database at the end of your program instead that way you can avoid any conflicts that you aren't knowledgeable of handling (such as your OperationalError given). So first I would begin to get the desired result and then insert that into your new table. Your third execution statement can be rewritten like so:
c.execute("""SELECT DISTINCT Movies.ID, Countries.Country
FROM Movies
CROSS JOIN Countries
WHERE Movies.ID = Countries.ID AND Countries.Country = 'Peru'
""")
This does the job, but you need to use fetchall() to return your result set in a list of tuples which can then be inserted into your new table. So you'd type this:
rows = c.fetchall()
Now you can open a new connection by creating the "PeruData.db" database, creating the table, and inserting the values.
with sqlite3.connect("PeruData.db") as connection:
c = connection.cursor()
c.execute("CREATE TABLE Movies (ID INT, Country TEXT)")
c.executemany("INSERT INTO Movies VALUES(?, ?)", rows)
That's it.
Hope I was able to answer your question!
The current error is probably caused by a typo or other minor problem. I could create the databases described here and successfully do the insert after fixing minor errors: a missing backslash at an end of line and adding qualifiers for the selected columns.
But I also advise your to use aliases for the tables in multi-table selects. The code that works in my test is:
cur.execute("\
INSERT INTO Movies \
(ID, Country) \
SELECT m.ID, c.Country\
FROM other.Movies m CROSS JOIN other.Countries c \
WHERE m.ID = c.ID AND c.Country = 'Peru'")
Ok, first of all: I am quite new to PostgreSQL and programming in general.
So I have two tables. One table (cars) is:
id | brand | model | price
----+---------+-------+-------
1 | Opel | Astra | 12000
2 | Citroen | C1 | 12000
3 | Citroen | C2 | 15000
4 | Citroen | C3 | 18000
5 | Audi | A3 | 20000
And the other is:
id | brand | max_price
----+---------+-----------
4 | Opel |
5 | Citroen |
6 | Audi |
What I would like to do is, make a selection on cars so that I have the max price grouped by brand and then I would like to insert the price to the correspondent brand in max price.
I tried to use python and this is what I have done:
cur = conn.cursor()
cur.execute ("""DROP TABLE IF EXISTS temp """)
cur.execute ("""CREATE TABLE temp (brand text, max_price integer)""")
conn.commit()
cur.execute ("""SELECT cars.brand, MAX(cars.price) FROM cars GROUP BY brand;""")
results = cur.fetchall()
for results in results:
cur.execute ("""INSERT INTO temp (brand, max_price) VALUES %s""" % str(results))
conn.commit()
cur.execute ("""UPDATE max_price SET max_price.max_price=temp.max_price WHERE max_price.brand = temp.brand;""")
conn.commit()
It gets stuck in the update part, signalling an error max_price.brand = temp.brand
Can anybody help me?
EDIT: thanks to the suggestion of domino I changed the last line with cur.execute ("""UPDATE max_price SET max_price.max_price=temp.max_price_int from temp WHERE max_price.brand = temp.brand;""") Now I have the problem that temp.max_price is a recognised not as an integer but as a tuple. So, to solve the problem I tried to add before this last line the following code:
for results in results:
results =results[0]
results = int(results)
cur.execute ("""INSERT INTO temp (max_price_int) VALUES %s""" % str(results))
conn.commit()
It gives me an error
cur.execute ("""INSERT INTO temp (max_price_int) VALUES %s""" % str(results))
psycopg2.ProgrammingError: syntax error at or near "12000"
LINE 1: INSERT INTO temp (max_price_int) VALUES 12000
12000 is exactly the first value I want it to insert!
When using cur.execute, you should never use the % operator. It opens up your queries to SQL injection attacks.
Instead, use the built-in query parameterization like so:
cur.execute ("""INSERT INTO temp (max_price_int) VALUES (%s)""",(results,))
See documentation here: http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
A different approach would be to use SQL to do your update in a single query using the with clauses. The single query would look like this:
with max (brand, max_price) as (
select brand, max(price) from cars
group by brand
)
update max_price
set max_price = max.max_price
from max
where max_price.brand = max.brand
;
Read more about Common Table Expressions (CTEs) here: https://www.postgresql.org/docs/9.5/static/queries-with.html