Python MySQLdb select from and insert into another database - python

I have a table looks like this:
part min max unitPrice
A 1 9 10
A 10 99 5
B 1 9 11
B 10 99 6
...
I also have a production table that I need to insert the previous data into this production one.
When I do the select statement from one table and fetch the record, I have a hard time insert into another table.
Say
cursor_table1.execute('select part, min, max, unitPrice, now() from table1')
for row in cursor_table1.fetchall():
part, min, max, unitPrice, now = row
print part, min, max, unitPrice, now
The result turns out to be
'416570S39677N1043', 1L, 24L, 48.5, datetime.datetime(2018, 10, 8, 16, 33, 42)
I know Python smartly figured out the type of every column but I actually just want the raw content. So I can do something like this:
cursor_table1.execute('select part, min, max, unitPrice, now() from table1')
for row in cursor_table1.fetchall():
cursor_table2.execute('insert into table2 values ' + str(tuple(row)))
The question is how can simply do a select statement from one table and add it to another.
Let me know if I did not describe my question in a clear way and I can add extra info if you want.

It might be a bit late to answer this question, but I also had the same problem and landed in this page. Now, I happen to have found a different answer and figured that it might be helpful to share it with others who have the same problem.
I have two mysql servers, one on Raspberry Pi and another on a VPS and I had to sync data between these two by reading data on RPi and inserting into the VPS. I've done it the usual way by writing a loop and catching the records one by one and inserting them and it was really slow, it took about 2 minutes for 2000 datasets.
Now I solved this problem by using the executemany function. As for the data I obtained all tuples returned by the select using the fetchall function.
rows = x.fetchall()
y.executemany("insert into table2 (f1, f2, f3) values (%s,%s,%s);", rows)
And it was super fast 😀, it took about 2 seconds for 5000 records.

If you wanted all of the data to pass through Python, you could do the following:
import datetime
cursor_table1.execute('SELECT part, min, max, unitPrice, NOW() from table1')
for row in cursor_table1.fetchall():
part, min, max, unitPrice, now = row
cursor_table2.execute("INSERT INTO table2 VALUES (%s,%s,%s,%s,'%s')" % (part, min, max, unitPrice, now.strftime('%Y-%m-%d %H:%M:%S') ))

If you don't need to make any calculation with the data selected from table1 and you are only inserting the data into the other table, then you can rely on mysql and run an insert ... select statement. So the query code would be like this:
cursor_table1.execute('insert into table2 (part, min, max, unitPrice, date) select part, min, max, unitPrice, now() from table1')
EDIT:
After knowing that the tables are in different servers, I would suggest to use executemany method to insert the data, as it would run faster.
First build a list of tuples containing all the data to be inserted and then run the executemany query

I expect that several answers here will give you trouble if you have more data than you do memory.
Maybe this doesn't count as solving the problem in python, but I do this:
from sh import bash
# ... omitted argparse and table listing ...
for table_name in tables_to_sync:
dump = 'mysqldump -h{host} -u{user} -p{password} {db} {table} '.format(
host=args.remote_host,
user=args.remote_user,
password=args.remote_password,
db=args.remote_database,
table=table_name,
)
flags = '--no-create-info --lock-tables=false --set-gtid-purged=OFF '
condition = '--where=\'id > {begin} and id <= {end}\' > {table}.sql '.format(
begin=begin,
end=end,
table=table_name
)
bash(['-c', dump + flags + condition])
load = 'mysql -u{user} -p{password} {db} < {table}.sql'.format(
user=args.local_user,
password=args.local_password,
db=args.local_database,
table=table_name
)
bash(['-c', load])
If you're worried about performance, you might consider cutting the middleman out entirely and using the federated storage engine--but that too would be a non-python approach.

Related

Fastest way to insert multiple values into a table from pyobdc

I'm using pyobdc with an application that requires me to insert >1000 rows normally which I currently do individually with pyobdc. Though this tends to take >30 minutes to finish. I was wondering if there are any faster methods that could do this < 1 minute. I know you can use multiple values in an insert commands but according to this (Multiple INSERT statements vs. single INSERT with multiple VALUES) it would possibly be even slower.
The code currently looks like this.
def Insert_X(X_info):
columns = ', '.join(X_info.keys())
placeholders = ', '.join('?' * len(X_info.keys()))
columns = columns.replace("'","")
values = [x for x in X_info.values()]
query_string = f"INSERT INTO X ({columns}) VALUES ({placeholders});"
with conn.cursor() as cursor:
cursor.execute(query_string,values)
With Insert_X being called >1000 times.

MySQL breaking column values by comma

I am querying MySQL database (fintech_16) through Python (pymysql) to get UNIQUE values of a column (Trend). When I ran the following query:
cursor.execute ("SELECT DISTINCT `Trend` FROM `fintech_16` ")
cursor.fetchall()
I got the following result:
((u'Investments',),
(u'Expansion',),
(u'New Products',),
(u'Collaboration',),
(u'New Products,Investments',),
(u'New Products,Expansion',),
(u'Expansion,Investments',),
(u'New Products,Collaboration',),
(u'Regulations',),
(u'Investments,New Products',),
(u'Investments,Expansion',),
(u'Collaboration,Investments',),
(u'Expansion,New Products',),
(u'Collaboration,New Products',))
Now. since some of the ids had more than one trend, the DB is counting them as a separate trend.
How should I tweak my query to get only the 5 trends (Investments, Expansion, New Products, Collaboration, Regulations) along with their counts?
Though these are only 5, I can use the LIKE %Investments% to get the count manually, but I want the code/query to do it.
TIA
First approach is to use SET data type to define exact values, you have 5 - ['New Products', 'Investments', 'Expansion'...].
Then you could use FIND_IN_SET function to count values you need:
SELECT COUNT(*) FROM `fintech_16` WHERE FIND_IN_SET('New Products', `Trend`) > 0;

sqlite - return all columns for max of one column without repeats

Im using Python to query a SQL database. I'm fairly new with databases. I've tried looking up this question, but I can't find a similar enough question to get the right answer.
I have a table with multiple columns/rows. I want to find the MAX of a single column, I want ALL columns returned (the entire ROW), and I want only one instance of the MAX. Right now I'm getting ten ROWS returned, because the MAX is repeated ten times. I only want one ROW returned.
The query strings I've tried so far:
sql = 'select max(f) from cbar'
# this returns one ROW, but only a single COLUMN (a single value)
sql = 'select * from cbar where f = (select max(f) from cbar)'
# this returns all COLUMNS, but it also returns multiple ROWS
I've tried a bunch more, but they returned nothing. They weren't right somehow. That's the problem, I'm too new to find the middle ground between my two working query statements.
In SQLite 3.7.11 or later, you can just retrieve all columns together with the maximum value:
SELECT *, max(f) FROM cbar;
But your Python might be too old. In the general case, you can sort the table by that column, and then just read the first row:
SELECT * FROM cbar ORDER BY f DESC LIMIT 1;

Add MySQL query results to R dataframe

I want to convert a MySQL query from a python script to an analogous query in R. The python uses a loop structure to search for specific values using genomic coordinates:
SQL = """SELECT value FROM %s FORCE INDEX (chrs) FORCE INDEX (sites)
WHERE `chrom` = %d AND `site` = %d""" % (Table, Chr, Start)
cur.execute(SQL)
In R the chromosomes and sites are in a dataframe and for every row in the dataframe I would like to extract a single value and add it to a new column in the dataframe
So my current dataframe has a similar structure to the following:
df <- data.frame("Chr"=c(1,1,3,5,5), "Site"=c(100, 200, 400, 100, 300))
The amended dataframe should have an additional column with values from the database (at corresponding genomic coordinates. The structure should be similar to:
df <- data.frame("Chr"=c(1,1,3,5,5), "Site"=c(100, 200, 400, 100, 300), "Value"=c(1.5, 0, 5, 60, 100)
So far I connected to the database using:
con <- dbConnect(MySQL(),
user="root", password="",
dbname="MyDataBase")
Rather than loop over each row in my dataframe, I would like to use something that would add the corresponding value to a new column in the existing dataframe.
Update with working solution based on answer below:
library(RMySQL)
con <- dbConnect(MySQL(),
user="root", password="",
dbname="MyDataBase")
GetValue <- function(DataFrame, Table){
queries <- sprintf("SELECT value as value
FROM %s FORCE INDEX (chrs) FORCE INDEX (sites)
WHERE chrom = %d AND site = %d UNION ALL SELECT 'NA' LIMIT 1", Table, DataFrame$Chr, DataFrame$start)
res <- ldply(queries, function(query) { dbGetQuery(con, query)})
DataFrame[, Table] <- res$value
return(DataFrame)
}
df <- GetValue(df, "TableName")
Maybe you could do something like this. First, build up your queries, then execute them, storing the results in a column of your dataframe. Not sure if the do.call(rbind part is necessary, but that basically takes a bunch of dataframe rows, and squishes them together by row into a dataframe.
queries=sprintf("SELECT value as value FROM %s FORCE INDEX (chrs) FORCE INDEX (sites) WHERE chrom = %d AND site = %d UNION ALL SELECT 0 LIMIT 1", "TableName", df$Chrom, df$Pos)
df$Value = do.call("rbind",sapply(queries, function(query) dbSendQuery(mydb, query)))$value
I played with your SQL a little, my concern with the original is with cases where it might return more than 1 row.
I like the data.table package for this kind of tasks as its syntax is inspired by SQL
require(data.table)
So an example database to match the values to a table
table <- data.table(chrom=rep(1:5, each=5),
site=rep(100*1:5, times=5),
Value=runif(5*5))
Now the SQL query can be translated into something like
# select from table, where chrom=Chr and site=Site, value
Chr <- 2
Site <- 200
table[chrom==Chr & site==Site, Value] # returns data.table
table[chrom==Chr & site==Site, ]$Value # returns numeric
Key (index) database for quick lookup (assuming unique chrom and site..)
setkey(table, chrom, site)
table[J(Chr, Site), ]$Value # very fast lookup due to indexed table
Your dataframe as data table with two columns 'Chr' and 'Site' both integer
df <- data.frame("Chr"=c(1,1,3,5,5), "Site"=c(100, 200, 400, 100, 300))
dt <- as.data.table(df) # adds data.table class to data.frame
setkey(dt, Chr, Site) # index for 'by' and for 'J' join
Match the values and append in new column (by reference, so no copying of table)
# loop over keys Chr and Site and find the match in the table
# select the Value column and create a new column that contains this
dt[, Value:=table[chrom==Chr & site==Site]$Value, by=list(Chr, Site)]
# faster:
dt[, Value:=table[J(Chr, Site)]$Value, by=list(Chr, Site)]
# fastest: in one table merge operation assuming the keys are in the same order
table[J(dt)]
kind greetings
Why don't you use the RMySQL or sqldf package?
With RMySQL, you get MySQL access in R.
With sqldf, you can issue SQL queries on R data structures.
Using either of those, you do not need to reword you SQL query to get the same results.
Let me also mention the data.table package, which lets you do very efficient selects and joins on your data frames after converting them to data tables using as.data.table(your.data.frame). Another good thing about it is that a data.table object is a data.frame at the same time, so all your functions that work on the data frames work on these converted objects, too.
You could easily use dplyr package. There is even nice vignette about that - http://cran.rstudio.com/web/packages/dplyr/vignettes/databases.html.
One thing you need to know is:
You can connect to MySQL and MariaDB (a recent fork of MySQL) through
src_mysql(), mediated by the RMySQL package. Like PostgreSQL, you'll
need to provide a dbname, username, password, host, and port.

python db insert

I am in facing a performance problem in my code.I am making db connection a making a select query and then inserting in a table.Around 500 rows in one select query ids populated .Before inserting i am running select query around 8-9 times first and then inserting then all using cursor.executemany.But it is taking 2 miuntes to insert which is not qood .Any idea
def insert1(id,state,cursor):
cursor.execute("select * from qwert where asd_id =%s",[id])
if sometcondition:
adding.append(rd[i])
cursor.executemany(indata, adding)
where rd[i] is a aray for records making and indata is a insert statement
#prog start here
cursor.execute("select * from assd")
for rows in cursor.fetchall()
if rows[1]=='aq':
insert1(row[1],row[2],cursor)
if rows[1]=='qw':
insert2(row[1],row[2],cursor)
I don't really understand why you're doing this.
It seems that you want to insert a subset of rows from "assd" into one table, and another subset into another table?
Why not just do it with two SQL statements, structured like this:
insert into tab1 select * from assd where asd_id = 42 and cond1 = 'set';
insert into tab2 select * from assd where asd_id = 42 and cond2 = 'set';
That'd dramatically reduce your number of roundtrips to the database and your client-server traffic. It'd also be an order of magnitude faster.
Of course, I'd also strongly recommend that you specify your column names in both the insert and select parts of the code.

Categories

Resources