Get multiple lists/data from MySQL StoredProcedure - python

I'm working in python with MySQL and want to get multiple lists of data from Stored Procedure.
i am using PyMySql to connect with my Database. And trying to do something like this but its not working
CREATE DEFINER=`root`#`%` PROCEDURE `spGetData`(IN ssscreenId INT(11),)
BEGIN
SELECT clientId, clientName FROM apl_cb.Client WHERE isActive = 1 AND isDeleted = 0;
SELECT bankId, bankName, bankAddress FROM apl_cb.Bank WHERE isDeleted = 0 AND isActive = 1;
SELECT eventId, eventName, eventGroup FROM apl_cb.Event WHERE isActive = 1 AND isDeleted = 0 AND menuEvent = 0 AND screenId = ssscreenId;
END
Any kind of help will be appreciated.
thanks

Have you tried using the nextset() method on your Python cursor to jump to the next result set? (python.org/dev/peps/pep-0249/#nextset)
If you have tried it, and it doesn't work, then you're out of luck; your Python program won't be able to retrieve multiple resultsets from a single stored procedure.
Keep in mind that MySQL stored procedures offer encapsulation advantages but not efficiency advantages. The time taken to issue three queries inside a stored procedure is the same as (or maybe a tiny bit higher than) the time of issuing them one after another from your Python program. Here's an explanation of part of that. http://www.joinfu.com/2010/05/mysql-stored-procedures-aint-all-that/

Related

Pymssql (or similar) function to return all available stored procedures

Our database engineer builds stored procedures for us to call from Python/R etc. These now number in the multiple 10s, over 2-3 databases.
I've got records of what 70% of them do, but I wondered if there was a quick/dirty way to query the database(s), and pull back a list of available stored procedures (given the supplied logins within the connection string.)
I typically use
sql = "
EXEC stored_proc_name
#param1 = 'xyz',
#param2 = 'abc'
"
cnxn = pymssql.connect(
host = r'ip.ad.dr.es.',
port = '1433',
user = r'db_user',
password = r'pdws',
database = 'db_name'
)
and then the wonderful pd.read_sql_query(sql, cnxn). This works fine - as long as i know the stored proc name and params required.
I've got most of them hard coded in to a module, but just thought i'd see if anyone knows of some in-built functionality to interrogate the database for this info - at least the available stored procedures, but also the required params too, if at all possible.
Ok this became quickly irrelevant for the initial issue, but just this week became relevant again.
What I was looking for was:
sql_SPs = """Select [NAME] from sysobjects where type = 'P' and category = 0"""
and when called as:
df_1 = pd.read_sql_query(sql_SPs, cnxn)
Returns a dataframe (fine in this instance) with the stored procedures listed

Optimize count query in Django

I have the following queries:
files = DataAvailability.objects.all()
type1 = files.filter(type1=True)
type2 = files.filter(type2=True)
# People information
people = Person.objects.all()
users = people.filter(is_subject=True)
# count information (this is taking a long time to query them all)
type1_users = type1.filter(person__in=users).count()
type2_users = type2.filter(person__in=users).count()
total_users = files.filter(person__in=users).count()
# another way
total_users2 = files.filter(person__in=users)
type1_users2 = total_users.filter(type1=True).count()
type2_users2 = total_users.filter(type2=True).count()
total_count = total_users2.count()
I thought about creating a query with .values() and putting into a set().
After that is done execute some functions within the set (like diff).
Is this the only way to improve the query time?
You can always do raw SQL https://docs.djangoproject.com/en/2.0/topics/db/sql/#performing-raw-queries
Example:
# Dont do this its insecure
YourModel.objects.raw(f"select id from {YourModel._meta.db_table}")
# Do like this to avoid SQL injection issues.
YourModel.objects.raw("select id from app_model_name")
The name of the table can be obtained as: YourModel._meta.db_table and also you can get the sql of a queryset like this:
type1_users = type1.filter(person__in=users)
type1_users.query.__str__()
So you can build join this query to another one.
I don't have to make those queries very often (once a day at most). So I'm running on a cron job which exports the data to a file (you could create a table in your database for auditing purposes, for ex). I then read the file and use the data from there. It's working well/fast.

Python sqlite3 never returns an inner join with 28 milion+ rows

Sqlite database with two tables, each over 28 million rows long. Here's the schema:
CREATE TABLE MASTER (ID INTEGER PRIMARY KEY AUTOINCREMENT,PATH TEXT,FILE TEXT,FULLPATH TEXT,MODIFIED_TIME FLOAT);
CREATE TABLE INCREMENTAL (INC_ID INTEGER PRIMARY KEY AUTOINCREMENT,INC_PATH TEXT,INC_FILE TEXT,INC_FULLPATH TEXT,INC_MODIFIED_TIME FLOAT);
Here's an example row from MASTER:
ID PATH FILE FULLPATH MODIFIED_TIME
---------- --------------- ---------- ----------------------- -------------
1 e:\ae/BONDS/0/0 100.bin e:\ae/BONDS/0/0/100.bin 1213903192.5
The tables have mostly identical data, with some differences between MODIFIED_TIME in MASTER and INC_MODIFIED_TIME in INCREMENTAL.
If I execute the following query in sqlite, I get the results I expect:
select ID from MASTER inner join INCREMENTAL on FULLPATH = INC_FULLPATH and MODIFIED_TIME != INC_MODIFIED_TIME;
That query will pause for a minute or so, return a number of rows, pause again, return some more, etc., and finish without issue. Takes about 2 minutes to fully return everything.
However, if I execute the same query in Python:
changed_files = conn.execute("select ID from MASTER inner join INCREMENTAL on FULLPATH = INC_FULLPATH and MODIFIED_TIME != INC_MODIFIED_TIME;")
It will never return - I can leave it running for 24 hours and still have nothing. The python32.exe process doesn't start consuming a large amount of cpu or memory - it stays pretty static. And the process itself doesn't actually seem to go unresponsive - however, I can't Ctrl-C to break, and have to kill the process to actually stop the script.
I do not have these issues with a small test database - everything runs fine in Python.
I realize this is a large amount of data, but if sqlite is handling the actual queries, python shouldn't be choking on it, should it? I can do other large queries from python against this database. For instance, this works:
new_files = conn.execute("SELECT DISTINCT INC_FULLPATH, INC_PATH, INC_FILE from INCREMENTAL where INC_FULLPATH not in (SELECT DISTINCT FULLPATH from MASTER);")
Any ideas? Are the pauses in between sqlite returning data causing a problem for Python? Or is something never occurring at the end to signal the end of the query results (and if so, why does it work with small databases)?
Thanks. This is my first stackoverflow post and I hope I followed the appropriate etiquette.
Python tends to have older versions of the SQLite library, especially Python 2.x, where it is not updated.
However, your actual problem is that the query is slow.
Use the usual mechanisms to optimize it, such as creating a two-column index on INC_FULLPATH and INC_MODIFIED_TIME.

Slow MySQL queries in Python but fast elsewhere

I'm having a heckuva time dealing with slow MySQL queries in Python. In one area of my application, "load data infile" goes quick. In an another area, the select queries are VERY slow.
Executing the same query in PhpMyAdmin AND Navicat (as a second test) yields a response ~5x faster than in Python.
A few notes...
I switched to MySQLdb as the connector and am also using SSCursor. No performance increase.
The database is optimized, indexed etc. I'm porting this application to Python from PHP/Codeigniter where it ran fine (I foolishly thought getting out of PHP would help speed it up)
PHP/Codeigniter executes the select queries swiftly. For example, one key aspect of the application takes ~2 seconds in PHP/Codeigniter, but is taking 10 seconds in Python BEFORE any of the analysis of the data is done.
My link to the database is fairly standard...
dbconn=MySQLdb.connect(host="127.0.0.1",user="*",passwd="*",db="*", cursorclass = MySQLdb.cursors.SSCursor)
Any insights/help/advice would be greatly appreciated!
UPDATE
In terms of fetching/handling the results, I've tried it a few ways. The initial query is fairly standard...
# Run Query
cursor.execute(query)
I removed all of the code within this loop just to make sure it wasn't the case bottlekneck, and it's not. I put dummy code in its place. The entire process did not speed up at all.
db_results = "test"
# Loop Results
for row in cursor:
a = 0 (this was the dummy code I put in to test)
return db_results
The query result itself is only 501 rows (large amount of columns)... took 0.029 seconds outside of Python. Taking significantly longer than that within Python.
The project is related to horse racing. The query is done within this function. The query itself is long, however, it runs well outside of Python. I commented out the code within the loop on purpose for testing... also the print(query) in hopes of figuring this out.
# Get PPs
def get_pps(race_ids):
# Comma Race List
race_list = ','.join(map(str, race_ids))
# PPs Query
query = ("SELECT raceindex.race_id, entries.entry_id, entries.prognum, runlines.line_id, runlines.track_code, runlines.race_date, runlines.race_number, runlines.horse_name, runlines.line_date, runlines.line_track, runlines.line_race, runlines.surface, runlines.distance, runlines.starters, runlines.race_grade, runlines.post_position, runlines.c1pos, runlines.c1posn, runlines.c1len, runlines.c2pos, runlines.c2posn, runlines.c2len, runlines.c3pos, runlines.c3posn, runlines.c3len, runlines.c4pos, runlines.c4posn, runlines.c4len, runlines.c5pos, runlines.c5posn, runlines.c5len, runlines.finpos, runlines.finposn, runlines.finlen, runlines.dq, runlines.dh, runlines.dqplace, runlines.beyer, runlines.weight, runlines.comment, runlines.long_comment, runlines.odds, runlines.odds_position, runlines.entries, runlines.track_variant, runlines.speed_rating, runlines.sealed_track, runlines.frac1, runlines.frac2, runlines.frac3, runlines.frac4, runlines.frac5, runlines.frac6, runlines.final_time, charts.raceshape "
"FROM hrdb_raceindex raceindex "
"INNER JOIN hrdb_runlines runlines ON runlines.race_date = raceindex.race_date AND runlines.track_code = raceindex.track_code AND runlines.race_number = raceindex.race_number "
"INNER JOIN hrdb_entries entries ON entries.race_date=runlines.race_date AND entries.track_code=runlines.track_code AND entries.race_number=runlines.race_number AND entries.horse_name=runlines.horse_name "
"LEFT JOIN hrdb_charts charts ON runlines.line_date = charts.race_date AND runlines.line_track = charts.track_code AND runlines.line_race = charts.race_number "
"WHERE raceindex.race_id IN (" + race_list + ") "
"ORDER BY runlines.line_date DESC;")
print(query)
# Run Query
cursor.execute(query)
# Query Fields
fields = [i[0] for i in cursor.description]
# PPs List
pps = []
# Loop Results
for row in cursor:
a = 0
#this_pp = {}
#for i, value in enumerate(row):
# this_pp[fields[i]] = value
#pps.append(this_pp)
return pps
One final note... I haven't considered the ideal way to handle the result. I believe one cursor allows the result to come back as a set of dictionaries. I haven't even made it to that point yet as the query and return itself is so slow.
Tho you have only 501 rows it looks like you have over 50 columns. How much total data is being passed from MySQL to Python?
501 rows x 55 columns = 27,555 cells returned.
If each cell averaged "only" 1K that would be close to 27MB of data returned.
To get a sense of how much data mysql is pushing you can add this to your query:
SHOW SESSION STATUS LIKE "bytes_sent"
Is your server well-resourced? Is memory allocation well configured?
My guess is that when you are using PHPMyAdmin you are getting paginated results. This masks the issue of MySQL returning more data than your server can handle (I don't use Navicat, not sure about how that returns results).
Perhaps the Python process is memory-constrained and when faced with this large result set it has to out page out to disk to handle the result set.
If you reduce the number of columns called and/or constrain to, say LIMIT 10 on your query do you get improved speed?
Can you see if the server running Python is paging to disk when this query is called? Can you see what memory is allocated to Python, how much is used during the process and how that allocation and usage compares to those same values in the PHP version?
Can you allocate more memory to your constrained resource?
Can you reduce the number of columns or rows that are called through pagination or asynchronous loading?
I know this is late, however, I have run into similar issues with mysql and python. My solution is to use queries using another language...I use R to make my queries which is blindly fast, do what I can in R and then send the data to python if need be for more general programming, although R has many general purpose libraries as well. Just wanted to post something that may help someone who has a similar problem, and I know this side steps the heart of the problem.

Sort table by a column and set other column to sequential value to persist ordering

So I'm not sure whether to pose this as a Django or SQL question however I have the following model:
class Picture(models.Model):
weight = models.IntegerField(default=0)
taken_date = models.DateTimeField(blank=True, null=True)
album = models.ForeignKey(Album, db_column="album_id", related_name='pictures')
I may have a subset of Picture records numbering in the thousands, and I'll need to sort them by taken_date and persist the order by setting the weight value.
For instance in Django:
pictures = Picture.objects.filter(album_id=5).order_by('taken_date')
for weight, picture in enumerate(list(pictures)):
picture.weight = weight
picture.save()
Now for 1000s of records as I'm expecting to have, this could take way too long. Is there a more efficient way of performing this task? I'm assuming I might need to resort to SQL as I've recently come to learn Django's not necessarily "there yet" in terms of database bulk operations.
Ok I put together the following in MySQL which works fine, however I'm gonna guess there's no way to simulate this using Django ORM?
UPDATE picture p
JOIN (SELECT #inc := #inc + 1 AS new_weight, id
FROM (SELECT #inc := 0) temp, picture
WHERE album_id = 5
ORDER BY taken_date) pw
ON p.id = pw.id
SET p.weight = pw.new_weight;
I'll leave the question open for a while just in case there's some awesome solution or app that solves this, however the above query for ~6000 records takes 0.11s.
NOTE that the above query will generate warnings in MySQL if you have the following setting in MySQL:
binlog_format=statement
In order to fix this, you must change the binlog_format setting to either mixed or row. mixed is probably better as it means you'll still use statement for everything except in cases where row is required to avoid a warning like the above.

Categories

Resources