How to use variables in query_to_pandas

How to use variables in query_to_pandas - python

Just got dumped into SQL with BigQuery and stuff so I don't know alot of terms for this kinda stuff. Currently trying to make a method for which you input a string (the dataset name you want to take out). But I can't seem to put in a string into the variable I want without it returning errors.
I looked up how to put in variables for SQL stuff but most of those solutions weren't for my case. Then I ended up with adding $s and adding s before the """ variable. (this ended up with a syntax error)
import pandas as pd
import bq_helper
from bq_helper import BigQueryHelper
# Some code about using BQ_helper to get the data, if you need it lmk
# test = `data.patentsview.application`
query1 = s"""
SELECT * FROM $s
LIMIT
20;
"""
response1 = patentsview.query_to_pandas_safe(query1)
response1.head(20)
With the code above it returns the error code
File "<ipython-input-63-6b07957ebb81>", line 8
"""
^
SyntaxError: invalid syntax
EDIT:
The original code that worked but would have to be manually bruteforced is this
query1 = """
SELECT * FROM `patents-public-data.patentsview.application`
LIMIT
20;
"""
response1 = patentsview.query_to_pandas_safe(query1)
response1.head(20)

If I understand you correctly, this may be what you're looking for:
#making up some variables:
vars = ['`patents-public-data.patentsview.application','`patents-private-data.patentsview.application']
for var in vars:
query = f"""SELECT * FROM {var}
LIMIT
20;
"""
print(query)
Output:
SELECT * FROM `patents-public-data.patentsview.application
LIMIT
20;
SELECT * FROM `patents-private-data.patentsview.application
LIMIT
20;

I believe this should help: https://cloud.google.com/bigquery/docs/parameterized-queries#bigquery_query_params_named-python:
To specify a named parameter, use the # character followed by an identifier, such as #param_name.

Related

How to do following sql operation using Table.select() method in python sqlobject?

Sql Operation:
SELECT *,
CASE
WHEN quota_unit = 'MB' THEN quota*1024
ELSE quota
END AS data_usage
FROM mapping
ORDER BY data_usage;
I tried like this,
query = mapping.select('CASE WHEN mapping.quota_unit = 'MB' THEN 1024 * mapping.quota ELSE mapping.quota END AS data_usage)
query.orderBy(data_usage)
res = list(query)
but select method taking that value in braces for Where clause and resultant query becomes
SELECT * FROM mapping WHERE 'CASE WHEN mapping.quota_unit = MB THEN 1024 * mapping.quota ELSE mapping.quota END AS data_usage'
which is not expected. Can you please me here ?

Not every possible SQL query could be mapping back to SQLObject expressions. Sometimes you need to go down to SQLBuilder or even raw queries:
results = connection.queryAll(string)

SQL Query translation to SQLAlchemy

Hello I am trying to translate the following relatively simple query to SQLAlchemy but I get
('Unexpected error:', <class 'sqlalchemy.exc.InvalidRequestError'>)
SELECT model, COUNT(model) AS count FROM log.logs
WHERE SOURCE = "WEB" AND year(timestamp) = 2015 AND month(timestamp) = 1
and account = "Test" and brand = "Nokia" GROUP BY model ORDER BY count DESC limit 10
This is what I wrote but it is not working. What is wrong ?
devices = db.session.query(Logs.model).filter_by(source=source).filter_by(account=acc).filter_by(brand=brand).\
filter_by(year=year).filter_by(month=month).group_by(Logs.model).order_by(Logs.model.count().desc()).all()

It's a bit hard to tell from your code sample, but the following is hopefully the correct SQLAlchemy code. Try:
from sqlalchemy.sql import func
devices = (db.session
.query(Logs.model, func.count(Logs.model).label('count'))
.filter(source=source)
.filter_by(account=acc)
.filter_by(brand=brand)
.filter_by(year=year)
.filter_by(month=month)
.group_by(Logs.model)
.order_by(func.count(Logs.model).desc()).all())
Note that I've enclosed the query in a (...) to avoid having to use \ at the end of each line.

python mysql execute() very slow

I've sourced a slowness in my application to the execute() function of mysql. I crafted a simple sql query that exemplifies this problem:
SELECT * FROM `cid444_agg_big` c WHERE 1
.
>>> import MySQLdb as mdb
>>> import time;
>>>
>>> dbconn = mdb.connect('localhost','*****','*****','*****');
>>> cursorconn = dbconn.cursor()
>>>
>>> sql="SELECT * FROM `cid444_agg_big` c WHERE 1";
>>>
>>> startstart=time.time();
>>> cursorconn.execute(sql);
21600L #returned 21600 records
>>> print time.time()-startstart, "for execute()"
2.86254501343 for execute() #why does this take so long?
>>>
>>> startstart=time.time();
>>> rows = cursorconn.fetchall()
>>> print time.time()-startstart, "for fetchall()"
0.0021288394928 for fetchall() #this is very fast, no problem with fetchall()
Running this query in the mysql shell, yields 0.27 seconds, or 10 times faster!!!
My only thought is the size of the data being returned. This returns 21600 "wide" rows. So that's a lot of data being sent to python. The database is localhost, so there's no network latency.
Why does this take so long?
UPDATE MORE INFORMATION
I wrote a similar script in php:
$c = mysql_connect ( 'localhost', '*****', '****', true );
mysql_select_db ( 'cachedata', $c );
$time_start = microtime_float();
$sql="SELECT * FROM `cid444_agg_big` c WHERE 1";
$q=mysql_query($sql);$c=0;
while($r=mysql_fetch_array($q))
$c++;//do something?
echo "Did ".$c." loops in ".(microtime_float() - $time_start)." seconds\n";
function microtime_float(){//function taken from php.net
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
This prints:
Did 21600 loops in 0.56120800971985 seconds
This loops on all the data instead of retrieving it all at once. PHP appears to be 6 times faster than the python version ....

The default MySQLdb cursor fetches the complete result set to the client on execute, and fetchall() will just copy the data from memory to memory.
If you want to store the result set on the server and fetch it on demand, you should use SSCursor instead.
Cursor:
This is the standard Cursor class that returns rows as tuples and stores the result set in the client.
SSCursor:
This is a Cursor class that returns rows as tuples and stores the result set in the server.

Very old discussion but i try to add my 2 cent.
Script had to select within many rows by timestamp. In standard situation (id index, name, timestamp) was very very slow (i didnt check but minutes, lot of minutes). I added an index to timestamp too.. the query took under 10 seconds. Much better.
"ALTER TABLE BTC ADD INDEX(timestamp)"
i hope can help.

python postgis ST_ClosestPoint

I am struggling with an SQL command issued from my python script. Here is what I have tried so far, the first example works fine but the rest do not.
#working SQL = "SELECT ST_Distance(ST_Transform(ST_GeomFromText(%s, 4326),27700),ST_Transform(ST_GeomFromText(%s, 4326),27700));"
#newPointSQL = "SELECT ST_ClosestPoint(ST_GeomFromText(%s),ST_GeomFromText(%s));"
#newPointSQL = "SELECT ST_As_Text(ST_ClosestPoint(ST_GeomFromText(%s), ST_GeomFromText(%s)));"
#newPointSQL = "SELECT ST_AsText(ST_ClosestPoint(ST_GeomFromEWKT(%s), ST_GeomFromText(%s)));"
#newPointSQL = "SELECT ST_AsText(ST_Line_Interpolate_Point(ST_GeomFromText(%s),ST_Line_Locate_Point(ST_GeomFromText(%s),ST_GeomFromText(%s))));"
newPointData = (correctionPathLine,pointToCorrect) - ( MULTILINESTRING((-3.16427109855617 55.9273798550064,-3.16462372283029 55.9273883602162)), POINT(-3.164667 55.92739))
My data is picked up ok because the first sql is successfull when executed. The problem is when I use the ST_ClosestPoint function.
Can anyone notice a misuse anywhere? Am I using the ST_ClosetsPoint in a wrong way?
In the last example, I did modify my data (in case someone notices) to run it but it still would not execute.

I don't know with what kind of geometries you are dealing with, but I had the same trouble before with MultiLineStrings, I realized that when a MultiLinestring can't be merged, the function ST_Line_Locate_Point doesn't work.(you can know if a MultiLineString can't be merged using the ST_LineMerge function) I've made a pl/pgSQL function based in an old maillist but I added some performance tweaks, It only works with MultiLineStrings and LineStrings (but can be easily modified to work with Polygons). First it checks if the geometry only has 1 dimension, if it has, you can use the old ST_Line_Interpolate_Point and ST_Line_Locate_Point combination, if not, then you have to do the same for each LineString in the MultiLineString. Also I've added a ST_LineMerge for pre 1.5 compatibility :
CREATE OR REPLACE FUNCTION ST_MultiLine_Nearest_Point(amultiline geometry,apoint geometry)
RETURNS geometry AS
$BODY$
DECLARE
mindistance float8;
adistance float8;
nearestlinestring geometry;
nearestpoint geometry;
simplifiedline geometry;
line geometry;
BEGIN
simplifiedline:=ST_LineMerge(amultiline);
IF ST_NumGeometries(simplifiedline) <= 1 THEN
nearestpoint:=ST_Line_Interpolate_Point(simplifiedline, ST_Line_Locate_Point(simplifiedline,apoint) );
RETURN nearestpoint;
END IF;
-- *Change your mindistance according to your projection, it should be stupidly big*
mindistance := 100000;
FOR line IN SELECT (ST_Dump(simplifiedline)).geom as geom LOOP
adistance:=ST_Distance(apoint,line);
IF adistance < mindistance THEN
mindistance:=adistance;
nearestlinestring:=line;
END IF;
END LOOP;
RETURN ST_Line_Interpolate_Point(nearestlinestring,ST_Line_Locate_Point(nearestlinestring,apoint));
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE STRICT;
UPDATE:
As noted by #Nicklas Avén ST_Closest_Point() should work, ST_Closest_Point was added in 1.5 .

rdflib graph not updated. Why?

I am trying to understand this behavior. It's definitely not what I expect. I have two programs, one reader, and one writer. The reader opens a RDFlib graph store, then performs a query every 2 seconds
import rdflib
import random
from rdflib import store
import time
default_graph_uri = "urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52"
s = rdflib.plugin.get('MySQL', store.Store)('rdfstore')
config_string = "host=localhost,password=foo,user=foo,db=foo"
rt = s.open(config_string,create=False)
if rt != store.VALID_STORE:
s.open(config_string,create=True)
while True:
graph = rdflib.ConjunctiveGraph(s, identifier = rdflib.URIRef(default_graph_uri))
rows = graph.query("SELECT ?id ?value { ?id <http://localhost#ha> ?value . }")
for r in rows:
print r[0], r[1]
time.sleep(2)
print " - - - - - - - - "
The second program is a writer that adds stuff to the triplestore
import rdflib
import random
from rdflib import store
default_graph_uri = "urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52"
s = rdflib.plugin.get('MySQL', store.Store)('rdfstore')
config_string = "host=localhost,password=foo,user=foo,db=foo"
rt = s.open(config_string,create=False)
if rt != store.VALID_STORE:
s.open(config_string,create=True)
graph = rdflib.ConjunctiveGraph(s, identifier = rdflib.URIRef(default_graph_uri))
graph.add( (
rdflib.URIRef("http://localhost/"+str(random.randint(0,100))),
rdflib.URIRef("http://localhost#ha"),
rdflib.Literal(str(random.randint(0,100)))
)
)
graph.commit()
I would expect to see the number of results increment on the reader as I submit stuff using the writer, but this does not happen. The reader continues to return the same result as when it started. If however I stop the reader and restart it, the new results appear.
Does anybody know what am I doing wrong ?

One easy fix is to put "graph.commit()" just after the line "graph = rdflib.ConjunctiveGraph(...)" in reader.
I'm not sure what's the cause and why commiting before read fixes this. I'm guessing that:
When opening MySQLdb connection, a transaction is started automatically
This transaction doesn't see updates from other, later transactions.
"graph.commit()" bubbles down to some "connection.commit()" somewhere that discards this transaction and starts a new one.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use variables in query_to_pandas - python

I believe this should help: https://cloud.google.com/bigquery/docs/parameterized-queries#bigquery_query_params_named-python: To specify a named parameter, use the # character followed by an identifier, such as #param_name.

Related

How to do following sql operation using Table.select() method in python sqlobject?

SQL Query translation to SQLAlchemy

python mysql execute() very slow

python postgis ST_ClosestPoint

rdflib graph not updated. Why?

Categories

Resources