(BigQuery PY Client Library v0.28) - Fetch result from table 'query' job - python

I'm learning BigQuery API using Python Client Libraries v0.28
https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#run-a-simple-query
Wrote this simple code to fetch data from the table
1) Create client object
client_ = bigquery.Client.from_service_account_json('/Users/xyz/key.json')
2) Begin new Async query job
QUERY = 'SELECT visitid FROM `1234567.ga_sessions_20180101`'
query_job = client_.query(QUERY
, job_id=str(uuid.uuid4()))
3) poll until the query is DONE
while (query_job.state == 'RUNNING'):
time.sleep(5)
query_job.reload()
4) Fetch the results in iteration
query_job.reload()
iter = query_job.result()
At this stage I'd like to fetch how many rows are in the table. As per the doc GitHub code iter is of type bigquery.table.RowIterator with a property [tier.total_rows][1]
5) However, at this stage when I print:
print(iter.total_rows)
It keeps returning None
I'm pretty sure this table is NOT empty an dry query is correctly formatted!
Any help to any pointers what am I missing here will be really helpful... Thanks a lot!
Cheers!

You need to also check query_job.error_result to make sure query succeeded.
You can also see your job in the UI, which can be useful for debugging, using project id and job id:
https://bigquery.cloud.google.com/results/projectid:jobid
Also, query_job.result() already waits for the job completion so you don't need to poll.

The current behavior of how RowIterator returns None is indeed perplexing. Luckily, according to this issue, tswast's comment from 10 days ago indicates that the developers are working on a better solution.
Current awkward behavior of .total_rows
Currently, .total_rows is initialized only once iteration begins. (In what follows, for clarity I renamed your iter variable to row_iter.)
row_iter = query_job.result()
itr = iter(row_iter)
first_row = next(itr)
print(row_iter.total_rows) # Now you get a number instead of None.
This is ugly because to continue the iteration, we must either handle the first row differently or call row_iter = query_job.result() again.
Temporary workaround
A currently-working alternative is to use the value of query_job._query_results.total_rows. Unfortunately this is cheating because _query_results is private, so there is no reason to expect that this will work in the future.
Future behavior
If tswast's proposal is implemented, then row_iter.total_rows will be initialized at the beginning, just as you expect.
Suggestion
In my code, I'm going to use something like
try:
num_rows = row_iter.total_rows or query_job._query_results.total_rows
except NameError:
num_rows = None
to be compatible with future behavior while falling-back to the temporary workaround if necessary.

Related

Struggling with how to iterate data

I am learning Python3 and I have a fairly simple task to complete but I am struggling how to glue it all together. I need to query an API and return the full list of applications which I can do and I store this and need to use it again to gather more data for each application from a different API call.
applistfull = requests.get(url,authmethod)
if applistfull.ok:
data = applistfull.json()
for app in data["_embedded"]["applications"]:
print(app["profile"]["name"],app["guid"])
summaryguid = app["guid"]
else:
print(applistfull.status_code)
I next have I think 'summaryguid' and I need to again query a different API and return a value that could exist many times for each application; in this case the compiler used to build the code.
I can statically call a GUID in the URL and return the correct information but I haven't yet figured out how to get it to do the below for all of the above and build a master list:
summary = requests.get(f"url{summaryguid}moreurl",authmethod)
if summary.ok:
fulldata = summary.json()
for appsummary in fulldata["static-analysis"]["modules"]["module"]:
print(appsummary["compiler"])
I would prefer to not yet have someone just type out the right answer but just drop a few hints and let me continue to work through it logically so I learn how to deal with what I assume is a common issue in the future. My thought right now is I need to move my second if up as part of my initial block and continue the logic in that space but I am stuck with that.
You are on the right track! Here is the hint: the second API request can be nested inside the loop that iterates through the list of applications in the first API call. By doing so, you can get the information you require by making the second API call for each application.
import requests
applistfull = requests.get("url", authmethod)
if applistfull.ok:
data = applistfull.json()
for app in data["_embedded"]["applications"]:
print(app["profile"]["name"],app["guid"])
summaryguid = app["guid"]
summary = requests.get(f"url/{summaryguid}/moreurl", authmethod)
fulldata = summary.json()
for appsummary in fulldata["static-analysis"]["modules"]["module"]:
print(app["profile"]["name"],appsummary["compiler"])
else:
print(applistfull.status_code)

How to use Azure DevOps / VSTS to fetch query results in python

Below is my current code. It connects successfully to the organization. How can I fetch the results of a query in Azure like they have here? I know this was solved but there isn't an explanation and there's quite a big gap on what they're doing.
from azure.devops.connection import Connection
from msrest.authentication import BasicAuthentication
from azure.devops.v5_1.work_item_tracking.models import Wiql
personal_access_token = 'xxx'
organization_url = 'zzz'
# Create a connection to the org
credentials = BasicAuthentication('', personal_access_token)
connection = Connection(base_url=organization_url, creds=credentials)
wit_client = connection.clients.get_work_item_tracking_client()
results = wit_client.query_by_id("my query ID here")
P.S. Please don't link me to the github or documentation. I've looked at both extensively for days and it hasn't helped.
Edit: I've added the results line that successfully gets the query. However, it returns a WorkItemQueryResult class which is not exactly what is needed. I need a way to view the column and results of the query for that column.
So I've figured this out in probably the most inefficient way possible, but hope it helps someone else and they find a way to improve it.
The issue with the WorkItemQueryResult class stored in variable "result" is that it doesn't allow the contents of the work item to be shown.
So the goal is to be able to use the get_work_item method that requires the id field, which you can get (in a rather roundabout way) through item.target.id from results' work_item_relations. The code below is added on.
for item in results.work_item_relations:
id = item.target.id
work_item = wit_client.get_work_item(id)
fields = work_item.fields
This gets the id from every work item in your result class and then grants access to the fields of that work item, which you can access by fields.get("System.Title"), etc.

SQLite3 incorrect number of bindings on second query

Using Python and SQlite3 where c is a cursor this code...
print("vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv")
print("SQL and parameters:",sql,parm)
c.execute(sql,parm)
# Get the row
print("Executed OK")
response = c.fetchone()
# If not successful return null
if not response:
return None
#
print("and produced ", response)
print("^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^")
give this output:
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
SQL and parameters: select * from Links where LinkNum = ? (301,)
Executed OK
and produced (301, 'Index', 'The Independent', 'https://www.independent.co.uk/', 6, 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
SQL and parameters: select * from Links where LinkNum = ? (301,)
Encountered exception of type ProgrammingError with arguments ('Incorrect number of bindings supplied. The current statement uses 1, and there are 6 supplied.',)
The application will close
Two identical statements. 1 works and the next throws the exception. As can be seen the row I'm trying to retrieve has 6 columns but that's the only hint/clue I can see. Can anyone help with tracking down the problem? Thanks.
Whatever was causing SQLite to have a fit I remedied the problem by retrieving the second row/object out side the Links object and passing it as an argument to the method I was calling, rather than have that method attempt to retrieve the object.
The problem would still be there but must be deep within the internals of Python instantiation and/or SQLite. Whatever, this problem is solved with some less fancy code.

SQL alchemy not updating as expected

This SQL Alchemy 0.9.7 code executes without error -- but does not update the underlying database as expected.
Here is the python:
print t #prints TITLE ABSTRACTOR 1
print newtitle #prints TITLE ABSTRACTOR I
print session.query(Basic).filter(Basic.title==t).count() #prints 1
ret = update(Basic).where(Basic.title==t).values(title=newtitle)
session.commit()
Here is what the database looks like after the update:
select count(*) from basics where title='TITLE ABSTRACTOR 1';
count
-------
1
(1 row)
select count(*) from basics where title='TITLE ABSTRACTOR I';
count
-------
0
(1 row)
Have I hit a SQL alchemy bug or am I missing something?
You're just constructing an update statement:
ret = update(Basic).where(Basic.title==t).values(title=newtitle)
That doesn't do anything unless you execute the statement:
stmt = update(Basic).where(Basic.title==t).values(title=newtitle)
ret = conn.execute(stmt)
But I think you were trying to use the ORM interface, not the core interface. In which case, although I don't remember the details, I'm pretty sure you do that by modifying a query object, not by calling anything named update. Hopefully if this is what you're looking for, hopefully someone who's fresher on this will provide a better answer, but something like this:
ret = session.query(Basic).filter(Basic.title==t)
ret.title = newtitle
If this doesn't make sense to you, see Executing in the tutorial. But I'm guessing you know this and it was just one of those stupid bugs we all make and all have a hard enough time seeing in other people's code, and it's 100x worse in our own. :)

SQLAlchemy - select for update example

I'm looking for a complete example of using select for update in SQLAlchemy, but haven't found one googling. I need to lock a single row and update a column, the following code doesn't work (blocks forever):
s = table.select(table.c.user=="test",for_update=True)
# Do update or not depending on the row
u = table.update().where(table.c.user=="test")
u.execute(email="foo")
Do I need a commit? How do I do that? As far as I know you need to:
begin transaction
select ... for update
update
commit
If you are using the ORM, try the with_for_update function:
foo = session.query(Foo).filter(Foo.id==1234).with_for_update().one()
# this row is now locked
foo.name = 'bar'
session.add(foo)
session.commit()
# this row is now unlocked
Late answer, but maybe someone will find it useful.
First, you don't need to commit (at least not in-between queries, which I'm assuming you are asking about). Your second query hangs indefinitely, because you are effectively creating two concurrent connections to the database. First one is obtaining lock on selected records, then second one tries to modify locked records. So it can't work properly. (By the way in the example given you are not calling first query at all, so I'm assuming in your real tests you did something like s.execute() somewhere). So to the point—working implementation should look more like:
s = conn.execute(table.select(table.c.user=="test", for_update=True))
u = conn.execute(table.update().where(table.c.user=="test"), {"email": "foo"})
conn.commit()
Of course in such simple case there's no reason to do any locking but I guess it is example only and you were planning to add some additional logic between those two calls.
Yes, you do need to commit, which you can execute on the Engine or create a Transaction explicitely. Also the modifiers are specified in the values(...) method, and not execute:
>>> conn.execute(users.update().
... where(table.c.user=="test").
... values(email="foo")
... )
>>> my_engine.commit()

Categories

Resources