Struggling with how to iterate data - python

I am learning Python3 and I have a fairly simple task to complete but I am struggling how to glue it all together. I need to query an API and return the full list of applications which I can do and I store this and need to use it again to gather more data for each application from a different API call.
applistfull = requests.get(url,authmethod)
if applistfull.ok:
data = applistfull.json()
for app in data["_embedded"]["applications"]:
print(app["profile"]["name"],app["guid"])
summaryguid = app["guid"]
else:
print(applistfull.status_code)
I next have I think 'summaryguid' and I need to again query a different API and return a value that could exist many times for each application; in this case the compiler used to build the code.
I can statically call a GUID in the URL and return the correct information but I haven't yet figured out how to get it to do the below for all of the above and build a master list:
summary = requests.get(f"url{summaryguid}moreurl",authmethod)
if summary.ok:
fulldata = summary.json()
for appsummary in fulldata["static-analysis"]["modules"]["module"]:
print(appsummary["compiler"])
I would prefer to not yet have someone just type out the right answer but just drop a few hints and let me continue to work through it logically so I learn how to deal with what I assume is a common issue in the future. My thought right now is I need to move my second if up as part of my initial block and continue the logic in that space but I am stuck with that.

You are on the right track! Here is the hint: the second API request can be nested inside the loop that iterates through the list of applications in the first API call. By doing so, you can get the information you require by making the second API call for each application.

import requests
applistfull = requests.get("url", authmethod)
if applistfull.ok:
data = applistfull.json()
for app in data["_embedded"]["applications"]:
print(app["profile"]["name"],app["guid"])
summaryguid = app["guid"]
summary = requests.get(f"url/{summaryguid}/moreurl", authmethod)
fulldata = summary.json()
for appsummary in fulldata["static-analysis"]["modules"]["module"]:
print(app["profile"]["name"],appsummary["compiler"])
else:
print(applistfull.status_code)

Related

Converting Intersystems cache objectscript into a python function

I am accessing an Intersystems cache 2017.1.xx instance through a python process to get various attributes about the database in able to monitor the database.
One of the items I want to monitor is license usage. I wrote a objectscript script in a Terminal window to access license usage by user:
s Rset=##class(%ResultSet).%New("%SYSTEM.License.UserListAll")
s r=Rset.Execute()
s ncol=Rset.GetColumnCount()
While (Rset.Next()) {f i=1:1:ncol w !,Rset.GetData(i)}
But, I have been unable to determine how to convert this script into a Python equivalent. I am using the intersys.pythonbind3 import for connecting and accessing the cache instance. I have been able to create python functions that accessing most everything else in the instance but this one piece of data I can not figure out how to translate it to Python (3.7).
Following should work (based on the documentation):
query = intersys.pythonbind.query(database)
query.prepare_class("%SYSTEM.License","UserListAll")
query.execute();
# Fetch each row in the result set, and print the
# name and value of each column in a row:
while 1:
cols = query.fetch([None])
if len(cols) == 0: break
print str(cols[0])
Also, notice that InterSystems IRIS -- successor to the Caché now has Python as an embedded language. See more in the docs
Since the noted query "UserListAll" is not defined correctly in the library; not SqlProc. So to resolve this issue would require a ObjectScript with the query and the use of #Result set or similar in Python to get the results. So I am marking this as resolved.
Not sure which Python interface you're using for Cache/IRIS, but this Open Source 3rd party one is worth investigating for the kind of things you're trying to do:
https://github.com/chrisemunt/mg_python

Recursive API Calls using AWS Lambda Functions Python

I've never written a recursive python script before. I'm used to splitting up a monolithic function into sub AWS Lambda functions. However, this particular script I am working on is challenging to break up into smaller functions.
Here is the code I am currently using for context. I am using one api request to return a list of objects within a table.
url_pega_EEvisa = requests.get('https://cloud.xxxx.com:443/prweb/api/v1/data/D_pxCaseList?caseClass=xx-xx-xx-xx', auth=(username, password))
pega_EEvisa_raw = url_pega_EEvisa.json()
pega_EEvisa = pega_EEvisa_raw['pxResults']
This returns every object(primary key) within a particular table as a list. For example,
['XX-XXSALES-WORK%20PO-1', 'XX-XXSALES-WORK%20PO-10', 'XX-XXSALES-WORK%20PO-100', 'XX-XXSALES-WORK%20PO-101', 'XX-XXSALES-WORK%20PO-102', 'XX-XXSALES-WORK%20PO-103', 'XX-XXSALES-WORK%20PO-104', 'XX-XXSALES-WORK%20PO-105', 'XX-XXSALES-WORK%20PO-106', 'XX-XXSALES-WORK%20PO-107']
I then use this list to populate more get requests using a for loop which then grabs me all the data per object.
for t in caseid:
url = requests.get(('https://cloud.xxxx.com:443/prweb/api/v1/cases/{}'.format(t)), auth=(username, password)).json()
data.append(url)
This particular lambda function takes about 15min which is the limit for one AWS Lambda function. Ideally, I'd like to split up the list into smaller parts and run the same process. I am struggling marking the point where it last ran before failure and passing that information on to the next function.
Any help is appreciated!
I'm not sure if I entirely understand what you want to do with the data once you've fetched all the information about the case, but it terms of breaking up the work once lambda is doing into many lambdas, you should be able to chunk out the list of cases and pass them to new invocations of the same lambda. Python psuedocode below, hopefully it helps illustrate the idea. I stole the chunks method from this answer that would help break the list into batches
import boto3
import json
client = boto3.client('lambda')
def handler
url_pega_EEvisa = requests.get('https://cloud.xxxx.com:443/prweb/api/v1/data/D_pxCaseList?caseClass=xx-xx-xx-xx', auth=(username, password))
pega_EEvisa_raw = url_pega_EEvisa.json()
pega_EEvisa = pega_EEvisa_raw['pxResults']
for chunk in chunks(pega_EEvisa, 10)
client.invoke(
FunctionName='lambdaToHandleBatchOfTenCases',
Payload=json.dumps(chunk)
)
Hopefully that helps? Let me know if this was not on target 😅

Is it possible to inject python code in Kwargs and how could I prevent this user input

I'm at the moment in the middle of writing my Bachelor thesis and for it creating a database system with Postgres and Flask.
To ensure the safety of my data, I was working on a file to prevent SQL injections, since a user should be able to submit a string via Http request. Since most of my functions which I use to analyze the Http request use Kwargs and a dict based on JSON in the request I was wondering if it is possible to inject python code into those kwargs.
And If so If there are ways to prevent that.
To make it easier to understand what I mean, here are some example requests and code:
def calc_sum(a, b):
c = a + b
return c
#app.route(/<target:string>/<value:string>)
def handle_request(target,value):
if target == 'calc_sum':
cmd = json.loads(value)
calc_sum(**cmd)
example Request:
Normal : localhost:5000/calc_sum/{"a":1, "b":2}
Injected : localhost:5000/calc_sum/{"a":1, "b:2 ): print("ham") def new_sum(a=1, b=2):return a+b":2 }
Since I'm not near my work, where all my code is I'm unable to test it out. And to be honest that my code example would work. But I hope this can convey what I meant.
I hope you can help me, or at least nudge me in the right direction. I've searched for it, but all I can find are tutorials on "who to use kwargs".
Best regards.
Yes you, but not in URL, try to use arguments like these localhost:5000/calc_sum?func=a+b&a=1&b=2
and to get these arguments you need to do this in flask
#app.route(/<target:string>)
def handle_request(target):
if target == 'calc_sum':
func= request.args.get('func')
a = request.args.get('a')
b = request.args.get('b')
result = exec(func)
exec is used to execute python code in strings

(BigQuery PY Client Library v0.28) - Fetch result from table 'query' job

I'm learning BigQuery API using Python Client Libraries v0.28
https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#run-a-simple-query
Wrote this simple code to fetch data from the table
1) Create client object
client_ = bigquery.Client.from_service_account_json('/Users/xyz/key.json')
2) Begin new Async query job
QUERY = 'SELECT visitid FROM `1234567.ga_sessions_20180101`'
query_job = client_.query(QUERY
, job_id=str(uuid.uuid4()))
3) poll until the query is DONE
while (query_job.state == 'RUNNING'):
time.sleep(5)
query_job.reload()
4) Fetch the results in iteration
query_job.reload()
iter = query_job.result()
At this stage I'd like to fetch how many rows are in the table. As per the doc GitHub code iter is of type bigquery.table.RowIterator with a property [tier.total_rows][1]
5) However, at this stage when I print:
print(iter.total_rows)
It keeps returning None
I'm pretty sure this table is NOT empty an dry query is correctly formatted!
Any help to any pointers what am I missing here will be really helpful... Thanks a lot!
Cheers!
You need to also check query_job.error_result to make sure query succeeded.
You can also see your job in the UI, which can be useful for debugging, using project id and job id:
https://bigquery.cloud.google.com/results/projectid:jobid
Also, query_job.result() already waits for the job completion so you don't need to poll.
The current behavior of how RowIterator returns None is indeed perplexing. Luckily, according to this issue, tswast's comment from 10 days ago indicates that the developers are working on a better solution.
Current awkward behavior of .total_rows
Currently, .total_rows is initialized only once iteration begins. (In what follows, for clarity I renamed your iter variable to row_iter.)
row_iter = query_job.result()
itr = iter(row_iter)
first_row = next(itr)
print(row_iter.total_rows) # Now you get a number instead of None.
This is ugly because to continue the iteration, we must either handle the first row differently or call row_iter = query_job.result() again.
Temporary workaround
A currently-working alternative is to use the value of query_job._query_results.total_rows. Unfortunately this is cheating because _query_results is private, so there is no reason to expect that this will work in the future.
Future behavior
If tswast's proposal is implemented, then row_iter.total_rows will be initialized at the beginning, just as you expect.
Suggestion
In my code, I'm going to use something like
try:
num_rows = row_iter.total_rows or query_job._query_results.total_rows
except NameError:
num_rows = None
to be compatible with future behavior while falling-back to the temporary workaround if necessary.

Elastic Search [PUT] error

I'm having some trouble getting elastic search integrated with an existing application, but it should be a fairly straightforward issue. I'm able to create and destroy indices but for some reason I'm having trouble getting data into elastic search and querying for it.
I'm using the pyes library and honestly finding the documentation to be less than helpful on this front. This is my current code:
def initialize_transcripts(database, mapping):
database.indices.create_index("transcript-index")
def index_course(database, sjson_directory, course_name, mapping):
database.put_mapping(course_name, {'properties': mapping}, "transcript-index")
all_transcripts = grab_transcripts(sjson_directory)
video_counter = 0
for transcript_tuple in all_transcripts:
data_map = {"searchable_text": transcript_tuple[0], "uuid": transcript_tuple[1]}
database.index(data_map, "transcript-index", course_name, video_counter)
video_counter += 1
database.indices.refresh("transcript-index")
def search_course(database, query, course_name):
search_query = TermQuery("searchable_text", query)
return database.search(query=search_query)
I'm first creating the database, and initializing the index, then trying to add data in and search it with the second two methods. I'm currently getting the following error:
raise ElasticSearchException(response.body, response.status, response.body)
pyes.exceptions.ElasticSearchException: No handler found for uri [/transcript-index/test-course] and method [PUT]
I'm not quite sure how to approach it, and the only reference I could find to this error suggested creating your index beforehand which I believe I am already doing. Has anyone run into this error before? Alternatively do you know of any good places to look that I might not be aware of?
Any help is appreciated.
For some reason, adding the ID to the index, despite the fact that it is shown in the starting documentation: (http://pyes.readthedocs.org/en/latest/manual/usage.html) doesn't work, and in fact causes this error.
Once I removed the video_counter argument to index, this worked perfectly.

Categories

Resources