Why Kinesis stream calls my Lambda function more than one time? - python

I am consuming Amazon Connect CTRs through Amazon Kinesis and inserting my data into Postgres. I am facing very unexpected behavior from Kinesis and Lambda function. Whenever a CTR record comes through kinesis, my lambda gets invoked and after inserting that record into Postgres, it again gets invoked and is very unexpected behavior. Although, I have received only one record. Here is my code, if anything is wrong with the code please correct me:
def lambda_handler(event, context):
print(event['Records'])
print(event)
for record in event['Records']:
conn = psycopg2.connect(
host = hostt,
user = username,
password = passwordd,
database = databasee
)
cur = conn.cursor(cursor_factory = RealDictCursor)
payload = base64.b64decode(record['kinesis']['data'])
de_serialize_payload = json.loads(payload)
print(len(de_serialize_payload))
print(de_serialize_payload)
try:
for dsp in de_serialize_payload:
if de_serialize_payload['Agent'] != None and de_serialize_payload['CustomerEndpoint'] != None and de_serialize_payload['Recording'] != None and de_serialize_payload['TransferredToEndpoint'] != None:
required_data = {
'arn' : de_serialize_payload['Agent']['ARN'],
'aftercontactworkduration' : de_serialize_payload['Agent']['AfterContactWorkDuration'],
'aftercontactworkendtimestamp' : de_serialize_payload['Agent']['AfterContactWorkEndTimestamp'],
'aftercontactworkstarttimestamp' : de_serialize_payload['Agent']['AfterContactWorkStartTimestamp'],
'agentconnectionattempts' : de_serialize_payload['AgentConnectionAttempts'],
'agentinteractionduration' : de_serialize_payload['Agent']['AgentInteractionDuration'],
'answeringmachinedetectionstatus' : de_serialize_payload['AnsweringMachineDetectionStatus'],
'channel' : de_serialize_payload['Channel'],
'connectedtoagenttimestamp' : de_serialize_payload['Agent']['ConnectedToAgentTimestamp'],
'connectedtosystemtimestamp' : de_serialize_payload['ConnectedToSystemTimestamp'],
'customerendpointaddress' : de_serialize_payload['CustomerEndpoint']['Address'],
'customerendpointtype' : de_serialize_payload['CustomerEndpoint']['Type'],
'customerholdduration' : de_serialize_payload['Agent']['CustomerHoldDuration'],
'dequeuetimestamp' : de_serialize_payload['Queue']['DequeueTimestamp'],
'disconnectreason' : de_serialize_payload['DisconnectReason'],
'disconnecttimestamp' : de_serialize_payload['DisconnectTimestamp'],
'queueduration' : de_serialize_payload['Queue']['Duration'],
'enqueuetimestamp' : de_serialize_payload['Queue']['EnqueueTimestamp'],
'hierarchygroups' : de_serialize_payload['Agent']['HierarchyGroups'],
'initialcontactid' : de_serialize_payload['InitialContactId'],
'initiationmethod' : de_serialize_payload['InitiationMethod'],
'initiationtimestamp' : de_serialize_payload['InitiationTimestamp'],
'instancearn' : de_serialize_payload['InstanceARN'],
'lastupdatetimestamp' : de_serialize_payload['LastUpdateTimestamp'],
'longestholdduration' : de_serialize_payload['Agent']['LongestHoldDuration'],
'nextcontactid' : de_serialize_payload['NextContactId'],
'numberofholds' : de_serialize_payload['Agent']['NumberOfHolds'],
'previouscontactid': de_serialize_payload['PreviousContactId'],
'queuearn' : de_serialize_payload['Queue']['ARN'],
'queuename' : de_serialize_payload['Queue']['Name'],
'recordingdeletionreason' : de_serialize_payload['Recording']['DeletionReason'],
'recordinglocation' : de_serialize_payload['Recording']['Location'],
'recordingstatus' : de_serialize_payload['Recording']['Status'],
'recordingtype' : de_serialize_payload['Recording']['Type'],
'routingprofilearn' : de_serialize_payload['Agent']['RoutingProfile']['ARN'],
'routingprofilename' : de_serialize_payload['Agent']['RoutingProfile']['Name'],
'scheduledtimestamp' : de_serialize_payload['ScheduledTimestamp'],
'systemendpointaddress' : de_serialize_payload['SystemEndpoint']['Address'],
'systemendpointtype' : de_serialize_payload['SystemEndpoint']['Type'],
'transfercompletedtimestamp' : de_serialize_payload['TransferCompletedTimestamp'],
'transferredtoendpoint' : de_serialize_payload['TransferredToEndpoint']['Address'],
'username' : de_serialize_payload['Agent']['Username'],
'voiceidresult' : de_serialize_payload['VoiceIdResult'],
'id' : de_serialize_payload['ContactId']
}
columns = required_data.keys()
print(columns)
values = [required_data[column] for column in columns]
print(values)
insert_statement = "insert into public.ctr (%s) values %s;"
cur.execute(insert_statement, (AsIs(','.join(columns)), tuple(values)))
print(cur.mogrify(insert_statement, (AsIs(','.join(columns)), tuple(values))))
conn.commit()
count = cur.rowcount
print(count, "Record inserted successfully into mobile table")
print("Agent, customer endpoint, transfer endpoint and recording data is available")
After one successful iteration it again starts iterating. I have spent more than two days on it and didn't figure out what's the problem.
I would really appreciate if someone guides me and sort out this query.

The issue was in my code. I was not ending my function successfully. It is Kinesis behavior if you are not ending your function successfully (200 OK) then kinesis reinvokes your function several times. So it it necessary to end your function properly.

Related

(Pymongo) Commit and rollback issue

(Pymongo) several functions cover with Commit and rollback, and one of them stll proceed while whole function should stop
I read the manual, that all the example commit and rollback only cover two operation, is that the limit?, usually should contain 3 or more operations and either operate in the same time or not operate if error https://pymongo.readthedocs.io/en/stable/api/pymongo/client_session.html
I tried to contain 3 operation inside commit and rollback but
mycol_two.insert_one() didn't stop proceed like other function when error occur
brief description:
I have three collections in same DB
collection "10_20_cash_all"
collection "10_20_cash_log"
collection "10_20_cash_info"
commit and rollback on line 39 to 44
line 42 print( 3/0 ) , I intent to make an error, expect all function would stop proceed
import pymongo
import datetime
import json
from bson.objectid import ObjectId
from bson import json_util
import re
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["(practice_10_14)-0004444"]
mycol_one = mydb["10_20_cash_all"]
mycol_two = mydb["10_20_cash_log"]
mycol_3rd = mydb["10_20_cash_info"]
# already store 100$ in bank
# doc_two = {"ID" : 100998 , "Cash_log$" : 5 } # withdraw 70$ from bank
doc_two = input("Enter ID and log amount$: ")
doc_3rd = input("Enter extra info: ")
doc_two_dic = json.loads(doc_two)
doc_3rd_dic = json.loads(doc_3rd)
# doc_3rd = {"note" : "today is good" }
ID_input = doc_two_dic['ID']
print("ur id is :" + str(ID_input))
doc_one = {"ID" : ID_input}
with myclient.start_session() as s:
cash_all_result = mycol_one.find_one(doc_one, session=s)
def cb(s):
try:
while True:
cash_all_result = mycol_one.find_one(doc_one, session=s)
mycol_two.insert_one(doc_two_dic, session=s)
print( 3/0 )
mycol_3rd.insert_one(doc_3rd_dic, session=s)
print( "now total is :" + str(cash_all_result['Cash_$']) )
Cash_total_int = int(cash_all_result['Cash_$'])
log_int = int(doc_two_dic['Cash_log$'])
if Cash_total_int < log_int:
print("error: withdraw is over ur balance")
break
new_Cash_total = Cash_total_int - log_int
print("now total is :" + str(new_Cash_total))
newvalues_json = { "$set" : {"Cash_$" : new_Cash_total } }
mycol_one.update_one(doc_one , newvalues_json, session=s)
fail_condition_json = {"ok" : 1 , "fail reason" : "no error "}
print(fail_condition_json)
return fail_condition_json
except Exception as e:
fail_condition_json = {"ok" : 0 , "fail reason" : "error raise on start_session()"}
print(fail_condition_json)
return fail_condition_json
s.with_transaction(cb)
command prompt:
Enter ID and log amount$: {"ID" : 100998 , "Cash_log$" : 5 }
Enter extra info: {"note" : "today is good" }
ur id is :100998
{'ok': 0, 'fail reason': 'error raise on start_session()'}
the "10_20_cash_log" still store new value which shoud empty/not run like '"10_20_cash_info"' is empty
{
"_id" : ObjectId("635262e502725626c39cbe9e"),
"ID" : 100998,
"Cash_log$" : 5
}

Combine two list into a Dict, Tuple

I am creating a rest-api using the Python+Flask_Restful combo, find it amazing.
Currently, i am allowing user to run a sql-query using the browser and it works fine, but the problem is, headers information is not displayed in the response.
Here is the code that i am using:
class RunCustomeSQL(Resource):
def get(self, enter_db_name, query):
if not os.path.isfile(enter_db_name+'.db'):
raise BadRequest("Database Doesn't Exist. Please select a valid database")
conn = sqlite3.connect(enter_db_name+'.db')
search_out = []
cursor = conn.execute(query)
row = None
for row in cursor:
search_out.append(row)
if not row: #This means empty response
raise BadRequest("No Results Found")
conn.commit()
conn.close()
return search_out
While, this code works great, it doesn't print the header values in the json-response. The current response is :
[
[
"dusiri_bibi",
"11",
"----------",
" srt/None ",
"14.30 MB",
"2017-12-13 23:43:54",
"C:/Test_Software/vc_redist.x64.exe"
],
]
Expected Output :
[
[
"Machine Name" : "dusiri_bibi",
"LABEL" : "11",
"PERMISSIONS" : "----------",
"USER" : " srt/None ",
"SIZE" : "14.30 MB",
"CREATED" : "2017-12-13 23:43:54",
"FILENAME" : "C:/Test_Software/vc_redist.x64.exe"
],
]
All the above text such as "machine name, label etc." are my table headers, I am not sure how to print them along with my output.
What if the user runs select user, size from table_name only
or
What if the user runs select * from table_name
In both scenario's, the output should display the table headers
Thanks
UPDATE #1 (25 April) : I managed to answer my first question and able to display a proper json response if the user selects the SELECT * statement in SQL but still facing issue with the second piece of it
Here is the answer to first part if anyone is looking for it : Using Regex
row = None
if re.search(r'(?<=SELECT)(.*)(?=FROM)',query, re.IGNORECASE).group(1) == ' * ':
for row in cursor:
search_out.append({'NAME' : row[0], 'LABEL_NUMBER' : row[1], 'PERM' : row[2], 'USER' : row[3] , 'SIZE' : row[4], 'DATE' : row[5], 'FILENAME' : row[6]})
if not row: #This means empty response
raise BadRequest("No Results Found")
Part II : Unanswered Query:
For the second piece, i now have two list :
list_1 : [[u'LABEL_NUMBER', u'PERM', u'FILENAME']]
list_2 : [(u'11', u'----------', u'C:/Test_Software/26.avi'), (u'11', u'----------', u'C:/Test_Software/6.avi'), (u'11', u'-rwx------', u'C:/Test_Software/Debug/Current_Frame1.avi'), (u'10', u'-rwxrwx---', u'C:/Windows/WinSxS/boxed-split.avi')]
As you can see, i have two list and i want to combine them into a dict to show the response like this:
[
{
LABEL_NUMBER : '11' ,
PERM : '-----------',
FILENAME : 'C:/Test_Software/26.avi'
},
...
....
......
{
LABEL_NUMBER : '10' ,
PERM : '-rwxrwx---',
FILENAME : 'C:/Windows/WinSxS/boxed-split.avi'
},
]
i am using the following code to do the same :
chunks = [list_2[idx:idx+3] for idx in range(0, len(list_2), 3)]
output = []
for each in chunks:
output.append(dict(zip(list_1, each)))
print(output)
But, this is failing with "TypeError: unhashable type: 'list'", i understand that lists are mutable and which is why i am getting this error but then how can i get the desired dict response? what am i doing wrong here?
You can use a list comprehension combined with zip for this:
list_1 = [[u'LABEL_NUMBER', u'PERM', u'FILENAME']]
list_2 = [(u'11', u'----------', u'C:/Test_Software/26.avi'), (u'11', u'----------', u'C:/Test_Software/6.avi'), (u'11', u'-rwx------', u'C:/Test_Software/Debug/Current_Frame1.avi'), (u'10', u'-rwxrwx---', u'C:/Windows/WinSxS/boxed-split.avi')]
d = [dict(zip(list_1[0], i)) for i in list_2]
Result:
[{'FILENAME': 'C:/Test_Software/26.avi',
'LABEL_NUMBER': '11',
'PERM': '----------'},
{'FILENAME': 'C:/Test_Software/6.avi',
'LABEL_NUMBER': '11',
'PERM': '----------'},
{'FILENAME': 'C:/Test_Software/Debug/Current_Frame1.avi',
'LABEL_NUMBER': '11',
'PERM': '-rwx------'},
{'FILENAME': 'C:/Windows/WinSxS/boxed-split.avi',
'LABEL_NUMBER': '10',
'PERM': '-rwxrwx---'}]

PyMongo query not returning results although the same query returns results in mongoDB shell

import pymongo
uri = 'mongodb://127.0.0.1:27017'
client = pymongo.MongoClient(uri)
db = client.TeamCity
students = db.students.find({})
for student in students:
print (student)
Python Result:
Blank
MongoDB: Results
db.students.find({})
{ "_id" : ObjectId("5788483d0e5b9ea516d4b66c"), "name" : "Jose", "mark" : 99 }
{ "_id" : ObjectId("57884cb3f7edc1fd01c3511e"), "name" : "Jordan", "mark" : 100
}
import pymongo
uri = 'mongodb://127.0.0.1:27017'
client = pymongo.MongoClient(uri)
db = client.TeamCity
students = db.students.find({})
print (students.count())
Python Result:
0
mongoDB Results
db.students.find({}).count()
2
What am I missing?
For
import pymongo
uri = 'mongodb://127.0.0.1:27017'
client = pymongo.MongoClient(uri)
db = client.TeamCity
students = db.students.find({})
print (students)
Python Result :
So I think it is able to connect to the db successfully but not returning results
Try your pymongo code like so, i.e. changing TeamCity to Teamcity
Print all students:
import pymongo
uri = 'mongodb://127.0.0.1:27017'
client = pymongo.MongoClient(uri)
db = client.Teamcity
students = db.students.find({})
for student in students:
print (student)
Count all students:
import pymongo
uri = 'mongodb://127.0.0.1:27017'
client = pymongo.MongoClient(uri)
db = client.Teamcity
students = db.students.find({})
print (students.count())
I know this question has been answered long ago, but I ran into the same kind of problem today and it happened to have a different reason, so I'm adding an answer here.
Code working on shell:
> db.customers.find({"cust_id": 2345}, {"pending_questions": 1, _id: 0})
{ "pending_questions" : [ 1, 5, 47, 89 ] }
Code not working in PyMongo (cust_id set through a web form):
db.customers.find({"cust_id": cust_id}, {"pending_questions": 1, "_id": 0})
It turned out that the numbers in the shell are being interpreted as ints, whereas the numbers used in Python code are being interpreted by PyMongo as floats, and hence return no matches. This proves the point:
cust_id = int(request.args.get('cust_id'))
db.customers.find({"cust_id": cust_id}, {"pending_questions": 1, "_id": 0})
which produces the result:
[1.0, 5.0, 47.0, 89.0]
The simple solution was to typecast everything to int in the python code. In conclusion, the data type inferred by the shell may be different from the data type inferred by PyMongo and this may be one reason that a find query that returns results on the shell doesn't return anything when run in PyMongo.

pymongo $set on array of subdocuments

I have a pymongo collection in the form of:
{
"_id" : "R_123456789",
"supplier_ids" : [
{
"id" : "S_987654321",
"file_version" : ISODate("2016-03-15T00:00:00Z"),
"latest" : false
},
{
"id" : "S_101010101",
"file_version" : ISODate("2016-03-29T00:00:00Z"),
"latest" : true
}
]
}
when I get new supplier data, if the supplier ID has changed, I want to capture that by setting latest on the previous 'latest' to False and the $push the new record.
$set is not working as I am trying to employ it (commented code after 'else'):
import pymongo
from dateutil.parser import parse
new_id = 'S_323232323'
new_date = parse('20160331')
with pymongo.MongoClient() as client:
db = client.transactions
collection_ids = db.ids
try:
collection_ids.insert_one({"_id": "R_123456789",
"supplier_ids": ({"id": "S_987654321",
"file_version": parse('20160315'),
"latest": False},
{"id": "S_101010101",
"file_version": parse('20160329'),
"latest": True})})
except pymongo.errors.DuplicateKeyError:
print('record already exists')
record = collection_ids.find_one({'_id':'R_123456789'})
for supplier_id in record['supplier_ids']:
print(supplier_id)
if supplier_id['latest']:
print(supplier_id['id'], 'is the latest')
if supplier_id['id'] == new_id:
print(new_id, ' is already the latest version')
else:
# print('setting', supplier_id['id'], 'latest flag to False')
# <<< THIS FAILS >>>
# collection_ids.update_one({'_id':record['_id']},
# {'$set':{'supplier_ids.latest':False}})
print('appending', new_id)
data_to_append = {"id" : new_id,
"file_version": new_date,
"latest": True}
collection_ids.update_one({'_id':record['_id']},
{'$push':{'supplier_ids':data_to_append}})
any and all help is much appreciated.
This whole process seems unnaturally verbose - should I be using a more streamlined approach?
Thanks!
You can try with positional operators.
collection_ids.update_one(
{'_id':record['_id'], "supplier_ids.latest": true},
{'$set':{'supplier_ids.$.latest': false}}
)
This query will update supplier_ids.latest = false, if it's true in document and matches other conditions.
The catch is you have to include field array as part of condition too.
For more information see Update

Python/Bottle: sending JSON object via post

I am running into an issue that I can't seem to get past. Any insight would be great.
The script is supposed to get memory allocation information from a database, and return that information as a formatted JSON object. The script works fine when I give it a static JSON object will stack_ids (the information I would be passing) but it won't work when I try to pass the information via POST.
Although the current state of my code uses request.json("") to access the passed data, I have also tried request.POST.get("").
My HTML includes this post request, using D3's xhr post:
var stacks = [230323, 201100, 201108, 229390, 201106, 201114];
var stack_ids = {'stack_ids': stacks};
var my_request = d3.xhr('/pie_graph');
my_request.header("Content-Type", "application/json")
my_request.post(stack_ids, function(stuff){
stuff = JSON.parse(stuff);
var data1 = stuff['allocations'];
var data2 = stuff['allocated bytes'];
var data3 = stuff['frees'];
var data4 = stuff['freed bytes'];
...
...
}, "json");
while my server script has this route:
#views.webapp.route('/pie_graph', method='POST')
def server_pie_graph_json():
db = views.db
config = views.config
ret = {
'allocations' : [],
'allocated bytes' : [],
'frees' : [],
'freed bytes' : [],
'leaks' : [],
'leaked bytes' : []
}
stack_ids = request.json['stack_ids']
#for each unique stack trace
for pos, stack_id in stack_ids:
stack = db.stacks[stack_id]
nallocs = format(stack.nallocs(db, config))
nalloc_bytes = format(stack.nalloc_bytes(db, config))
nfrees = format(stack.nfrees(db, config))
nfree_bytes = format(stack.nfree_bytes(db, config))
nleaks = format(stack.nallocs(db, config) - stack.nfrees(db, config))
nleaked_bytes = format(stack.nalloc_bytes(db, config) - stack.nfree_bytes(db, config))
# create a dictionary representing the stack
ret['allocations'].append({'label' : stack_id, 'value' : nallocs})
ret['allocated bytes'].append({'label' : stack_id, 'value' : nalloc_bytes})
ret['frees'].append({'label' : stack_id, 'value' : nfrees})
ret['freed bytes'].append({'label' : stack_id, 'value' : nfree_bytes})
ret['leaks'].append({'label' : stack_id, 'value' : nleaks})
ret['leaked bytes'].append({'label' : stack_id, 'value' : nfree_bytes})
# return dictionary of allocation information
return ret
Most of that can be ignored, the script works when I give it a static JSON object full of data.
The request currently returns a 500 Internal Server Error: JSONDecodeError('Expecting value: line 1 column 2 (char 1)',).
Can anyone explain to me what I am doing wrong?
Also, if you need me to explain anything further, or include any other information, I am happy to do that. My brain is slightly fried after working on this for so long, so I may have missed something.
Here is what I do with POST and it works:
from bottle import *
#post('/')
def do_something():
comment = request.forms.get('comment')
sourcecode = request.forms.get('sourceCode')
Source
function saveTheSourceCodeToServer(comment) {
var path = saveLocation();
var params = { 'sourceCode' : getTheSourceCode() , 'comment' : comment};
post_to_url(path, params, 'post');
}
Source with credits to JavaScript post request like a form submit

Categories

Resources