how to create a custom metric in data dog using aws lambda - python

I am using aws lambda to calculate number of days for rds maintenance. Now i get an integer and i would like to send this to datadog so that it creates a metric. I am new to datadog and not sure how to do it. I already created lambda layer for datadog. Here is my code , since my lambda is doing alot of other stuff so i will only include the problematic block
import boto3
import json
import collections
import datetime
from dateutil import parser
import time
from datetime import timedelta
from datetime import timezone
from datadog_lambda.metric import lambda_metric
from datadog_lambda.wrapper import datadog_lambda_wrapper
import urllib.request, urllib.error, urllib.parse
import os
import sys
from botocore.exceptions import ClientError
from botocore.config import Config
from botocore.session import Session
import tracemalloc
<lambda logic, only including else block at which point i want to send the data>
print("WARNING! ForcedApplyDate is: ", d_fapply )
rdsForcedDate = parser.parse(d_fapply)
print (rdsForcedDate)
current_dateTime = datetime.datetime.now(timezone.utc)
difference = rdsForcedDate - current_dateTime
#print (difference)
if difference < timedelta(days=7):
rounded = difference.days
print (rounded)
lambda_metric(
"rds_maintenance.days", # Metric name
rounded, # Metric value
tags=['key:value', 'key:value'] # Associated tags
)
here i would like to send the number of days which could be 5, 10 15 any number. I have also added lambda extension layer, function runs perfectly but i dont see metric in DD
I have also tried using this
from datadog import statsd
if difference < timedelta(days=7):
try:
rounded = difference.days
statsd.gauge('rds_maintenance_alert', round(rounded,
2))
print ("data sent to datadog")
except Exception as e:
print(f"Error sending metric to Datadog: {e}")
Again i dont get error for this block too but cant see metric. Data dog api key, site are in lambda env variables

Related

Azure Form Recognizer error Atribute error when making the request

I'm trying to follow this document for the form recognizer API,
specifically the example for Recognize receipts:
https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/client-library?pivots=programming-language-python&tabs=windows
I'm trying the following code:
import sys
import logging
from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential
import os
import azure.ai.formrecognizer
endpoint = r"https://form-recognizer-XXXXX-test.cognitiveservices.azure.com/"
form_recognizer_client = FormRecognizerClient(endpoint=endpoint, credential="XXXXXXXXX")
receiptUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/receipt/contoso-receipt.png"
poller = form_recognizer_client.begin_recognize_receipts_from_url(receiptUrl)
receipts = poller.result()
And getting this error:
request.http_request.headers[self._name] = self._credential.key
AttributeError: 'str' object has no attribute 'key'
The difference I see is that in the example the endpoint and key are called as attributes of a class:
form_recognizer_client = FormRecognizerClient(endpoint=self.endpoint, credential=AzureKeyCredential(self.key))
But I do not see where does the "self." comes from and how is that the value is not a string.
I agree that it is a bit unclear in the quickstart where that key is coming from. In the example, the API key is getting set as a class variable (where the self is coming from), but you do not need to do this to get your code working.
For successful authentication, the string API key "XXXXXXXXX" must be wrapped in the credential class AzureKeyCredential. I've updated your code below to do this, please let me know if it works for you:
import sys
import logging
from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential
import os
import azure.ai.formrecognizer
endpoint = r"https://form-recognizer-XXXXX-test.cognitiveservices.azure.com/"
form_recognizer_client = FormRecognizerClient(endpoint=endpoint,
credential=AzureKeyCredential("XXXXXXXXX"))
receiptUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-
python/master/sdk/formrecognizer/azure-ai-
formrecognizer/tests/sample_forms/receipt/contoso-receipt.png"
poller = form_recognizer_client.begin_recognize_receipts_from_url(receiptUrl)
receipts = poller.result()

Does APScheduler Need a Function to Run?

I'm new to coding and this my first project. So far I've pieced together what I have through Googling, Tutorials and Stack.
I'm trying to add data from a pandas df of scraped RSS feeds to a remote sql database, then host the script on heroku or AWS and have the script running every hour.
Someone on here recommend that I use APScheduler as in this post.
I'm struggling though as there aren't any 'dummies' tutorials around APScheduler. This is what I've created so far.
I guess my question is does my script need to be in a function for APScheduler to trigger it or can it work another way.
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
#sched.scheduled_job('interval', minutes=1)
sched.configure()
sched.start()
import pandas as pd
from pandas.io import sql
import feedparser
import time
rawrss = ['http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml',
'https://www.yahoo.com/news/rss/',
'http://www.huffingtonpost.co.uk/feeds/index.xml',
'http://feeds.feedburner.com/TechCrunch/',
'https://www.uktech.news/feed'
]
time = time.strftime('%a %H:%M:%S')
summary = 'text'
posts = []
for url in rawrss:
feed = feedparser.parse(url)
for post in feed.entries:
posts.append((time, post.title, post.link, summary))
df = pd.DataFrame(posts, columns=['article_time','article_title','article_url', 'article_summary']) # pass data to init
df.set_index(['article_time'], inplace=True)
import pymysql
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://<username>:<host>:3306/<database_name>?charset=utf8', encoding = 'utf-8')
engine.execute("INSERT INTO rsstracker VALUES('%s', '%s', '%s','%s')" % (time, post.title, post.link, summary))
df.to_sql(con=engine, name='rsstracker', if_exists='append') #, flavor='mysql'
Yes. What you want to be executed must be a function (or another callable, like a method). The decorator syntax (#sched.…) needs a function definition (def …) to which the decorator is applied. The code in your example doesn't compile.
Then it's a blocking scheduler, meaning if you call sched.start() this method doesn't return (unless you stop the scheduler in some scheduled code) and nothing after the call is executed.
Imports should go to the top, then it's easier to see what the module depends on. And don' import things you don't actually use.
I'm not sure why you import and use pandas for data that doesn't really need DataFrame objects. Also SQLAlchemy without actually using anything this library offers and formatting values as strings into an SQL query which is dangerous!
Just using SQLAlchemy for the database it may look like this:
#!/usr/bin/env python
# coding: utf-8
from __future__ import absolute_import, division, print_function
from time import strftime
import feedparser
from apscheduler.schedulers.blocking import BlockingScheduler
from sqlalchemy import create_engine, MetaData
sched = BlockingScheduler()
RSS_URLS = [
'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml',
'https://www.yahoo.com/news/rss/',
'http://www.huffingtonpost.co.uk/feeds/index.xml',
'http://feeds.feedburner.com/TechCrunch/',
'https://www.uktech.news/feed',
]
#sched.scheduled_job('interval', minutes=1)
def process_feeds():
time = strftime('%a %H:%M:%S')
summary = 'text'
engine = create_engine(
'mysql+pymysql://<username>:<host>:3306/<database_name>?charset=utf8'
)
metadata = MetaData(engine, reflect=True)
rsstracker = metadata.tables['rsstracker']
for url in RSS_URLS:
feed = feedparser.parse(url)
for post in feed.entries:
(
rsstracker.insert()
.values(
time=time,
title=post.title,
url=post.link,
summary=summary,
)
.execute()
)
def main():
sched.configure()
sched.start()
if __name__ == '__main__':
main()
The time column seems a bit odd, I would have expected a TIMESTAMP or DATETIME here and not a string that throws away much of the information, just leaving the abbreviated week day and the time.

Querying from Soda database using Socrata client.get in Python

I am trying to query from a database and I've tried to lookup the right way to format an SoQL string but I am failing. I try the following:
from __future__ import division, print_function
from sodapy import Socrata
import pandas as pd
import numpy as np
client = Socrata("data.cityofchicago.org", None)
df = client.get("kkgn-a2j4", query="WHERE traffic > -1")
and receive an error that it Could not parse SoQL query "WHERE traffic > -1" at line 1 character 1. If I do the following, however, it works:
from __future__ import division, print_function
from sodapy import Socrata
import pandas as pd
import numpy as np
client = Socrata("data.cityofchicago.org", None)
df = client.get("kkgn-a2j4", where="traffic > -1")
But I want to know how to get the query argument to work so I can use more complex queries. Specifically, I want to try to query when traffic > -1 and BETWEEN '2013-01-19T23:50:32.000' AND '2014-12-14T23:50:32.000'.
You can use the sodapy where parameter ($where in SoQl) to combine multiple filters, just use AND to combine them:
traffic > -1 AND last_update BETWEEN '2013-01-19T23:50:32.000' AND '2014-12-14T23:50:32.000'

Python retrieving data from web HTTP 400: Bad Request Error (Too many Requests?)

I am using a python module (googlefinance) to retrieve stock information. In my code, I create a symbols list which then gets sent into a loop to collect the information for each symbol.
The symbols list contains about 3000 indexes which is why I think I am getting this error. When I try shortening the range of the loop (24 requests), it works fine. I have tried also tried using a time delay in between requests but no luck. How can I make it so that I can retrieve the information for all specified symbols without getting the HTTP 400 Error?
from googlefinance import getQuotes
import pandas as pd
import pymysql
import time
import threading
import urllib.request
def createSymbolList(csvFile):
df = pd.read_csv(csvFile)
saved_column = df['Symbol']
return saved_column
def getSymbolInfo(symbolList):
newList=[]
for i in range(int(24)):
newList.append(getQuotes(symbolList[i]))
return newList
nyseList = createSymbolList("http://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download")
try:
l=(getSymbolInfo(nyseList))
print(l)
print(len(l))
except urllib.error.HTTPError as err:
print(err)

Python Getting date online?

How can I get the current date, month & year online using Python? By this I mean, rather than getting it from the computer's date-visit a website and get it, so it doesn't rely on the computer.
So thinking about the "would be so trivial" part I went ahead and just made a google app engine web app -- when you visit it, it returns a simple response claiming to be HTML but actually just a string such as 2009-05-26 02:01:12 UTC\n. Any feature requests?-)
Usage example with Python's urllib module:
Python 2.7
>>> from urllib2 import urlopen
>>> res = urlopen('http://just-the-time.appspot.com/')
>>> time_str = res.read().strip()
>>> time_str
'2017-07-28 04:55:48'
Python 3.x+
>>> from urllib.request import urlopen
>>> res = urlopen('http://just-the-time.appspot.com/')
>>> result = res.read().strip()
>>> result
b'2017-07-28 04:53:46'
>>> result_str = result.decode('utf-8')
>>> result_str
'2017-07-28 04:53:46'
If you can't use NTP, but rather want to stick with HTTP, you could urllib.urlget("http://developer.yahooapis.com/TimeService/V1/getTime") and parse the results:
<?xml version="1.0" encoding="UTF-8"?>
<Error xmlns="urn:yahoo:api">
The following errors were detected:
<Message>Appid missing or other error </Message>
</Error>
<!-- p6.ydn.sp1.yahoo.com uncompressed/chunked Mon May 25 18:42:11 PDT 2009 -->
Note that the datetime (in PDT) is in the final comment (the error message is due to lack of APP ID). There probably are more suitable web services to get the current date and time in HTTP (without requiring registration &c), since e.g. making such a service freely available on Google App Engine would be so trivial, but I don't know of one offhand.
For this NTP server can be used.
import ntplib
import datetime, time
print('Make sure you have an internet connection.')
try:
client = ntplib.NTPClient()
response = client.request('pool.ntp.org')
Internet_date_and_time = datetime.datetime.fromtimestamp(response.tx_time)
print('\n')
print('Internet date and time as reported by NTP server: ',Internet_date_and_time)
except OSError:
print('\n')
print('Internet date and time could not be reported by server.')
print('There is not internet connection.')
In order to utilise an online time string, e.g. derived from an online service (http://just-the-time.appspot.com/), it can be read and converted into a datetime.datetime format using urllib2 and datetime.datetime:
import urllib2
from datetime import datetime
def getOnlineUTCTime():
webpage = urllib2.urlopen("http://just-the-time.appspot.com/")
internettime = webpage.read()
OnlineUTCTime = datetime.strptime(internettime.strip(), '%Y-%m-%d %H:%M:%S')
return OnlineUTCTime
or very compact (less good readable)
OnlineUTCTime=datetime.strptime(urllib2.urlopen("http://just-the-time.appspot.com/").read().strip(),
'%Y-%m-%d %H:%M:%S')
little exercise:
Comparing your own UTC time with the online time:
print(datetime.utcnow() - getOnlineUTCTime())
# 0:00:00.118403
#if the difference is negatieve the result will be something like: -1 day, 23:59:59.033398
(bear in mind that processing time is included also)
Goto timezonedb.com and create an account u will receive api key on the your email and use the api key in the following code
from urllib import request
from datetime import datetime
import json
def GetTime(zone):
ApiKey="YOUR API KEY"
webpage=request.urlopen("http://api.timezonedb.com/v2/get-time-zone?key=" +ApiKey + "&format=json&by=zone&zone="+zone)
internettime = json.loads(webpage.read().decode("UTF-8"))
OnlineTime = datetime.strptime(internettime["formatted"].strip(), '%Y-%m-%d %H:%M:%S')
return(OnlineTime)
print(GetTime("Asia/Kolkata")) #you can pass any zone region name for ex : America/Chicago
This works really well for me, no account required:
import requests
from datetime import datetime
def get_internet_datetime(time_zone: str = "etc/utc") -> datetime:
"""
Get the current internet time from:
'https://www.timeapi.io/api/Time/current/zone?timeZone=etc/utc'
"""
timeapi_url = "https://www.timeapi.io/api/Time/current/zone"
headers = {
"Accept": "application/json",
}
params = {"timeZone": time_zone}
dt = None
try:
request = requests.get(timeapi_url, headers=headers, params=params)
r_dict = request.json()
dt = datetime(
year=r_dict["year"],
month=r_dict["month"],
day=r_dict["day"],
hour=r_dict["hour"],
minute=r_dict["minute"],
second=r_dict["seconds"],
microsecond=r_dict["milliSeconds"] * 1000,
)
except Exception:
logger.exception("ERROR getting datetime from internet...")
return None
return dt
here is a python module for hitting NIST online http://freshmeat.net/projects/mxdatetime.
Perhaps you mean the NTP protocol? This project may help: http://pypi.python.org/pypi/ntplib/0.1.3
Here is the code I made for myself. I was getting a problem in linux that the date and time changes each time I switch on my PC, so instead setting again and again as the internet requires proper date. I made this script which will be used by date command to set date and time automatically through an alias.
import requests
resp = requests.get('https://www.timeapi.io/api/Time/current/zone?timeZone=etc/utc')
resp = resp.text
resp = str(resp)
resp
first = resp.find('"date":"') + 8
rp = ''
for i in range(first , 145):
rp = resp[i] + rp
print(rp[::-1]+"\n")
second = resp.find('"time":"') + 8
rp_2 = ''
for i in range(second , 160):
rp_2 = resp[i] + rp_2
print(rp_2[::-1]+"\n")

Categories

Resources