How to set Data Granularity with cm_api api.query_timeseries - python

I am trying to get files_total and dfs_capacity_used metrics for last 1 week with code shared in https://cloudera.github.io/cm_api/docs/python-client/
import time
import datetime
from_time = datetime.datetime.fromtimestamp(time.time() - 1800)
to_time = datetime.datetime.fromtimestamp(time.time())
query = "select files_total, dfs_capacity_used " \
"where serviceName = HDFS-1 " \
" and category = SERVICE"
result = api.query_timeseries(query, from_time, to_time)
ts_list = result[0]
for ts in ts_list.timeSeries:
print "--- %s: %s ---" % (ts.metadata.entityName, ts.metadata.metricName)
for point in ts.data:
print "%s:\t%s" % (point.timestamp.isoformat(), point.value)
I am getting the output. But the Data Granuality is showing on daily basis. Is there a way to get the output every 6 hours like the option in snapshot shared from Cloudera UI as below,

query_timeseries does not provide Data Granularity option. It will auto-determine by the time period that which could cover the time period that we set.
With below get function we can retrieve based on Data Granularity
api=ApiResource('CM_HOST',username='admin',password='admin')
api.get(relpath='timeseries',params={'query':'select files_total,
dfs_capacity_used where serviceName = HDFS-1 and category =
SERVICE','desiredRollup':'RAW','mustUseDesiredRollup':'True','from':'2020-08-10','to':2020-08-17})
If we would like to hardly set our granularity as 6 hourly, then we could set 'desiredRollup' as 'SIX_HOURLY' and 'mustUseDesiredRollup' as 'True' .

Related

How do I specify a start date using time at a particular time of the day with yfinance

Please am trying to get the stock date at a specify time of the date adding the time of which i want the stock data to be gotten from as well to so far i did something like this but i keep getting error saying data - F: Data doesn't exist for startDate = 1666286770, endDate = 1666287074
this is my code below
def watchlist():
timezone = pytz.timezone('US/Eastern')
print(type(timezone))
aware = dt.datetime.now(timezone).time()
print(aware)
global pastTime
pastTime = dt.datetime.now(timezone) - dt.timedelta(minutes=5) # time of 5minutes ago
print(pastTime)
for x in ticks:
toStr = str(x)
syb = yf.Ticker(toStr)
data = pd.DataFrame(syb.history(interval="1m",period='1d',))
data2 = pd.DataFrame(syb.history(interval="1m",period='1d',start=pastTime))
if data['Open'].sum() < data2['Open'].sum():
print(data['Open'].sum())
print(data2['Open'].sum())
print('Watch stock')
else:
print(toStr, 'Proceed to sell with robinhood')
watchlist()
Screen shot of issue

Call a range of dates from an API using Python

Currently writing a program using an API from MarketStack.com. This is for school, so I am still learning.
I am writing a stock prediction program using Python on PyCharm and I have written the connection between the program and the API without issues. So, I can certainly get the High, Name, Symbols, etc. What I am trying to do now is call a range of dates. The API says I can call up to 30 years of historical data, so I want to call all 30 years for a date that is entered by the user. Then the program will average the high on that date in order to give a trend prediction.
So, the problem I am having is calling more than one date. As I said, I want to call all 30 dates, and then I will do the math, etc.
Can someone help me call a range of dates? I tried installing Pandas and that wasn't being accepted by PyCharm for some reason.. Any help is greatly appreciated.
import tkinter as tk
import requests
# callouts for window size
HEIGHT = 650
WIDTH = 600
# function for response
def format_response(selected_stock):
try:
name1 = selected_stock['data']['name']
symbol1 = selected_stock['data']['symbol']
high1 = selected_stock['data']['eod'][1]['high']
final_str = 'Name: %s \nSymbol: %s \nEnd of Day ($ USD): %s' % (name1, symbol1, high1)
except:
final_str = 'There was a problem retrieving that information'
return final_str
# function linking to API
def stock_data(entry):
params = {'access_key': 'xxx'}
response = requests.get('http://api.marketstack.com/v1/tickers/' + entry + '/' + 'eod', params=params)
selected_stock = response.json()
label2['text'] = format_response(selected_stock)
# function for response
def format_response2(stock_hist_data):
try:
name = stock_hist_data['data']['name']
symbol = stock_hist_data['data']['symbol']
high = stock_hist_data['data']['eod'][1]['high']
name3 = stock_hist_data['data']['name']
symbol3 = stock_hist_data['data']['symbol']
high3 = stock_hist_data['data']['eod'][1]['high']
final_str2 = 'Name: %s \nSymbol: %s \nEnd of Day ($ USD): %s' % (name, symbol, high)
final_str3 = '\nName: %s \nSymbol: %s \nEnd of Day ($ USD): %s' % (name3, symbol3, high3)
except:
final_str2 = 'There was a problem retrieving that information'
final_str3 = 'There was a problem retrieving that information'
return final_str2 + final_str3
# function for response in lower window
def stock_hist_data(entry2):
params2 = {'access_key': 'xxx'}
response2 = requests.get('http://api.marketstack.com/v1/tickers/' + entry2 + '/' + 'eod', params=params2)
hist_data = response2.json()
label4['text'] = format_response2(hist_data)

Retrieving Dates more than Current Time in From Django Database

I am struggling to understand how the date queries work in Django as I am storing a database with train times. I want to get times that are greater than the current time.
The query looks like this, but returns zero results:
latestdepartures = LatestDepartures.objects.filter(station=startstation,earliest__gte=timezone.now().astimezone(pytz.utc))
My database has the entry below for example.
When I run the query, I get the results below (first line is print(timezone.now().astimezone(pytz.utc)):
2020-08-01 15:49:06.610055+00:00
<QuerySet []>
The code which adds the data to the database looks like:
def convert_date_time(o):
if isinstance(o, datetime):
return o.__str__()
def updateservices(stationname,destination):
now = datetime.now()
# dd/mm/YY H:M:S
datenow = now.strftime("%d/%m/%Y")
board = DARWIN_SESH.get_station_board(stationname)
stationdict = dict()
stationdict['from'] = stationname
stationdict['name'] = board.location_name
stationdict['servicelist']=[]
services = board.train_services
for s in services:
traindict = dict()
service_details = DARWIN_SESH.get_service_details(s.service_id)
traindict['departuretime'] = datetime.strptime(datenow + " " + service_details.std,'%m/%d/%Y %H:%M').astimezone(pytz.utc)
traindict['callingpoints'] = []
callingpoints = service_details.subsequent_calling_points
for c in callingpoints:
if c.crs == destination:
callingpointdict = dict()
callingpointdict['code'] = c.crs
callingpointdict['name'] = c.location_name
callingpointdict['arrivaltime'] = datetime.strptime(datenow + " " + c.st,'%m/%d/%Y %H:%M').astimezone(pytz.utc)
traindict['callingpoints'].append(callingpointdict)
if len(traindict['callingpoints']) > 0:
stationdict['servicelist'].append(traindict)
#For getting the minimum departure
departures = [s['departuretime'] for s in stationdict['servicelist']]
#Store the train departure object in the database
stationdata = json.dumps(stationdict, default=convert_date_time)
LatestDepartures.objects.create(
station = stationname,
earliest = min(departures),
services = stationdata
)
return stationdata
servicedetails.std will be a time represented in 24hours in string format, for example "17:00".
Can anyone help, I am not sure if I am meant to change the date format somewhere or if it is to do with the way the datetime object is created by adding the time.
UPDATE:
Now storing the date in a different format as '%d/%m/%Y %H:%M':
Now I get dates that are greater than, but once the time current time has exceeded the earliest in the database, the query still returns results. Example output is:
2020-08-01 17:31:21.909052+00:00
print(timezone.now().astimezone(pytz.utc))
2020-08-01 18:03:00+00:00 - Time in database

Google Calendar API event always minusing one day

I am creating dict obj and sending it up to google calendar as with googles own example on their API documentation. I read a mssql database and then produce a csv file of the results. I then use the cdv information to write the events.
Snippets from my code.
def count_leaveduration(sdate, fdate):
try:
date_format = "%Y.%m.%d"
cmp_sdate = datetime.strptime(sdate, date_format)
cmp_fdate = datetime.strptime(fdate, date_format)
delta = cmp_fdate - cmp_sdate
return delta.days, cmp_sdate, cmp_fdate
except Exception as e:
input_logging('error', 'Cannot Count Leave Duration - Exception: %s' % e)
duration, sdate, fdate = count_leaveduration(line['FIRSTDAYOFABSENCE'], line['LASTDAYOFABSENCE'])
event['summary'] = '%s - Leave' % line['NAME1']
event['location'] = 'Out Of Office'
# date type here instead because all day event for duration.
event['start'] = {'date': '%s' % sdate.strftime('%Y-%m-%d')}
event['end'] = {'date': '%s' % fdate.strftime('%Y-%m-%d')}
event['attendees'] = [{'email': line['ELECTRONICMAILADDRESS']}]
appbuildobj.events().insert(calendarId=robj, body=event).execute()
The entries work fine in general but if the duration is longer than one day it seems to chop a day off on the calendar entry even though the date should be the finish date which is being presented to the calendar event dict. Of course the work around is to do something like this:-
fdate = fdate + timedelta(days=1)
However, Id like to know if anyone knows the reason for this happening?
It's hidden but I've found it:-
https://developers.google.com/google-apps/calendar/concepts
Such an event starts on startDate and ends the day before endDay. For example, a one-day event should have its start date set to day and its end date set to day + 1.
So I have fixed this with:-
fdate = fdate + timedelta(days=1)

Import RRD data into Python Data Structures

Does anyone have a good method for importing rrd data into python? The only libraries I have found so far are just wrappers for the command line or provide importing data into rrd and graphing it.
I am aware of the export and dump options of rrd, but I am wondering if someone has already done the heavy lifting here.
Here's an excerpt from a script that I wrote to get cacti rrd data out. It's not likely to be exactly what you want, but it might give you a good start. The intent of my script is to turn Cacti into a data warehouse, so I tend to pull out a lot of averages, max, or min data. I also have some flags for tossing upper or lower range spikes, multiplying in case I want to turn "bytes/sec" into something more usable, like "mb/hour"...
If you want exact one-to-one data copy, you might need to tweak this a bit.
value_dict = {}
for file in files:
if file[0] == '':
continue
file = file[0]
value_dict[file] = {}
starttime = 0
endtime = 0
cmd = '%s fetch %s %s -s %s -e %s 2>&1' % (options.rrdtool, file, options.cf, options.start, options.end)
if options.verbose: print cmd
output = os.popen(cmd).readlines()
dsources = output[0].split()
if dsources[0].startswith('ERROR'):
if options.verbose:
print output[0]
continue
if not options.source:
source = 0
else:
try:
source = dsources.index(options.source)
except:
print "Invalid data source, options are: %s" % (dsources)
sys.exit(0)
data = output[3:]
for val in data:
val = val.split()
time = int(val[0][:-1])
val = float(val[source+1])
# make sure it's not invalid numerical data, and also an actual number
ok = 1
if options.lowerrange:
if val < options.lowerrange: ok = 0
if options.upperrange:
if val > options.upperrange: ok = 0
if ((options.toss and val != options.toss and val == val) or val == val) and ok:
if starttime == 0:
# this should be accurate for up to six months in the past
if options.start < -87000:
starttime = time - 1800
else:
starttime = time - 300
else:
starttime = endtime
endtime = time
filehash[file] = 1
val = val * options.multiply
values.append(val)
value_dict[file][time] = val
seconds = seconds + (endtime - starttime)

Categories

Resources