Date-time always giving 1200HRS - python

I'm working on a chatbot and using Dialogflow's sys.date-time entity. So when I'm presenting some dates to the bot, like "Today", or "Feb 14", I always get
"parameters": {
"date-time": "2021-02-14T12:00:00Z"
}
whereas I want
"parameters": {
"date-time": "2021-02-14T00:00:00Z"
}
Right now I'm using my app to replace the datetime with hours=0, however, I also want the bot to give
"parameters": {
"date-time": "2021-02-14T08:00:00Z"
}
when I say "feb 14 8AM" (hour is explictly mentioned), but the app will replace the hours. So gonna have to fix it from Dialogflow side. Any solutions please?

for Dialogflow system entity #sys.date-time this is the default behavior as per ISO-8601 format , what you can do as the fix for this is to instead of using #sys.date-time follow these steps :
use two separate parameters for date and time #sys.date and #sys.time.
Make #sys.date as required and #sys.time as optional.
Set some appropriate default value for #sys.time as 00:00:00 so when time is not given like FEB 14 it will take time as default value .
In your app you can use these two values .
Sample intent
Hope it helps !

I was able to solve this semantic bug by properly analysing the value received for #sys.date-time built-in entity of Dialogflow with multiple phrases.
When just a day/date is mentioned (say Feb 14 2020), this is the data I receive:
"parameters": {
"date-time": "2020-02-14T12:00:00Z"
}
When I mention time along with the date explicitly (say Feb 14 3pm), this is the data received:
"parameters": {
"date-time": {
"date_time": "2021-02-14T15:00:00Z"
}
}
The difference here is that in the second case, a nested dictionary is presented in date-time.
Distinguishing between that, I was able to slice the ISO-8601 format on my app and replace 12:00:00 with 00:00:00, but if I received a dictionary instead, I did not slice/replace it.

Related

Set context from custom XBRL file

I'm able to read a custom XBRL file. The problem is that the parsed object has the amounts of the initial period (last december) and not the last accountable period.
from xbrl import XBRLParser, GAAP, GAAPSerializer
# xbrl comes from python-xbrl package
xbrl_parser = XBRLParser()
with open('filename.xbrl') as file:
xbrl = xbrl_parser.parse(file)
custom_obj = xbrl_parser.parseCustom(xbrl)
print(custom_obj.cashandcashequivalents)
This prints the cash of 2021/12 not 2022/06 as expected
Current output: 100545101000
Expected: 81518021000
I think those number are the ones you can see in lines 9970 and 9972 of xbrl file.
These are the lines:
9970: <ifrs-full:CashAndCashEquivalents decimals="-3" contextRef="CierreTrimestreActual" unitRef="CLP">81518021000</ifrs-full:CashAndCashEquivalents>
9972: <ifrs-full:CashAndCashEquivalents decimals="-3" contextRef="SaldoActualInicio" unitRef="CLP">100545101000</ifrs-full:CashAndCashEquivalents>
How can I set the context/contextRef so the custom_obj has the numbers of the latest periods?
XBRL file: https://www.cmfchile.cl/institucional/inc/inf_financiera/ifrs/safec_ifrs_verarchivo.php?auth=&send=&rut=70016160&mm=06&aa=2022&archivo=70016160_202206_C.zip&desc_archivo=Estados%20financieros%20(XBRL)&tipo_archivo=XBRL
I've never used python-xbrl, but from a quick look at the source code it looks very basic and makes lots of unwarranted assumptions about the structure of the document. It doesn't appear to have any support for XBRL Dimensions, which the report you're using makes use of.
The module isn't built on a proper model of the XBRL data which would give you easy access to each fact's properties such as the period, and allow you to easily filter down to just the facts that you want.
I don't think the module will allow you to do what you want. Looking at this code it just iterates over all the facts, and sticks them onto properties on an object, so whichever fact it hits last in the document will be the one that you get, and given that order isn't important in XBRL files, it's just going to be pot luck which one you get.
I'd strongly recommend switching to a better XBRL library. Arelle is probably the most widely used, although you could also use my own pxp.
As an example, either tool can be used to convert the XBRL to JSON format, and will give you facts like this:
"f126928": {
"value": "81518021000",
"decimals": -3,
"dimensions": {
"concept": "ifrs-full:CashAndCashEquivalents",
"entity": "scheme:70016160-9",
"period": "2022-07-01T00:00:00",
"unit": "iso4217:CLP"
}
},
"f126930": {
"value": "100545101000",
"decimals": -3,
"dimensions": {
"concept": "ifrs-full:CashAndCashEquivalents",
"entity": "scheme:70016160-9",
"period": "2022-01-01T00:00:00",
"unit": "iso4217:CLP"
}
},
With this, you can then sort the facts by period, and then select the most recent one. Of course, you can do the same directly via the Python interfaces in these tools, rather than going via JSON.

Is it possible to return Date math with elastic?

I find myself using relative times to define ranges such as now-3d to now. Elastic converts that to usable timestamps and returns values in that range, but it doesnt return the date itself. Is it possible for elastic to give you the date range?
What I had in mind was essentially:
GET url/time_math
{
"range":
{
"from": "now-1w/w",
"to": "now/w"
}
}
output:
{
"range":
{
"from": "2022-03-13T00:00:00",
"to": "2022-03-19T23:59:59"
}
}
I am using python / python elastic library to do these queries. If I can do it in the scripting lang, would work too.
Elastic is able to determine these timestamps for relative dates really well, can be difficult to compute from api/front end at times to set for custom graphs.

Getting indexes with timestamp but adding custom hours/minutes/ in elasticsearch Python

I've been trying to receive all indexes from 7 days ago to now using this type of query:
query = {'query': {
'bool': {
'filter': [
{'range': {'#timestamp':{'gte': now-7d/d,'lte': now/d}}},
]
}
}
}
The problem is I need to get them from let's say: 12 am (midnight) to 11:59 pm. Note: the datetime 'now' can't be hardcoded; it needs to have the exact day, when the script is run. Is it possible to do it without using datetime relying only on built in "data math" in elasticsearch api for Python?
EDIT: To clarify, I need the exact hour to be set to provide exact intervals. Example: getting data at with timestamp between 11:30 am to 12:00 and so on (with 30 minutes interval).
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#range-query-date-math-rounding and https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math goes into this
you can't round to the half hour though sorry

Binance API Python - How to use a specific output

When i let my bot place an order, it gives me something like the following output:
[{
"symbol": "BNBBTC",
"orderId": 3301945,
"clientOrderId": "6gCrw2kRUAF9CvJDGP16IP",
"transactTime": 1507725176595,
"price": "0.00000000",
"origQty": "10.00000000",
"executedQty": "10.00000000",
"status": "FILLED",
"timeInForce": "GTC",
"type": "LIMIT",
"side": "SELL"
}]
I want my bot to automatically be able to fetch the orderId so that it is able to continue working with that by itself without me manually typing in the Id.
For example, if i want to cancel that order:
result = client.cancel_order(
symbol='BNBBTC',
orderId='orderId')
I'd need to ask for the Id first, replace that 'orderId' and run again to be able to cancel the order. There has to be a way to automate this, right?
I suggest looking at some basic tutorials on dictionaries. Getting values of keys is and should be the first thing you learn.
In your case with a dictionary as provided, the structure is very plain. So to get the value of orderId you can just use your_dictionary.get("orderId").
Note I use .get instead of dict[key], this way if there is no orderId in your dictionary the console will only output None. Whereas if I use dict[key] and there is no such key, we will get a KeyError.

How to ensure all data is captured in ES API?

I am trying to create an API in Python to pull the data from ES and feed it in a data warehouse. The data is live and being filled every second so I am going to create a near-real-time pipeline.
The current URL format is {{url}}/{{index}}/_search and the test payload I am sending is:
{
"from" : 0,
"size" : 5
}
On the next refresh it will pull using payload:
{
"from" : 6,
"size" : 5
}
And so on until it reaches the total amount of records. The PROD environment has about 250M rows and I'll set the size to 10 K per extract.
I am worried though as I don't know if the records are being reordered within ES. Currently, there is a plugin which uses a timestamp generated by the user but that is flawed as sometimes documents are being skipped due to a delay in the jsons being made available for extract in ES and possibly the way the time is being generated.
Does anyone know what is the default sorting when pulling the data using /_search?
I suppose what you're looking for is a streaming / changes API which is nicely described by #Val here and also an open feature request.
I the meantime, you cannot really count on the size and from parameters -- you could probably make redundant queries and handle the duplicates before they reach your data warehouse.
Another option would be to skip ES in this regard and stream directly to the warehouse? What I mean is, take an ES snapshot up until a given time once (so you keep the historical data), feed it to the warehouse, and then stream directly from where ever you're getting your data to the warehouse.
Addendum
AFAIK the default sorting is by the insertion date. But there's no internal _insertTime or similar.. You can use cursors -- it's called scrolling and here's a py implementation. But this goes from the 'latest' doc to the 'first', not vice versa. So it'll give you all the existing docs but I'm not so sure about the newly added ones while you were scrolling. You'd then wanna run the scroll again which is suboptimal.
You could also pre-sort your index which should work quite nicely for your use case when combined w/ scrolling.
Thanks for the responses. After consideration with my colleagues, we decided to implement and use the _ingest API instead to create a pipeline in ES which inserts the server document ingestion date on each doc.
Steps:
Create the timestamp pipeline
PUT _ingest/pipeline/timestamp_pipeline
{
"description" : "Inserts timestamp field for all documents",
"processors" : [
{
"set" : {
"field": "insert_date",
"value": "{{_ingest.timestamp}}"
}
}
]
}
Update indexes to add the new default field
PUT /*/_settings
{
"index" : {
"default_pipeline": "timestamp_pipeline"
}
}
In Python then I use the _scroll API like so:
es = Elasticsearch(cfg.esUrl, port = cfg.esPort, timeout = 200)
doc = {
"query": {
"range": {
"insert_date": {
"gte": lastRowDateOffset
}
}
}
}
res = es.search(
index = Index,
sort = "insert_date:asc",
scroll = "2m",
size = NumberOfResultsPerPage,
body = doc
)
Where lastRowDateOffset is the date of the last run

Categories

Resources