Getting indexes with timestamp but adding custom hours/minutes/ in elasticsearch Python - python

I've been trying to receive all indexes from 7 days ago to now using this type of query:
query = {'query': {
'bool': {
'filter': [
{'range': {'#timestamp':{'gte': now-7d/d,'lte': now/d}}},
]
}
}
}
The problem is I need to get them from let's say: 12 am (midnight) to 11:59 pm. Note: the datetime 'now' can't be hardcoded; it needs to have the exact day, when the script is run. Is it possible to do it without using datetime relying only on built in "data math" in elasticsearch api for Python?
EDIT: To clarify, I need the exact hour to be set to provide exact intervals. Example: getting data at with timestamp between 11:30 am to 12:00 and so on (with 30 minutes interval).

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#range-query-date-math-rounding and https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math goes into this
you can't round to the half hour though sorry

Related

Date-time always giving 1200HRS

I'm working on a chatbot and using Dialogflow's sys.date-time entity. So when I'm presenting some dates to the bot, like "Today", or "Feb 14", I always get
"parameters": {
"date-time": "2021-02-14T12:00:00Z"
}
whereas I want
"parameters": {
"date-time": "2021-02-14T00:00:00Z"
}
Right now I'm using my app to replace the datetime with hours=0, however, I also want the bot to give
"parameters": {
"date-time": "2021-02-14T08:00:00Z"
}
when I say "feb 14 8AM" (hour is explictly mentioned), but the app will replace the hours. So gonna have to fix it from Dialogflow side. Any solutions please?
for Dialogflow system entity #sys.date-time this is the default behavior as per ISO-8601 format , what you can do as the fix for this is to instead of using #sys.date-time follow these steps :
use two separate parameters for date and time #sys.date and #sys.time.
Make #sys.date as required and #sys.time as optional.
Set some appropriate default value for #sys.time as 00:00:00 so when time is not given like FEB 14 it will take time as default value .
In your app you can use these two values .
Sample intent
Hope it helps !
I was able to solve this semantic bug by properly analysing the value received for #sys.date-time built-in entity of Dialogflow with multiple phrases.
When just a day/date is mentioned (say Feb 14 2020), this is the data I receive:
"parameters": {
"date-time": "2020-02-14T12:00:00Z"
}
When I mention time along with the date explicitly (say Feb 14 3pm), this is the data received:
"parameters": {
"date-time": {
"date_time": "2021-02-14T15:00:00Z"
}
}
The difference here is that in the second case, a nested dictionary is presented in date-time.
Distinguishing between that, I was able to slice the ISO-8601 format on my app and replace 12:00:00 with 00:00:00, but if I received a dictionary instead, I did not slice/replace it.

How to query with time filters in GoogleScraper?

Even if Google's official API does not offer time information in the query results - even no time filtering for keywords, there is time filtering option in the advanced search:
Google results for stackoverflow in the last one hour
GoogleScraper library offers many flexible options BUT time related ones. How to add time features using the library?
After a bit of inspection, I've found that time Google sends the filtering information by qdr value to the tbs key (possibly means time based search although not officially stated):
https://www.google.com/search?tbs=qdr:h1&q=stackoverflow
This gets the results for the past hour. m and y letters can be used for months and years respectively.
Also, to add sorting by date feature, add the sbd (should mean sort by date) value as well:
https://www.google.com/search?tbs=qdr:h1,sbd:1&q=stackoverflow
I was able to insert these keywords to the BASE Google URL of GoogleScraper. Insert below lines to the end of get_base_search_url_by_search_engine() method (just before return) in scraping.py:
if("google" in str(specific_base_url)):
specific_base_url = "https://www.google.com/search?tbs=qdr:{},sbd:1".format(config.get("time_filter", ""))
Now use the time_filter option in your config:
from GoogleScraper import scrape_with_config
config = {
'use_own_ip': True,
'keyword_file': "keywords.txt",
'search_engines': ['google'],
'num_pages_for_keyword': 2,
'scrape_method': 'http',
"time_filter": "d15" #up to 15 days ago
}
search = scrape_with_config(config)
Results will only include the time range. Additionally, text snippets in the results will have raw date information:
one_sample_result = search.serps[0].links[0]
print(one_sample_result.snippet)
4 mins ago It must be pretty easy - let propertytotalPriceOfOrder =
order.items.map(item => +item.unit * +item.quantity * +item.price);.
where order is your entire json object.

Cleansing Data with Updates - Mongodb + Python

I have imported the into Mongodb but not able to cleanse the data in Python. Please see the below question and the script. I need answer of Script 1 & 2
import it into MongoDB, cleanse the data in Python, and update MongoDB with the cleaned data. Specifically, you'll be taking a people dataset where some of the birthday fields look like this:
{
...
"birthday": ISODate("2011-03-17T11:21:36Z"),
...
}
And other birthday fields look like this:
{
...
"birthday": "Thursday, March 17, 2011 at 7:21:36 AM",
...
}
MongoDB natively supports a Date datatype through BSON. This datatype is used in the first example, but a plain string is used in the second example. In this assessment, you'll complete the attached notebook to script a fix that makes all of the document's birthday field a Date.
Download the notebook and dataset to your notebook directory. Once you have the notebook up and running, and after you've updated your connection URI in the third cell, continue through the cells until you reach the fifth cell, where you'll import the dataset. This can take up to 10 minutes depending on the speed of your Internet connection and computing power of your computer.
After verifying that all of the documents have successfully been inserted into your cluster, you'll write a query in the 7th cell to find all of the documents that use a string for the birthday field.
To verify your understanding of the first part of this assessment, how many documents had a string value for the birthday field (the output of cell 8)?
Script1
Replace YYYY with a query on the people-raw collection that will return a cursor with only
documents where the birthday field is a string
people_with_string_birthdays = YYYY
This is the answer to verify you completed the lab:
people_with_string_birthdays.count()
Script2
updates = []
# Again, we're updating several thousand documents, so this will take a little while
for person in people_with_string_birthdays:
# Pymongo converts datetime objects into BSON Dates. The dateparser.parse function
# returns a datetime object, so we can simply do the following to update the field
# properly. Replace ZZZZ with the correct update operator
updates.append(UpdateOne(
{"_id": person["_id"]},
{ZZZZ: { "birthday": dateparser.parse(person["birthday"]) } }
))
count += 1
if count == batch_size:
people_raw.bulk_write(updates)
updates = []
count = 0
if updates:
people_raw.bulk_write(updates)
count = 0
# If everything went well this should be zero
people_with_string_birthdays.count()
import json
with open("./people-raw.json") as dataset:
array={}
for i in dataset:
a=json.loads(i)
if type(a["birthday"]) not in array:
array[type(a["birthday"])]=1
else:
array[type(a["birthday"])]+=1
print(array)
give the path of your people-raw.json file in open() method iff JSON file not in same directory.
Ans : 10382
Script 1: YYYY = people_raw.find({"Birthday" : {"$type" : "string"}})

Timezone issue with pyExchange

I am working with pyExchange on windows 7 machine. I have a simple python v2.7 script that retrieves the Outlook calendar events from the exchange server. The script is provided below:
Code:
from pyexchange import Exchange2010Service, ExchangeNTLMAuthConnection
from datetime import datetime
import time
from pytz import timezone
def getEvents():
URL = u'https://xxxxx.de/EWS/Exchange.asmx'
USERNAME = u'MS.LOCAL\\xxxxx'
PASSWORD = u"xxxxxx"
connection = ExchangeNTLMAuthConnection(url=URL,
username=USERNAME,
password=PASSWORD)
service = Exchange2010Service(connection)
timestamp = datetime.now()
print timestamp.strftime('%Y, %m, %d, %H, %M, %S')
print time.timezone
eventsList = service.calendar().list_events(
start=timezone("Europe/Amsterdam").localize(datetime(2015, 1, 19, 0, 0, 0)),
end=timezone("Europe/Amsterdam").localize(datetime(2015, 1, 19, 23, 59, 59)),
details=True
)
for event in eventsList.events:
print "{start} {stop} - {subject} - {room}".format(
start=event.start,
stop=event.end,
subject=event.subject,
room=event.location
)
getEvents()
Problem:
The timestamp of the events doesn't match the timestamp of the events in Outlook. I created the events manually using the Outlook as well as using a pyExchange script.
For eg: If I create an event from 11:00 AM - 11:30 AM in Outlook, then the above script will return the timestamp of that event as 10:00 AM - 10:30 AM. The time is one hour less/back.
If I check my time.timezone it returns W. Europe Standard Time. I have specified my timezone in the script too ie. Europe/Amsterdam. But still the problem persists. Also I checked the timezone settings in Outlook. Shown below:
I logged into the Exchange server and it is also in the same timezone as my client machine.
Any suggestion regarding why the time is not correct for the events? Is this a bug in pyExchange? I would really appreciate it, if some one can test this and report back here, just to be sure that its not only me who is facing this issue.
I looked and it's probably not a bug in pyexchange, but how you're handling timezones. No shame, they're sadly extremely confusing in Python.
First, the package is returning event dates in UTC and not your local time. You're seeing an hour off the expected time because your timezone is +1 UTC. Here's an event I pulled from my calendar using your script (this is start/end/name/room):
2015-01-19 20:00:00+00:00 2015-01-19 21:00:00+00:00 - Lunch - Cafe
Note the +00:00 - that means it's in UTC. Noon here in California is 20:00 UTC.
Always, always, use UTC when handling datetimes. Here's some doc from the pytz folk on why localtimes are dangerous.
PyExchange tries to have your back and will convert localtime to UTC, but it always returns UTC. That's on purpose because see the previous link.
Now, to answer your question on getting this to work. First, convert your local time to UTC using these handy tips:
Use datetime.now(pytz.utc) to get the current datetime
Don't use datetime(…, tzinfo=timezone) to create a timezone aware datetime object, it's broken. Instead, create the datetime object and call timezone.localize on it.
For you, that means you have to do ugly stuff like:
start = timezone("Europe/Amsterdam").localize(datetime(2015, 1, 19, 0, 0, 0))
start = start.astimezone(pytz.utc)
Then, when you want to display UTC dates as your own time, do:
event.start.astimezone(timezone("Europe/Amsterdam"))
When I do that, I see this output from your script:
2015-01-19 21:00:00+01:00 2015-01-19 22:00:00+01:00 - Lunch - Cafe
which I would expect. Noon my time is 9pm your time.
Here's a gist of your script that I changed. Take a look and see if it fixes your problem. If not, I'm happy to look again!

Python datetime parsing for SAR logs

I know that there are a lot of datetime parsing questions here and, of course, there are many docs out there on the web but even with all that I'm still scratching my head on the best way to do this after hours of reading, trial and error (and, boy, have there been errors).
So, I need to parse SAR memory logs under linux over a range of days, returning data in JSON format for presentation in graphical format in a browser.
I'm just about there - it looks great in Chrome but I need to improve the output date format to maximise x-browser capability.
So, to do this I need to work with two things
A datetime object that is set to the date of the sar log I'm reading
A string from the log entry in the form '10:01:30 PM'
I want to be able to combine these into a string formatted as 'YYYY-MM-DDT22:01:30'
The bodge I'm using at the moment is
jDict['timestamp'] = d.strftime("%Y-%m-%d") + " " + timeStr
where d is my datetime object and timeStr is the time string from the log entry. Chrome is letting me get into bad habits and is happy to parse that format, but FF is stricter.
EDIT
#dawg below has asked for a sample of input and output
Example sar log format:
05:15:01 PM 2797588 13671876 83.01 228048 8276332 8249908 39.92
05:25:01 PM 2791396 13678068 83.05 228048 8276572 8455572 40.92
05:35:01 PM 2786104 13683360 83.08 228048 8282040 8249852 39.92
Current Output format:
[
{"timestamp": "2014-02-03 01:35:01 PM", "memtot": 16469464, "memused": 15747980},
{"timestamp": "2014-02-03 01:45:01 PM", "memtot": 16469464, "memused": 15791088},
{"timestamp": "2014-02-03 01:55:01 PM", "memtot": 16469464, "memused": 15690408}
]
Obviously I've not matched times and dates here - just some random lines
Or just give up and use moment.js in the browser - that's what I went for in the end. Downside is another library to load but it does just make the problem go away

Categories

Resources