Python datetime parsing for SAR logs

Python datetime parsing for SAR logs - python

I know that there are a lot of datetime parsing questions here and, of course, there are many docs out there on the web but even with all that I'm still scratching my head on the best way to do this after hours of reading, trial and error (and, boy, have there been errors).
So, I need to parse SAR memory logs under linux over a range of days, returning data in JSON format for presentation in graphical format in a browser.
I'm just about there - it looks great in Chrome but I need to improve the output date format to maximise x-browser capability.
So, to do this I need to work with two things
A datetime object that is set to the date of the sar log I'm reading
A string from the log entry in the form '10:01:30 PM'
I want to be able to combine these into a string formatted as 'YYYY-MM-DDT22:01:30'
The bodge I'm using at the moment is
jDict['timestamp'] = d.strftime("%Y-%m-%d") + " " + timeStr
where d is my datetime object and timeStr is the time string from the log entry. Chrome is letting me get into bad habits and is happy to parse that format, but FF is stricter.
EDIT
#dawg below has asked for a sample of input and output
Example sar log format:
05:15:01 PM 2797588 13671876 83.01 228048 8276332 8249908 39.92
05:25:01 PM 2791396 13678068 83.05 228048 8276572 8455572 40.92
05:35:01 PM 2786104 13683360 83.08 228048 8282040 8249852 39.92
Current Output format:
[
{"timestamp": "2014-02-03 01:35:01 PM", "memtot": 16469464, "memused": 15747980},
{"timestamp": "2014-02-03 01:45:01 PM", "memtot": 16469464, "memused": 15791088},
{"timestamp": "2014-02-03 01:55:01 PM", "memtot": 16469464, "memused": 15690408}
]
Obviously I've not matched times and dates here - just some random lines

Or just give up and use moment.js in the browser - that's what I went for in the end. Downside is another library to load but it does just make the problem go away

Related

Date Parsing problem while Integrating to Oracle PMS

I'm recieving date in PMS message something like this |GA090616|GD090617|
which means Guest Arrival is at 09-06-16 and Guest Deprature is at 09-06-17
I wanted to parse it as date using python.
I've also visited stack oveflow[1, 2 ...] for this but as solution I've found
from datetime import datetime
self.DATE_FROMAT='%d/%m/%y'
arrival_date=datetime.strptime('90616', self.DATE_FROMAT).date()
print(arrival_date)
and it's not possible to parse it like this due to its unclear format.
I'm not sure if 09 is a month or a date, but from what I've seen in documents and PDFs, it appears to be a month.
Is there any better solution for this kind of date parsing? or suggestions for my expectations.
09-06-16,
09-06-17
Note:
Please Just take the date from the string 090617 and parse it as a date. That will be helpful.

You can do this with regex matching, you can either split the string with msg.split("|") or not, but that depends on your use case.
import re
from datetime import datetime
msg = "GA090616|GD090617|"
DATE_FORMAT='%d%m%y'
ar = re.match("GA(\d{6})", msg)
dp = re.match("GD(\d{6})", msg)
guest_arrival = datetime.strptime(ar.group(1), DATE_FORMAT).date()
guest_departure = datetime.strptime(dp.group(1), DATE_FORMAT).date()
Although not fully tested, this should be a boilerplate as to how to retrieve the date from the message. Remember to remove the \ from the date format, as that is not included in the message.

Getting indexes with timestamp but adding custom hours/minutes/ in elasticsearch Python

I've been trying to receive all indexes from 7 days ago to now using this type of query:
query = {'query': {
'bool': {
'filter': [
{'range': {'#timestamp':{'gte': now-7d/d,'lte': now/d}}},
]
}
}
}
The problem is I need to get them from let's say: 12 am (midnight) to 11:59 pm. Note: the datetime 'now' can't be hardcoded; it needs to have the exact day, when the script is run. Is it possible to do it without using datetime relying only on built in "data math" in elasticsearch api for Python?
EDIT: To clarify, I need the exact hour to be set to provide exact intervals. Example: getting data at with timestamp between 11:30 am to 12:00 and so on (with 30 minutes interval).

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#range-query-date-math-rounding and https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math goes into this
you can't round to the half hour though sorry

Django rest framework timefield input format

After hours of searching, I found many posts that are related but wasn't able to help.
What I want to do is input eg: 10:30 AM into the TimeField.
In the django rest framework API on the browser, it is using this 10:30 AM format ('%I:%M %p').
But when I am using postman to test it, the output is in 24hr format ('%H:%M:%S'). I also tried to use 10:30 PM as input but the output I get is 10:30:00 instead of 22:30:00.
Many of the answers I found suggest to change the TimeField format in settings.py by using this line:
TIME_INPUT_FORMATS = ('%I:%M %p',)
but it doesn't work for me.
Sorry for my inexperience on django rest framework as I am still learning.
Here is the screenshot of the result.
On browser API:
On postman:

If you check the documentation on the TimeField you will see:
Signature: TimeField(format=api_settings.TIME_FORMAT, input_formats=None)
Where
format - A string representing the output format. If not specified, this defaults to the same value as the TIME_FORMAT settings key, which will be 'iso-8601' unless set. Setting to a format string indicates that to_representation return values should be coerced to string output. Format strings are described below. Setting this value to None indicates that Python.
input_formats - A list of strings representing the input formats which may be used to parse the date. If not specified, the TIME_INPUT_FORMATS setting will be used, which defaults to ['iso-8601'].
So you either can specify the format and input_formats on the serializer, or set the settings.TIME_FORMAT and settings.TIME_INPUT_FORMATS.
Let's set the first case:
class MySerializer(serializers.Serializer):
...
birthTime=serializers.TimeField(format='%I:%M %p', input_formats='%I:%M %p')
Some suggestions:
Make your variable names snake case: birth_time.
You may need to play a bit around with the input format because you may expect many different inputs:
input_formats=['%I:%M %p','%H:%M',...]

Convert the result in Serializer validate method and return it.
import time
t = time.strptime(timevalue_24hour, "%H:%M")
timevalue_12hour = time.strftime( "%I:%M %p", t )

Script for a changing URL

I am having a bit of trouble in coding a process or a script that would do the following:
I need to get data from the URL of:
nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hd20140430/gfs_hd_00z
But the file URL's (the days and model runs change), so it has to assume this base structure for variables.
Y - Year
M - Month
D - Day
C - Model Forecast/Initialization Hour
F- Model Frame Hour
Like so:
nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hdYYYYMMDD/gfs_hd_CCz
This script would run, and then import that date (in the YYYYMMDD, as well as CC) with those variables coded -
So while the mission is to get
http://nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hd20140430/gfs_hd_00z
While these variables correspond to get the current dates in the format of:
http://nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hdYYYYMMDD/gfs_hd_CCz
Can you please advise how to go about and get the URL's to find the latest date in this format? Whether it'd be a script or something with wget, I'm all ears. Thank you in advance.

In Python, the requests library can be used to get at the URLs.
You can generate the URL using a combination of the base URL string plus generating the timestamps using the datetime class and its timedelta method in combination with its strftime method to generate the date in the format required.
i.e. start by getting the current time with datetime.datetime.now() and then in a loop subtract an hour (or whichever time gradient you think they're using) via timedelta and keep checking the URL with the requests library. The first one you see that's there is the latest one, and you can then do whatever further processing you need to do with it.
If you need to scrape the contents of the page, scrapy works well for that.

I'd try scraping the index one level up at http://nomads.ncep.noaa.gov/dods/gfs_hd ; the last link-of-particular-form there should take you to the daily downloads pages, where you could do something similar.
Here's an outline of scraping the daily downloads page:
import BeautifulSoup
import urllib
grdd = urllib.urlopen('http://nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hd20140522')
soup = BeautifulSoup.BeautifulSoup(grdd)
datalinks = 'http://nomads.ncep.noaa.gov:80/dods/gfs_hd/gfs_hd'
for link in soup.findAll('a'):
if link.get('href').startswith(datalinks):
print('Suitable link: ' + link.get('href')[len(datalinks):])
# Figure out if you already have it, choose if you want info, das, dds, etc etc.
and scraping the page with the last thirty would, of course, be very similar.

The easiest solution would be just to mirror the parent directory:
wget -np -m -r http://nomads.ncep.noaa.gov:9090/dods/gfs_hd
However, if you just want the latest date, you can use Mojo::UserAgent as demonstrated on Mojocast Episode 5
use strict;
use warnings;
use Mojo::UserAgent;
my $url = 'http://nomads.ncep.noaa.gov:9090/dods/gfs_hd';
my $ua = Mojo::UserAgent->new;
my $dom = $ua->get($url)->res->dom;
my #links = $dom->find('a')->attr('href')->each;
my #gfs_hd = reverse sort grep {m{gfs_hd/}} #links;
print $gfs_hd[0], "\n";
On May 23rd, 2014, Outputs:
http://nomads.ncep.noaa.gov:9090/dods/gfs_hd/gfs_hd20140523

How to turn a comma-delimited list of date attributes into a MySQL date value?

Not sure if I've worded my title correctly but here goes. I have a file of jobs which are all in a similar format to this:
423720,hparviz,RUN,512,22,Mar,10:38,11,April,14:06
Basically from this I need to covert the start date and end date to a format which allows me to import it into mysql (22-Mar 10:38 - 11-Apr 14:06 or however MySQL requires dates to be formatted). This data is extracted using a command in linux, in which I'm manipulating the results to allow importation to a MySQL database. Would it be easier to manipulate in Linux (during the command), in Python (in the state I've shown) or MySQL (after importation).
If you need any more details let me know, thanks.

Assuming every line looks like what you posted:
f=open(filename,"r")
listOfLines=f.readlines()
for line in listOfLines:
splitLine=line.split(",")
print "Day of Month: "+splitLine[4]#this is an example of one piece of info.
print "Month: "+splitLine[5]#this is an example of one piece of info.

you can split the line with string.split(',').
You can than convert the date using time.strftime

process to create data | awk -F, -v OFS=, '
function format_date(day, mon, time) {
return sprintf("%s-%s %s", day, substr(mon,1,3), time)
}
{print $1,$2,$3,$4,format_date($5,$6,$7),format_date($8,$9,$10)}
'
Outputs
423720,hparviz,RUN,512,22-Mar 10:38,11-Apr 14:06
Alter the format string as required.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python datetime parsing for SAR logs - python

Or just give up and use moment.js in the browser - that's what I went for in the end. Downside is another library to load but it does just make the problem go away

Related

Date Parsing problem while Integrating to Oracle PMS

Getting indexes with timestamp but adding custom hours/minutes/ in elasticsearch Python

Django rest framework timefield input format

Script for a changing URL

How to turn a comma-delimited list of date attributes into a MySQL date value?

Categories

Resources