I have a large text file on the web that I am using requests to obtain and parse data from. The text file begins each line with a format like [Mon Oct 10 08:58:26 2022]. How can I get the latest 7 days or convert only the datetime to an object or string for storing and parsing later? I simply want to extract the timestamps from the log and print them
You can use TimedRotatingFileHandler for daily or 7-days logs.
read more about timed rotating file handler here
and
read more about extracting timestamps from files
Can you tell me if this snippet solves your problem?
from datetime import datetime
log_line = "[Sun Oct 09 06:14:26 2022] Wiladoc is browsing your wares."
_datetime = log_line[1:25]
_datetime_strp = datetime.strptime(_datetime, '%a %b %d %H:%M:%S %Y')
print(_datetime)
print(_datetime_strp)
This question already has answers here:
Python RE library String Split but keep the delimiters/separators as part of the next string
(2 answers)
Closed 5 months ago.
Given git log output like such:
commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)
Author: Slim Shady
Date: Sun Sep 18 19:53:42 2022 -0700
ci: remove debugging line github action script
commit body
commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)
Author: Slim Shady
Date: Sun Sep 18 19:41:20 2022 -0700
feat: read and write IDs
commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874
Author: Slim Shady
Date: Sun Sep 18 17:41:03 2022 -0700
feat: new hook to allow custom tags
I'd like that to turn into a list in python, with each element containing a single commit (including hash, author, body, etc.).
I've tried using re.split(r"commit \w{40}", git_log), but it doesn't keep the hash in the output.
You could also use a positive lookahead to split your data.
with open('git_log.txt', 'r') as f:
data = f.read()
res = list(filter(None, re.split(r"(?=commit \w{40})", data)))
Output:
[
'commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:53:42 2022 -0700\n\n ci: remove debugging line github action script\n\n commit body\n\n',
'commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:41:20 2022 -0700\n\n feat: read and write IDs\n\n',
'commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874\nAuthor: Slim Shady\nDate: Sun Sep 18 17:41:03 2022 -0700\n\n feat: new hook to allow custom tags'
]
You need to put the split pattern in a capture group to allow it to be part of the output:
# filter(None, ...) to remove empty strings
>>> res = filter(None, re.split(r'(commit \w{40})', inp))
# Join items in group of two to handle the split between a commit line and rest of its body
>>> output = ["".join(item) for item in zip(*[res] * 2)]
>>> output
[
'commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:53:42 2022 -0700\n\n ci: remove debugging line github action script\n\n commit body\n\n',
'commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:41:20 2022 -0700\n\n feat: read and write IDs\n\n',
'commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874\nAuthor: Slim Shady\nDate: Sun Sep 18 17:41:03 2022 -0700\n\n feat: new hook to allow custom tags'
]
But if you do have control over the git log output, you could format it differently and parse it without regex:
git log --pretty=format:'"%H"%x09"%an"%x09"%ad"%x09"%B"' > output.csv
Then:
>>> import csv
>>> with open("output.csv") as f:
... items = list(csv.reader(f, delimiter='\t'))
...
>>> items[0]
["19e0f017ac832238f5a800dd3ea7a5966b3c1343", "Slim Shady", "Sun Sep 18 19:53:42 2022 -0700", "ci: remove debugging line github action script"]
Other option is to use libraries like https://gitpython.readthedocs.io/en/stable/ to get access to commits as Python objects you can access easily.
Warning: beginner here:
So I am reading in a text file that is in the form of a json file. Since the son file is just like a dictionary I want to address parts of the json like I would a dictionary but I don't know how to do this. This is the little bit of what i have:
code:
with open("trump.txt","r") as lines:
for line in lines:
print(line)
what this prints:
{"created_at":"Wed Sep 27 01:19:39 +0000 2017","id":912849180741087232,"id_str":"912849180741087232","text":"RT #TheRickWilson: I see the clickservatives are out in force screaming there were special circumstances in AL.\n\nYes, it's because Trump ca\u2026","source":"\u003ca href=\"http://twitter.com/download/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":66914769,"id_str":"66914769","name":"Kathy","screen_name":"mydoggigi","location":"Earth","url":null,"description":"Love politics, Grandchildren & PSU #StillWithHer #NotMyPresident Blocked by Susan Sarandon, Glenn Greenwald, Joel Osteen and Joe Scarborough!!\ud83d\ude0e #TheResistance","translator_type":"none","protected":false,"verified":false,"followers_count":5878,"friends_count":5973,"listed_count":143,"favourites_count":110285,"statuses_count":138191,"created_at":"Wed Aug 19 04:55:41 +0000 2009","utc_offset":-14400,"time_zone":"Eastern Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/903412377424732160/NqCfPFiB_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/903412377424732160/NqCfPFiB_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/66914769/1504225271","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Wed Sep 27 01:08:45 +0000 2017","id":912846439964987392,"id_str":"912846439964987392","text":"I see the clickservatives are out in force screaming there were special circumstances in AL.\n\nYes, it's because Trump can't deliver. Sad!","source":"\u003ca href=\"http://twitter.com/download/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":19084896,"id_str":"19084896","name":"Rick Wilson","screen_name":"TheRickWilson","location":"Florida and points beyond","url":"http://facebook.com/therickwilson","description":"GOP Media Guy, Dad, Husband, Pilot, Hunter, Writer. I make ads and do politics. Daily Beast columnist. Everything Trump Touches Dies.","translator_type":"none","protected":false,"verified":true,"followers_count":238578,"friends_count":3518,"listed_count":4235,"favourites_count":48094,"statuses_count":250609,"created_at":"Fri Jan 16 20:50:17 +0000 2009","utc_offset":-14400,"time_zone":"America/New_York","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"1A1B1F","profile_background_image_url":"http://pbs.twimg.com/profile_background_images/220716353/Firefox_Wallpaper.jpg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/220716353/Firefox_Wallpaper.jpg","profile_background_tile":true,"profile_link_color":"445555","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"252429","profile_text_color":"666666","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/813585115934658560/gnuRozoD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/813585115934658560/gnuRozoD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/19084896/1504722796","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":5,"reply_count":50,"retweet_count":100,"favorite_count":456,"entities":{"hashtags":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"TheRickWilson","name":"Rick Wilson","id":19084896,"id_str":"19084896","indices":[3,17]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1506475179263"}
so how can i do something as simple as something below in my code?
dict["created_at"]="Wed Sep 27 01:19:39 +0000 2017"
Try this:
import json
with open('file.json') as file:
data = json.load(file)
#code
This question already has answers here:
Python FTP get the most recent file by date
(5 answers)
Closed 4 years ago.
How do I determine the most recently modified file from an ftp directory listing? I used the max function on the unix timestamp locally, but the ftp listing is harder to parse. The contents of each line is only separated by a space.
from ftplib import FTP
ftp = FTP('ftp.cwi.nl')
ftp.login()
data = []
ftp.dir(data.append)
ftp.quit()
for line in data:
print line
output:
drwxrwsr-x 5 ftp-usr pdmaint 1536 Mar 20 09:48 .
dr-xr-srwt 105 ftp-usr pdmaint 1536 Mar 21 14:32 ..
-rw-r--r-- 1 ftp-usr pdmaint 5305 Mar 20 09:48 INDEX
Just to make some corrections:
date_str = ' '.join(line.split()[5:8])
time.strptime(date_str, '%b %d %H:%M') # import time
And to find the most recent file
for line in data:
col_list = line.split()
date_str = ' '.join(line.split()[5:8])
if datePattern.search(col_list[8]):
file_dict[time.strptime(date_str, '%b %d %H:%M')] = col_list[8]
date_list = list([key for key, value in file_dict.items()])
s = file_dict[max(date_list)]
print s
If the FTP server supports the MLSD command (and quite possibly it does), you can use the FTPDirectory class from that answer in a related question.
Create an ftplib.FTP instance (eg aftp) and an FTPDirectory instance (eg aftpdir), connect to the server, .cwd to the directory you want, and read the files using aftpdir.getdata(aftp). After that, you get name of the freshest file as:
import operator
max(aftpdir, key=operator.attrgetter('mtime')).name
To parse the date, you can use (from version 2.5 onwards):
datetime.datetime.strptime('Mar 21 14:32', '%b %d %H:%M')
You can split each line and get the date:
date_str = ' '.join(line.split(' ')[5:8])
Then parse the date (check out egenix mxDateTime package, specifically the DateTimeFromString function) to get comparable objects.