I have a large text file on the web that I am using requests to obtain and parse data from. The text file begins each line with a format like [Mon Oct 10 08:58:26 2022]. How can I get the latest 7 days or convert only the datetime to an object or string for storing and parsing later? I simply want to extract the timestamps from the log and print them
You can use TimedRotatingFileHandler for daily or 7-days logs.
read more about timed rotating file handler here
and
read more about extracting timestamps from files
Can you tell me if this snippet solves your problem?
from datetime import datetime
log_line = "[Sun Oct 09 06:14:26 2022] Wiladoc is browsing your wares."
_datetime = log_line[1:25]
_datetime_strp = datetime.strptime(_datetime, '%a %b %d %H:%M:%S %Y')
print(_datetime)
print(_datetime_strp)
I have content that I am reading in that I need to collect the emails from within. However, I just want to pull the email that comes after From:
Here is an example:
Recip: fhavor#gmail.com
Subject: Report results (Gd)
Headers: Received: from daem.com (unknown [127.1.1.1])
Date: Sat, 13 Feb 2021 13:11:42 +0000 (GMT)
From: Tavon Lo <lt35#gmail.com>
As you can see there are multiple emails but I want to only collect the email that comes after the From: part of the content.Which would be "lt35#gmail.com". So far I have a good regex that collects ALL the emails within the content.
EMAIL = r"((?:^|\b)(?:[^\s]+?\#(?:.+?)\[\.\][a-zA-Z]+)(?:$|\b))"
I am new to regex patterns so any ideas or suggestions as to how to improve the above pattern to only collect the emails that come after from: would highly be appreciated!
You can use
(?m)^From:[^<>\n\r]*<([^<>#]+#[^<>]+)>
See the regex demo.
Details:
(?m) - re.M inline modifier option
^ - start of a line
From: - a literal string
[^<>\n\r]* - zero or more chars other than <, >, CR and LF
< - a < char
([^<>#]+#[^<>]+) - Group 1: one or more chars other than <, > and #, then a # char and then one or more chars other than < and >
> - a > char.
See a Python demo:
import re
rx = re.compile(r'^From:[^<>\n\r]*<([^<>#]+#[^<>]+)>', re.M) # Define the regex
with open(your_file_path, 'r') as f: # Open file for reading
print(rx.findall(f.read())) # Get all the emails after From:
I am currently working on a programme within the django environment which operates off a json api provided by a third party. There is an object within that API which I want however the string of information it provides is too much for me.
The data I want is the created_at tag from the twitter api using tweepy. This created_at contains data in the following format:
"created_at": "Mon Aug 27 17:21:03 +0000 2012"
This is all fine however this will return the date AND time whereas I simply want the the time part of the above example i.e. 17:21:03.
Is there any way I can just take this part of the created_at response string and store it in a separate variable?
You can use the dateutil module
from dateutil import parser
created_at = "Mon Aug 27 17:21:03 +0000 2012"
created_at = parser.parse(created_at)
print created_at.time()
Output:
17:21:03
Try below code.
my_datetime = response_from_twitter['created_at']
my_time = my_datetime.split(' ')[3]
# my_time will now contain time part.
You could just split the string into a list and take the 4th element:
time = source['created_at'].split(' ')[3]
What about a regular expression with re.search():
>>> import re
>>> d = {"created_at": "Mon Aug 27 17:21:03 +0000 2012"}
>>> re.search('\d{2}:\d{2}:\d{2}', d['created_at']).group(0)
'17:21:03'
Warning: beginner here:
So I am reading in a text file that is in the form of a json file. Since the son file is just like a dictionary I want to address parts of the json like I would a dictionary but I don't know how to do this. This is the little bit of what i have:
code:
with open("trump.txt","r") as lines:
for line in lines:
print(line)
what this prints:
{"created_at":"Wed Sep 27 01:19:39 +0000 2017","id":912849180741087232,"id_str":"912849180741087232","text":"RT #TheRickWilson: I see the clickservatives are out in force screaming there were special circumstances in AL.\n\nYes, it's because Trump ca\u2026","source":"\u003ca href=\"http://twitter.com/download/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":66914769,"id_str":"66914769","name":"Kathy","screen_name":"mydoggigi","location":"Earth","url":null,"description":"Love politics, Grandchildren & PSU #StillWithHer #NotMyPresident Blocked by Susan Sarandon, Glenn Greenwald, Joel Osteen and Joe Scarborough!!\ud83d\ude0e #TheResistance","translator_type":"none","protected":false,"verified":false,"followers_count":5878,"friends_count":5973,"listed_count":143,"favourites_count":110285,"statuses_count":138191,"created_at":"Wed Aug 19 04:55:41 +0000 2009","utc_offset":-14400,"time_zone":"Eastern Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/903412377424732160/NqCfPFiB_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/903412377424732160/NqCfPFiB_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/66914769/1504225271","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Wed Sep 27 01:08:45 +0000 2017","id":912846439964987392,"id_str":"912846439964987392","text":"I see the clickservatives are out in force screaming there were special circumstances in AL.\n\nYes, it's because Trump can't deliver. Sad!","source":"\u003ca href=\"http://twitter.com/download/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":19084896,"id_str":"19084896","name":"Rick Wilson","screen_name":"TheRickWilson","location":"Florida and points beyond","url":"http://facebook.com/therickwilson","description":"GOP Media Guy, Dad, Husband, Pilot, Hunter, Writer. I make ads and do politics. Daily Beast columnist. Everything Trump Touches Dies.","translator_type":"none","protected":false,"verified":true,"followers_count":238578,"friends_count":3518,"listed_count":4235,"favourites_count":48094,"statuses_count":250609,"created_at":"Fri Jan 16 20:50:17 +0000 2009","utc_offset":-14400,"time_zone":"America/New_York","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"1A1B1F","profile_background_image_url":"http://pbs.twimg.com/profile_background_images/220716353/Firefox_Wallpaper.jpg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/220716353/Firefox_Wallpaper.jpg","profile_background_tile":true,"profile_link_color":"445555","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"252429","profile_text_color":"666666","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/813585115934658560/gnuRozoD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/813585115934658560/gnuRozoD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/19084896/1504722796","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":5,"reply_count":50,"retweet_count":100,"favorite_count":456,"entities":{"hashtags":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"TheRickWilson","name":"Rick Wilson","id":19084896,"id_str":"19084896","indices":[3,17]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1506475179263"}
so how can i do something as simple as something below in my code?
dict["created_at"]="Wed Sep 27 01:19:39 +0000 2017"
Try this:
import json
with open('file.json') as file:
data = json.load(file)
#code
i'm using the cabby API: https://github.com/EclecticIQ/cabby
in hopes of pulling stix info through the taxii client.
I've got my python code pulling the data from www.hailataxii.com
the data is in a container.. and i can flip through it.. it looks like xml, but no xml parser will read or manipulate the data. I'd love to put each record into a dictionary, then put the data into some kind of database.. but until i find a way to access the data from the download, i'm at a loss. there is very little data or examples.
Any suggestions would be greatly appreciated.
Here is my basic code for testing:
import pprint
from cabby import create_client
HailATaxiiFeedList=[
'guest.Abuse_ch',
'guest.CyberCrime_Tracker',
'guest.EmergingThreats_rules',
'guest.Lehigh_edu',
'guest.MalwareDomainList_Hostlist',
'guest.blutmagie_de_torExits',
'guest.dataForLast_7daysOnly',
'guest.dshield_BlockList',
'guest.phishtank_com'
]
client = create_client(
'hailataxii.com',
use_https=False,
discovery_path='/taxii-discovery-service')
print (": Discover_Collections:")
services = client.discover_services()
for service in services:
print('Service type= {s.type} , address= {s.address}' .format(s=service))
print (": Get_Collections:")
collections = client.get_collections(
uri='http://hailataxii.com/taxii-data')
for collection_name in HailATaxiiFeedList:
print ("Polling :", collection_name, ".. could take a while, please be patient..")
file = open(("./iocs/"+ collection_name + ".xml"), "w")
content_blocks = client.poll(collection_name=collection_name)
count =1
for block in content_blocks:
taxii_message=block.content.decode('utf-8')
file.write(taxii_message)
count+=1
if count > 20: # just getting the 20 top objects because the lists are huge
break
file.close()
The output looks like xml, but no xml parser will touch it.
<stix:STIX_Package xmlns:cyboxCommon="http://cybox.mitre.org/common-2" xmlns:cybox="http://cybox.mitre.org/cybox-2" xmlns:cyboxVocabs="http://cybox.mitre.org/default_vocabularies-2" xmlns:marking="http://data-marking.mitre.org/Marking-1" xmlns:simpleMarking="http://data-marking.mitre.org/extensions/MarkingStructure#Simple-1" xmlns:tlpMarking="http://data-marking.mitre.org/extensions/MarkingStructure#TLP-1" xmlns:TOUMarking="http://data-marking.mitre.org/extensions/MarkingStructure#Terms_Of_Use-1" xmlns:opensource="http://hailataxii.com" xmlns:edge="http://soltra.com/" xmlns:indicator="http://stix.mitre.org/Indicator-2" xmlns:ttp="http://stix.mitre.org/TTP-1" xmlns:stixCommon="http://stix.mitre.org/common-1" xmlns:stixVocabs="http://stix.mitre.org/default_vocabularies-1" xmlns:stix="http://stix.mitre.org/stix-1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:taxii="http://taxii.mitre.org/messages/taxii_xml_binding-1" xmlns:taxii_11="http://taxii.mitre.org/messages/taxii_xml_binding-1.1" xmlns:tdq="http://taxii.mitre.org/query/taxii_default_query-1" id="edge:Package-a2c8f8f2-5a4d-4f0e-92be-d3fa482247d0" version="1.1.1" timestamp="2017-10-09T20:39:36.179672+00:00">
<stix:STIX_Header>
<stix:Handling>
<marking:Marking>
<marking:Controlled_Structure>../../../../descendant-or-self::node()</marking:Controlled_Structure>
<marking:Marking_Structure xsi:type="tlpMarking:TLPMarkingStructureType" color="WHITE"/>
<marking:Marking_Structure xsi:type="TOUMarking:TermsOfUseMarkingStructureType">
<TOUMarking:Terms_Of_Use>zeustracker.abuse.ch | Abuse source[https://sslbl.abuse.ch/blacklist/] - As for all abuse.ch projects, the use of the SSL Blacklist is free for both commercial and non-commercial usage without any limitation. However, if you are a commercial vendor of security software/services and you want to integrate data from the SSL Blacklist into your products / services, you will have to ask for permission first by contacting me using the contact form [http://www.abuse.ch/?page_id=4727].'
</TOUMarking:Terms_Of_Use>
</marking:Marking_Structure>
<marking:Marking_Structure xsi:type="simpleMarking:SimpleMarkingStructureType">
<simpleMarking:Statement>Unclassified (Public)</simpleMarking:Statement>
</marking:Marking_Structure>
</marking:Marking>
</stix:Handling>
</stix:STIX_Header>
<stix:Indicators>
<stix:Indicator id="opensource:indicator-00398022-0d9c-474b-b543-31b85a4f22ab" timestamp="2014-10-31T16:44:24.766014+00:00" xsi:type="indicator:IndicatorType" version="2.1.1">
<indicator:Title>ZeuS Tracker (offline)| s-k.kiev.ua/html/30/config.bin (2014-10-13) | This domain has been identified as malicious by zeustracker.abuse.ch</indicator:Title>
<indicator:Type xsi:type="stixVocabs:IndicatorTypeVocab-1.1">Domain Watchlist</indicator:Type>
<indicator:Type xsi:type="stixVocabs:IndicatorTypeVocab-1.1">URL Watchlist</indicator:Type>
<indicator:Type xsi:type="stixVocabs:IndicatorTypeVocab-1.1">File Hash Watchlist</indicator:Type>
<indicator:Description>This domain s-k.kiev.ua has been identified as malicious by zeustracker.abuse.ch. For more detailed infomation about this indicator go to [CAUTION!!Read-URL-Before-Click] [https://zeustracker.abuse.ch/monitor.php?host=s-k.kiev.ua].</indicator:Description>
<indicator:Observable idref="opensource:Observable-94ead651-1df5-4cfe-b4bb-e34ce5e60224">
</indicator:Observable>
<indicator:Indicated_TTP>
<stixCommon:TTP idref="opensource:ttp-6055672f-ecfd-40ae-aa84-0b336a5accb6" xsi:type="ttp:TTPType"/>
</indicator:Indicated_TTP>
<indicator:Producer>
<stixCommon:Identity id="opensource:Identity-3066ae12-3db6-44dd-9636-6b083b6479dc">
<stixCommon:Name>zeustracker.abuse.ch</stixCommon:Name>
</stixCommon:Identity>
<stixCommon:Time>
<cyboxCommon:Produced_Time>2014-10-13T00:00:00+00:00</cyboxCommon:Produced_Time>
<cyboxCommon:Received_Time>2014-10-20T19:29:30+00:00</cyboxCommon:Received_Time>
</stixCommon:Time>
</indicator:Producer>
</stix:Indicator>
</stix:Indicators>
Any suggestions would be greatly appreciated.
The XML you're looking at is STIX. Check out: https://www.eclecticiq.com/stix-taxii. Then follow the link to the STIX website and find (right bottom) "tooling" section. You should find libraries and parsing tools to make it useful.
Alternatively, there are commercial platform available to do data processing magic. Google "Threat Intelligence Platform".
Cheers,
Joep
Founder
EclecticIQ