Python - How to find profile from file [duplicate] - python

This question already has answers here:
Understanding slicing
(38 answers)
Closed 8 months ago.
I am new to Python.
I wanted to find profiles from a log file, with following criteria
user logged in, user changed password, user logged off within same second
those actions (log in, change password, log off) happened one after another with no other entires in between.
with .txt file looks like this
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user logged off| -
Mon, 22 Aug 2016 13:15:42 +0200|178.57.66.225|iukj| - |user logged in| -
Mon, 22 Aug 2016 13:15:40 +0200|178.57.66.215|klij| - |user logged in| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|klij| - |user changed password| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|klij| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged off| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user logged in| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user changed password| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user changed profile| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user logged off| -
Mon, 22 Aug 2016 13:20:42 +0200|178.57.67.225|yytr| - |user logged in| -
asdf - is typical profile name from the log file
Here is what I have done so far
import collections
import time
with open('logfiles.txt') as infile:
counts = collections.Counter(l.strip() for l in infile)
for line, count in counts.most_common():
print(line, count)
time.sleep(10)
I know the logic is to get same hours, minutes, and seconds
if they are duplicates, then I print the profiles.
But I am confuse how to get time from a file.
Any help is very much appreciated.
EDIT:
The output would be:
asdf
klij
plnb
zzad

I think this is more complicated than you might have imagined. Your sample data is very straightforward but the description (requirements) imply that the log might have interspersed lines that you need to account for. So I think it's a case of working through the log file sequentially recording certain actions (log on, log off) and keeping a note of what was observed on any previous line. This seems to work with your data:
from datetime import datetime as DT, timedelta as TD
FMT = '%a, %d %b %Y %H:%M:%S %z'
td = TD(seconds=1)
prev = None
with open('logfile.txt') as logfile:
for line in logfile:
if len(tokens := line.split('|')) > 4:
dt, _, profile, _, action, *_ = tokens
if prev is None or prev[1] != profile:
prev = (dt, profile) if action == 'user logged in' else None
else:
if action == 'user logged off':
if DT.strptime(dt, FMT) - DT.strptime(prev[0], FMT) <= td:
print(profile)
prev = None
Output:
asdf
plnb
qweq
zzad

To parse a time I would use regex for this task to match a time expression on each line.
Something like this would work.
EDIT: I omitted the lines which don't correspond to the formatting.
import re
time = re.search(r'(\d+):(\d+):(\d+)', line).group()
As far as the profile name is concerned, I would use a split function on the most common lines like #Matthias suggested and your code would look something like this:
import collections
import time
with open('logfiles.txt') as infile:
counts = collections.Counter(l.strip() for l in infile)
for line, count in counts.most_common():
"""The line splits where the '|' symbol is and creates a list.
We choose the third element of the list - profile"""
list_of_segments = line.split('|')
if len(list_of_segments) == 6:
print(list_of_segments[2])
time.sleep(10)

Related

How to convert test log file to json in a prescribed way

I have a log file which is below.trying to take first server details 192.168.1.1 and check when it is connected and disconnected.then go to second server 192.168.1.2 details and check when it is connected and disconnected. Like way need to determine the connection time and disconnected time of all servers
str_ = '''Jan 23 2016 11:30:08AM - ssh 22 192.168.1.1 connected
Jan 23 2016 12:04:56AM - ssh 22 192.168.1.2 connected
Jan 23 2016 2:18:32PM - ssh 22 192.168.1.2 disconnected
Jan 23 2016 5:16:09PM - un x Dos attack from 201.10.0.4
Jan 23 2016 10:43:44PM - ssh 22 192.168.1.1 disconnected
Feb 1 2016 1:40:28AM - ssh 22 192.168.1.1 connected
Feb 1 2016 2:21:52AM - un x Dos attack from 201.168.123.1
Mar 29 2016 2:13:07PM - ssh 22 192.168.1.1 disconnected'''
How to convert my log file in to json
My Expected out
{1:{192.168.1.1:[(connected,Jan 23 2016 11:30:08AM),(disconnected,Jan 23 2016 10:43:44PM)]},
2:{192.168.1.2:[(connected,Jan 23 2016 12:04:56AM),(disconnected,Jan 23 2016 2:18:32PM)]},
3:{192.168.1.1:[(connected,Feb 1 2016 1:40:28AM),(disconnected,Mar 29 2016 2:13:07PM )]},
4:{Dos:[201.10.0.4,201.168.123.1]}}
My Pseudo code
import json
import re
i = 1
result = {}
with open('test.log') as f:
lines = f.readlines()
for line in lines:
r = line.split('')
#result[i] = {}
i += 1
print(result)
with open('data.json', 'w') as fp:
json.dump(result, fp)
Why do you need dict keyed by entry numbers {1: xxx, 2: yyy, 3: zzz}? I'll advise using just a list instead - [xxx, yyy, zzz]. You can get an entry by index and so on. Technically json can't use numbers as keys.
There is no logic to group connected and disconnected events in your pseudocode.
Some lines from log don't has connect/disconnect info, so you need some logic for it too.
lines = f.readlines(); for line in lines: may eat lots of memory for large log files, just use for lines in f:
So, I think you need something like:
import json
import re
result = []
opened = {}
with open('test.log') as f:
for line in f:
date, rest = line.split(' - ', 1)
rest, last = rest.strip().rsplit(' ', 1)
ip = rest.rsplit(' ', 1)[1]
if last == 'connected':
entry = {ip: [(last, date)]}
opened[ip] = entry
result.append(entry)
elif last == 'disconnected':
opened[ip][ip].append((last, date))
del opened[ip]
print(result)
with open('data.json', 'w') as fp:
json.dump(result, fp)
It works for your sample, but needs more error checking for other logs

Python Timeout doesn't seem to work

I have the following code;
def ip_addresses():
# Get external ipv4
try:
response = urllib2.urlopen('http://icanhazip.com', timeout = 2)
out = response.read()
public_ipv4 = re.sub('\n', '', out)
except:
public_ipv4 = "failed to retrieve public_ipv4"
In normal circumstance, when response from http://icanhazip.com is received, the output is something like this;
xxx#xxx:/var/log$ date && tail -1 xxx.log
Tue Jul 25 **07:43**:18 UTC 2017 {"public_ipv4": "208.185.193.131"}, "date": "2017-07-25 **07:43**:01.558242"
So, the current date and the date of the log generation are same.
However, when there is an exception, this is happening;
xxx#xxx:/var/log$ date && tail -1 xxx.log
Tue Jul 25 **07:30**:25 UTC 2017 {"public_ipv4": "failed to retrieve public_ipv4"},"date": "2017-07-25 **07:23**:01.525444"
Why is the "timeout" not working?
Try to get the verbose exception details in this manner
and then investigate what is the error all about, the difference in time
Use this format...
import sys
try:
1 / 0
except:
print sys.exc_info()

Execute command in time increments of 1 second on same line in terminal

This script I have is very simple. Until the script is manually stopped, it prints, in the terminal, the current time every 1 second. The only problem I've had thus far is the carriage return '\r' command used to go back and use the same line as before does not work as intended. Instead of the time being overwritten each time, I get an output like this:
Good morning!
It's 03:10:13 PM on Wednesday, Jan 25, 2017
It's 03:10:14 PM on Wednesday, Jan 25, 2017
It's 03:10:15 PM on Wednesday, Jan 25, 2017
It's 03:10:16 PM on Wednesday, Jan 25, 2017
It's 03:10:17 PM on Wednesday, Jan 25, 2017
It's 03:10:18 PM on Wednesday, Jan 25, 2017
It's 03:10:19 PM on Wednesday, Jan 25, 2017
It's 03:10:20 PM on Wednesday, Jan 25, 2017
It's 03:10:21 PM on Wednesday, Jan 25, 2017
Am I not allowed to do this on terminal? Is there a problem with the 1 second pause I'm putting in between?
Here is my code:
import time
import sys
print("Good morning!")
while True:
time_str = "It's %I:%M:%S %p on %A, %b %d, %Y\r"
print time.strftime(time_str)
sys.stdout.flush()
time.sleep(1)
Some extra information: I'm using a bash shell on an Ubuntu system
You have to use end="\r" in print() to replace default end="\n"
import time
import sys
print("Good morning!")
while True:
time_str = "It's %I:%M:%S %p on %A, %b %d, %Y"
print(time.strftime(time_str), end="\r")
sys.stdout.flush()
time.sleep(1)
I use Linux Mint (based on Ubuntu) and \r works for me in terminal (but maybe Ubuntu use different terminal).
You get error with end="\r" so this means that you use Python 2, not Pyhton 3 - and then you need comma at the end in print to skip default \n
import time
import sys
print "Good morning!"
while True:
time_str = "It's %I:%M:%S %p on %A, %b %d, %Y\r"
print time.strftime(time_str), # <-- comma to skip "\n"
sys.stdout.flush()
time.sleep(1)
You must place the carriage return at the beginning of the text and replace the carriage end with ""
import time
import sys
print("Good morning!")
while True:
time_str = "\rIt's %I:%M:%S %p on %A, %b %d, %Y"
print(time.strftime(time_str), end="")
sys.stdout.flush()
time.sleep(1)

Python analyse logfile with regex

I have to analyse a email sending logfile (get SMTP reply for a message-id), which looks like this:
Nov 12 17:26:57 zeus postfix/smtpd[23992]: E859950021DB1: client=pegasus.os[172.20.19.62]
Nov 12 17:26:57 zeus postfix/cleanup[23995]: E859950021DB1: message-id=a92de331-9242-4d2a-8f0e-9418eb7c0123
Nov 12 17:26:58 zeus postfix/qmgr[22359]: E859950021DB1: from=<system#directoperation.de>, size=114324, nrcpt=1 (queue active)
Nov 12 17:26:58 zeus postfix/smtp[24007]: certificate verification failed for mx.elutopia.it[62.149.128.160]:25: untrusted issuer /C=US/O=RTFM, Inc./OU=Widgets Division/CN=Test CA20010517
Nov 12 17:26:58 zeus postfix/smtp[24007]: E859950021DB1: to=<mike#elutopia.it>, relay=mx.elutopia.it[62.149.128.160]:25, delay=0.89, delays=0.09/0/0.3/0.5, dsn=2.0.0, status=sent (250 2.0.0 d3Sx1m03q0ps1bK013Sxg4 mail accepted for delivery)
Nov 12 17:26:58 zeus postfix/qmgr[22359]: E859950021DB1: removed
Nov 12 17:27:00 zeus postfix/smtpd[23980]: connect from pegasus.os[172.20.19.62]
Nov 12 17:27:00 zeus postfix/smtpd[23980]: setting up TLS connection from pegasus.os[172.20.19.62]
Nov 12 17:27:00 zeus postfix/smtpd[23980]: Anonymous TLS connection established from pegasus.os[172.20.19.62]: TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)
Nov 12 17:27:00 zeus postfix/smtpd[23992]: disconnect from pegasus.os[172.20.19.62]
Nov 12 17:27:00 zeus postfix/smtpd[23980]: 2C04150101DB2: client=pegasus.os[172.20.19.62]
Nov 12 17:27:00 zeus postfix/cleanup[23994]: 2C04150101DB2: message-id=21e2f9d3-154a-3683-85d3-a7c52d429386
Nov 12 17:27:00 zeus postfix/qmgr[22359]: 2C04150101DB2: from=<system#directoperation.de>, size=53237, nrcpt=1 (queue active)
Nov 12 17:27:00 zeus postfix/smtp[24006]: ABE7C50001D62: to=<info#elvictoria.it>, relay=relay3.telnew.it[195.36.1.102]:25, delay=4.9, delays=0.1/0/4/0.76, dsn=2.0.0, status=sent (250 2.0.0 r9EFQt0J009467 Message accepted for delivery)
Nov 12 17:27:00 zeus postfix/qmgr[22359]: ABE7C50001D62: removed
Nov 12 17:27:00 zeus postfix/smtp[23998]: 2C04150101DB2: to=<peter#elgravo.ch>, relay=liberomx2.elgravo.ch[212.52.84.93]:25, delay=0.72, delays=0.07/0/0.3/0.35, dsn=2.0.0, status=sent (250 ok: Message 2040264602 accepted)
Nov 12 17:27:00 zeus postfix/qmgr[22359]: 2C04150101DB2: removed
At the moment, I get a message-id (uuid) from a database (for example a92de331-9242-4d2a-8f0e-9418eb7c0123) and then run my code through the logfile:
log_id = re.search (']: (.+?): message-id='+message_id, text).group(1)
sent_status = (re.search (']: '+log_id+'.*dsn=(.....)', text)
With the message-id I find the log_id, and with the log_id I can find the SMTP reply answer.
This works fine, but a better way would be, if the software goes through the log file, get the message-id and the reply code and update the DB then. But I'm not sure, how I shall do this? This script has to be run every ~2 minutes and check on a updating log-file. So how can I assure, that it remembers where it was and doesn't get a message-id twice?
Thanks in advance
Use a dictionary to store message IDs, use a separate file to store the byte number where you last left off in the log file.
msgIDs = {}
# get where you left off in the logfile during the last read:
try:
with open('logfile_placemarker.txt', 'r') as f:
lastRead = int(f.read())
except IOError:
print("Can't find/read place marker file! Starting at 0")
lastRead = 0
with open('logfile.log', 'r') as f:
f.seek(lastRead)
for line in f:
# ...
# Pick out msgIDs and response codes
# ...
if msgID in msgIDs:
print("uh oh, found the same msg id twice!!")
msgIDs[msgID] = responseCode
lastRead = f.tell()
# Do whatever you need to do with the msgIDs you found:
updateDB(msgIDs)
# Store lastRead (where you left off in the logfile) in a file if you need to so it persists in the next run
with open('logfile_placemarker.txt', 'w') as f:
f.write(str(lastRead))

Pylons & Beaker: JSON Encoded Sessions

Need to read Pylons session data (just read, not write to) in node.js
Once I decode the base64, I'm left with a string containing a serialized Python object which, is a pain to parse in node.js
How can I get Beaker to serialize to JSON instead? For it is far easier for node.js to handle.
i had to look inside beaker to find what you call "Python serialized strings" are python pickles.
i don't think it would be more than a few lines to change it get it to use json to store the dict.
here is a patch against https://bitbucket.org/bbangert/beaker/src/257f147861c8:
diff -r 257f147861c8 beaker/session.py
--- a/beaker/session.py Mon Apr 18 11:38:53 2011 -0400
+++ b/beaker/session.py Sat Apr 30 14:19:12 2011 -0400
## -489,10 +489,10 ##
nonce = b64encode(os.urandom(40))[:8]
encrypt_key = crypto.generateCryptoKeys(self.encrypt_key,
self.validate_key + nonce, 1)
- data = util.pickle.dumps(self.copy(), 2)
+ data = util.json.dumps(self.copy())
return nonce + b64encode(crypto.aesEncrypt(data, encrypt_key))
else:
- data = util.pickle.dumps(self.copy(), 2)
+ data = util.json.dumps(self.copy())
return b64encode(data)
def _decrypt_data(self):
## -504,10 +504,10 ##
self.validate_key + nonce, 1)
payload = b64decode(self.cookie[self.key].value[8:])
data = crypto.aesDecrypt(payload, encrypt_key)
- return util.pickle.loads(data)
+ return util.json.loads(data)
else:
data = b64decode(self.cookie[self.key].value)
- return util.pickle.loads(data)
+ return util.json.loads(data)
def save(self, accessed_only=False):
"""Saves the data for this session to persistent storage"""
diff -r 257f147861c8 beaker/util.py
--- a/beaker/util.py Mon Apr 18 11:38:53 2011 -0400
+++ b/beaker/util.py Sat Apr 30 14:19:12 2011 -0400
## -24,6 +24,11 ##
import pickle
else:
import cPickle as pickle
+
+try:
+ import json
+except ImportError:
+ import simplejson as json
from beaker.converters import asbool
from beaker import exceptions

Categories

Resources