How to read and match across multiple lines with python - python

I'm trying to extract pertinent information from a large textfile (1000+ lines), most of which isn't important:
ID: 67108866 Virtual-system: root, VPN Name: VPN-NAME-XYZ
Local Gateway: 1.1.1.1, Remote Gateway: 2.2.2.2
Traffic Selector Name: TS-1
Local Identity: ipv4(10.10.10.0-10.10.10.255)
Remote Identity: ipv4(10.20.10.0-10.20.10.255)
Version: IKEv2
DF-bit: clear, Copy-Outer-DSCP Disabled, Bind-interface: st0.287
Port: 500, Nego#: 0, Fail#: 0, Def-Del#: 0 Flag: 0x2c608b29
Multi-sa, Configured SAs# 1, Negotiated SAs#: 1
Tunnel events:
From this I need to extract only certain bits, and example output would be something like:
VPN Name: VPN-NAME-XYZ, Local Gateway: 1.1.1.1, Remote Gateway: 2.2.2.2
I've tried a couple different ways to get this, however my code keeps stopping on the 1st match, I need the code to match 1 line, then move onto the following line and match that:
with open('/path/to/vpn.txt', 'r') as file:
for vpn in file:
vpn = vpn.strip().lower()
name = "xyz"
if name in vpn:
print(vpn)
if "1.1.1.1" in vpn:
print(vpn)
I'm able to print both if I move the 2nd if in line:
with open('/path/to/vpn.txt', 'r') as file:
for vpn in file:
vpn = vpn.strip().lower()
name = "xyz"
if name in vpn:
print(vpn)
if "1.1.1.1" in vpn:
print(vpn)
Is it possible to match clauses on both lines?
I've tried a few different ways, with my indents and matches but can't get it, also the problem with print(vpn) is it's printing the entire line

Use regex to match the regions you need and then get all matched from the entire text. You need not do this line by line as well. An example below.
import re
found_text = []
with open('/path/to/vpn.txt', 'r') as file:
file_text = file.read()
[found_text.extend(found.split(",")) for found in [finds.group(0) for finds in
re.finditer(
r"((VPN Name|Local Gateway|Remote Gateway):.*)",
file_text)]]
# split by comma, if you want it to be splitted further
print(found_text)
This will yield an output like
['VPN Name: VPN-NAME-XYZ', 'Local Gateway: 1.1.1.1', ' Remote Gateway: 2.2.2.2']

Related

Run python file as admin

I'm trying to create a program that ask the user link of any websites to block it (in hosts)
import requests
print('Time to block some websites')
ask = input('> Give me the link... ') ; link = {} # Ask user
try:
r = requests.get(ask) # Try the url
print('Protocol : ' , r.url [:r.url.find(":")]) # Check protocol of the url
url = r.url
print('your url : ' , url)
except:
raise ValueError('Give the url with https/http', ask)
url = url.split('/') ; urlist = list(url) # Split url
link.update({'linko': urlist[2]}) ; link.update({'host': '127.0.0.1 '}) # Add host and url to link
x = link['linko'] ; y = link['host'] # Transform value of dict in string
z = str(y+x) # Assign host and link
f = open('hosts', 'a') # Open document in append mode
f.write(z) ; f.write('\n') # Write z and newline for future use
f.close() # Always close after use
f = open('hosts' , 'r') # Open document in read mode
print(f.read()) # Read document
f.close() # Always close after use
Traceback :
File "C:\Windows\System32\drivers\etc\block.py", line 17, in <module>
f = open('hosts', 'a') # Open document in append mode
PermissionError: [Errno 13] Permission denied: 'hosts'
When I tried to execute the program with runas administrator :
RUNAS ERROR: Unable to run - C:\Windows\System32\drivers\etc\block.py
193: C:\Windows\System32\drivers\etc\block.py is not a valid Win32 application.
How do I get the program to have permissions to add sites to hosts?
Thanks for your answers, I found a 'manually' solution :
Running cmd.exe as admin then move to C:\Windows\System32\drivers\etc and execute block.py
Command prompt :
C:\Windows\system32>cd drivers
C:\Windows\System32\drivers>cd etc
C:\Windows\System32\drivers\etc>python block.py
Time to block some websites
> Give me the link... https://stackoverflow.com/questions/64506935/run-python-file-as-admin
Protocol : https
https://stackoverflow.com/questions/64506935/run-python-file-as-admin
# Copyright (c) 1993-2009 Microsoft Corp.
#
# This is a sample HOSTS file used by Microsoft TCP/IP for Windows.
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one
# space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
# 102.54.94.97 rhino.acme.com # source server
# 38.25.63.10 x.acme.com # x client host
# localhost name resolution is handled within DNS itself.
# 127.0.0.1 localhost
# ::1 localhost
127.0.0.1 stackoverflow.com
C:\Windows\System32\drivers\etc>

regex to grep string from config file in python

I have config file which contains network configurations something like given below.
LISTEN=192.168.180.1 #the network which listen the traffic
NETMASK=255.255.0.0
DOMAIN =test.com
Need to grep the values from the config. the following is my current code.
import re
with open('config.txt') as f:
data = f.read()
listen = re.findall('LISTEN=(.*)',data)
print listen
the variable listen contains
192.168.180.1 #the network which listen the traffic
but I no need the commented information but sometimes comments may not exist like other "NETMASK"
If you really want to this using regular expressions I would suggest changing it to LISTEN=([^#$]+)
Which should match anything up to the pound sign opening the comment or a newline character.
I come up with solution which will have common regex and replace "#".
import re
data = '''
LISTEN=192.168.180.1 #the network which listen the traffic
NETMASK=255.255.0.0
DOMAIN =test.com
'''
#Common regex to get all values
match = re.findall(r'.*=(.*)#*',data)
print "Total match found"
print match
#Remove # part if any
for index,val in enumerate(match):
if "#" in val:
val = (val.split("#")[0]).strip()
match[index] = val
print "Match after removing #"
print match
Output :
Total match found
['192.168.180.1 #the network which listen the traffic', '255.255.0.0', 'test.com']
Match after removing #
['192.168.180.1', '255.255.0.0', 'test.com']
data = """LISTEN=192.168.180.1 #the network which listen the traffic"""
import re
print(re.search(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}', data).group())
>>>192.168.180.1
print(re.search(r'[0-9]+(?:\.[0-9]+){3}', data).group())
>>>192.168.180.1
In my experience regex is slow runtime and not very readable. I would do:
with open('config.txt') as f:
for line in f:
if not line.startswith("LISTEN="):
continue
rest = line.split("=", 1)[1]
nocomment = rest.split("#", 1)[0]
print nocomment
I think the better approach is to read the whole file as the format it is given in. I wrote a couple of tutorials, e.g. for YAML, CSV, JSON.
It looks as if this is an INI file.
Example Code
Example INI file
INI files need a header. I assume it is network:
[network]
LISTEN=192.168.180.1 #the network which listen the traffic
NETMASK=255.255.0.0
DOMAIN =test.com
Python 2
#!/usr/bin/env python
import ConfigParser
import io
# Load the configuration file
with open("config.ini") as f:
sample_config = f.read()
config = ConfigParser.RawConfigParser(allow_no_value=True)
config.readfp(io.BytesIO(sample_config))
# List all contents
print("List all contents")
for section in config.sections():
print("Section: %s" % section)
for options in config.options(section):
print("x %s:::%s:::%s" % (options,
config.get(section, options),
str(type(options))))
# Print some contents
print("\nPrint some contents")
print(config.get('other', 'use_anonymous')) # Just get the value
Python 3
Look at configparser:
#!/usr/bin/env python
import configparser
# Load the configuration file
config = configparser.RawConfigParser(allow_no_value=True)
with open("config.ini") as f:
config.readfp(f)
# Print some contents
print(config.get('network', 'LISTEN'))
gives:
192.168.180.1 #the network which listen the traffic
Hence you need to parse that value as well, as INI seems not to know #-comments.

How to remove multiple lines in a text file using regex in python?

I want to remove multiple lines in a file using regex.
I have a file with something like this :
host host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
host another_host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
Basically, when I choose the host name like host_name for example, it'll detect the line that has it and remove all the lines after it until it encounters the first { :
#before
host host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
host another_host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
#after
host another_host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
I guess we would use something like m = search('r"^host.*}', line) but it works for line by line stuff not for multiple lines.
def remove(filename, hostname):
with open(os.path.abspath("app/static/DATA/{}".format(filename)), "a") as f:
for line in f:
m = search('r"^hostname.*}', line, re.MULTILIGNE)
if m:
#we delete the bloc, I don't know how to do it though
Starting like this?
I have 3 ideas for you.
Try MULTILINE mode. You can read more about it here: https://docs.python.org/3/library/re.html#re.MULTILINE which I think will do what you are asking.
When that just doesn't do the trick, I cheat. I'll run a pre regex to swap all \n to something strange like "this_is_my_special_holder". Now everything is on one line. I'll do the work I want like you have written. Then I'll run a post regex that swaps all "this_is_my_special_holder" back to \n. If you ever get stuck in a language that doesn't support multiline this should always get it done :)
You may just be able to run the regex, my example here does just that:
Here is how I would do this whole thing:
import re
def main(regex_part):
the_regex = re.compile("".join(["host ", regex_part, " {[^}]+}"]))
with open('test.txt', 'r') as myfile:
data=myfile.read()
data = re.sub(the_regex, "", data)
print (data)
with open('test2.txt', 'w') as newfile:
newfile.write(data)
main("host_name")
I open the file with 'with', that way you don't have to close the file handle later. 'r' is to read the file and 'w' is to write the file. The regex simply replaces:
host host_name { everything up to the next } and then the next }
with nothing.
http://regex101.com is a handy site to actually play with the regexs. good luck!

Python: Passing a list of IP addresses as a list of strings

My code is designed to geo-locate IP addresses from a text file. I'm having trouble on the last section. When I run the code, I get a complaint from the map_ip.update line: socket.error: illegal IP address string passed to inet_pton
When I troubleshoot with a print statement, I get the following format:
['$ ip address']
['$ ip address']
['$ ip address']
How do I get country_name_by_addr() to read each IP address in the proper format? It appears my IP addresses are being formatted as a list of strings in individual lists.
# script that geo-locates IP addresses from a consolidated dictionary
import pygeoip
import itertools
import re
# initialize dictionary for IP addresses
count = {}
"""
This loop reads text file line-by-line and
returns one-to-one key:value pairs of IP addresses.
"""
with open('$short_logins.txt path') as f:
for cnt, line in enumerate(f):
ip = re.findall(r'[0-9]+(?:\.[0-9]+){3}', line)
count.update({cnt: ip})
cnt += 1
"""
This line consolidates unique IP addresses. Keys represent how
many times each unique IP address occurs in the text file.
"""
con_count = [(k, len(list(v))) for k, v in itertools.groupby(sorted(count.values)))]
"""
Country lookup:
This section passes each unique IP address from con_count
through country name database. These IP address are not required
to come from con_count.
"""
map_ip = {}
gi = pygeoip.GeoIP('$GeoIP.dat path')
for i in count.itervalues():
map_ip.update({i: gi.country_name_by_addr(i)})
print map_ip
So I solved this dilemma yesterday by doing away with the regular expression:
ip = re.findall(r'[0-9]+(?:\.[0-9]+){3}', line)
I found a much simpler solution by stripping the whitespace in the file and checking to see if the IP address was accounted for. IP addresses are all in the third column hence the [2]:
ip = line.split()[2]
if ip in count:
count[ip] += 1
else:
count.update({ip: 1})
I removed the con_count line as well. Pygeoip functions are much more receptive to lists not made out of regular expressions.

How to extract a word from text in Python

I have this string "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate." in a log file. What I need to do is look for this message and extract the IP address (1.2.3.4) from the log file.
import os
import shutil
import optparse
import sys
def main():
file = open("messages", "r")
log_data = file.read()
file.close()
search_str = "is currently trusted in the white list, but it is now using a new trusted certificate."
index = log_data.find(search_str)
print index
return
if __name__ == '__main__':
main()
How do I extract the IP address? Your response is appreciated.
Really simple answer:
msg = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."
parts = msg.split(' ', 2)
print parts[1]
results in:
1.2.3.4
You could also do REs if you wanted, but for something this simple...
There will be dozens of possible approaches, pros and cons depend on the details of your log file. One example, using the re module:
import re
x = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."
pattern = "IP ([0-9\.]+) is currently trusted in the white list"
m = re.match(pattern, x)
for ip in m.groups():
print ip
If you want to print out every instance of that string in your log file, you'd do something like this:
import re
pattern = "(IP [9-0\.]+ is currently trusted in the white list, but it is now using a new trusted certificate.)"
m = re.match(pattern, log_data)
for match in m.groups():
print match
Use regular expressions.
Code like this:
import re
compiled = re.compile(r"""
.*? # Leading junk
(?P<ipaddress>\d+\.\d+\.\d+\.\d+) # IP address
.*? # Trailing junk
""", re.VERBOSE)
str = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."
m = compiled.match(str)
print m.group("ipaddress")
And you get this:
>>> import re
>>>
>>> compiled = re.compile(r"""
... .*? # Leading junk
... (?P<ipaddress>\d+\.\d+\.\d+\.\d+) # IP address
... .*? # Trailing junk
... """, re.VERBOSE)
>>> str = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."
>>> m = compiled.match(str)
>>> print m.group("ipaddress")
1.2.3.4
Also, I learned there there is a dictionary of matches, groupdict():
>>>> str = "Peer 10.11.6.224 is currently trusted in the white list, but it is now using a new trusted certificate. Consider removing its likely outdated white list entry."
>>>> m = compiled.match(str)
>>>> print m.groupdict()
{'ipaddress': '10.11.6.224'}
Later: fixed that. The initial '.*' was eating your first character match. Changed it to be non-greedy. For consistency (but not necessity), I changed the trailing match, too.
Regular expression is the way to go. But if you fill uncomfortably writing them, you can try a small parser that I wrote (https://github.com/hgrecco/stringparser). It translates a string format to a regular expression. In your case, you will do the following:
from stringparser import Parser
parser = Parser("IP {} is currently trusted in the white list, but it is now using a new trusted certificate.")
ip = parser(text)
If you have a file with multiple lines you can replace the last line by:
with open("log.txt", "r") as fp:
ips = [parser(line) for line in fp]
Good luck.

Categories

Resources