Split string to dictionary based on multiple delimiters - python

Problem:
I get a cluster definition as a string. The definition contains the IP, Port and Trunk details of two interconnected applications for one or more data centers. The structure of the string is the following:
Application1 IPs and Port details: 'N' number of IPs (minimum 2) and exactly 1 port. IPs are separated with commas, while the port is separated with colon.
Application1 config is closed with a semicolon, which is followed by Application2 IPs and trunks.
On Application 2 an IP and Trunk forms a single unit. These units are separated by colons while IP and Trunk information within the units are separated by '|'
This is a sample input which contains the cluster definitions for 2 Data Centers.
cluster_params = "1.1.1.1,2.2.2.2:5002;10.10.0.1|17,10.10.0.2|18,10.10.0.3|19,10.10.0.4|20\n3.3.3.3,4.4.4.4:5003;10.10.1.1|21,10.10.1.2|22,10.10.1.3|23,10.10.1.4|24"
What I need:
I would like to split the cluster_params into a nested dictionary, something like the below sample:
clusters = { 'Cluster1_App1_port_salt': ['App1_IP1', 'App1_IP2', 'App1_IPn'],
'App2_IPs_and_ports': ['App2_Ip1', 'App2_Port1','App2_Ip2', 'App2_Port2', 'App2_Ipn', 'App2_Portn'],
'Cluster2_App1_port_salt': ['App1_IP1', 'App1_IP2', 'App1_IPn'],
'App2_IPs_and_ports': ['App2_Ip1', 'App2_Port1','App2_Ip2', 'App2_Port2', 'App2_Ipn', 'App2_Portn'],}
I can split and get the required variables from the string with the below code, but can't figure out how to put it into a nested dictionary. (App1 is ICM and App2 is MPP in the below code)
import string
import random
cluster_params = "1.1.1.1,2.2.2.2:5002;10.10.0.1|17,10.10.0.2|18,10.10.0.3|19,10.10.0.4|20\n3.3.3.3,4.4.4.4:5003;10.10.1.1|21,10.10.1.2|22,10.10.1.3|23,10.10.1.4|24"
clusters =cluster_params.split('\n')
for i in clusters:
separate_cluster=i.split(';')
icm_config=separate_cluster[0]
mpp_config=separate_cluster[1]
icm_separator=icm_config.split(':')
icm_ips=icm_separator[0]
icm_port=icm_separator[1]
salt=random.choice(string.ascii_lowercase)+random.choice(string.ascii_lowercase)+random.choice(string.ascii_lowercase)+random.choice(string.ascii_lowercase)
PIM_Name='PIM_Connector_' + icm_port + '_' + salt
mpp_separator=mpp_config.split(',')
mpp_ip=[]
mpp_trunk=[]
for i in mpp_separator:
mpp_ip.append(i.split('|')[0])
mpp_trunk.append(i.split('|')[1])
print('ICM_Config: ' + icm_config)
print('Pim_Name: ' + PIM_Name)
print('ICM_IPs: ' + icm_ips)
print('ICM_Port: ' + icm_port)
print('ICM_IPs ' + icm_ips)
print('MPP_Config: ' + mpp_config)
print( mpp_ip)
print( mpp_trunk)
Thanks!

Check this out:
def salt():
return "".join([random.choice(string.ascii_lowercase) for _ in range(4)])
apps_names = ["icm","mpp"]
clusters_dict = {}
for i,cluster in enumerate(cluster_params.split("\n")):
salt_ = salt()
icm,mpp = cluster.split(";")
icm_ips,icm_port= icm.split(":")
icm_ips=icm_ips.split(",")
mpp_ips , mpp_trunks = [], []
for entry in mpp.split(","):
mpp_ip,mpp_trunk = entry.split("|")
mpp_ips.append(mpp_ip)
mpp_trunks.append(mpp_trunk)
args_dict ={}
args_dict["icm_"+str(icm_port)+"_"+salt_]=icm_ips
args_dict["mpp_ips_and_ports"]=[mpp_ips[i]+"_"+mpp_trunks[i] for i in range(len(mpp_ips))]
clusters_dict["cluster"+str(i+1)]=args_dict
print(clusters_dict)
output example :
{'cluster1': {'icm_5002_phua': ['1.1.1.1', '2.2.2.2'], 'mpp_ips_and_ports': ['10.10.0.1_17', '10.10.0.2_18', '10.10.0.3_19', '10.10.0.4_20']}, 'cluster2': {'icm_5003_ppkg': ['3.3.3.3', '4.4.4.4'], 'mpp_ips_and_ports': ['10.10.1.1_21', '10.10.1.2_22', '10.10.1.3_23', '10.10.1.4_24']}}

Related

Read unstructured data in pandas

I have the following unstructured data in a text file, which is message log data from Discord.
[06-Nov-19 03:36 PM] Dyno#0000
{Embed}
Server
**Message deleted in #reddit-feed**
Author: ? | Message ID: 171111183099756545
[12-Nov-19 01:35 PM] Dyno#0000
{Embed}
Member Left
#Unknown User
ID: 171111183099756545
[16-Nov-19 11:25 PM] Dyno#0000
{Embed}
Member Joined
#User
ID: 171111183099756545
Essentially my goal is to parse the data and extract all the join and leave messages then plot the growth of members in the server. Some of the messages are irrelevant, and each message block has varying length of rows too.
Date Member-change
4/24/2020 2
4/25/2020 -1
4/26/2020 3
I've tried parsing the data in a loop but because the data is unstructured and has varying lengths of rows, I'm confused on how to set it up. Is there a way to ignore all blocks without "Member Joined" and "Member Left"?
It is structured text, just not in the way you are expecting.
A file can be structured if the text is written in a consistent format even though normally we think of structured text as field-based.
The fields are separated by a date-based header, followed by the {embed} keyword, followed by the command you are interested in.
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import re
from itertools import count
# Get rid of the newlines for convenience
message = message_log.replace("\n", " ")
# Use a regular expression to split the log file into records
rx = r"(\[\d{2}-\w{3}-\d{2})"
replaced = re.split(rx, message)
# re.split will leave a blank entry as the first entry
replaced.pop(0)
# Each record will be a separate entry in a list
# Unfortunately the date component gets put in a different section of the list
# from the record is refers to and needs to be merged back together
merge_list = list()
for x, y in zip(count(step=2), replaced):
try:
merge_list.append(replaced[x] + replaced[x+1])
except:
continue
# Now a nice clean record list exists, it is possible to get the user count
n = 0
for z in merge_list:
# Split the record into date and context
log_date = re.split("(\d{2}-\w{3}-\d{2})", z)
# Work out whether the count should be incremented or decremented
if "{Embed} Member Joined" in z:
n = n + 1
elif "{Embed} Member Left" in z:
n = n - 1
else:
continue
# log_date[1] is needed to get the date from the record
print(log_date[1] + " " + str(n))

Python - How to handle space as a value of a variable without quotes?

I have a string "bitrate:8000"
I need to convert it to "-bps 8000". Note that the parameter name is changed and so is the delimiter from ':' to space.
Also the delimiters are not fixed always, sometimes I would need to change from ':' to '-' using the same program.
The change rules are supplied as a config file which I am reading through the ConfigParser module. Something like:
[params]
modify_param_name = bitrate/bps
modify_delimiter = :/' '
value = 8000
In my program:
orig_param = modify_param_name.split('/')[0]
new_param = modify_param_name.split('/')[1]
orig_delimiter = modify_delimiter.split('/')[0]
new_delimiter = modify_delimiter.split('/')[1]
new_param_string = new_param + new_delimiter + value
However, this results in the string as below:
-bps' '8000
The question is how can I handle spaces without the ' ' quotes?
The reason why you're getting the ' ' string is probably related to the way you parse your modify_delimiter value.
You're reading that as a string, so that modify_delimiter == ":/' '".
When you're doing:
new_delimiter = modify_delimiter.split('/')[1]
Essentially modify_delimiter.split('/') gives you an array of [':', "' '"].
So when you're doing new_param_string = new_param + new_delimiter + value
, you are concatenating together 'bps' + "' '" + '8000'.
If your modify_delimiter contained the string ':/ ', this would work just fine:
>>> new_param_string = new_param + new_delimiter + value
>>> new_param_string
'bps 8000'
It has been pointed out that you're using ConfigParser. Unfortunatelly, I don't see an option for ConfigParser (either in python 2 or 3) to preserve trailing whitespaces - it looks like they're always stripped.
What I can suggest in that case is that you wrap your string in quotes entirely in your config file:
[params]
modify_param_name = bitrate/bps
modify_delimiter = ":/ "
And in your code, when you initialize modify_delimiter, strip the " on your own:
modify_delimiter = config.get('params', 'modify_delimiter').strip('"')
That way the trailing space will get preserved and you should get your desired output.

Python - PING a list of IP Address from database

Python - PING a list of IP Address from database
I have a list of ip addresses consisting of 200 locations, which in that location there are 4 ip addresses that I need to do ping testing. I intend to make a command which when I write the name or code of a particular location then it will directly ping to 4 ip address at that location. I have learned a bit to create a list that contains the ip address I entered through the command input () like this :
import os
import socket
ip = []
y = ['IP 1 : ','IP 2 : ', 'IP 3 : ', 'IP 4 : ']
while True:
for x in y:
server_ip = input(x)
ip.append(server_ip)
break
for x in ip:
print("\n")
rep = os.system('ping ' + x + " -c 3")
please give me a little advice about the command I want to make so that I no longer need to enter the ip address one by one. which still makes me confused, especially on how to make the existing items in the database into a variable x which we will insert into this command;
rep = os.system ('ping' + x + "-c 3")
EDIT: It now iterates over a CSV file rather than a hard-coded Python dictionary.
I believe you will be better off using python dictionaries rather than python lists. Assuming you are using Python 3.X, this is what you want to run:
import os
import csv
# Save the IPs you want to ping inside YOURFILE.csv
# Then iterate over the CSV rows using a For Loop
# Ensure your ip addresses are under a column titled ip_address
with open('YOURFILE.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
rep = os.system("ping " + row['ip_address'] + " -c 3")

How to store scapy packet data?

I have a DNS packet sniffer built with Scapy that I'd like to store packet data from.
I understand that packet data is stored as a dictionary, which should make it ideal to store in another dictionary or array. I can see using pkt[0].summary that the data is correct and I am getting packets but I cannot figure out how to correctly store it.
As I am new to Python / Scapy, my question is how to store / append this packet data to a dictionary or array as the packets come through.
This is what the code looks like:
#!/usr/bin/env python
from scapy.all import *
from datetime import datetime
import time
import datetime
import sys
# Select interface and ports of interest
interface = 'ens33'
bpf = 'udp and port 53'
# SELECT/FILTER MSGS
def select_DNS(pkt):
pkt_time = pkt.sprintf('%sent.time%')
# SELECT/FILTER DNS MSGS
try:
dict = []
# queries
if DNSQR in pkt and pkt.dport == 53:
domain = pkt.getlayer(DNS).qd.qname.decode() # .decode() gets rid of the b''
print('Q - Time: ' + pkt_time + ' , source IP: ' + pkt[IP].src + ' , domain: ' + domain)
# responses
elif DNSRR in pkt and pkt.sport == 53:
domain = pkt.getlayer(DNS).qd.qname.decode()
print('R - Time: ' + pkt_time + ' , source IP: ' + pkt[IP].src + ' , domain: ' + domain)
except:
pass
# START SNIFFER
sniff(iface=interface, filter=bpf, store=0, prn=select_DNS)
I'm fairly sure the packet structure is not a dictionary, even though it provides some dictionary like features (overriding the slicing notation).
If you want to store the packets in a list (array), just append them as you go.
cache = []
def select_DNS(pkt):
cache.append(pkt)
If you want to store packets to disk, I would suggest writing them out using the wrpacp function to save them in "pcap" format.
wrpcap("temp.cap",pkts)

Extracting Data from Multiple TXT Files and Creating a Summary CSV File in Python

I have a folder with about 50 .txt files containing data in the following format.
=== Predictions on test data ===
inst# actual predicted error distribution (OFTd1_OF_Latency)
1 1:S 2:R + 0.125,*0.875 (73.84)
I need to write a program that combines the following: my index number (i), the letter of the true class (R or S), the letter of the predicted class, and each of the distribution predictions (the decimals less than 1.0).
I would like it to look like the following when finished, but preferably as a .csv file.
ID True Pred S R
1 S R 0.125 0.875
2 R R 0.105 0.895
3 S S 0.945 0.055
. . . . .
. . . . .
. . . . .
n S S 0.900 0.100
I'm a beginner and a bit fuzzy on how to get all of that parsed and then concatenated and appended. Here's what I was thinking, but feel free to suggest another direction if that would be easier.
for i in range(1, n):
s = str(i)
readin = open('mydata/output/output'+s+'out','r')
#The files are all named the same but with different numbers associated
output = open("mydata/summary.csv", "a")
storage = []
for line in readin:
#data extraction/concatenation here
if line.startswith('1'):
id = i
true = # split at the ':' and take the letter after it
pred = # split at the second ':' and take the letter after it
#some have error '+'s and some don't so I'm not exactly sure what to do to get the distributions
ds = # split at the ',' and take the string of 5 digits before it
if pred == 'R':
dr = #skip the character after the comma but take the have characters after
else:
#take the five characters after the comma
lineholder = id+' , '+true+' , '+pred+' , '+ds+' , '+dr
else: continue
output.write(lineholder)
I think using the indexes would be another option, but it might complicate things if the spacing is off in any of the files and I haven't checked this for sure.
Thank you for your help!
Well first of all, if you want to use CSV, you should use CSV module that comes with python. More about this module here: https://docs.python.org/2.7/library/csv.html I won't demonstrate how to use it, because it's pretty simple.
As for reading the input data, here's my suggestion how to break down every line of the data itself. I assume that lines of data in the input file have their values separated by spaces, and each value cannot contain a space:
def process_line(id_, line):
pieces = line.split() # Now we have an array of values
true = pieces[1].split(':')[1] # split at the ':' and take the letter after it
pred = pieces[2].split(':')[1] # split at the second ':' and take the letter after it
if len(pieces) == 6: # There was an error, the + is there
p4 = pieces[4]
else: # There was no '+' only spaces
p4 = pieces[3]
ds = p4.split(',')[0] # split at the ',' and take the string of 5 digits before it
if pred == 'R':
dr = p4.split(',')[0][1:] #skip the character after the comma but take the have??? characters after
else:
dr = p4.split(',')[0]
return id_+' , '+true+' , '+pred+' , '+ds+' , '+dr
What I mainly used here was split function of strings: https://docs.python.org/2/library/stdtypes.html#str.split and in one place this simple syntax of str[1:] to skip the first character of the string (strings are arrays after all, we can use this slicing syntax).
Keep in mind that my function won't handle any errors or lines formated differently than the one you posted as an example. If the values in every line are separated by tabs and not spaces you should replace this line: pieces = line.split() with pieces = line.split('\t').
i think u can separte floats and then combine it with the strings with the help of re module as follows:
import re
file = open('sample.txt','r')
strings=[[num for num in re.findall(r'\d+\.+\d+',i) for i in file.readlines()]]
print (strings)
file.close()
file = open('sample.txt','r')
num=[[num for num in re.findall(r'\w+\:+\w+',i) for i in file.readlines()]]
print (num)
s= num+strings
print s #[['1:S','2:R'],['0.125','0.875','73.84']] output of the code
this prog is written for one line u can use it for multiple line as well but u need to use a loop for that
contents of sample.txt:
1 1:S 2:R + 0.125,*0.875 (73.84)
2 1:S 2:R + 0.15,*0.85 (69.4)
when you run the prog the result will be:
[['1:S,'2:R'],['1:S','2:R'],['0.125','0.875','73.84'],['0.15,'0.85,'69.4']]
simply concatenate them
This uses regular expressions and the CSV module.
import re
import csv
matcher = re.compile(r'[[:blank:]]*1.*:(.).*:(.).* ([^ ]*),[^0-9]?(.*) ')
filenametemplate = 'mydata/output/output%iout'
output = csv.writer(open('mydata/summary.csv', 'w'))
for i in range(1, n):
for line in open(filenametemplate % i):
m = matcher.match(line)
if m:
output.write([i] + list(m.groups()))

Categories

Resources