Parser for command line output - python

I am getting command line output in below format
server
3 threads started
1.1.1.1 ONLINE at SUN
version: 1.2.3.4
en: net
1.1.1.2 ONLINE at SUN
version: 1.2.3.5
en: net
1.1.1.3 OFFLINE at SUN
version: 1.2.3.6
en: net
File: xys
high=600
low=70
name=lmn
I want parsed output like
l1 = [
{
"1.1.1.1": {
"status": "ONLINE",
"version": "1.2.3.4",
"en": "net"
},
"1.1.1.2": {
"status": "ONLINE",
"version": "1.2.3.5",
"en": "net"
},
"1.1.1.3": {
"status": "OFFLINE",
"version": "1.2.3.6",
"en": "net"
}
}
]
l2 = {
"File": "xys",
"high": 600,
"low": 70,
"name": "lmn"
}
I am getting all this in a string.
I Have split string by \n and created a list and then From "File" keyword created 2 lists of the main list. Then parsed both lists separately.
index = [i for i in range(len(output)) if "File" in output[i] ]
if index:
list1 = output[:index[0]]
list2 = output[index[0]:]
Is there any other more efficient way to parse this output.

What you did would work alright.
How much you should worry about this would depend on if this is just some quick setup being done for a few automated tests or if this code is for a service in an enterprise environment that has to stay running, but the one thing I would be worried about is what happens if File: ... is no longer the line that follows the IP addresses. If you want to make sure this does not throw of your code, you could go through the string line by line parsing it.
You would need your parser to check for all of the following cases:
The word server
The comments following the word server about how many threads where started
Any other comments after the word server
The IP address (regex is your friend)
The indented area that follows having found an IP addresses
key value pairs separated with a colon
key value pairs separated with an equals sign
But in all reality, I think what you did looks great. It's not that hard to change your code from searching for "File" to something else if that need ever arises. You will want to spend a little bit of time verifying that it appears that "File" does always proceed the IP addresses. If reliability is super important, then you will have some additional work to do in protecting yourself from running into problems later on if the order things come in is changes on you.

The solution provided below does not need to use the number of server threads running, as it can keep track of the thread number by removing all metadata preceding and following the threads' information:
with open("data.txt", "r") as inFile:
lines = [line for line in inFile]
lines = [line for line in lines[2:] if line != '\n']
threads = lines[:-4]
meta = lines[-4:]
l1 = []
l2 = {}
for i in range(0,len(threads),3):
status = threads[i]
version = threads[i+1]
en = threads[i+2]
status = status.split()
name = status[0]
status = status[1]
version = version.split()
version = version[1].strip()
en = en.split()
en = en[1].strip()
l1.append({name : {'status' : status, "version" : version, "en" : en}})
fileInfo = meta[0].strip().split(": ")
l2.update({fileInfo[0] : fileInfo[1]})
for elem in meta[1:]:
item = elem.strip().split("=")
l2.update({item[0] : item[1]})
The result will be:
For l1:
[{'1.1.1.1': {'status': 'ONLINE', 'version': '1.2.3.4', 'en': 'net'}}, {'1.1.1.2': {'status': 'ONLINE', 'version': '1.2.3.5', 'en': 'net'}}, {'1.1.1.3': {'status': 'OFFLINE', 'version': '1.2.3.6', 'en': 'net'}}]
For l2:
{'File': 'xys', 'high': '600', 'low': '70', 'name': 'lmn'}

Related

Extract and parse JSON Data with Python3

I want to ask how to get the list IP Address in my getproject():
addresses={
'int-ext': [
{
'version': 4,
'addr': '192.168.1.201',
'OS-EXT-IPS:type': 'fixed',
'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:e1:95:33'
}
]
},
accessIPv4=,
accessIPv6=
and my code :
data = getproject()
for pj in project:
for dt in data:
if pj.project == dt["name"]:
pj.addr = dt["addresses"]["int-ext"][0]["addr"]
I want to get the IP address and I try to print, but the output still "None" in running code . this code error when I run in python3

CSV Writer writing duplicate rows

I am new to programming and currently trying to parse NMAP XML to CSV.
When i open the CSV file i see that there is more rows than it should of been.
Anyone would be able to tell me what I did wrong with the code below?
from libnmap.parser import NmapParser
import os
import csv
report = 'nmap_results.xml'
with open('nmap_results.csv', 'w') as output_file:
writer = csv.writer(output_file)
for filename in report:
nmap_report = NmapParser.parse_fromfile(report)
for host in nmap_report.hosts:
row = []
for hostname in host.hostnames:
row.append('{}'.format(hostname))
row.append('{}'.format(host.address))
for serv in host.services:
row.append(serv.port)
writer.writerow(row)
Here is the out of CSV:
yahoo.com,media-router-fp1.prod1.media.vip.gq1.yahoo.com,98.137.246.7,53,80,443
red.com,server-13-249-188-122.bos50.r.cloudfront.net,13.249.188.122,53,80,443
google.com,172.18.128.1,53,80,443
cnn.com,151.101.1.67,53
yahoo.com,media-router-fp1.prod1.media.vip.gq1.yahoo.com,98.137.246.7,53,80,443
red.com,server-13-249-188-122.bos50.r.cloudfront.net,13.249.188.122,53,80,443
google.com,172.18.128.1,53,80,443
cnn.com,151.101.1.67,53
yahoo.com,media-router-fp1.prod1.media.vip.gq1.yahoo.com,98.137.246.7,53,80,443
red.com,server-13-249-188-122.bos50.r.cloudfront.net,13.249.188.122,53,80,443
google.com,172.18.128.1,53,80,443
As mentioned above. It looks lik you have too many loops. report is your xml. How many times does each filename appear in nmap_results.xml? Becuase your first loop says
for filename in report
Well if a certain filename like file_A.blah appears more then once in nmap_results.xml, then you will run NmapParser.parse_fromfile(report) on that report name multiple times.
I'm assuming that nmap_report isa nested structure, like a JSON, or another XML
{hosts: [ {hostnames: [ftw, fml],
address : www.aol.com,
services : [ {port: 10 },
{port: 20 } ] },
{hostnames: [wtf, wow],
address : www.yahoo.com,
services : [ {port: 31 },
{port: 41 } ] },
{hostnames: [lol, log]
address : www.msn.com,
services : [ {port: 52 },
{port: 62 } ] },
{hostnames: [omg, okc],
address : www.url.com,
services : [ {port: 404 }
{port: 88 } ] },
] }
Above, is my imagined structure of nmap_report
- nmap_report.hosts is a List with 4 items. Thus for x in blah.hosts is kind of valid
- each item (call it a host) is Dictionary with keys hostnames, address, services
I'm pretending you can call a Python Dictionary with "dot" notation. In any case. The structure works. So, by your loop, we have:
for host in nmap_report.hosts:
row = []
for hostname in host.hostnames:
row.append('{}'.format(hostname))
row.append('{}'.format(host.address))
for serv in host.services:
row.append(serv.port)
writer.writerow(row)``` <--- by the way, this is a typo right? remove these '''
row 1 [] --> [ftw, fml] --> [ftw, fml, www.aol.com] --> [ftw, fml, www.aol.com, 10, 20]
row 2 [] --> [wtf, wow] same pattern
row 3 [] --> [lol, log] same pattern
row 4 [] --> [omg, okc] same pattern
You should check if writer.writerow(row) adds a new line also, again check that those three back ticks (`) are removed from this line.
Based on the example that I deduced from your structure, the reason you have duplicates is because for filename in report: is using the same filename multiple times.
Your filenames from report = 'nmap_results.xml' are not unique.

How to parse tab-delimited text file with 4th column as json and remove certain keys?

I have a text file that is 26 Gb, The line format is as follow
/type/edition /books/OL10000135M 4 2010-04-24T17:54:01.503315 {"publishers": ["Bernan Press"], "physical_format": "Hardcover", "subtitle": "9th November - 3rd December, 1992", "key": "/books/OL10000135M", "title": "Parliamentary Debates, House of Lords, Bound Volumes, 1992-93", "identifiers": {"goodreads": ["6850240"]}, "isbn_13": ["9780107805401"], "languages": [{"key": "/languages/eng"}], "number_of_pages": 64, "isbn_10": ["0107805405"], "publish_date": "December 1993", "last_modified": {"type": "/type/datetime", "value": "2010-04-24T17:54:01.503315"}, "authors": [{"key": "/authors/OL2645777A"}], "latest_revision": 4, "works": [{"key": "/works/OL7925046W"}], "type": {"key": "/type/edition"}, "subjects": ["Government - Comparative", "Politics / Current Events"], "revision": 4}
I'm trying to get only the last columns which is a json and from that Json I'm only trying to save the "title", "isbn 13", "isbn 10"
I was able to save only the last column with this code
csv.field_size_limit(sys.maxsize)
# File names: to read in from and read out to
input_file = '../inputFile/ol_dump_editions_2019-10-31.txt'
output_file = '../outputFile/output.txt'
## ==================== ##
## Using module 'csv' ##
## ==================== ##
with open(input_file) as to_read:
with open(output_file, "w") as tmp_file:
reader = csv.reader(to_read, delimiter = "\t")
writer = csv.writer(tmp_file)
desired_column = [4] # text column
for row in reader: # read one row at a time
myColumn = list(row[i] for i in desired_column) # build the output row (process)
writer.writerow(myColumn) # write it
but this doesn't return a proper json object instead returns everything with a double quotations next to it. Also how would I extract certain values from the json as a new json
EDIT:
"{""publishers"": [""Bernan Press""], ""physical_format"": ""Hardcover"", ""subtitle"": ""9th November - 3rd December, 1992"", ""key"": ""/books/OL10000135M"", ""title"": ""Parliamentary Debates, House of Lords, Bound Volumes, 1992-93"", ""identifiers"": {""goodreads"": [""6850240""]}, ""isbn_13"": [""9780107805401""], ""languages"": [{""key"": ""/languages/eng""}], ""number_of_pages"": 64, ""isbn_10"": [""0107805405""], ""publish_date"": ""December 1993"", ""last_modified"": {""type"": ""/type/datetime"", ""value"": ""2010-04-24T17:54:01.503315""}, ""authors"": [{""key"": ""/authors/OL2645777A""}], ""latest_revision"": 4, ""works"": [{""key"": ""/works/OL7925046W""}], ""type"": {""key"": ""/type/edition""}, ""subjects"": [""Government - Comparative"", ""Politics / Current Events""], ""revision"": 4}"
EDIT 2:
so im trying to read this file which is a tab separated file with the following columns:
type - type of record (/type/edition, /type/work etc.)
key - unique key of the record. (/books/OL1M etc.)
revision - revision number of the record
last_modified - last modified timestamp
JSON - the complete record in JSON format
Im trying to read the JSON file and from that Json im only trying to get the "title", "isbn 13", "isbn 10" as a json and save it to the file as a row
so every row should look like the original but with only those key and values
Here's a straight-forward way of doing it. You would need to repeat this and extract the desired data from each line of the file as it's being read, line-by-line (the default way text file reading is handled in Python).
import json
line = '/type/edition /books/OL10000135M 4 2010-04-24T17:54:01.503315 {"publishers": ["Bernan Press"], "physical_format": "Hardcover", "subtitle": "9th November - 3rd December, 1992", "key": "/books/OL10000135M", "title": "Parliamentary Debates, House of Lords, Bound Volumes, 1992-93", "identifiers": {"goodreads": ["6850240"]}, "isbn_13": ["9780107805401"], "languages": [{"key": "/languages/eng"}], "number_of_pages": 64, "isbn_10": ["0107805405"], "publish_date": "December 1993", "last_modified": {"type": "/type/datetime", "value": "2010-04-24T17:54:01.503315"}, "authors": [{"key": "/authors/OL2645777A"}], "latest_revision": 4, "works": [{"key": "/works/OL7925046W"}], "type": {"key": "/type/edition"}, "subjects": ["Government - Comparative", "Politics / Current Events"], "revision": 4}'
csv_cols = line.split('\t')
json_data = json.loads(csv_cols[4])
#print(json.dumps(json_data, indent=4))
desired = {key: json_data[key] for key in ("title", "isbn_13", "isbn_10")}
result = json.dumps(desired, indent=4)
print(result)
Output from sample line:
{
"title": "Parliamentary Debates, House of Lords, Bound Volumes, 1992-93",
"isbn_13": [
"9780107805401"
],
"isbn_10": [
"0107805405"
]
}
So given that your current code returns the following:
result = '{""publishers"": [""Bernan Press""], ""physical_format"": ""Hardcover"", ""subtitle"": ""9th November - 3rd December, 1992"", ""key"": ""/books/OL10000135M"", ""title"": ""Parliamentary Debates, House of Lords, Bound Volumes, 1992-93"", ""identifiers"": {""goodreads"": [""6850240""]}, ""isbn_13"": [""9780107805401""], ""languages"": [{""key"": ""/languages/eng""}], ""number_of_pages"": 64, ""isbn_10"": [""0107805405""], ""publish_date"": ""December 1993"", ""last_modified"": {""type"": ""/type/datetime"", ""value"": ""2010-04-24T17:54:01.503315""}, ""authors"": [{""key"": ""/authors/OL2645777A""}], ""latest_revision"": 4, ""works"": [{""key"": ""/works/OL7925046W""}], ""type"": {""key"": ""/type/edition""}, ""subjects"": [""Government - Comparative"", ""Politics / Current Events""], ""revision"": 4}'
Looks like what you need to do is: First - Replace those double-double-quotes with regular double quotes, otherwise things are not parsible:
res = result.replace('""','"')
Now res is convertible to a JSON object:
import json
my_json = json.loads(res)
my_json now looks like this:
{'authors': [{'key': '/authors/OL2645777A'}],
'identifiers': {'goodreads': ['6850240']},
'isbn_10': ['0107805405'],
'isbn_13': ['9780107805401'],
'key': '/books/OL10000135M',
'languages': [{'key': '/languages/eng'}],
'last_modified': {'type': '/type/datetime',
'value': '2010-04-24T17:54:01.503315'},
'latest_revision': 4,
'number_of_pages': 64,
'physical_format': 'Hardcover',
'publish_date': 'December 1993',
'publishers': ['Bernan Press'],
'revision': 4,
'subjects': ['Government - Comparative', 'Politics / Current Events'],
'subtitle': '9th November - 3rd December, 1992',
'title': 'Parliamentary Debates, House of Lords, Bound Volumes, 1992-93',
'type': {'key': '/type/edition'},
'works': [{'key': '/works/OL7925046W'}]}
You can conveniently get any field you want from this object:
my_json['title']
# 'Parliamentary Debates, House of Lords, Bound Volumes, 1992-93'
my_json['isbn_10'][0]
# '0107805405'
Especially because your example is so large, I'd recommend using a specialized library such as pandas, which has a read_csv method, or even dask, which supports out-of-memory operations.
Both of these systems will automatically parse out the quotations for you, and dask will do so in "pieces" direct from disk so you never have to try to load 26GB into RAM.
In both libraries, you can then access the columns you want like this:
data = read_csv(PATH)
data["ColumnName"]
You can then parse these rows either using json.loads() (import json) or you can use the pandas/dask json implementations. If you can give some more details of what you're expecting, I can help you draft a more specific code example.
Good luck!
I saved your data to a file to see if i could read just the rows, let me know if this works:
lines = zzread.split('\n')
temp=[]
for to_read in lines:
if len(to_read) == 0:
break
new_to_read = '{' + to_read.split('{',1)[1]
temp.append(json.loads(new_to_read))
for row in temp:
print(row['isbn_13'])
If that works this should create a json for you:
lines = zzread.split('\n')
temp=[]
for to_read in lines:
if len(to_read) == 0:
break
new_to_read = '{' + to_read.split('{',1)[1]
temp.append(json.loads(new_to_read))
new_json=[]
for row in temp:
new_json.append({'title': row['title'], 'isbn_13': row['isbn_13'], 'isbn_10': row['isbn_10']})

Dictionary from a String with particular structure

I am using python 3 to read this file and convert it to a dictionary.
I have this string from a file and I would like to know how could be possible to create a dictionary from it.
[User]
Date=10/26/2003
Time=09:01:01 AM
User=teodor
UserText=Max Cor
UserTextUnicode=392039n9dj90j32
[System]
Type=Absolute
Dnumber=QS236
Software=1.1.1.2
BuildNr=0923875
Source=LAM
Column=OWKD
[Build]
StageX=12345
Spotter=2
ApertureX=0.0098743
ApertureY=0.2431899
ShiftXYZ=-4.234809e-002
[Text]
Text=Here is the Text files
DataBaseNumber=The database number is 918723
..... (There are more than 1000 lines per file) ...
On the text I have "Name=Something" and then I would like to convert it as follows:
{'Date':'10/26/2003',
'Time':'09:01:01 AM'
'User':'teodor'
'UserText':'Max Cor'
'UserTextUnicode':'392039n9dj90j32'.......}
The word between [ ] can be removed, like [User], [System], [Build], [Text], etc...
In some fields there is only the first part of the string:
[Colors]
Red=
Blue=
Yellow=
DarkBlue=
What you have is an ordinary properties file. You can use this example to read the values into map:
try (InputStream input = new FileInputStream("your_file_path")) {
Properties prop = new Properties();
prop.load(input);
// prop.getProperty("User") == "teodor"
} catch (IOException ex) {
ex.printStackTrace();
}
EDIT:
For Python solution, refer to the answerred question.
You can use configparser to read .ini, or .properties files (format you have).
import configparser
config = configparser.ConfigParser()
config.read('your_file_path')
# config['User'] == {'Date': '10/26/2003', 'Time': '09:01:01 AM'...}
# config['User']['User'] == 'teodor'
# config['System'] == {'Type': 'Abosulte', ...}
Can easily be done in python. Assuming your file is named test.txt.
This will also work for lines with nothing after the = as well as lines with multiple =.
d = {}
with open('test.txt', 'r') as f:
for line in f:
line = line.strip() # Remove any space or newline characters
parts = line.split('=') # Split around the `=`
if len(parts) > 1:
d[parts[0]] = ''.join(parts[1:])
print(d)
Output:
{
"Date": "10/26/2003",
"Time": "09:01:01 AM",
"User": "teodor",
"UserText": "Max Cor",
"UserTextUnicode": "392039n9dj90j32",
"Type": "Absolute",
"Dnumber": "QS236",
"Software": "1.1.1.2",
"BuildNr": "0923875",
"Source": "LAM",
"Column": "OWKD",
"StageX": "12345",
"Spotter": "2",
"ApertureX": "0.0098743",
"ApertureY": "0.2431899",
"ShiftXYZ": "-4.234809e-002",
"Text": "Here is the Text files",
"DataBaseNumber": "The database number is 918723"
}
I would suggest to do some cleaning to get rid of the [] lines.
After that you can split those lines by the "=" separator and then convert it to a dictionary.

writing json-ish list to csv, line by line, in python for bitcoin addresses

I'm querying the onename api in an effort to get the bitcoin addresses of all the users.
At the moment I'm getting all the user information as a json-esque list, and then piping the output to a file, it looks like this:
[{'0': {'owner_address': '1Q2Tv6f9vXbdoxRmGwNrHbjrrK4Hv6jCsz', 'zone_file': '{"avatar": {"url": "https://s3.amazonaws.com/kd4/111"}, "bitcoin": {"address": "1NmLvYVEZqPGeQNcgFS3DdghpoqaH4r5Xh"}, "cover": {"url": "https://s3.amazonaws.com/dx3/111"}, "facebook": {"proof": {"url": "https://facebook.com/jasondrake1978/posts/10152769170542776"}, "username": "jasondrake1978"}, "graph": {"url": "https://s3.amazonaws.com/grph/111"}, "location": {"formatted": "Mechanicsville, Va"}, "name": {"formatted": "Jason Drake"}, "twitter": {"username": "000001"}, "v": "0.2", "website": "http://1642.com"}', 'verifications': [{'proof_url': 'https://facebook.com/jasondrake1978/posts/10152769170542776', 'service': 'facebook', 'valid': False, 'identifier': 'jasondrake1978'}], 'profile': {'website': 'http://1642.com', 'cover': {'url': 'https://s3.amazonaws.com/dx3/111'}, 'facebook': {'proof': {'url': 'https://facebook.com/jasondrake1978/posts/10152769170542776'}, 'username': 'jasondrake1978'}, 'twitter': {'username': '000001'}, 'bitcoin': {'address': '1NmLvYVEZqPGeQNcgFS3DdghpoqaH4r5Xh'}, 'name': {'formatted': 'Jason Drake'}, 'graph': {'url': 'https://s3.amazonaws.com/grph/111'}, 'location': {'formatted': 'Mechanicsville, Va'}, 'avatar': {'url': 'https://s3.amazonaws.com/kd4/111'}, 'v': '0.2'}}}]
what I'm really interested in is the field {"address": "1NmLvYVEZqPGeQNcgFS3DdghpoqaH4r5Xh"}, the rest of the stuff I don't need, I just want the addresses of every user.
Is there a way that I can just write only the addresses to a file using python?
I'm trying to write it as something like:
1NmLvYVEZqPGeQNcgFS3DdghpoqaH4r5Xh,
1GA9RVZHuEE8zm4ooMTiqLicfnvymhzRVm,
1BJdMS9E5TUXxJcAvBriwvDoXmVeJfKiFV,
1NmLvYVEZqPGeQNcgFS3DdghpoqaH4r5Xh,
...
and so on.
I've tried a number of different ways using dump, dumps, etc. but I haven't yet been able to pin it down.
My code looks like this:
import os
import json
import requests
#import py2neo
import csv
# set up authentication parameters
#py2neo.authenticate("46.101.180.63:7474", "neo4j", "uni-bonn")
# Connect to graph and add constraints.
neo4jUrl = os.environ.get('NEO4J_URL',"http://46.101.180.63:7474/db/data/")
#graph = py2neo.Graph(neo4jUrl)
# Add uniqueness constraints.
#graph.run("CREATE CONSTRAINT ON (q:Person) ASSERT q.id IS UNIQUE;")
# Build URL.
apiUrl = "https://api.onename.com/v1/users"
# apiUrl = "https://raw.githubusercontent.com/s-matthew-english/26.04/master/test.json"
# Send GET request.
Allusersjson = requests.get(apiUrl, headers = {"accept":"application/json"}).json()
#print(json)])
UsersDetails=[]
for username in Allusersjson['usernames']:
usernamex= username[:-3]
apiUrl2="https://api.onename.com/v1/users/"+usernamex+"?app-id=demo-app-id&app-secret=demo-app-secret"
userinfo=requests.get(apiUrl2, headers = {"accept":"application/json"}).json()
# try:
# if('bitcoin' not in userinfo[usernamex]['profile']):
# continue
# else:
# UsersDetails.append(userinfo)
# except:
# continue
try:
address = userinfo[usernamex]["profile"]["bitcoin"]["address"]
UsersDetails.append(address)
except KeyError:
pass # no address
out = "\n".join(UsersDetails)
print(out)
open("out.csv", "w").write(out)
# f = csv.writer(open("test.csv", "wb+"))
# Build query.
query = """
RETURN {json}
"""
# Send Cypher query.
# py2neo.CypherQuery(graph, query).run(json=json)
# graph.run(query).run(json=json)
#graph.run(query,json=json)
anyway, in such a situation, what's the best way to write out those addresses as csv :/
UPDATE
I ran it, and at first it worked, but then I got the following error:
Instead of adding all the information to the UsersDetails list
UsersDetails.append(userinfo)
you can add just the relevant part (address)
try:
address = userinfo[usernamex]["profile"]["bitcoin"]["address"]
UsersDetails.append(address)
except KeyError:
pass # no address
except TypeError:
pass # illformed data
To print the values to the screen:
out = "\n".join(UsersDetails)
print(out)
(replace "\n" with "," for comma separated output, instead of one per line)
To save to a file:
open("out.csv", "w").write(out)
You need to reformat the list, either through map() or a list comprehension, to get it down to just the information you want. For example, if the top-level key used in the response from the api.onename.com API is always 0, you can do something like this
UsersAddresses = [user['0']['profile']['bitcoin']['address'] for user in UsersDetails]

Categories

Resources