CSV Writer writing duplicate rows - python

I am new to programming and currently trying to parse NMAP XML to CSV.
When i open the CSV file i see that there is more rows than it should of been.
Anyone would be able to tell me what I did wrong with the code below?
from libnmap.parser import NmapParser
import os
import csv
report = 'nmap_results.xml'
with open('nmap_results.csv', 'w') as output_file:
writer = csv.writer(output_file)
for filename in report:
nmap_report = NmapParser.parse_fromfile(report)
for host in nmap_report.hosts:
row = []
for hostname in host.hostnames:
row.append('{}'.format(hostname))
row.append('{}'.format(host.address))
for serv in host.services:
row.append(serv.port)
writer.writerow(row)
Here is the out of CSV:
yahoo.com,media-router-fp1.prod1.media.vip.gq1.yahoo.com,98.137.246.7,53,80,443
red.com,server-13-249-188-122.bos50.r.cloudfront.net,13.249.188.122,53,80,443
google.com,172.18.128.1,53,80,443
cnn.com,151.101.1.67,53
yahoo.com,media-router-fp1.prod1.media.vip.gq1.yahoo.com,98.137.246.7,53,80,443
red.com,server-13-249-188-122.bos50.r.cloudfront.net,13.249.188.122,53,80,443
google.com,172.18.128.1,53,80,443
cnn.com,151.101.1.67,53
yahoo.com,media-router-fp1.prod1.media.vip.gq1.yahoo.com,98.137.246.7,53,80,443
red.com,server-13-249-188-122.bos50.r.cloudfront.net,13.249.188.122,53,80,443
google.com,172.18.128.1,53,80,443

As mentioned above. It looks lik you have too many loops. report is your xml. How many times does each filename appear in nmap_results.xml? Becuase your first loop says
for filename in report
Well if a certain filename like file_A.blah appears more then once in nmap_results.xml, then you will run NmapParser.parse_fromfile(report) on that report name multiple times.
I'm assuming that nmap_report isa nested structure, like a JSON, or another XML
{hosts: [ {hostnames: [ftw, fml],
address : www.aol.com,
services : [ {port: 10 },
{port: 20 } ] },
{hostnames: [wtf, wow],
address : www.yahoo.com,
services : [ {port: 31 },
{port: 41 } ] },
{hostnames: [lol, log]
address : www.msn.com,
services : [ {port: 52 },
{port: 62 } ] },
{hostnames: [omg, okc],
address : www.url.com,
services : [ {port: 404 }
{port: 88 } ] },
] }
Above, is my imagined structure of nmap_report
- nmap_report.hosts is a List with 4 items. Thus for x in blah.hosts is kind of valid
- each item (call it a host) is Dictionary with keys hostnames, address, services
I'm pretending you can call a Python Dictionary with "dot" notation. In any case. The structure works. So, by your loop, we have:
for host in nmap_report.hosts:
row = []
for hostname in host.hostnames:
row.append('{}'.format(hostname))
row.append('{}'.format(host.address))
for serv in host.services:
row.append(serv.port)
writer.writerow(row)``` <--- by the way, this is a typo right? remove these '''
row 1 [] --> [ftw, fml] --> [ftw, fml, www.aol.com] --> [ftw, fml, www.aol.com, 10, 20]
row 2 [] --> [wtf, wow] same pattern
row 3 [] --> [lol, log] same pattern
row 4 [] --> [omg, okc] same pattern
You should check if writer.writerow(row) adds a new line also, again check that those three back ticks (`) are removed from this line.
Based on the example that I deduced from your structure, the reason you have duplicates is because for filename in report: is using the same filename multiple times.
Your filenames from report = 'nmap_results.xml' are not unique.

Related

Parser for command line output

I am getting command line output in below format
server
3 threads started
1.1.1.1 ONLINE at SUN
version: 1.2.3.4
en: net
1.1.1.2 ONLINE at SUN
version: 1.2.3.5
en: net
1.1.1.3 OFFLINE at SUN
version: 1.2.3.6
en: net
File: xys
high=600
low=70
name=lmn
I want parsed output like
l1 = [
{
"1.1.1.1": {
"status": "ONLINE",
"version": "1.2.3.4",
"en": "net"
},
"1.1.1.2": {
"status": "ONLINE",
"version": "1.2.3.5",
"en": "net"
},
"1.1.1.3": {
"status": "OFFLINE",
"version": "1.2.3.6",
"en": "net"
}
}
]
l2 = {
"File": "xys",
"high": 600,
"low": 70,
"name": "lmn"
}
I am getting all this in a string.
I Have split string by \n and created a list and then From "File" keyword created 2 lists of the main list. Then parsed both lists separately.
index = [i for i in range(len(output)) if "File" in output[i] ]
if index:
list1 = output[:index[0]]
list2 = output[index[0]:]
Is there any other more efficient way to parse this output.
What you did would work alright.
How much you should worry about this would depend on if this is just some quick setup being done for a few automated tests or if this code is for a service in an enterprise environment that has to stay running, but the one thing I would be worried about is what happens if File: ... is no longer the line that follows the IP addresses. If you want to make sure this does not throw of your code, you could go through the string line by line parsing it.
You would need your parser to check for all of the following cases:
The word server
The comments following the word server about how many threads where started
Any other comments after the word server
The IP address (regex is your friend)
The indented area that follows having found an IP addresses
key value pairs separated with a colon
key value pairs separated with an equals sign
But in all reality, I think what you did looks great. It's not that hard to change your code from searching for "File" to something else if that need ever arises. You will want to spend a little bit of time verifying that it appears that "File" does always proceed the IP addresses. If reliability is super important, then you will have some additional work to do in protecting yourself from running into problems later on if the order things come in is changes on you.
The solution provided below does not need to use the number of server threads running, as it can keep track of the thread number by removing all metadata preceding and following the threads' information:
with open("data.txt", "r") as inFile:
lines = [line for line in inFile]
lines = [line for line in lines[2:] if line != '\n']
threads = lines[:-4]
meta = lines[-4:]
l1 = []
l2 = {}
for i in range(0,len(threads),3):
status = threads[i]
version = threads[i+1]
en = threads[i+2]
status = status.split()
name = status[0]
status = status[1]
version = version.split()
version = version[1].strip()
en = en.split()
en = en[1].strip()
l1.append({name : {'status' : status, "version" : version, "en" : en}})
fileInfo = meta[0].strip().split(": ")
l2.update({fileInfo[0] : fileInfo[1]})
for elem in meta[1:]:
item = elem.strip().split("=")
l2.update({item[0] : item[1]})
The result will be:
For l1:
[{'1.1.1.1': {'status': 'ONLINE', 'version': '1.2.3.4', 'en': 'net'}}, {'1.1.1.2': {'status': 'ONLINE', 'version': '1.2.3.5', 'en': 'net'}}, {'1.1.1.3': {'status': 'OFFLINE', 'version': '1.2.3.6', 'en': 'net'}}]
For l2:
{'File': 'xys', 'high': '600', 'low': '70', 'name': 'lmn'}

Dictionary from a String with particular structure

I am using python 3 to read this file and convert it to a dictionary.
I have this string from a file and I would like to know how could be possible to create a dictionary from it.
[User]
Date=10/26/2003
Time=09:01:01 AM
User=teodor
UserText=Max Cor
UserTextUnicode=392039n9dj90j32
[System]
Type=Absolute
Dnumber=QS236
Software=1.1.1.2
BuildNr=0923875
Source=LAM
Column=OWKD
[Build]
StageX=12345
Spotter=2
ApertureX=0.0098743
ApertureY=0.2431899
ShiftXYZ=-4.234809e-002
[Text]
Text=Here is the Text files
DataBaseNumber=The database number is 918723
..... (There are more than 1000 lines per file) ...
On the text I have "Name=Something" and then I would like to convert it as follows:
{'Date':'10/26/2003',
'Time':'09:01:01 AM'
'User':'teodor'
'UserText':'Max Cor'
'UserTextUnicode':'392039n9dj90j32'.......}
The word between [ ] can be removed, like [User], [System], [Build], [Text], etc...
In some fields there is only the first part of the string:
[Colors]
Red=
Blue=
Yellow=
DarkBlue=
What you have is an ordinary properties file. You can use this example to read the values into map:
try (InputStream input = new FileInputStream("your_file_path")) {
Properties prop = new Properties();
prop.load(input);
// prop.getProperty("User") == "teodor"
} catch (IOException ex) {
ex.printStackTrace();
}
EDIT:
For Python solution, refer to the answerred question.
You can use configparser to read .ini, or .properties files (format you have).
import configparser
config = configparser.ConfigParser()
config.read('your_file_path')
# config['User'] == {'Date': '10/26/2003', 'Time': '09:01:01 AM'...}
# config['User']['User'] == 'teodor'
# config['System'] == {'Type': 'Abosulte', ...}
Can easily be done in python. Assuming your file is named test.txt.
This will also work for lines with nothing after the = as well as lines with multiple =.
d = {}
with open('test.txt', 'r') as f:
for line in f:
line = line.strip() # Remove any space or newline characters
parts = line.split('=') # Split around the `=`
if len(parts) > 1:
d[parts[0]] = ''.join(parts[1:])
print(d)
Output:
{
"Date": "10/26/2003",
"Time": "09:01:01 AM",
"User": "teodor",
"UserText": "Max Cor",
"UserTextUnicode": "392039n9dj90j32",
"Type": "Absolute",
"Dnumber": "QS236",
"Software": "1.1.1.2",
"BuildNr": "0923875",
"Source": "LAM",
"Column": "OWKD",
"StageX": "12345",
"Spotter": "2",
"ApertureX": "0.0098743",
"ApertureY": "0.2431899",
"ShiftXYZ": "-4.234809e-002",
"Text": "Here is the Text files",
"DataBaseNumber": "The database number is 918723"
}
I would suggest to do some cleaning to get rid of the [] lines.
After that you can split those lines by the "=" separator and then convert it to a dictionary.

How to access certain values of a json site via python

This is the code i have so far:
import json
import requests
import time
endpoint = "https://www.deadstock.ca/collections/new-arrivals/products/nike-
air-max-1-cool-grey.json"
req = requests.get(endpoint)
reqJson = json.loads(req.text)
for id in reqJson['product']:
name = (id['title'])
print (name)
Feel free to visit the link, I'm trying to grab all the "id" value and print them out. They will be used later to send to my discord.
I tried with my above code but i have no idea how to actually get those values. I don't know which variable to use in the for in reqjson statement
If anyone could help me out and guide me to get all of the ids to print that would be awesome.
for product in reqJson['product']['title']:
ProductTitle = product['title']
print (title)
I see from the link you provided that the only ids that are in a list are actually part of the variants list under product. All the other ids are not part of a list and have therefore no need to iterate over. Here's an excerpt of the data for clarity:
{
"product":{
"id":232418213909,
"title":"Nike Air Max 1 \/ Cool Grey",
...
"variants":[
{
"id":3136193822741,
"product_id":232418213909,
"title":"8",
...
},
{
"id":3136193855509,
"product_id":232418213909,
"title":"8.5",
...
},
{
"id":3136193789973,
"product_id":232418213909,
"title":"9",
...
},
...
],
"image":{
"id":3773678190677,
"product_id":232418213909,
"position":1,
...
}
}
}
So what you need to do should be to iterate over the list of variants under product instead:
import json
import requests
endpoint = "https://www.deadstock.ca/collections/new-arrivals/products/nike-air-max-1-cool-grey.json"
req = requests.get(endpoint)
reqJson = json.loads(req.text)
for product in reqJson['product']['variants']:
print(product['id'], product['title'])
This outputs:
3136193822741 8
3136193855509 8.5
3136193789973 9
3136193757205 9.5
3136193724437 10
3136193691669 10.5
3136193658901 11
3136193626133 12
3136193593365 13
And if you simply want the product id and product name, they would be reqJson['product']['id'] and reqJson['product']['title'], respectively.

Dynamically create list/dict during for loop for conversion to JSON

I am trying to build a list/dict that will be converted to JSON later on. I am trying to write the code that builds and populates the multiple levels of the JSON format I ultimately need. I am having an issue wrapping my head around this. Thank you for the help.
What I ultimately need -> Populate this list/dict:
dataset_permission_json = []
with this format:
{
"projects":[
{
"project":"test-project-1",
"datasets":[
{
"dataset":"testing1",
"permissions":[
{
"role":"READER",
"google_group":"testing1#test.com"
}
]
},
{
"dataset":"testing2",
"permissions":[
{
"role":"OWNER",
"google_group":"testing2#test.com"
}
]
},
{
"dataset":"testing3",
"permissions":[
{
"role":"READER",
"google_group":"testing3#test.com"
}
]
},
{
"dataset":"testing4",
"permissions":[
{
"role":"WRITER",
"google_group":"testing4#test.com"
}
]
}
]
}
]
}
I have multiple for loops that successfully print out the information I am pulling from an external API but I to be able to enter that data into the list/dict. The dynamic values I am trying to input are:
'project' i.e. test-project-1
'dataset' i.e. testing1
'role' i.e. READER
'google_group' i.e. testing1#test.com
I have tried things like:
dataset_permission_json.update({'project': project})
but cannot figure out how not to overwrite the data during the multiple for loops.
for project in projects:
print(project) ## Need to add this variable to 'projects'
for bq_group in bq_groups:
delegated_credentials = credentials.create_delegated(bq_group)
http_auth = delegated_credentials.authorize(Http())
list_datasets_in_project = bigquery_service.datasets().list(projectId=project).execute()
datasets = list_datasets_in_project.get('datasets',[])
print(dataset['datasetReference']['datasetId']) ##Add the dataset to 'datasets' under the project
for dataset in datasets:
get_dataset_permissions_result = bigquery_service.datasets().get(projectId=project, datasetId=dataset['datasetReference']['datasetId']).execute()
dataset_permissions = get_dataset_permissions_result.get('access',[])
### ADD THE NEXT LEVEL 'permissions' level here?
for dataset_permission in dataset_permissions:
if 'groupByEmail' in dataset_permission:
if bq_group in dataset_permission['groupByEmail']:
print(dataset['datasetReference']['datasetId'] && dataset_permission['groupByEmail']) ##Add to each dataset
I appreciate the help.
EDIT: Updated Progress
Ok I have created the nested structure that I was looking for using StackOverflow
Things are great except for the last part. I am trying to append the role & group to each 'permission' nest, but after everything runs the data is only appended to the last 'permission' nest in the JSON structure. It seems like it is overwriting itself during the for loop. Thoughts?
Updated for loop:
for project in projects:
for bq_group in bq_groups:
delegated_credentials = credentials.create_delegated(bq_group)
http_auth = delegated_credentials.authorize(Http())
list_datasets_in_project = bigquery_service.datasets().list(projectId=project).execute()
datasets = list_datasets_in_project.get('datasets',[])
for dataset in datasets:
get_dataset_permissions_result = bigquery_service.datasets().get(projectId=project, datasetId=dataset['datasetReference']['datasetId']).execute()
dataset_permissions = get_dataset_permissions_result.get('access',[])
for dataset_permission in dataset_permissions:
if 'groupByEmail' in dataset_permission:
if bq_group in dataset_permission['groupByEmail']:
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions']
permission = {'group': dataset_permission['groupByEmail'],'role': dataset_permission['role']}
dataset_permission_json['permissions'] = permission
UPDATE: Solved.
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions']
permission = {'group': dataset_permission['groupByEmail'],'role': dataset_permission['role']}
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions'] = permission

python json to csv converting script?

Let me start by stating that I am new to python. I wrote a script that will convert a .json file to csv format. I managed to write a script to do the job, however I don't think that my script will work if the format of the json file was to change. My script assumes that the json file will be in the same format at all times.
<json file example>
{
"Order":
{
"order_id":"8251662",
"order_date":"2012-08-20 13:17:37",
"order_date_shipped":"0000-00-00 00:00:00",
"order_status":"fraudreview",
"order_ship_firstname":"pam",
"order_ship_lastname":"Gregorio",
"order_ship_address1":"1533 E. Dexter St",
"order_ship_address2":"",
"order_ship_city":"Covina",
"order_ship_state":"CA",
"order_ship_zip":"91746",
"order_ship_country":"US United States",
"order_ship_phone":"6268936923",
"order_ship_email":"pgregorio#brighton.com",
"order_bill_firstname":"pam",
"order_bill_lastname":"Gregorio",
"order_bill_address1":"1533 E. Dexter St",
"order_bill_address2":"",
"order_bill_city":"Covina",
"order_bill_state":"CA",
"order_bill_zip":"91746",
"order_bill_country":"US United States",
"order_bill_phone":"6268936923",
"order_bill_email":"pgregorio#brighton.com",
"order_gift_message":"",
"order_giftwrap":"0",
"order_gift_charge":"0",
"order_shipping":"Standard (Within 5-10 Business Days)",
"order_tax_charge":"62.83",
"order_tax_shipping":"0",
"order_tax_rate":"0.0875",
"order_shipping_charge":"7.5",
"order_total":"788.33",
"order_item_count":"12",
"order_tracking":"",
"order_carrier":"1"
},
"Items":
[
{
"item_id":"25379",
"item_date_shipped":"",
"item_code":"17345-J3553-J35532",
"item_quantity":"2","item_taxable":"YES",
"item_unit_price":"32","item_shipping":"0.67",
"item_addcharge_price":"0",
"item_description":" ABC Slide Bracelet: : Size: OS: Silver Sku: J35532",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17345",
"item_product_kit_id":"0",
"item_product_sku":"J35532",
"item_product_barcode":"881934310775",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25382",
"item_date_shipped":"",
"item_code":"17608-J3809-J3809C",
"item_quantity":"1",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.23",
"item_addcharge_price":"0",
"item_description":" \"ABC Starter Bracelet 7 1\/4\"\"\": : Size: OS: Silver Sku: J3809C",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17608",
"item_product_kit_id":"0",
"item_product_sku":"J3809C",
"item_product_barcode":"881934594175",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25385",
"item_date_shipped":"",
"item_code":"17687-J9200-J92000",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"12",
"item_shipping":"0.25",
"item_addcharge_price":"0",
"item_description":" ABC Cathedral Bead: : Size: OS: Silver Sku: J92000",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17687",
"item_product_kit_id":"0",
"item_product_sku":"J92000",
"item_product_barcode":"881934602832",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25388",
"item_date_shipped":"",
"item_code":"17766-J9240-J92402",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.46",
"item_addcharge_price":"0",
"item_description":" ABC Ice Diva Bead: : Size: OS: Silver Sku: J92402",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17766",
"item_product_kit_id":"0",
"item_product_sku":"J92402",
"item_product_barcode":"881934655838",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
],
"FraudReasons":
[
{
"order_id":"11957",
"fraud_reason":"order total exceeds max amount"
},
{
"order_id":"11957",
"fraud_reason":"order exceeds max item count"
}
]
}
My script currently works fine with this json file but It wont work if there is only one item or one fraudreason. Here is the code to my script.
<script code>
#!/usr/bin/python
import simplejson as json
import optparse
import pycurl
import sys
import csv
json_data = open(file)
data = json.load(json_data)
json_data.close()
csv_file = '/tmp/' + str(options.orderId) + '.csv'
orders = data['Order']
items = data['Items']
frauds = data['FraudReasons']
o = csv.writer(open(csv_file, 'w'), lineterminator=',')
o.writerow([orders['order_id'],orders['order_date'],orders['order_date_shipped'],orders['order_status'],orders['order_ship_firstname'],orders['order_ship_lastname'],orders['order_ship_address1'],orders['order_ship_address2'],orders['order_ship_city'],orders['order_ship_state'],orders['order_ship_zip'],orders['order_ship_country'],orders['order_ship_phone'],orders['order_ship_email'],orders['order_bill_firstname'],orders['order_bill_lastname'],orders['order_bill_address1'],orders['order_bill_address2'],orders['order_bill_city'],orders['order_bill_state'],orders['order_bill_zip'],orders['order_bill_country'],orders['order_bill_phone'],orders['order_bill_email'],orders['order_gift_message'],orders['order_giftwrap'],orders['order_gift_charge'],orders['order_shipping'],orders['order_tax_charge'],orders['order_tax_shipping'],orders['order_tax_rate'],orders['order_shipping_charge'],orders['order_total'],orders['order_item_count'],orders['order_tracking'],orders['order_carrier']])
for item in items:
o.writerow([item['item_id'],item['item_date_shipped'],item['item_code'],item['item_quantity'],item['item_taxable'],item['item_unit_price'],item['item_shipping'],item['item_addcharge_price'],item['item_description'],item['item_quantity_returned'],item['item_quantity_shipped'],item['item_quantity_canceled'],item['item_status'],item['item_product_id'],item['item_product_kit_id'],item['item_product_sku'],item['item_product_barcode'],item['item_tracking'],item['item_carrier'],item['item_source_orderid']])
for fraud in frauds:
o.writerow([fraud['fraud_reason']],)
I also have not been able to figure out how not to use the labels I hope someone can help me with this
thanks in advance.
You may want to use csv.DictWriter:
# It's considered best to stash the main logic of your script
# in a main() function like this.
def main(filename, options):
with open(filename) as fi:
data = json.load(fi)
csv_file = '/tmp/' + str(options.orderId) + '.csv'
order = data['Order']
items = data['Items']
frauds = data['FraudReasons']
# Here's one way to keep this maintainable if the JSON
# format changes, and you don't care too much about the
# order of the fields...
orders_fields = sorted(orders.keys())
item_fields = sorted(items[0].keys()) if items else ()
fraud_fields = sorted(fraud[0].keys()) if fraud else ()
csv_options = dict(lineterminator=',')
with open(csv_file, 'w') as fo:
o = csv.DictWriter(fo, order_fields, **csv_options)
o.writeheader()
o.writerow(orders)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, item_fields, **csv_options)
o.writeheader()
o.writerows(items)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, fraud_fields, **csv_options)
o.writeheader()
o.writerows(frauds)
# If this script is run from the command line, just run
# main(). Here's the place to use `optparse`.
if __name__ == '__main__':
main(...) # You'll need to fill in the main() arguments...
If you need to specify the order of fields, assign them to a tuple like this:
orders_fields = (
'order_id',
'order_date',
'order_date_shipped',
# ... etc.
)
You should ask the json-generated object (data) for the names of the fields. To retain the input order, tell json to use collections.OrderedDict instead of plain dict (requires python 2.7):
import json
from collections import OrderedDict as ordereddict
data = json.loads(open('mydata.json', object_pairs_hook=ordereddict)
orders = data['Order']
print orders.keys() # Will print the keys in the order they were read
You can then use orders.keys() instead of your hard-coded list, either with writerow or (simpler) with csv.DictWriter.
Note that this uses the default json, not simplejson, and requires python 2.7 for the ordered_pairs_hook argument and the OrderedDict type.
Edit: Yeah, I see from the comments that you're stuck with 2.4. You can download an ordereddict from PyPi, and you can extend the JSONDecoder class and pass it with the cls argument (see here), instead of object_pairs_hook, but that's uglier and more work...

Categories

Resources