Add "entry" to JSON File with Python - python

I need to modify a JSON-File with python. As I'm working with python (and JSON) for the first time, I read some articles about it, but didn't understand it completely.
I managed to import a JSON to python, as some kind of array (or list?).
JSON looks like this:
{
"sources":[{
"id":100012630,
"name":"Activity Login Page",
"category":"NAM/Activity",
"automaticDateParsing":true,
"multilineProcessingEnabled":false,
"useAutolineMatching":false,
"forceTimeZone":true,
"timeZone":"Europe/Brussels",
"filters":[],
"cutoffTimestamp":1414364400000,
"encoding":"UTF-8",
"pathExpression":"C:\\NamLogs\\nam-login-page.log*",
"blacklist":[],
"sourceType":"LocalFile",
"alive":true
},{
"id":100001824,
"name":"localWinEvent",
"category":"NAM/OS/EventLog",
"automaticDateParsing":true,
"multilineProcessingEnabled":false,
"useAutolineMatching":false,
"forceTimeZone":false,
"filters":[],
"cutoffTimestamp":1409090400000,
"encoding":"UTF-8",
"logNames":["Security","Application","System","Others"],
"sourceType":"LocalWindowsEventLog",
"alive":true
},{
"id":100001830,
"name":"localWinPerf",
"category":"NAM/OS/Perf",
"automaticDateParsing":false,
"multilineProcessingEnabled":false,
"useAutolineMatching":false,
"forceTimeZone":false,
"filters":[],
"cutoffTimestamp":0,
"encoding":"UTF-8",
"interval":60000,
"wmiQueries":[{
"name":"NAMID Service",
"query":"SELECT * FROM Win32_PerfRawData_PerfProc_Process WHERE Name = 'tomcat7'"
},{
"name":"CPU",
"query":"select * from Win32_PerfFormattedData_PerfOS_Processor"
},{
"name":"Logical Disk",
"query":"select * from Win32_PerfFormattedData_PerfDisk_LogicalDisk"
},{
"name":"Physical Disk",
"query":"select * from Win32_PerfFormattedData_PerfDisk_PhysicalDisk"
},{
"name":"Memory",
"query":"select * from Win32_PerfFormattedData_PerfOS_Memory"
},{
"name":"Network",
"query":"select * from Win32_PerfFormattedData_Tcpip_NetworkInterface"
}],
"sourceType":"LocalWindowsPerfMon",
"alive":true
},
Now, as I got hundreds of files like those, I wrote a foreach through the whole directory:
for filename in os.listdir('./json/'):
with open('./json/'+filename) as data_file:
sources = json.load(data_file)
Now I would need something like again a foreach source in sources, which adds a row (or a entry or whatever a "line" in JSON is called) to every source (something like collectorName=fileName) and then overwrite the old file with the new one.
The JSON would then look like this:
{
"sources":[{
"id":100012630,
"name":"Activity Login Page",
"category":"NAM/Activity",
"automaticDateParsing":true,
"multilineProcessingEnabled":false,
"useAutolineMatching":false,
"forceTimeZone":true,
"timeZone":"Europe/Brussels",
"filters":[],
"cutoffTimestamp":1414364400000,
"encoding":"UTF-8",
"pathExpression":"C:\\NamLogs\\nam-login-page.log*",
"blacklist":[],
"sourceType":"LocalFile",
"alive":true,
"collectorName":"Collector2910"
},{
"id":100001824,
"name":"localWinEvent",
"category":"NAM/OS/EventLog",
"automaticDateParsing":true,
"multilineProcessingEnabled":false,
"useAutolineMatching":false,
"forceTimeZone":false,
"filters":[],
"cutoffTimestamp":1409090400000,
"encoding":"UTF-8",
"logNames":["Security","Application","System","Others"],
"sourceType":"LocalWindowsEventLog",
"alive":true,
"collectorName":"Collector2910"
},{.....
I hope I could explain my issue and I'd be happy if someone could help me out (even with a totally different solution).
Thanks in advance
Michael

Here's one way to do it:
for filename in os.listdir('./json/'):
sources = None
with open('./json/'+filename) as data_file:
sources = json.load(data_file)
sourcelist = sources['sources']
for i, s in enumerate(sourcelist):
sources['sources'][i]['collectorName'] = 'Collector' + str(i)
with open('./json/'+filename, 'w') as data_file:
data_file.write(json.dumps(sources))

for filename in os.listdir('./json/'):
with open('./json/'+filename) as data_file:
datadict = json.load(data_file)
# At this point you have a plain python dict.
# This dict has a 'sources' key, pointing to
# a list of dicts. What you want is to add
# a 'collectorName': filename key:value pair
# to each of these dicts
for record in datadict["sources"]:
record["collectorName"] = filename
# now you just have to serialize your datadict back
# to json and write it back to the file - which is
# in fact a single operation
with open('./json/'+filename, "w") as data_file:
json.dump(datadict, data_file)

Related

Open a JSON files and edit structure

I have produced a couple of json files after scraping a few elements. The structure for each file is as follows:
us.json
{'Pres': 'Biden', 'Vice': 'Harris', 'Secretary': 'Blinken'}
uk.json
{'1st Min': 'Johnson', 'Queen':'Elizabeth', 'Prince': 'Charles'}
I'd like to know how I could edit the structure of each dictionary inside the json file to get an output as it follows:
[
{"title": "Pres",
"name": "Biden"}
,
{"title": "Vice",
"name": "Harris"}
,
{"title": "Secretary",
"name": "Blinken"}
]
As far as I am able to think how to do it (I'm a beginner, studying only since a few weeks) I need first to run a loop to open each file, then I should generate a list of dictionaries and finally modify the dictionary to change the structure. This is what I got NOT WORKING as it overrides always with the same keys.
import os
import json
list_of_dicts = []
for filename in os.listdir("DOCS/Countries Data"):
with open(os.path.join("DOCS/Countries Data", filename), 'r', encoding='utf-8') as f:
text = f.read()
country_json = json.loads(text)
list_of_dicts.append(country_json)
for country in list_of_dicts:
newdict = country
lastdict = {}
for key in newdict:
lastdict = {'Title': key}
for value in newdict.values():
lastdict['Name'] = value
print(lastdict)
Extra bonus if you could also show me how to generate an ID mumber for each entry. Thank you very much
This look like task for list comprehension, I would do it following way
import json
us = '{"Pres": "Biden", "Vice": "Harris", "Secretary": "Blinken"}'
data = json.loads(us)
us2 = [{"title":k,"name":v} for k,v in data.items()]
us2json = json.dumps(us2)
print(us2json)
output
[{"title": "Pres", "name": "Biden"}, {"title": "Vice", "name": "Harris"}, {"title": "Secretary", "name": "Blinken"}]
data is dict, .items() provide key-value pairs, which I unpack into k and v (see tuple unpacking).
You can do this easily by writing a simple function like below
import uuid
def format_dict(data: dict):
return [dict(title=title, name=name, id=str(uuid.uuid4())) for title, name in data.items()]
where you can split the items as different objects and add a identifier for each using uuid.
Full code can be modified like this
import uuid
import os
import json
def format_dict(data: dict):
return [dict(title=title, name=name, id=str(uuid.uuid4())) for title, name in data.items()]
list_of_dicts = []
for filename in os.listdir("DOCS/Countries Data"):
with open(os.path.join("DOCS/Countries Data", filename), 'r', encoding='utf-8') as f:
country_json = json.load(f)
list_of_dicts.append(format_dict(country_json))
# list_of_dicts contains all file contents

Python use .json as config file with class instances

I'm creating an application that stores it's config in a dictionary. I know I can write this to a JSON file and read this every time the app starts. But the problem is that this dictionary also contains objects. Like so(LED is an imported module with the classes APALedstrip and Arduino)
rooms['livingroom'] = {
"room":data.room(name = 'livingroom',dataKeys = dataKeys),
"lights":{
"LedStrip":LED.APALedstrip(name = 'livingroom',
room = 'livingroom')
}
}
rooms['bed'] = {
"room":data.room(name = 'bed', dataKeys = dataKeys),
"lights":{
"LedStrip":LED.Arduino(name ='bed',
serialPort = 'ttyUSB0',
room = 'livingroom',
master = {'room':'livingroom', 'light':'LedStrip'},
roomSensors = 'livingroom')
}
}
I'm curious is it also possible to store this in an JSON file like so? And when it's imported into a dictionary that the objects are still created?
You need to serialize your objects. One way is to use "pickle".
Pickle convert an object to bytes, so the next step is to convert them to string using base64.
I choose base64 because it's safe for non-ASCII characters
In order to automatically save and retrieve the rooms use
save_rooms() and retrieve_rooms()
import codecs
import json
import pickle
def save_rooms(rooms):
for room in rooms:
# find all LedStrip objects
if 'lights' in rooms[room] and 'LedStrip' in rooms[room]['lights']:
lights = rooms[room]['lights']['LedStrip']
# encode object to bytes with pickle and then to string with base64
rooms[room]['lights']['LedStrip'] = codecs.encode(pickle.dumps(lights),
"base64").decode()
with open("rooms.json", "w") as f:
json.dump(rooms, f)
def retrieve_rooms():
with open("rooms.json") as f:
rooms = json.load(f)
for room in rooms:
# find all LedStrip objects
if 'lights' in rooms[room] and 'LedStrip' in rooms[room]['lights']:
lights = rooms[room]['lights']['LedStrip']
# decode from string to bytes with base64 and then from bytes to object with pickle
rooms[room]['lights']['LedStrip'] = pickle.loads(codecs.decode(lights.encode(), "base64"))
return rooms
rooms = {}
rooms['livingroom'] = {
"room": data.room(name='livingroom', dataKeys=dataKeys),
"lights": {
"LedStrip": LED.APALedstrip(name='livingroom',
room='livingroom')
}
}
rooms['bed'] = {
"room": data.room(name='bed', dataKeys=dataKeys),
"lights": {
"LedStrip": LED.Arduino(name='bed',
serialPort='ttyUSB0',
room='livingroom',
master={'room': 'livingroom', 'light': 'LedStrip'},
roomSensors='livingroom')
}
}
save_rooms(rooms)
loaded_rooms = retrieve_rooms()
In addition I implemented the logic so you can save any variation of rooms as long as you keep the structure the same.
ex.
rooms['kitchen'] = {
"room": data.room(name='kitchen', dataKeys=dataKeys),
"lights": {
"LedStrip": LED.APALedstrip(name='kitchen',
room='kitchen')
}
}

Dictionary from a String with particular structure

I am using python 3 to read this file and convert it to a dictionary.
I have this string from a file and I would like to know how could be possible to create a dictionary from it.
[User]
Date=10/26/2003
Time=09:01:01 AM
User=teodor
UserText=Max Cor
UserTextUnicode=392039n9dj90j32
[System]
Type=Absolute
Dnumber=QS236
Software=1.1.1.2
BuildNr=0923875
Source=LAM
Column=OWKD
[Build]
StageX=12345
Spotter=2
ApertureX=0.0098743
ApertureY=0.2431899
ShiftXYZ=-4.234809e-002
[Text]
Text=Here is the Text files
DataBaseNumber=The database number is 918723
..... (There are more than 1000 lines per file) ...
On the text I have "Name=Something" and then I would like to convert it as follows:
{'Date':'10/26/2003',
'Time':'09:01:01 AM'
'User':'teodor'
'UserText':'Max Cor'
'UserTextUnicode':'392039n9dj90j32'.......}
The word between [ ] can be removed, like [User], [System], [Build], [Text], etc...
In some fields there is only the first part of the string:
[Colors]
Red=
Blue=
Yellow=
DarkBlue=
What you have is an ordinary properties file. You can use this example to read the values into map:
try (InputStream input = new FileInputStream("your_file_path")) {
Properties prop = new Properties();
prop.load(input);
// prop.getProperty("User") == "teodor"
} catch (IOException ex) {
ex.printStackTrace();
}
EDIT:
For Python solution, refer to the answerred question.
You can use configparser to read .ini, or .properties files (format you have).
import configparser
config = configparser.ConfigParser()
config.read('your_file_path')
# config['User'] == {'Date': '10/26/2003', 'Time': '09:01:01 AM'...}
# config['User']['User'] == 'teodor'
# config['System'] == {'Type': 'Abosulte', ...}
Can easily be done in python. Assuming your file is named test.txt.
This will also work for lines with nothing after the = as well as lines with multiple =.
d = {}
with open('test.txt', 'r') as f:
for line in f:
line = line.strip() # Remove any space or newline characters
parts = line.split('=') # Split around the `=`
if len(parts) > 1:
d[parts[0]] = ''.join(parts[1:])
print(d)
Output:
{
"Date": "10/26/2003",
"Time": "09:01:01 AM",
"User": "teodor",
"UserText": "Max Cor",
"UserTextUnicode": "392039n9dj90j32",
"Type": "Absolute",
"Dnumber": "QS236",
"Software": "1.1.1.2",
"BuildNr": "0923875",
"Source": "LAM",
"Column": "OWKD",
"StageX": "12345",
"Spotter": "2",
"ApertureX": "0.0098743",
"ApertureY": "0.2431899",
"ShiftXYZ": "-4.234809e-002",
"Text": "Here is the Text files",
"DataBaseNumber": "The database number is 918723"
}
I would suggest to do some cleaning to get rid of the [] lines.
After that you can split those lines by the "=" separator and then convert it to a dictionary.

how to extract data from json. other answers here did not work for me

i want to extract Mac_Address from this json in python. anyone can help ?
{"list":
[{
"Group_Devices_id":3,
"User_Id":19,
"Mac_Address":" fe80::f17a:4a64:7192:ed68%2 ",
"Master_Device":"T"
}],
"success":true}
- Import json
- data = json.loads('{"list": [{ "Group_Devices_id":3, "User_Id":19,
"Mac_Address":" fe80::f17a:4a64:7192:ed68%2 ", "Master_Device":"T"
}], "success":true}')
- data['list'][0]['Mac_Address']
Desc : Import json libraray, then load your data. then access any elements.
import json
data = json.loads('{"list": [{ "Group_Devices_id":3, "User_Id":19,
"Mac_Address":" fe80::f17a:4a64:7192:ed68%2 ", "Master_Device":"T"
}], "success":true}'
for item in data["list"]:
print item["Mac_Address"]
1) Load data
2) Assume you need to get Mac_Address from lots of data of list,
then use "for...in.." to process

python json to csv converting script?

Let me start by stating that I am new to python. I wrote a script that will convert a .json file to csv format. I managed to write a script to do the job, however I don't think that my script will work if the format of the json file was to change. My script assumes that the json file will be in the same format at all times.
<json file example>
{
"Order":
{
"order_id":"8251662",
"order_date":"2012-08-20 13:17:37",
"order_date_shipped":"0000-00-00 00:00:00",
"order_status":"fraudreview",
"order_ship_firstname":"pam",
"order_ship_lastname":"Gregorio",
"order_ship_address1":"1533 E. Dexter St",
"order_ship_address2":"",
"order_ship_city":"Covina",
"order_ship_state":"CA",
"order_ship_zip":"91746",
"order_ship_country":"US United States",
"order_ship_phone":"6268936923",
"order_ship_email":"pgregorio#brighton.com",
"order_bill_firstname":"pam",
"order_bill_lastname":"Gregorio",
"order_bill_address1":"1533 E. Dexter St",
"order_bill_address2":"",
"order_bill_city":"Covina",
"order_bill_state":"CA",
"order_bill_zip":"91746",
"order_bill_country":"US United States",
"order_bill_phone":"6268936923",
"order_bill_email":"pgregorio#brighton.com",
"order_gift_message":"",
"order_giftwrap":"0",
"order_gift_charge":"0",
"order_shipping":"Standard (Within 5-10 Business Days)",
"order_tax_charge":"62.83",
"order_tax_shipping":"0",
"order_tax_rate":"0.0875",
"order_shipping_charge":"7.5",
"order_total":"788.33",
"order_item_count":"12",
"order_tracking":"",
"order_carrier":"1"
},
"Items":
[
{
"item_id":"25379",
"item_date_shipped":"",
"item_code":"17345-J3553-J35532",
"item_quantity":"2","item_taxable":"YES",
"item_unit_price":"32","item_shipping":"0.67",
"item_addcharge_price":"0",
"item_description":" ABC Slide Bracelet: : Size: OS: Silver Sku: J35532",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17345",
"item_product_kit_id":"0",
"item_product_sku":"J35532",
"item_product_barcode":"881934310775",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25382",
"item_date_shipped":"",
"item_code":"17608-J3809-J3809C",
"item_quantity":"1",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.23",
"item_addcharge_price":"0",
"item_description":" \"ABC Starter Bracelet 7 1\/4\"\"\": : Size: OS: Silver Sku: J3809C",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17608",
"item_product_kit_id":"0",
"item_product_sku":"J3809C",
"item_product_barcode":"881934594175",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25385",
"item_date_shipped":"",
"item_code":"17687-J9200-J92000",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"12",
"item_shipping":"0.25",
"item_addcharge_price":"0",
"item_description":" ABC Cathedral Bead: : Size: OS: Silver Sku: J92000",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17687",
"item_product_kit_id":"0",
"item_product_sku":"J92000",
"item_product_barcode":"881934602832",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25388",
"item_date_shipped":"",
"item_code":"17766-J9240-J92402",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.46",
"item_addcharge_price":"0",
"item_description":" ABC Ice Diva Bead: : Size: OS: Silver Sku: J92402",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17766",
"item_product_kit_id":"0",
"item_product_sku":"J92402",
"item_product_barcode":"881934655838",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
],
"FraudReasons":
[
{
"order_id":"11957",
"fraud_reason":"order total exceeds max amount"
},
{
"order_id":"11957",
"fraud_reason":"order exceeds max item count"
}
]
}
My script currently works fine with this json file but It wont work if there is only one item or one fraudreason. Here is the code to my script.
<script code>
#!/usr/bin/python
import simplejson as json
import optparse
import pycurl
import sys
import csv
json_data = open(file)
data = json.load(json_data)
json_data.close()
csv_file = '/tmp/' + str(options.orderId) + '.csv'
orders = data['Order']
items = data['Items']
frauds = data['FraudReasons']
o = csv.writer(open(csv_file, 'w'), lineterminator=',')
o.writerow([orders['order_id'],orders['order_date'],orders['order_date_shipped'],orders['order_status'],orders['order_ship_firstname'],orders['order_ship_lastname'],orders['order_ship_address1'],orders['order_ship_address2'],orders['order_ship_city'],orders['order_ship_state'],orders['order_ship_zip'],orders['order_ship_country'],orders['order_ship_phone'],orders['order_ship_email'],orders['order_bill_firstname'],orders['order_bill_lastname'],orders['order_bill_address1'],orders['order_bill_address2'],orders['order_bill_city'],orders['order_bill_state'],orders['order_bill_zip'],orders['order_bill_country'],orders['order_bill_phone'],orders['order_bill_email'],orders['order_gift_message'],orders['order_giftwrap'],orders['order_gift_charge'],orders['order_shipping'],orders['order_tax_charge'],orders['order_tax_shipping'],orders['order_tax_rate'],orders['order_shipping_charge'],orders['order_total'],orders['order_item_count'],orders['order_tracking'],orders['order_carrier']])
for item in items:
o.writerow([item['item_id'],item['item_date_shipped'],item['item_code'],item['item_quantity'],item['item_taxable'],item['item_unit_price'],item['item_shipping'],item['item_addcharge_price'],item['item_description'],item['item_quantity_returned'],item['item_quantity_shipped'],item['item_quantity_canceled'],item['item_status'],item['item_product_id'],item['item_product_kit_id'],item['item_product_sku'],item['item_product_barcode'],item['item_tracking'],item['item_carrier'],item['item_source_orderid']])
for fraud in frauds:
o.writerow([fraud['fraud_reason']],)
I also have not been able to figure out how not to use the labels I hope someone can help me with this
thanks in advance.
You may want to use csv.DictWriter:
# It's considered best to stash the main logic of your script
# in a main() function like this.
def main(filename, options):
with open(filename) as fi:
data = json.load(fi)
csv_file = '/tmp/' + str(options.orderId) + '.csv'
order = data['Order']
items = data['Items']
frauds = data['FraudReasons']
# Here's one way to keep this maintainable if the JSON
# format changes, and you don't care too much about the
# order of the fields...
orders_fields = sorted(orders.keys())
item_fields = sorted(items[0].keys()) if items else ()
fraud_fields = sorted(fraud[0].keys()) if fraud else ()
csv_options = dict(lineterminator=',')
with open(csv_file, 'w') as fo:
o = csv.DictWriter(fo, order_fields, **csv_options)
o.writeheader()
o.writerow(orders)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, item_fields, **csv_options)
o.writeheader()
o.writerows(items)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, fraud_fields, **csv_options)
o.writeheader()
o.writerows(frauds)
# If this script is run from the command line, just run
# main(). Here's the place to use `optparse`.
if __name__ == '__main__':
main(...) # You'll need to fill in the main() arguments...
If you need to specify the order of fields, assign them to a tuple like this:
orders_fields = (
'order_id',
'order_date',
'order_date_shipped',
# ... etc.
)
You should ask the json-generated object (data) for the names of the fields. To retain the input order, tell json to use collections.OrderedDict instead of plain dict (requires python 2.7):
import json
from collections import OrderedDict as ordereddict
data = json.loads(open('mydata.json', object_pairs_hook=ordereddict)
orders = data['Order']
print orders.keys() # Will print the keys in the order they were read
You can then use orders.keys() instead of your hard-coded list, either with writerow or (simpler) with csv.DictWriter.
Note that this uses the default json, not simplejson, and requires python 2.7 for the ordered_pairs_hook argument and the OrderedDict type.
Edit: Yeah, I see from the comments that you're stuck with 2.4. You can download an ordereddict from PyPi, and you can extend the JSONDecoder class and pass it with the cls argument (see here), instead of object_pairs_hook, but that's uglier and more work...

Categories

Resources