Python How to add nested fields to Yaml file - python

I need to modify a YAML file and add several fields.I am using the ruamel.yaml package.
First I load the YAML file:
data = yaml.load(file_name)
I can easily add new simple fields, like-
data['prop1'] = "value1"
The problem I face is that I need to add a nested dictionary incorporate with array:
prop2:
prop3:
- prop4:
prop5: "Some title"
prop6: "Some more data"
I tried to define-
record_to_add = dict(prop2 = dict(prop3 = ['prop4']))
This is working, but when I try to add beneath it prop5 it fails-
record_to_add = dict(prop2 = dict(prop3 = ['prop4'= dict(prop5 = "Value")]))
I get
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
What am I doing wrong?

The problem has little to do with ruamel.yaml. This:
['prop4'= dict(prop5 = "Value")]
is invalid Python as a list ([ ]) expects comma separated values. You would need to use something like:
record_to_add = dict(prop2 = dict(prop3 = dict(prop4= [dict(prop5 = "Some title"), dict(prop6='Some more data'),])))
As your program is incomplete I am not sure if you are using the old API or not. Make sure to use
import ruamel.yaml
yaml = ruamel.yaml.YAML()
and not
import ruamel.yaml as yaml

Its because of having ['prop4'= <> ].Instead record_to_add = dict(prop2 = dict(prop3 = [dict(prop4 = dict(prop5 = "Value"))])) should work.
Another alternate would be,
import yaml
data = {
"prop1": {
"prop3":
[{ "prop4":
{
"prop5": "some title",
"prop6": "some more data"
}
}]
}
}
with open(filename, 'w') as outfile:
yaml.dump(data, outfile, default_flow_style=False)

Related

A regex for CSV parsing? Python3 Re module

Is there a regex (Python re compatible) that I can use for parsing csv?
EDIT: I didn't realize there was a csv module in Python's standard library
Here's the regex: (?<!,\"\w)\s*,(?!\w\s*\",). It's python compatible and JavaScript compatible. Here's the full parsing script (as a python function):
def parseCSV(csvDoc, output_type="dict"):
from re import compile as c
from json import dumps
from numpy import array
# This is where all the parsing happens
"""
To parse csv files.
Arguments:
csvDoc - The csv document to parse.
output_type - the output type this
function will return
"""
csvparser = c('(?<!,\"\\w)\\s*,(?!\\w\\s*\",)')
lines = str(csvDoc).split('\n')
# All the lines are not empty
necessary_lines = [line for line in lines if line != ""]
All = array([csvparser.split(line) for line in necessary_lines])
if output_type.lower() in ("dict", "json"): # If you want JSON or dict
# All the python dict keys required (At the top of the file or top row)
top_line = list(All[0])
main_table = {} # The parsed data will be here
main_table[top_line[0]] = {
name[0]: {
thing: name[
# The 'actual value' counterpart
top_line.index(thing)
] for thing in top_line[1:] # The requirements
} for name in All[1:]
}
return dumps(main_table, skipkeys=True, ensure_ascii=False, indent=1)
elif output_type.lower() in ("list",
"numpy",
"array",
"matrix",
"np.array",
"np.ndarray",
"numpy.array",
"numpy.ndarray"):
return All
else:
# All the python dict keys required (At the top of the file or top row)
top_line = list(All[0])
main_table = {} # The parsed data will be here
main_table[top_line[0]] = {
name[0]: {
thing: name[
# The 'actual value' counterpart
top_line.index(thing)
] for thing in top_line[1:] # The requirements
} for name in All[1:]
}
return dumps(main_table, skipkeys=True, ensure_ascii=False, indent=1)
Dependancies: NumPy
All you need to do is chuck in the raw text of the csv file and then the function will return a json (or a 2-dimension list if you wish) in this format:
{"top-left-corner name":{
"foo":{"Item 1 left to foo":"Item 2 of the top row",
"Item 2 left to foo":"Item 3 of the top row",
...}
"bar":{...}
}
}
And here's an example of it:
CSV.csv
foo,bar,zbar
foo_row,foo1,,
barie,"2,000",,
and it outputs:
{
"foo": {
"foo_row": {
"bar": "foo1",
"zbar": ""
},
"barie": {
"bar": "\"2,000\"",
"zbar": ""
}
}
}
It should work if your csv file is formatted correctly (The ones I tested was made by apple's Numbers)

Dictionary from a String with particular structure

I am using python 3 to read this file and convert it to a dictionary.
I have this string from a file and I would like to know how could be possible to create a dictionary from it.
[User]
Date=10/26/2003
Time=09:01:01 AM
User=teodor
UserText=Max Cor
UserTextUnicode=392039n9dj90j32
[System]
Type=Absolute
Dnumber=QS236
Software=1.1.1.2
BuildNr=0923875
Source=LAM
Column=OWKD
[Build]
StageX=12345
Spotter=2
ApertureX=0.0098743
ApertureY=0.2431899
ShiftXYZ=-4.234809e-002
[Text]
Text=Here is the Text files
DataBaseNumber=The database number is 918723
..... (There are more than 1000 lines per file) ...
On the text I have "Name=Something" and then I would like to convert it as follows:
{'Date':'10/26/2003',
'Time':'09:01:01 AM'
'User':'teodor'
'UserText':'Max Cor'
'UserTextUnicode':'392039n9dj90j32'.......}
The word between [ ] can be removed, like [User], [System], [Build], [Text], etc...
In some fields there is only the first part of the string:
[Colors]
Red=
Blue=
Yellow=
DarkBlue=
What you have is an ordinary properties file. You can use this example to read the values into map:
try (InputStream input = new FileInputStream("your_file_path")) {
Properties prop = new Properties();
prop.load(input);
// prop.getProperty("User") == "teodor"
} catch (IOException ex) {
ex.printStackTrace();
}
EDIT:
For Python solution, refer to the answerred question.
You can use configparser to read .ini, or .properties files (format you have).
import configparser
config = configparser.ConfigParser()
config.read('your_file_path')
# config['User'] == {'Date': '10/26/2003', 'Time': '09:01:01 AM'...}
# config['User']['User'] == 'teodor'
# config['System'] == {'Type': 'Abosulte', ...}
Can easily be done in python. Assuming your file is named test.txt.
This will also work for lines with nothing after the = as well as lines with multiple =.
d = {}
with open('test.txt', 'r') as f:
for line in f:
line = line.strip() # Remove any space or newline characters
parts = line.split('=') # Split around the `=`
if len(parts) > 1:
d[parts[0]] = ''.join(parts[1:])
print(d)
Output:
{
"Date": "10/26/2003",
"Time": "09:01:01 AM",
"User": "teodor",
"UserText": "Max Cor",
"UserTextUnicode": "392039n9dj90j32",
"Type": "Absolute",
"Dnumber": "QS236",
"Software": "1.1.1.2",
"BuildNr": "0923875",
"Source": "LAM",
"Column": "OWKD",
"StageX": "12345",
"Spotter": "2",
"ApertureX": "0.0098743",
"ApertureY": "0.2431899",
"ShiftXYZ": "-4.234809e-002",
"Text": "Here is the Text files",
"DataBaseNumber": "The database number is 918723"
}
I would suggest to do some cleaning to get rid of the [] lines.
After that you can split those lines by the "=" separator and then convert it to a dictionary.

Editing a JSON file with new information in Python

I'm trying to edit a meta file with new information generated with a python script and don't want to just append the information with a new JSON object, but rather update the read information.
As input I have something like this:
{
"foo1": [
{
"bar1": 0,
"bar2": 1337
},
...
}
So far my code reads the information and stores it in a dictionary. After that the information in this file is deleted and replaced with the updated dictionary. The Code is as shown below:
...
outputData = {"foo2": [{"bar3": True, "bar4": 123}]}
with open(metaFile, 'r+') as f:
metaData = json.load(f)
f.seek(0)
f.truncate()
metaData.update(outputData)
f.write(json.dumps(metaData, indent=2))
f.close()
...
As a result this comes out as expected:
{
"foo1": [
{
"bar1": 0,
"bar2": 1337
}
],
"foo2":[
{
"bar3": true,
"bar4": 123
}
]
}
Now to my exact question, is it possible to edit the file in such a way, that the content in the file doesn't get deleted at first and written again? Because if something happens with the metaData after the initialization, the information is just gone.
Changing the 'r+' argument to 'w+' (+ is optional) will create a new file instead reading from it first and the whole data is gone at this point. With 'a' the outputData cannot be updated and then added, because it would rewrite the already given information. Without updating the metaData it will just create a new object and that's not what I had in mind.
In your case, if you are sure the file size after your changes will always be equal or bigger in size then what's currently in the file, you can call f.write(data) directly.
This way, you don't have to truncate (and lose) the file before writing it.
Also, when you open a file using the with syntax, it will be automatically closed once the with block ends.
In the end you code would look something like this:
outputData = {"foo2": [{"bar3": True, "bar4": 123}]}
with open(metaFile, 'r+') as f:
metaData = json.load(f)
f.seek(0)
metaData.update(outputData)
f.write(json.dumps(metaData, indent=2))
# rest of your code with the normal identation level
You can code as follows, or use MongoDB alternatively.
outputData = {"foo2": [{"bar3": True, "bar4": 123}]}
with open(metaFile, 'r+') as fp:
origin = fp.read()
target = json.dumps(dict(json.loads(origin), **outputData), indent=2)
index = [i for i, (a, b) in enumerate(zip(origin, target)) if a != b][0]
fp.seek(index)
fp.truncate()
fp.write(target[index:])
fp.close()

python - extract a specific key / value from json file by a variable

a bit new to python and json.
i have this json file:
{ "hosts": {
"example1.lab.com" : ["mysql", "apache"],
"example2.lab.com" : ["sqlite", "nmap"],
"example3.lab.com" : ["vim", "bind9"]
}
}
what i want to do is use the hostname variable and extract the values of each hostname.
its a bit hard to explain but im using saltstack, which already iterates over hosts and i want it to be able to extract each host's values from the json file using the hostname variable.
hope im understood.
thanks
o.
You could do something along these lines:
import json
j='''{ "hosts": {
"example1.lab.com" : ["mysql", "apache"],
"example2.lab.com" : ["sqlite", "nmap"],
"example3.lab.com" : ["vim", "bind9"]
}
}'''
specific_key='example2'
found=False
for key,di in json.loads(j).iteritems(): # items on Py 3k
for k,v in di.items():
if k.startswith(specific_key):
found=True
print k,v
break
if found:
break
Or, you could do:
def pairs(args):
for arg in args:
if arg[0].startswith(specific_key):
k,v=arg
print k,v
json.loads(j,object_pairs_hook=pairs)
Either case, prints:
example2.lab.com [u'sqlite', u'nmap']
If you have the JSON in a string then just use Python's json.loads() function to load JSON parse the JSON and load its contents into your namespace by binding it to some local name
Example:
#!/bin/env python
import json
some_json = '''{ "hosts": {
"example1.lab.com" : ["mysql", "apache"],
"example2.lab.com" : ["sqlite", "nmap"],
"example3.lab.com" : ["vim", "bind9"]
}
}'''
some_stuff = json.loads(some_json)
print some_stuff['hosts'].keys()
---> [u'example1.lab.com', u'example3.lab.com', u'example2.lab.com']
As shown you then access the contents of some_stuff just as you would any other Python dictionary ... all the top level variable declaration/assignments which were serialized (encoded) in the JSON will be keys in that dictionary.
If the JSON contents are in a file you can open it like any other file in Python and pass the file object's name to the json.load() function:
#!/bin/python
import json
with open("some_file.json") as f:
some_stuff = json.load(f)
print ' '.join(some_stuff.keys())
If the above json file is stored as 'samplefile.json', you can write following in python:
import json
f = open('samplefile.json')
data = json.load(f)
value1 = data['hosts']['example1.lab.com']
value2 = data['hosts']['example2.lab.com']
value3 = data['hosts']['example3.lab.com']

python json to csv converting script?

Let me start by stating that I am new to python. I wrote a script that will convert a .json file to csv format. I managed to write a script to do the job, however I don't think that my script will work if the format of the json file was to change. My script assumes that the json file will be in the same format at all times.
<json file example>
{
"Order":
{
"order_id":"8251662",
"order_date":"2012-08-20 13:17:37",
"order_date_shipped":"0000-00-00 00:00:00",
"order_status":"fraudreview",
"order_ship_firstname":"pam",
"order_ship_lastname":"Gregorio",
"order_ship_address1":"1533 E. Dexter St",
"order_ship_address2":"",
"order_ship_city":"Covina",
"order_ship_state":"CA",
"order_ship_zip":"91746",
"order_ship_country":"US United States",
"order_ship_phone":"6268936923",
"order_ship_email":"pgregorio#brighton.com",
"order_bill_firstname":"pam",
"order_bill_lastname":"Gregorio",
"order_bill_address1":"1533 E. Dexter St",
"order_bill_address2":"",
"order_bill_city":"Covina",
"order_bill_state":"CA",
"order_bill_zip":"91746",
"order_bill_country":"US United States",
"order_bill_phone":"6268936923",
"order_bill_email":"pgregorio#brighton.com",
"order_gift_message":"",
"order_giftwrap":"0",
"order_gift_charge":"0",
"order_shipping":"Standard (Within 5-10 Business Days)",
"order_tax_charge":"62.83",
"order_tax_shipping":"0",
"order_tax_rate":"0.0875",
"order_shipping_charge":"7.5",
"order_total":"788.33",
"order_item_count":"12",
"order_tracking":"",
"order_carrier":"1"
},
"Items":
[
{
"item_id":"25379",
"item_date_shipped":"",
"item_code":"17345-J3553-J35532",
"item_quantity":"2","item_taxable":"YES",
"item_unit_price":"32","item_shipping":"0.67",
"item_addcharge_price":"0",
"item_description":" ABC Slide Bracelet: : Size: OS: Silver Sku: J35532",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17345",
"item_product_kit_id":"0",
"item_product_sku":"J35532",
"item_product_barcode":"881934310775",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25382",
"item_date_shipped":"",
"item_code":"17608-J3809-J3809C",
"item_quantity":"1",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.23",
"item_addcharge_price":"0",
"item_description":" \"ABC Starter Bracelet 7 1\/4\"\"\": : Size: OS: Silver Sku: J3809C",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17608",
"item_product_kit_id":"0",
"item_product_sku":"J3809C",
"item_product_barcode":"881934594175",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25385",
"item_date_shipped":"",
"item_code":"17687-J9200-J92000",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"12",
"item_shipping":"0.25",
"item_addcharge_price":"0",
"item_description":" ABC Cathedral Bead: : Size: OS: Silver Sku: J92000",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17687",
"item_product_kit_id":"0",
"item_product_sku":"J92000",
"item_product_barcode":"881934602832",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
{
"item_id":"25388",
"item_date_shipped":"",
"item_code":"17766-J9240-J92402",
"item_quantity":"2",
"item_taxable":"YES",
"item_unit_price":"22",
"item_shipping":"0.46",
"item_addcharge_price":"0",
"item_description":" ABC Ice Diva Bead: : Size: OS: Silver Sku: J92402",
"item_quantity_returned":"0",
"item_quantity_shipped":"0",
"item_quantity_canceled":"0",
"item_status":"pending",
"item_product_id":"17766",
"item_product_kit_id":"0",
"item_product_sku":"J92402",
"item_product_barcode":"881934655838",
"item_tracking":"",
"item_carrier":"0",
"item_source_orderid":""
},
],
"FraudReasons":
[
{
"order_id":"11957",
"fraud_reason":"order total exceeds max amount"
},
{
"order_id":"11957",
"fraud_reason":"order exceeds max item count"
}
]
}
My script currently works fine with this json file but It wont work if there is only one item or one fraudreason. Here is the code to my script.
<script code>
#!/usr/bin/python
import simplejson as json
import optparse
import pycurl
import sys
import csv
json_data = open(file)
data = json.load(json_data)
json_data.close()
csv_file = '/tmp/' + str(options.orderId) + '.csv'
orders = data['Order']
items = data['Items']
frauds = data['FraudReasons']
o = csv.writer(open(csv_file, 'w'), lineterminator=',')
o.writerow([orders['order_id'],orders['order_date'],orders['order_date_shipped'],orders['order_status'],orders['order_ship_firstname'],orders['order_ship_lastname'],orders['order_ship_address1'],orders['order_ship_address2'],orders['order_ship_city'],orders['order_ship_state'],orders['order_ship_zip'],orders['order_ship_country'],orders['order_ship_phone'],orders['order_ship_email'],orders['order_bill_firstname'],orders['order_bill_lastname'],orders['order_bill_address1'],orders['order_bill_address2'],orders['order_bill_city'],orders['order_bill_state'],orders['order_bill_zip'],orders['order_bill_country'],orders['order_bill_phone'],orders['order_bill_email'],orders['order_gift_message'],orders['order_giftwrap'],orders['order_gift_charge'],orders['order_shipping'],orders['order_tax_charge'],orders['order_tax_shipping'],orders['order_tax_rate'],orders['order_shipping_charge'],orders['order_total'],orders['order_item_count'],orders['order_tracking'],orders['order_carrier']])
for item in items:
o.writerow([item['item_id'],item['item_date_shipped'],item['item_code'],item['item_quantity'],item['item_taxable'],item['item_unit_price'],item['item_shipping'],item['item_addcharge_price'],item['item_description'],item['item_quantity_returned'],item['item_quantity_shipped'],item['item_quantity_canceled'],item['item_status'],item['item_product_id'],item['item_product_kit_id'],item['item_product_sku'],item['item_product_barcode'],item['item_tracking'],item['item_carrier'],item['item_source_orderid']])
for fraud in frauds:
o.writerow([fraud['fraud_reason']],)
I also have not been able to figure out how not to use the labels I hope someone can help me with this
thanks in advance.
You may want to use csv.DictWriter:
# It's considered best to stash the main logic of your script
# in a main() function like this.
def main(filename, options):
with open(filename) as fi:
data = json.load(fi)
csv_file = '/tmp/' + str(options.orderId) + '.csv'
order = data['Order']
items = data['Items']
frauds = data['FraudReasons']
# Here's one way to keep this maintainable if the JSON
# format changes, and you don't care too much about the
# order of the fields...
orders_fields = sorted(orders.keys())
item_fields = sorted(items[0].keys()) if items else ()
fraud_fields = sorted(fraud[0].keys()) if fraud else ()
csv_options = dict(lineterminator=',')
with open(csv_file, 'w') as fo:
o = csv.DictWriter(fo, order_fields, **csv_options)
o.writeheader()
o.writerow(orders)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, item_fields, **csv_options)
o.writeheader()
o.writerows(items)
fo.write('\n') # Optional, if you want to keep them separated.
o = csv.DictWriter(fo, fraud_fields, **csv_options)
o.writeheader()
o.writerows(frauds)
# If this script is run from the command line, just run
# main(). Here's the place to use `optparse`.
if __name__ == '__main__':
main(...) # You'll need to fill in the main() arguments...
If you need to specify the order of fields, assign them to a tuple like this:
orders_fields = (
'order_id',
'order_date',
'order_date_shipped',
# ... etc.
)
You should ask the json-generated object (data) for the names of the fields. To retain the input order, tell json to use collections.OrderedDict instead of plain dict (requires python 2.7):
import json
from collections import OrderedDict as ordereddict
data = json.loads(open('mydata.json', object_pairs_hook=ordereddict)
orders = data['Order']
print orders.keys() # Will print the keys in the order they were read
You can then use orders.keys() instead of your hard-coded list, either with writerow or (simpler) with csv.DictWriter.
Note that this uses the default json, not simplejson, and requires python 2.7 for the ordered_pairs_hook argument and the OrderedDict type.
Edit: Yeah, I see from the comments that you're stuck with 2.4. You can download an ordereddict from PyPi, and you can extend the JSONDecoder class and pass it with the cls argument (see here), instead of object_pairs_hook, but that's uglier and more work...

Categories

Resources