How to iterate through block devices using python - python

I have around 10 EBS volumes attached to a single instance. Below is e.g., of lsblk for some of them. Here we can't simply mount xvdf or xvdp to some location but actual point is xvdf1, xvdf2, xvdp which are to be mounted. I want to have a script that would allow me to iterate through all the points under xvdf, xvdp etc. using python. I m newbie to python.
[root#ip-172-31-1-65 ec2-user]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvdf 202:80 0 35G 0 disk
├─xvdf1 202:81 0 350M 0 part
└─xvdf2 202:82 0 34.7G 0 part
xvdp 202:0 0 8G 0 disk
└─xvdp1 202:1 0 8G 0 part

If you have a relatively new lsblk, you can easily import its json output into a python dictionary, which then open all possibilities for iterations.
# lsblk --version
lsblk from util-linux 2.28.2
For example, you could run the following command to gather all block devices and their children with their name and mount point. Use --help to get a list of all supported columns.
# lsblk --json -o NAME,MOUNTPOINT
{
"blockdevices": [
{"name": "vda", "mountpoint": null,
"children": [
{"name": "vda1", "mountpoint": null,
"children": [
{"name": "pv-root", "mountpoint": "/"},
{"name": "pv-var", "mountpoint": "/var"},
{"name": "pv-swap", "mountpoint": "[SWAP]"},
]
},
]
}
]
}
So you just have to pipe that output into a file and use python's json parser. Or run the command straight within your script as the example below shows:
#!/usr/bin/python3.7
import json
import subprocess
process = subprocess.run("/usr/bin/lsblk --json -o NAME,MOUNTPOINT".split(),
capture_output=True, text=True)
# blockdevices is a dictionary with all the info from lsblk.
# Manipulate it as you wish.
blockdevices = json.loads(process.stdout)
print(json.dumps(blockdevices, indent=4))

#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
def parse(file_name):
result = []
with open(file_name) as input_file:
for line in input_file:
temp_arr = line.split(' ')
for item in temp_arr:
if '└─' in item or '├─' in item:
result.append(item.replace('└─','').replace('├─',''))
return result
def main(argv):
if len(argv)>1:
print 'Usage: ./parse.py input_file'
return
result = parse(argv[0])
print result
if __name__ == "__main__":
main(sys.argv[1:])
The above is what you need. You can modify it to parse the output of lsblk better.
Usage:
1. Save the output of lsblk to a file.
E.g. run this command: lsblk > output.txt
2. python parse.py output.txt

I remixed minhhn2910's answer for my own purposes to work with encrypted partitions, labels and build the output in a tree-like dict object. I'll probably keep a more updated version as I hit edge-cases on GitHub, but here is the basic code:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import re
import pprint
def parse_blk(blk_filename):
result = []
with open(blk_filename) as blk_file:
disks = []
for line in blk_file:
if line.startswith('NAME'): # skip first line
continue
blk_list = re.split('\s+', line)
node_type = blk_list[5]
node_size = blk_list[3]
if node_type in set(['disk', 'loop']):
# new disk
disk = {'name': blk_list[0], 'type': node_type, 'size': node_size}
if node_type == 'disk':
disk['partitions'] = []
disks.append(disk)
# get size info if relevant
continue
if node_type in set(['part', 'dm']):
# new partition (or whatever dm is)
node_name = blk_list[0].split('\x80')[1]
partition = {'name': node_name, 'type': node_type, 'size': node_size}
disk['partitions'].append(partition)
continue
if len(blk_list) > 8: # if node_type == 'crypt':
# crypt belonging to a partition
node_name = blk_list[1].split('\x80')[1]
partition['crypt'] = node_name
return disks
def main(argv):
if len(argv)>1:
print 'Usage: ./parse.py blk_filename'
return
result = parse_blk(argv[0])
pprint.PrettyPrinter(indent=4).pprint(result)
if __name__ == "__main__":
main(sys.argv[1:])
It works for your output as well:
$ python check_partitions.py blkout2.txt
[ { 'name': 'xvdf',
'partitions': [ { 'name': 'xvdf1', 'size': '350M', 'type': 'part'},
{ 'name': 'xvdf2', 'size': '34.7G', 'type': 'part'}],
'size': '35G',
'type': 'disk'},
{ 'name': 'xvdp',
'partitions': [{ 'name': 'xvdp1', 'size': '8G', 'type': 'part'}],
'size': '8G',
'type': 'disk'}]
This is how it works on a slightly more complicated scenario with docker loopback devices and encrypted partitions.
$ python check_partitions.py blkout.txt
[ { 'name': 'sda',
'partitions': [ { 'crypt': 'cloudfleet-swap',
'name': 'sda1',
'size': '2G',
'type': 'part'},
{ 'crypt': 'cloudfleet-storage',
'name': 'sda2',
'size': '27.7G',
'type': 'part'}],
'size': '29.7G',
'type': 'disk'},
{ 'name': 'loop0', 'size': '100G', 'type': 'loop'},
{ 'name': 'loop1', 'size': '2G', 'type': 'loop'},
{ 'name': 'mmcblk0',
'partitions': [{ 'name': 'mmcblk0p1',
'size': '7.4G',
'type': 'part'}],
'size': '7.4G',
'type': 'disk'}]

Related

Make the content of the json file vertically [duplicate]

How do I pretty-print a JSON file in Python?
Use the indent= parameter of json.dump() or json.dumps() to specify how many spaces to indent by:
>>> import json
>>> your_json = '["foo", {"bar": ["baz", null, 1.0, 2]}]'
>>> parsed = json.loads(your_json)
>>> print(json.dumps(parsed, indent=4))
[
"foo",
{
"bar": [
"baz",
null,
1.0,
2
]
}
]
To parse a file, use json.load():
with open('filename.txt', 'r') as handle:
parsed = json.load(handle)
You can do this on the command line:
python3 -m json.tool some.json
(as already mentioned in the commentaries to the question, thanks to #Kai Petzke for the python3 suggestion).
Actually python is not my favourite tool as far as json processing on the command line is concerned. For simple pretty printing is ok, but if you want to manipulate the json it can become overcomplicated. You'd soon need to write a separate script-file, you could end up with maps whose keys are u"some-key" (python unicode), which makes selecting fields more difficult and doesn't really go in the direction of pretty-printing.
You can also use jq:
jq . some.json
and you get colors as a bonus (and way easier extendability).
Addendum: There is some confusion in the comments about using jq to process large JSON files on the one hand, and having a very large jq program on the other. For pretty-printing a file consisting of a single large JSON entity, the practical limitation is RAM. For pretty-printing a 2GB file consisting of a single array of real-world data, the "maximum resident set size" required for pretty-printing was 5GB (whether using jq 1.5 or 1.6). Note also that jq can be used from within python after pip install jq.
After reading the data with the json standard library module, use the pprint standard library module to display the parsed data. Example:
import json
import pprint
json_data = None
with open('file_name.txt', 'r') as f:
data = f.read()
json_data = json.loads(data)
pprint.pprint(json_data)
The output will look like:
{'address': {'city': 'New York',
'postalCode': '10021-3100',
'state': 'NY',
'streetAddress': '21 2nd Street'},
'age': 27,
'children': [],
'firstName': 'John',
'isAlive': True,
'lastName': 'Smith'}
Note that this output is not valid JSON; while it shows the content of the Python data structure with nice formatting, it uses Python syntax to do so. In particular, strings are (usually) enclosed in single quotes, whereas JSON requires double quotes. To rewrite the data to a JSON file, use pprint.pformat:
pretty_print_json = pprint.pformat(json_data).replace("'", '"')
with open('file_name.json', 'w') as f:
f.write(pretty_print_json)
Pygmentize is a powerful tool for coloring the output of terminal commands.
Here is an example of using it to add syntax highlighting to the json.tool output:
echo '{"foo": "bar"}' | python -m json.tool | pygmentize -l json
The result will look like:
In a previous Stack Overflow answer, I show in detail how to install and use pygmentize.
Use this function and don't sweat having to remember if your JSON is a str or dict again - just look at the pretty print:
import json
def pp_json(json_thing, sort=True, indents=4):
if type(json_thing) is str:
print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
else:
print(json.dumps(json_thing, sort_keys=sort, indent=indents))
return None
pp_json(your_json_string_or_dict)
Use pprint: https://docs.python.org/3.6/library/pprint.html
import pprint
pprint.pprint(json)
print() compared to pprint.pprint()
print(json)
{'feed': {'title': 'W3Schools Home Page', 'title_detail': {'type': 'text/plain', 'language': None, 'base': '', 'value': 'W3Schools Home Page'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'https://www.w3schools.com'}], 'link': 'https://www.w3schools.com', 'subtitle': 'Free web building tutorials', 'subtitle_detail': {'type': 'text/html', 'language': None, 'base': '', 'value': 'Free web building tutorials'}}, 'entries': [], 'bozo': 0, 'encoding': 'utf-8', 'version': 'rss20', 'namespaces': {}}
pprint.pprint(json)
{'bozo': 0,
'encoding': 'utf-8',
'entries': [],
'feed': {'link': 'https://www.w3schools.com',
'links': [{'href': 'https://www.w3schools.com',
'rel': 'alternate',
'type': 'text/html'}],
'subtitle': 'Free web building tutorials',
'subtitle_detail': {'base': '',
'language': None,
'type': 'text/html',
'value': 'Free web building tutorials'},
'title': 'W3Schools Home Page',
'title_detail': {'base': '',
'language': None,
'type': 'text/plain',
'value': 'W3Schools Home Page'}},
'namespaces': {},
'version': 'rss20'}
To be able to pretty print from the command line and be able to have control over the indentation etc. you can set up an alias similar to this:
alias jsonpp="python -c 'import sys, json; print json.dumps(json.load(sys.stdin), sort_keys=True, indent=2)'"
And then use the alias in one of these ways:
cat myfile.json | jsonpp
jsonpp < myfile.json
You could try pprintjson.
Installation
$ pip3 install pprintjson
Usage
Pretty print JSON from a file using the pprintjson CLI.
$ pprintjson "./path/to/file.json"
Pretty print JSON from a stdin using the pprintjson CLI.
$ echo '{ "a": 1, "b": "string", "c": true }' | pprintjson
Pretty print JSON from a string using the pprintjson CLI.
$ pprintjson -c '{ "a": 1, "b": "string", "c": true }'
Pretty print JSON from a string with an indent of 1.
$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -i 1
Pretty print JSON from a string and save output to a file output.json.
$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -o ./output.json
Output
def saveJson(date,fileToSave):
with open(fileToSave, 'w+') as fileToSave:
json.dump(date, fileToSave, ensure_ascii=True, indent=4, sort_keys=True)
It works to display or save it to a file.
Here's a simple example of pretty printing JSON to the console in a nice way in Python, without requiring the JSON to be on your computer as a local file:
import pprint
import json
from urllib.request import urlopen # (Only used to get this example)
# Getting a JSON example for this example
r = urlopen("https://mdn.github.io/fetch-examples/fetch-json/products.json")
text = r.read()
# To print it
pprint.pprint(json.loads(text))
TL;DR: many ways, also consider print(yaml.dump(j, sort_keys=False))
For most uses, indent should do it:
print(json.dumps(parsed, indent=2))
A Json structure is basically tree structure.
While trying to find something fancier, I came across this nice paper depicting other forms of nice trees that might be interesting: https://blog.ouseful.info/2021/07/13/exploring-the-hierarchical-structure-of-dataframes-and-csv-data/.
It has some interactive trees and even comes with some code including this collapsing tree from so:
Other samples include using plotly Here is the code example from plotly:
import plotly.express as px
fig = px.treemap(
names = ["Eve","Cain", "Seth", "Enos", "Noam", "Abel", "Awan", "Enoch", "Azura"],
parents = ["", "Eve", "Eve", "Seth", "Seth", "Eve", "Eve", "Awan", "Eve"]
)
fig.update_traces(root_color="lightgrey")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
And using treelib. On that note, This github also provides nice visualizations. Here is one example using treelib:
#%pip install treelib
from treelib import Tree
country_tree = Tree()
# Create a root node
country_tree.create_node("Country", "countries")
# Group by country
for country, regions in wards_df.head(5).groupby(["CTRY17NM", "CTRY17CD"]):
# Generate a node for each country
country_tree.create_node(country[0], country[1], parent="countries")
# Group by region
for region, las in regions.groupby(["GOR10NM", "GOR10CD"]):
# Generate a node for each region
country_tree.create_node(region[0], region[1], parent=country[1])
# Group by local authority
for la, wards in las.groupby(['LAD17NM', 'LAD17CD']):
# Create a node for each local authority
country_tree.create_node(la[0], la[1], parent=region[1])
for ward, _ in wards.groupby(['WD17NM', 'WD17CD']):
# Create a leaf node for each ward
country_tree.create_node(ward[0], ward[1], parent=la[1])
# Output the hierarchical data
country_tree.show()
I have, based on this, created a function to convert json to a tree:
from treelib import Node, Tree, node
def create_node(tree, s, counter_byref, verbose, parent_id=None):
node_id = counter_byref[0]
if verbose:
print(f"tree.create_node({s}, {node_id}, parent={parent_id})")
tree.create_node(s, node_id, parent=parent_id)
counter_byref[0] += 1
return node_id
def to_compact_string(o):
if type(o) == dict:
if len(o)>1:
raise Exception()
k,v =next(iter(o.items()))
return f'{k}:{to_compact_string(v)}'
elif type(o) == list:
if len(o)>1:
raise Exception()
return f'[{to_compact_string(next(iter(o)))}]'
else:
return str(o)
def to_compact(tree, o, counter_byref, verbose, parent_id):
try:
s = to_compact_string(o)
if verbose:
print(f"# to_compact({o}) ==> [{s}]")
create_node(tree, s, counter_byref, verbose, parent_id=parent_id)
return True
except:
return False
def json_2_tree(o , parent_id=None, tree=None, counter_byref=[0], verbose=False, compact_single_dict=False, listsNodeSymbol='+'):
if tree is None:
tree = Tree()
parent_id = create_node(tree, '+', counter_byref, verbose)
if compact_single_dict and to_compact(tree, o, counter_byref, verbose, parent_id):
# no need to do more, inserted as a single node
pass
elif type(o) == dict:
for k,v in o.items():
if compact_single_dict and to_compact(tree, {k:v}, counter_byref, verbose, parent_id):
# no need to do more, inserted as a single node
continue
key_nd_id = create_node(tree, str(k), counter_byref, verbose, parent_id=parent_id)
if verbose:
print(f"# json_2_tree({v})")
json_2_tree(v , parent_id=key_nd_id, tree=tree, counter_byref=counter_byref, verbose=verbose, listsNodeSymbol=listsNodeSymbol, compact_single_dict=compact_single_dict)
elif type(o) == list:
if listsNodeSymbol is not None:
parent_id = create_node(tree, listsNodeSymbol, counter_byref, verbose, parent_id=parent_id)
for i in o:
if compact_single_dict and to_compact(tree, i, counter_byref, verbose, parent_id):
# no need to do more, inserted as a single node
continue
if verbose:
print(f"# json_2_tree({i})")
json_2_tree(i , parent_id=parent_id, tree=tree, counter_byref=counter_byref, verbose=verbose,listsNodeSymbol=listsNodeSymbol, compact_single_dict=compact_single_dict)
else: #node
create_node(tree, str(o), counter_byref, verbose, parent_id=parent_id)
return tree
Then for example:
import json
j = json.loads('{"2": 3, "4": [5, 6], "7": {"8": 9}}')
json_2_tree(j ,verbose=False,listsNodeSymbol='+' ).show()
gives:
+
├── 2
│ └── 3
├── 4
│ └── +
│ ├── 5
│ └── 6
└── 7
└── 8
└── 9
While
json_2_tree(j ,listsNodeSymbol=None, verbose=False ).show()
+
├── 2
│ └── 3
├── 4
│ ├── 5
│ └── 6
└── 7
└── 8
└── 9
And
json_2_tree(j ,compact_single_dict=True,listsNodeSymbol=None).show()
+
├── 2:3
├── 4
│ ├── 5
│ └── 6
└── 7:8:9
As you see, there are different trees one can make depending on how explicit vs. compact he wants to be.
One of my favorites, and one of the most compact ones might be using yaml:
import yaml
j = json.loads('{"2": "3", "4": ["5", "6"], "7": {"8": "9"}}')
print(yaml.dump(j, sort_keys=False))
Gives the compact and unambiguous:
'2': '3'
'4':
- '5'
- '6'
'7':
'8': '9'
I think that's better to parse the json before, to avoid errors:
def format_response(response):
try:
parsed = json.loads(response.text)
except JSONDecodeError:
return response.text
return json.dumps(parsed, ensure_ascii=True, indent=4)
I had a similar requirement to dump the contents of json file for logging, something quick and easy:
print(json.dumps(json.load(open(os.path.join('<myPath>', '<myjson>'), "r")), indent = 4 ))
if you use it often then put it in a function:
def pp_json_file(path, file):
print(json.dumps(json.load(open(os.path.join(path, file), "r")), indent = 4))
A very simple way is using rich. with this method you can also highlight the json
This method reads data from a json file called config.json
from rich import print_json
setup_type = open('config.json')
data = json.load(setup_type)
print_json(data=data)
It's far from perfect, but it does the job.
data = data.replace(',"',',\n"')
you can improve it, add indenting and so on, but if you just want to be able to read a cleaner json, this is the way to go.

IBM Watson CPLEX Shows no Variables, no Solution when solving LP file

I'm migrating an application that formerly ran on IBM's DoCloud to their new API based off of Watson. Since our application doesn't have data formatted in CSV nor a separation between the model and data layers it seemed simpler to upload an LP file along with a model file that reads the LP file and solves it. I can upload and it claims to solve correctly but returns empty solve status. I've also output various model info (e.g. number of variables) and everything is zeroed out. I've confirmed the LP isn't blank - it has a trivial MILP.
Here is my model code (most of which is taken directly from the example at https://dataplatform.cloud.ibm.com/exchange/public/entry/view/50fa9246181026cd7ae2a5bc7e4ac7bd):
import os
import sys
from os.path import splitext
import pandas
from docplex.mp.model_reader import ModelReader
from docplex.util.environment import get_environment
from six import iteritems
def loadModelFiles():
"""Load the input CSVs and extract the model and param data from it
"""
env = get_environment()
inputModel = params = None
modelReader = ModelReader()
for inputName in [f for f in os.listdir('.') if splitext(f)[1] != '.py']:
inputBaseName, ext = splitext(inputName)
print(f'Info: loading {inputName}')
try:
if inputBaseName == 'model':
inputModel = modelReader.read_model(inputName, model_name=inputBaseName)
elif inputBaseName == 'params':
params = modelReader.read_prm(inputName)
except Exception as e:
with env.get_input_stream(inputName) as inStream:
inData = inStream.read()
raise Exception(f'Error: {e} found while processing {inputName} with contents {inData}')
if inputModel is None or params is None:
print('Warning: error loading model or params, see earlier messages for details')
return inputModel, params
def writeOutputs(outputs):
"""Write all dataframes in ``outputs`` as .csv.
Args:
outputs: The map of outputs 'outputname' -> 'output df'
"""
for (name, df) in iteritems(outputs):
csv_file = '%s.csv' % name
print(csv_file)
with get_environment().get_output_stream(csv_file) as fp:
if sys.version_info[0] < 3:
fp.write(df.to_csv(index=False, encoding='utf8'))
else:
fp.write(df.to_csv(index=False).encode(encoding='utf8'))
if len(outputs) == 0:
print("Warning: no outputs written")
# load and solve model
model, modelParams = loadModelFiles()
ok = model.solve(cplex_parameters=modelParams)
solution_df = pandas.DataFrame(columns=['name', 'value'])
for index, dvar in enumerate(model.solution.iter_variables()):
solution_df.loc[index, 'name'] = dvar.to_string()
solution_df.loc[index, 'value'] = dvar.solution_value
outputs = {}
outputs['solution'] = solution_df
# Generate output files
writeOutputs(outputs)
try:
with get_environment().get_output_stream('test.txt') as fp:
fp.write(f'{model.get_statistics()}'.encode('utf-8'))
except Exception as e:
with get_environment().get_output_stream('excInfo') as fp:
fp.write(f'Got exception {e}')
and a stub of the code that runs it (again, pulling heavily from the example):
prmFile = NamedTemporaryFile()
prmFile.write(self.ctx.cplex_parameters.export_prm_to_string().encode())
modelFile = NamedTemporaryFile()
modelFile.write(self.solver.export_as_lp_string(hide_user_names=True).encode())
modelMetadata = {
self.client.repository.ModelMetaNames.NAME: self.name,
self.client.repository.ModelMetaNames.TYPE: 'do-docplex_12.9',
self.client.repository.ModelMetaNames.RUNTIME_UID: 'do_12.9'
}
baseDir = os.path.dirname(os.path.realpath(__file__))
def reset(tarinfo):
tarinfo.uid = tarinfo.gid = 0
tarinfo.uname = tarinfo.gname = 'root'
return tarinfo
with NamedTemporaryFile() as tmp:
tar = tarfile.open(tmp.name, 'w:gz')
tar.add(f'{baseDir}/ibm_model.py', arcname='main.py', filter=reset)
tar.add(prmFile.name, arcname='params.prm', filter=reset)
tar.add(modelFile.name, arcname='model.lp', filter=reset)
tar.close()
modelDetails = self.client.repository.store_model(
model=tmp.name,
meta_props=modelMetadata
)
modelUid = self.client.repository.get_model_uid(modelDetails)
metaProps = {
self.client.deployments.ConfigurationMetaNames.NAME: self.name,
self.client.deployments.ConfigurationMetaNames.BATCH: {},
self.client.deployments.ConfigurationMetaNames.COMPUTE: {'name': 'S', 'nodes': 1}
}
deployDetails = self.client.deployments.create(modelUid, meta_props=metaProps)
deployUid = self.client.deployments.get_uid(deployDetails)
solvePayload = {
# we upload input data as part of model since only CSV data is supported in this interface
self.client.deployments.DecisionOptimizationMetaNames.INPUT_DATA: [],
self.client.deployments.DecisionOptimizationMetaNames.OUTPUT_DATA: [
{
"id": ".*"
}
]
}
jobDetails = self.client.deployments.create_job(deployUid, solvePayload)
jobUid = self.client.deployments.get_job_uid(jobDetails)
while jobDetails['entity']['decision_optimization']['status']['state'] not in ['completed', 'failed',
'canceled']:
logger.debug(jobDetails['entity']['decision_optimization']['status']['state'] + '...')
time.sleep(5)
jobDetails = self.client.deployments.get_job_details(jobUid)
logger.debug(jobDetails['entity']['decision_optimization']['status']['state'])
# cleanup
self.client.repository.delete(modelUid)
prmFile.close()
modelFile.close()
Any ideas of what can be causing this or what a good test avenue is? It seems there's no way to view the output of the model for debugging, am I missing something in Watson studio?
I tryed something very similar from your code and the solution is included in the payload when the job is completed.
See this shared notebook: https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/cfbe34a0-52a8-436c-99bf-8df6979c11da/view?access_token=220636400ecdf537fb5ea1b47d41cb10f1b252199d1814d8f96a0280ec4a4e1e
I the last cells, after the job is completed, I print the status:
print(jobDetails['entity']['decision_optimization'])
and get
{'output_data_references': [], 'input_data': [], 'solve_state': {'details': {'PROGRESS_GAP': '0.0', 'MODEL_DETAIL_NONZEROS': '3', 'MODEL_DETAIL_TYPE': 'MILP', 'MODEL_DETAIL_CONTINUOUS_VARS': '0', 'MODEL_DETAIL_CONSTRAINTS': '2', 'PROGRESS_CURRENT_OBJECTIVE': '100.0', 'MODEL_DETAIL_INTEGER_VARS': '2', 'MODEL_DETAIL_KPIS': '[]', 'MODEL_DETAIL_BOOLEAN_VARS': '0', 'PROGRESS_BEST_OBJECTIVE': '100.0'}, 'solve_status': 'optimal_solution'}, 'output_data': [{'id': 'test.txt', 'fields': ['___TEXT___'], 'values': [['IC0gbnVtYmVyIG9mIHZhcmlhYmxlczogMgogICAtIGJpbmFyeT0wLCBpbnRlZ2VyPTIsIGNvbnRpbnVvdXM9MAogLSBudW1iZXIgb2YgY29uc3RyYWludHM6IDIKICAgLSBsaW5lYXI9Mg==']]}, {'id': 'solution.json', 'fields': ['___TEXT___'], 'values': [['eyJDUExFWFNvbHV0aW9uIjogeyJ2ZXJzaW9uIjogIjEuMCIsICJoZWFkZXIiOiB7InByb2JsZW1OYW1lIjogIm1vZGVsIiwgIm9iamVjdGl2ZVZhbHVlIjogIjEwMC4wIiwgInNvbHZlZF9ieSI6ICJjcGxleF9sb2NhbCJ9LCAidmFyaWFibGVzIjogW3siaW5kZXgiOiAiMCIsICJuYW1lIjogIngiLCAidmFsdWUiOiAiNS4wIn0sIHsiaW5kZXgiOiAiMSIsICJuYW1lIjogInkiLCAidmFsdWUiOiAiOTUuMCJ9XSwgImxpbmVhckNvbnN0cmFpbnRzIjogW3sibmFtZSI6ICJjMSIsICJpbmRleCI6IDB9LCB7Im5hbWUiOiAiYzIiLCAiaW5kZXgiOiAxfV19fQ==']]}, {'id': 'solution.csv', 'fields': ['name', 'value'], 'values': [['x', 5], ['y', 95]]}], 'status': {'state': 'completed', 'running_at': '2020-03-09T06:45:29.759Z', 'completed_at': '2020-03-09T06:45:30.470Z'}}
which contains in output:
'output_data': [{
'id': 'test.txt',
'fields': ['___TEXT___'],
'values': [['IC0gbnVtYmVyIG9mIHZhcmlhYmxlczogMgogICAtIGJpbmFyeT0wLCBpbnRlZ2VyPTIsIGNvbnRpbnVvdXM9MAogLSBudW1iZXIgb2YgY29uc3RyYWludHM6IDIKICAgLSBsaW5lYXI9Mg==']]
}, {
'id': 'solution.json',
'fields': ['___TEXT___'],
'values': [['eyJDUExFWFNvbHV0aW9uIjogeyJ2ZXJzaW9uIjogIjEuMCIsICJoZWFkZXIiOiB7InByb2JsZW1OYW1lIjogIm1vZGVsIiwgIm9iamVjdGl2ZVZhbHVlIjogIjEwMC4wIiwgInNvbHZlZF9ieSI6ICJjcGxleF9sb2NhbCJ9LCAidmFyaWFibGVzIjogW3siaW5kZXgiOiAiMCIsICJuYW1lIjogIngiLCAidmFsdWUiOiAiNS4wIn0sIHsiaW5kZXgiOiAiMSIsICJuYW1lIjogInkiLCAidmFsdWUiOiAiOTUuMCJ9XSwgImxpbmVhckNvbnN0cmFpbnRzIjogW3sibmFtZSI6ICJjMSIsICJpbmRleCI6IDB9LCB7Im5hbWUiOiAiYzIiLCAiaW5kZXgiOiAxfV19fQ==']]
}, {
'id': 'solution.csv',
'fields': ['name', 'value'],
'values': [['x', 5], ['y', 95]]
}
],
Hope this helps.
Alain
Thanks to Alain for verifying the overall approach but the main issue was there was simply a bug in my code:
After calling modelFile.write(...) it's necessary to call modelFile.seek(0) to reset the file pointer - otherwise it writes an empty file to the tar archive

how to use combine loops and use in subprocess?

my_dict has the 1000 values sample -
{0: {'Id': 'd1', 'email': '122as#gmail.com', 'name': 'elpato'},
1: {'Id': 'd2', 'email': 'sss#gmail.com', 'name': 'petoka'},
2: {'Id': 'd3', 'email': 'abcd#gmail.com', 'name': 'hukke'},
3: {'Id': 'd4', 'email': 'bbsss#gmail.com', 'name': 'aetptoka'}}
This code by using name in my_dict and creating json data and json files of it by using faker library random data is generated.
Here by running 1.py 4 json files are created.
i.e., elpato.json, petoka.json, hukke.json, aetptoka.json
Here is 1.py :
import subprocess
import json
import faker
for ids in [g['name'] for g in my_dict.values()]:
fake = Faker('en_US')
ind=ids
sms = {
"user_id": ind ,
"name": fake.name(),
"email": fake.email(),
"gender": "MALE",
"mother_name": fake.name(),
"father_name": fake.name()
}
f_name = '{}.json'.format(ind)
print(f_name)
with open(f_name, 'w') as fp:
json.dump(sms, fp, indent=4)
for grabbing email :
for name in [v['email'] for v in my_dict.values()]:
print(name)
need to use name and email loops in subprocess
output I need : in f_name 4 json files that has been created above should load.
subprocess.call(["....","f_name(json file)","email"])
I need to loop the subprocess so that subprocess will run into loop by calling both f_name and email in a loop. Here it should loop for 4 times as 4 json files are created and 4 emails are in dict.

String indices must be integers - Django

I have a pretty big dictionary which looks like this:
{
'startIndex': 1,
'username': 'myemail#gmail.com',
'items': [{
'id': '67022006',
'name': 'Adopt-a-Hydrant',
'kind': 'analytics#accountSummary',
'webProperties': [{
'id': 'UA-67522226-1',
'name': 'Adopt-a-Hydrant',
'websiteUrl': 'https://www.udemy.com/,
'internalWebPropertyId': '104343473',
'profiles': [{
'id': '108333146',
'name': 'Adopt a Hydrant (Udemy)',
'type': 'WEB',
'kind': 'analytics#profileSummary'
}, {
'id': '132099908',
'name': 'Unfiltered view',
'type': 'WEB',
'kind': 'analytics#profileSummary'
}],
'level': 'STANDARD',
'kind': 'analytics#webPropertySummary'
}]
}, {
'id': '44222959',
'name': 'A223n',
'kind': 'analytics#accountSummary',
And so on....
When I copy this dictionary on my Jupyter notebook and I run the exact same function I run on my django code it runs as expected, everything is literarily the same, in my django code I'm even printing the dictionary out then I copy it to the notebook and run it and I get what I'm expecting.
Just for more info this is the function:
google_profile = gp.google_profile # Get google_profile from DB
print(google_profile)
all_properties = []
for properties in google_profile['items']:
all_properties.append(properties)
site_selection=[]
for single_property in all_properties:
single_propery_name=single_property['name']
for single_view in single_property['webProperties'][0]['profiles']:
single_view_id = single_view['id']
single_view_name = (single_view['name'])
selections = single_propery_name + ' (View: '+single_view_name+' ID: '+single_view_id+')'
site_selection.append(selections)
print (site_selection)
So my guess is that my notebook has some sort of json parser installed or something like that? Is that possible? Why in django I can't access dictionaries the same way I can on my ipython notebooks?
EDITS
More info:
The error is at the line: for properties in google_profile['items']:
Django debug is: TypeError at /gconnect/ string indices must be integers
Local Vars are:
all_properties =[]
current_user = 'myemail#gmail.com'
google_profile = `the above dictionary`
So just to make it clear for who finds this question:
If you save a dictionary in a database django will save it as a string, so you won't be able to access it after.
To solve this you can re-convert it to a dictionary:
The answer from this post worked perfectly for me, in other words:
import json
s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
json_acceptable_string = s.replace("'", "\"")
d = json.loads(json_acceptable_string)
# d = {u'muffin': u'lolz', u'foo': u'kitty'}
There are many ways to convert a string to a dictionary, this is only one. If you stumbled in this problem you can quickly check if it's a string instead of a dictionary with:
print(type(var))
In my case I had:
<class 'str'>
before converting it with the above method and then I got
<class 'dict'>
and everything worked as supposed to

parsing linux iscsi multipath.conf into python nested dictionaries

I writing a script that involves adding/removing multipath "objects" from the standard multipath.conf configuration file, example below:
# This is a basic configuration file with some examples, for device mapper
# multipath.
## Use user friendly names, instead of using WWIDs as names.
defaults {
user_friendly_names yes
}
##
devices {
device {
vendor "SolidFir"
product "SSD SAN"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_selector "service-time 0"
path_checker tur
hardware_handler "0"
failback immediate
rr_weight uniform
rr_min_io 1000
rr_min_io_rq 1
features "0"
no_path_retry 24
prio const
}
}
multipaths {
multipath {
wwid 36f47acc1000000006167347a00000041
alias dwqa-ora-fs
}
multipath {
wwid 36f47acc1000000006167347a00000043
alias dwqa-ora-grid
}
multipath {
wwid 36f47acc1000000006167347a00000044
alias dwqa-ora-dwqa1
}
multipath {
wwid 36f47acc1000000006167347a000000ae
alias dwqa-ora-dwh2d10-1
}
multipath {
wwid 36f47acc1000000006167347a000000f9
alias dwqa-ora-testdg-1
}
}
So what I'm trying to do is read this file in and store it in a nested python dictionary (or list of nested dictionaries). We can ignore the comments lines (starting with #) for now. I have not come up with a clear/concise solution for this.
Here is my partial solution (doesn't give me the expected output yet, but it's close)
def nonblank_lines(f):
for l in f:
line = l.rstrip()
if line:
yield line
def __parse_conf__(self):
conf = []
with open(self.conf_file_path) as f:
for line in nonblank_lines(f):
if line.strip().endswith("{"): # opening bracket, start of new list of dictionaries
current_dictionary_key = line.split()[0]
current_dictionary = { current_dictionary_key : None }
conf.append(current_dictionary)
elif line.strip().endswith("}"): # closing bracket, end of new dictionary
pass
# do nothing...
elif not line.strip().startswith("#"):
if current_dictionary.values() == [None]:
# New dictionary... we should be appending to this one
current_dictionary[current_dictionary_key] = [{}]
current_dictionary = current_dictionary[current_dictionary_key][0]
key = line.strip().split()[0]
val = " ".join(line.strip().split()[1:])
current_dictionary[key] = val
And this is the resulting dictionary (the list 'conf'):
[{'defaults': [{'user_friendly_names': 'yes'}]},
{'devices': None},
{'device': [{'failback': 'immediate',
'features': '"0"',
'getuid_callout': '"/lib/udev/scsi_id --whitelisted --device=/dev/%n"',
'hardware_handler': '"0"',
'no_path_retry': '24',
'path_checker': 'tur',
'path_grouping_policy': 'multibus',
'path_selector': '"service-time 0"',
'prio': 'const',
'product': '"SSD SAN"',
'rr_min_io': '1000',
'rr_min_io_rq': '1',
'rr_weight': 'uniform',
'vendor': '"SolidFir"'}]},
{'multipaths': None},
{'multipath': [{'alias': 'dwqa-ora-fs',
'wwid': '36f47acc1000000006167347a00000041'}]},
{'multipath': [{'alias': 'dwqa-ora-grid',
'wwid': '36f47acc1000000006167347a00000043'}]},
{'multipath': [{'alias': 'dwqa-ora-dwqa1',
'wwid': '36f47acc1000000006167347a00000044'}]},
{'multipath': [{'alias': 'dwqa-ora-dwh2d10-1',
'wwid': '36f47acc1000000006167347a000000ae'}]},
{'multipath': [{'alias': 'dwqa-ora-testdg-1',
'wwid': '36f47acc1000000006167347a000000f9'}]},
{'multipath': [{'alias': 'dwqa-ora-testdp10-1',
'wwid': '"SSolidFirSSD SAN 6167347a00000123f47acc0100000000"'}]}]
Obviously the "None"s should be replaced with nested dictionary below it, but I can't get this part to work.
Any suggestions? Or better ways to parse this file and store it in a python data structure?
Try something like this:
def parse_conf(conf_lines):
config = []
# iterate on config lines
for line in conf_lines:
# remove left and right spaces
line = line.rstrip().strip()
if line.startswith('#'):
# skip comment lines
continue
elif line.endswith('{'):
# new dict (notice the recursion here)
config.append({line.split()[0]: parse_conf(conf_lines)})
else:
# inside a dict
if line.endswith('}'):
# end of current dict
break
else:
# parameter line
line = line.split()
if len(line) > 1:
config.append({line[0]: " ".join(line[1:])})
return config
The function will get into the nested levels on the configuration file (thanks to recursion and the fact that the conf_lines object is an iterator) and make a list of dictionaries that contain other dictionaries. Unfortunately, you have to put every nested dictionary inside a list again, because in the example file you show how multipath can repeat, but in Python dictionaries a key must be unique. So you make a list.
You can test it with your example configuration file, like this:
with open('multipath.conf','r') as conf_file:
config = parse_conf(conf_file)
# show multipath config lines as an example
for item in config:
if 'multipaths' in item:
for multipath in item['multipaths']:
print multipath
# or do something more useful
And the output would be:
{'multipath': [{'wwid': '36f47acc1000000006167347a00000041'}, {'alias': 'dwqa-ora-fs'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a00000043'}, {'alias': 'dwqa-ora-grid'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a00000044'}, {'alias': 'dwqa-ora-dwqa1'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a000000ae'}, {'alias': 'dwqa-ora-dwh2d10-1'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a000000f9'}, {'alias': 'dwqa-ora-testdg-1'}]}
If you don't use recursion, you will need some way of keeping track of your level. But even then it is difficult to have references to parents or siblings in order to add data (I failed). Here's another take based on Daniele Barresi's mention of recursion on the iterable input:
Data:
inp = """
# This is a basic configuration file with some examples, for device mapper
# multipath.
## Use user friendly names, instead of using WWIDs as names.
defaults {
user_friendly_names yes
}
##
devices {
device {
vendor "SolidFir"
product "SSD SAN"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_selector "service-time 0"
path_checker tur
hardware_handler "0"
failback immediate
rr_weight uniform
rr_min_io 1000
rr_min_io_rq 1
features "0"
no_path_retry 24
prio const
}
}
multipaths {
multipath {
wwid 36f47acc1000000006167347a00000041
alias dwqa-ora-fs
}
multipath {
wwid 36f47acc1000000006167347a00000043
alias dwqa-ora-grid
}
multipath {
wwid 36f47acc1000000006167347a00000044
alias dwqa-ora-dwqa1
}
multipath {
wwid 36f47acc1000000006167347a000000ae
alias dwqa-ora-dwh2d10-1
}
multipath {
wwid 36f47acc1000000006167347a000000f9
alias dwqa-ora-testdg-1
}
}
"""
Code:
import re
level = 0
def recurse( data ):
""" """
global level
out = []
level += 1
for line in data:
l = line.strip()
if l and not l.startswith('#'):
match = re.search(r"\s*(\w+)\s*(?:{|(?:\"?\s*([^\"]+)\"?)?)", l)
if not match:
if l == '}':
level -= 1
return out # recursion, up one level
else:
key, value = match.groups()
if not value:
print( " "*level, level, key )
value = recurse( data ) # recursion, down one level
else:
print( " "*level, level, key, value)
out.append( [key,value] )
return out # once
result = recurse( iter(inp.split('\n')) )
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(result)
Resulting list with nested ["key", value] pairs:
[ ['defaults', [['user_friendly_names', 'yes']]],
[ 'devices',
[ [ 'device',
[ ['vendor', 'SolidFir'],
['product', 'SSD SAN'],
['path_grouping_policy', 'multibus'],
[ 'getuid_callout',
'/lib/udev/scsi_id --whitelisted --device=/dev/%n'],
['path_selector', 'service-time 0'],
['path_checker', 'tur'],
['hardware_handler', '0'],
['failback', 'immediate'],
['rr_weight', 'uniform'],
['rr_min_io', '1000'],
['rr_min_io_rq', '1'],
['features', '0'],
['no_path_retry', '24'],
['prio', 'const']]]]],
[ 'multipaths',
[ [ 'multipath',
[ ['wwid', '36f47acc1000000006167347a00000041'],
['alias', 'dwqa-ora-fs']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a00000043'],
['alias', 'dwqa-ora-grid']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a00000044'],
['alias', 'dwqa-ora-dwqa1']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a000000ae'],
['alias', 'dwqa-ora-dwh2d10-1']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a000000f9'],
['alias', 'dwqa-ora-testdg-1']]]]]]
Multipath conf is a bit of a pig to parse. This is what I use (originally based on the answer from daniele-barresi), the output is easier to work with than the other examples.
def get_multipath_conf():
def parse_conf(conf_lines, parent=None):
config = {}
for line in conf_lines:
line = line.split('#',1)[0].strip()
if line.endswith('{'):
key = line.split('{', 1)[0].strip()
value = parse_conf(conf_lines, parent=key)
if key+'s' == parent:
if type(config) is dict:
config = []
config.append(value)
else:
config[key] = value
else:
# inside a dict
if line.endswith('}'):
# end of current dict
break
else:
# parameter line
line = line.split(' ',1)
if len(line) > 1:
key = line[0]
value = line[1].strip().strip("'").strip('"')
config[key] = value
return config
return parse_conf(open('/etc/multipath.conf','r'))
This is the output:
{'blacklist': {'devnode': '^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda|sdb)[0-9]*$'},
'defaults': {'find_multipaths': 'yes',
'max_polling_interval': '4',
'polling_interval': '2',
'reservation_key': '0x1'},
'devices': [{'detect_checker': 'no',
'hardware_handler': '1 alua',
'no_path_retry': '5',
'path_checker': 'tur',
'prio': 'alua',
'product': 'iSCSI Volume',
'user_friendly_names': 'yes',
'vendor': 'StorMagic'}]}

Categories

Resources