Create dictionary from CSV without repeating top level - python

I am trying to make an API using a script. It runs but I need to make it iterate the CSV file without appending port_config to each line:
Reading this CSV File:
device_id,port,description
4444,eth1/1,test1
1111,eth1/2,test2
2222,eth1/3,test3
1234,eth1/4,test4
The code I have so far:
for device_id,port,description in devices:
print(device_id,port,description)
payload="{\n \"port_config\": { \"%s\": { \"description\": \"%s\"
}\n}\n}" % (port,description)
print(payload)
Result of above:
{
"port_config": { "interfacex/x": { "description": "test1" }
}
}
{
"port_config": { "interfacex/x": { "description": "test2" }
}
}
{
"port_config": { "interfacex/x": { "description": "test3" }
}
}
{
"port_config": { "interfacex/x": { "description": "test4" }
}
}
Desired results:
{
"port_config": {
"eth1/1": {"description": "test0,"},
"eth1/2": {"description": "test1,"},
"eth1/3": {"description": "test2,"},
"eth1/4": {"description": "test3,"}
}
}

Your question still has one problem. The CSV example doesn't exactly match the desired output. Let's say you have the following CSV string.
devices = """device_id,port,description
4444,eth1/1,test1
1111,eth1/2,test2
2222,eth1/3,test3
1234,eth1/4,test4
"""
And you want the following output:
d = {
"port_config": {
"eth1/1": {"description": "test1,"},
"eth1/2": {"description": "test2,"},
"eth1/3": {"description": "test3,"},
"eth1/4": {"description": "test4,"},
}
}
You can easily achieve this in Python by doing some dictionary gymnastics like the following:
import csv
from pprint import pprint
# Input csv.
devices = """device_id,port,description
4444,eth1/1,test1
1111,eth1/2,test2
2222,eth1/3,test3
1234,eth1/4,test4
"""
# Read the data using Python's built-in csv module.
lines = devices.splitlines()
reader = csv.reader(lines)
devices = list(reader)
# Let's initialize the target payloads data structure.
payload = {
"port_config": {},
}
for idx, (device_id, port, description) in enumerate(devices):
if idx == 0:
continue # This line skips the header.
payload["port_config"][port] = {"description": description}
pprint(payload)
This should give you the following output:
{'port_config': {'eth1/1': {'description': 'test1'},
'eth1/2': {'description': 'test2'},
'eth1/3': {'description': 'test3'},
'eth1/4': {'description': 'test4'}}}

You can do this via pandas also -
import pandas as pd
df = pd.read_csv('inp_file.csv')
result = {'port_config' : {item['port'] :{"description": item['description']} for item in df[['port','description']].to_dict(orient='records')}}

Related

Iterating JSON key value pairs over another JSON file and create a new json as per the output

I am trying to create a JSON file from two JSON files. Here I am reading the key value pairs from input.json and searching the matches in the secondary.json file and finally dumping the output to a new json file.
In the output of test.py I am expecting
{'tire1': {'source': ['test1', 'test2', 'test3']},
'tire6': {'source': ['test10', 'test21', 'test33']}}
instead of
{'tire1': {'source': ['test10', 'test21', 'test33']},
'tire6': {'source': ['test10', 'test21', 'test33']}}
But do not know what's wrong.
test.py
import json
import re
def findkeysvalues(inputDict, key):
if isinstance(inputDict, list):
for i in inputDict:
for x in findkeysvalues(i, key):
yield x
if isinstance(inputDict, dict):
if key in inputDict:
yield inputDict[key]
for j in inputDict.values():
for x in findkeysvalues(j, key):
yield x
def process_JSON_value(jsonFileInput, parentInputKey, key):
with open(jsonFileInput) as jsonFile:
data = json.load(jsonFile)
Dict = { }
for i in data:
if i == parentInputKey:
Dict[i] = data[i]
return list(findkeysvalues(Dict, key))
def createRulesJSON():
with open("input.json") as jsonFile:
data = json.load(jsonFile)
Dict = { }
rules_items_source = list(findkeysvalues(data, "source"))
for p in data:
Dict[p] = { }
for i in rules_items_source:
x = re.findall("\w+", i[0])
sourceItems = process_JSON_value("secondary.json", x[0], "compname")
Dict[p]['source'] = sourceItems
print(Dict)
createRulesJSON()
input.json
{
"tire1": {
"source": [ "{{ 'TEX' | YYYYYYY | join }}" ],
"dest": [ "{{ Microservice.host }}" ],
"port": "555"
},
"tire6": {
"source": [ "{{ 'REP' | LLLLLL | join }}" ],
"dest": [ "{{ Microservice.host2 }}" ],
"port": "555"
}
}
secondary.json
{
"client": {
"name": "anyname"
},
"PEP": {
"tire2": {
"tire3": {
"compname": "test1"
},
"tire4": {
"compname": "test2"
},
"tire5": {
"compname": "test3"
}
}
},
"REP": {
"tire2": {
"cmpname": "vendor1",
"tire3": {
"compname": "test10"
},
"tire4": {
"compname": "test21"
},
"tire5": {
"compname": "test33"
}
}
},
"Microservice": {
"host": "ttttttttttttttttttttt",
"host2": "GGGGGGGGGGGGGGGGGGGGGGGG"
}
}
Two issues:
Your nested loops in createRulesJSON create a Cartesian product on data. The first loop gets all keys from the data, and the nested loop extracts the three-letter code from all data. So you will combine one key with a code that was extracted from the other key's data. There is no attempt to keep these two informations associated, yet that is what you need.
To fix that, change this:
rules_items_source = list(findkeysvalues(data, "source"))
for p in data:
To:
for p in data:
rules_items_source = list(findkeysvalues(data[p], "source"))
From the expected output it seems that you want to map the code "TEX" (in the first file) with the code "PEP" (in the second file). There is nothing that maps these two codes to eachother.
To fix that, I will just assume that you'll correct in one of your files the code to match the other code.

how to extract specific data from json and put in to csv using python

I have a JSON which is in nested form. I would like to extract specific data from json and put into csv using pandas python.
data = {
"class":"hudson.model.Hudson",
"jobs":[
{
"_class":"hudson.model.FreeStyleProject",
"name":"git_checkout",
"url":"http://localhost:8080/job/git_checkout/",
"builds":[
{
"_class":"hudson.model.FreeStyleBuild",
"duration":1201,
"number":6,
"result":"FAILURE",
"url":"http://localhost:8080/job/git_checkout/6/"
}
]
},
{
"_class":"hudson.model.FreeStyleProject",
"name":"output",
"url":"http://localhost:8080/job/output/",
"builds":[
]
},
{
"_class":"org.jenkinsci.plugins.workflow.job.WorkflowJob",
"name":"pipeline_test",
"url":"http://localhost:8080/job/pipeline_test/",
"builds":[
{
"_class":"org.jenkinsci.plugins.workflow.job.WorkflowRun",
"duration":9274,
"number":85,
"result":"SUCCESS",
"url":"http://localhost:8080/job/pipeline_test/85/"
},
{
"_class":"org.jenkinsci.plugins.workflow.job.WorkflowRun",
"duration":4251,
"number":84,
"result":"SUCCESS",
"url":"http://localhost:8080/job/pipeline_test/84/"
}
]
}
]
}
From the above JSON i want to fetch jobs name value and builds result value . I am new to python any help will be appreciated .
Till now i have tried
main_data = data['jobs]
json_normalize(main_data,['builds'],
record_prefix='jobs_', errors='ignore')
which gives information only build key values and not the name of job .
Can anyone help ?
Expected Output:
Considering only first build result value you can need to be in csv column you can achieve this using pandas.
data = {
"class": "hudson.model.Hudson",
"jobs": [
{
"_class": "hudson.model.FreeStyleProject",
"name": "git_checkout",
"url": "http://localhost:8080/job/git_checkout/",
"builds": [
{
"_class": "hudson.model.FreeStyleBuild",
"duration": 1201,
"number": 6,
"result": "FAILURE",
"url": "http://localhost:8080/job/git_checkout/6/"
}
]
},
{
"_class": "hudson.model.FreeStyleProject",
"name": "output",
"url": "http://localhost:8080/job/output/",
"builds": []
},
{
"_class": "org.jenkinsci.plugins.workflow.job.WorkflowJob",
"name": "pipeline_test",
"url": "http://localhost:8080/job/pipeline_test/",
"builds": [
{
"_class": "org.jenkinsci.plugins.workflow.job.WorkflowRun",
"duration": 9274,
"number": 85,
"result": "SUCCESS",
"url": "http://localhost:8080/job/pipeline_test/85/"
},
{
"_class": "org.jenkinsci.plugins.workflow.job.WorkflowRun",
"duration": 4251,
"number": 84,
"result": "SUCCESS",
"url": "http://localhost:8080/job/pipeline_test/84/"
}
]
}
]
}
main_data = data.get('jobs')
res = {'name':[], 'result':[]}
for name_dict in main_data:
res['name'].append(name_dict.get('name','NA'))
resultval = name_dict['builds'][0].get('result') if len(name_dict['builds'])>0 else 'NA'
res['result'].append(resultval)
print(res)
import pandas as pd
df = pd.DataFrame(res)
df.to_csv("/home/file_timer/jobs.csv", index=False)
Check the csv file output
name,result
git_checkout,FAILURE
output,NA
pipeline_test,SUCCESS
If 'NA' result want to skip then
main_data = data.get('jobs')
res = {'name':[], 'result':[]}
for name_dict in main_data:
if len(name_dict['builds'])==0:
continue
res['name'].append(name_dict.get('name', 'NA'))
resultval = name_dict['builds'][0].get('result')
res['result'].append(resultval)
print(res)
import pandas as pd
df = pd.DataFrame(res)
df.to_csv("/home/akash.pagar/shell_learning/file_timer/jobs.csv", index=False)
Output will bw like
name,result
git_checkout,FAILURE
pipeline_test,SUCCESS
Simply with build number,
for job in data.get('jobs'):
for build in job.get('builds'):
print(job.get('name'), build.get('number'), build.get('result'))
gives the result
git_checkout 6 FAILURE
pipeline_test 85 SUCCESS
pipeline_test 84 SUCCESS
If you want to get the result of latest build, and pretty sure about the build number always in decending order,
for job in data.get('jobs'):
if job.get('builds'):
print(job.get('name'), job.get('builds')[0].get('result'))
and if you are not sure the order,
for job in data.get('jobs'):
if job.get('builds'):
print(job.get('name'), sorted(job.get('builds'), key=lambda k: k.get('number'))[-1].get('result'))
then the result will be:
git_checkout FAILURE
pipeline_test SUCCESS
Assuming last build is the last element of its list and you don't care about jobs with no builds, this does:
import pandas as pd
#data = ... #same format as in the question
z = [(job["name"], job["builds"][-1]["result"]) for job in data["jobs"] if len(job["builds"])]
df = pd.DataFrame(data=z, columns=["name", "result"])
#df.to_csv #TODO
Also we don't necessarily need pandas to create the csv file.
You could do:
import csv
#z = ... #see previous code block
with open("f.csv", 'w') as fp:
csv.writer(fp).writerows([("name", "result")] + z)

Pull key from json file when values is known (groovy or python)

Is there any way to pull the key from JSON if the only thing I know is the value? (In groovy or python)
An example:
I know the "_number" value and I need a key.
So let's say, known _number is 2 and as an output, I should get dsf34f43f34f34f
{
"id": "8e37ecadf4908f79d58080e6ddbc",
"project": "some_project",
"branch": "master",
"current_revision": "3rtgfgdfg2fdsf",
"revisions": {
"43g5g534534rf34f43f": {
"_number": 3,
"created": "2019-04-16 09:03:07.459000000",
"uploader": {
"_account_id": 4
},
"description": "Rebase"
},
"dsf34f43f34f34f": {
"_number": 2,
"created": "2019-04-02 10:54:14.682000000",
"uploader": {
"_account_id": 2
},
"description": "Rebase"
}
}
}
With Groovy:
def json = new groovy.json.JsonSlurper().parse("x.json" as File)
println(json.revisions.findResult{ it.value._number==2 ? it.key : null })
// => dsf34f43f34f34f
Python 3: (assuming that data is saved in data.json):
import json
with open('data.json') as f:
json_data = json.load(f)
for rev, revdata in json_data['revisions'].items():
if revdata['_number'] == 2:
print(rev)
Prints all revs where _number equals 2.
using dict-comprehension:
print({k for k,v in d['revisions'].items() if v.get('_number') == 2})
OUTPUT:
{'dsf34f43f34f34f'}

Is it possible to order JSON object of Key Value Pairs when keys change and values are more key value pairs.

In python, I'm having trouble figuring out how to output the JSON object (expressed below) as a string wherein the contents of Baseball are ordered based on "key1" (descending). When I receive the JSON (from the datasources) it's got the players out of order. Ultimately, my code needs to order the players, and then pass it along to the next function ordered. Please assume that I cannot modify the format of the JSON to be/have arrays as the consuming function can't handle that (as it's currently written).
Example JSON:
{
"DataSource1":{
"Baseball":{
"Sean":{
"key1":"10",
},
"Gene":{
"key1":"100",
},
"Alan":{
"key1":"1",
}
}
},
"DataSource2":{
"Baseball":{
"Bob_Smith":{
"key1":"1"
},
"Adam_Filmore":{
"key1":"100"
},
"Joe_Allen":{
"key1":"10"
}
}
}
"DataSource3":{
"Baseball":{
"Jake":{
"key1":"10"
},
"Huck":{
"key1":"1"
},
"Eric":{
"key1":"100"
}
}
}
}
Example of how I would like JSON to output:
{
"DataSource1":{
"Baseball":{
"Alan":{
"key1":"1",
},
"Sean":{
"key1":"10",
},
"Gene":{
"key1":"100",
}
}
},
"DataSource2":{
"Baseball":{
"Bob_Smith":{
"key1":"1"
},
"Joe_Allen":{
"key1":"10"
},
"Adam_Filmore":{
"key1":"100"
}
}
}
"DataSource3":{
"Baseball":{
"Huck":{
"key1":"1"
},
"Jake":{
"key1":"10"
},
"Eric":{
"key1":"100"
}
}
}
}
Use sorted() to establish the sort order you want, then store the results in a collections.OrderedDict.
Try this:
import json
from collections import OrderedDict
with open('data.json') as f:
data = json.load(f)
for data_source in data:
data[data_source]["Baseball"] = OrderedDict(
sorted(data[data_source]["Baseball"].items(),
key=lambda x: x[1]["key1"]))
with open('new_data.json', 'w') as f:
json.dump(data, f, indent=4)

string indices must be integers error with json

I am trying to parse out following json using pythong:
{
"document_tone":{
"tone_categories":[
{
"tones":[
{
"score":0.044115,
"tone_id":"anger",
"tone_name":"Anger"
},
{
"score":0.005631,
"tone_id":"disgust",
"tone_name":"Disgust"
},
{
"score":0.013157,
"tone_id":"fear",
"tone_name":"Fear"
},
{
"score":1.0,
"tone_id":"joy",
"tone_name":"Joy"
},
{
"score":0.058781,
"tone_id":"sadness",
"tone_name":"Sadness"
}
],
"category_id":"emotion_tone",
"category_name":"Emotion Tone"
},
{
"tones":[
{
"score":0.0,
"tone_id":"analytical",
"tone_name":"Analytical"
},
{
"score":0.0,
"tone_id":"confident",
"tone_name":"Confident"
},
{
"score":0.0,
"tone_id":"tentative",
"tone_name":"Tentative"
}
],
"category_id":"language_tone",
"category_name":"Language Tone"
},
{
"tones":[
{
"score":0.0,
"tone_id":"openness_big5",
"tone_name":"Openness"
},
{
"score":0.571,
"tone_id":"conscientiousness_big5",
"tone_name":"Conscientiousness"
},
{
"score":0.936,
"tone_id":"extraversion_big5",
"tone_name":"Extraversion"
},
{
"score":0.978,
"tone_id":"agreeableness_big5",
"tone_name":"Agreeableness"
},
{
"score":0.975,
"tone_id":"emotional_range_big5",
"tone_name":"Emotional Range"
}
],
"category_id":"social_tone",
"category_name":"Social Tone"
}
]
}
}
and here is the code that I am trying to use following code to get tone name and score from the json:
import json
from watson_developer_cloud import ToneAnalyzerV3Beta
import urllib.request
import codecs
reader = codecs.getreader("utf-8")
tone_analyzer = ToneAnalyzerV3Beta(
url='https://gateway.watsonplatform.net/tone-analyzer/api',
username='<username>',
password='<password>',
version='2016-02-11')
data=json.dumps(tone_analyzer.tone(text='I am very happy'), indent=2)
print (data)
for cat in data['document_tone']['tone_categories']:
print('Category:', cat['category_name'])
for tone in cat['tones']:
print('-', tone['tone_name'])
but keep running into the error string indices must be integers. I tried to ask the same question in one of my earlier post but with this post I am providing some more details.
I would really appreciate any inputs with this.
Thank You
tone_analyzer.tone(text='I am very happy')
returns a dictionary, there is no need to use json to modify the data in any way, just do
X = tone_analyzer.tone(text='I am very happy')
Note that you have already recieved this exact answer on your previous question.

Categories

Resources