I extract data using API and retrieve a list of servers and backups. Some servers have more than one backup. This is how I get list of all servers with backaup IDs.
bkplist = requests.get('https://heee.com/1.2/storage/backup')
bkplist_json = bkplist.json()
backup_list = bkplist.json()
backupl = backup_list['storages']['storage']
Json looks like this:
{
"storages": {
"storage": [
{
"access": "",
"created": "",
"license": ,
"origin": "01165",
"size": ,
"state": "",
"title": "",
"type": "backup",
"uuid": "01019",
"zone": ""
},
Firstly I create a dictionary to store this data:
backup = {}
for u in backup_list['storages']['storage']:
srvuuidorg = u['origin']
backup_uuid = u['uuid']
backup[srvuuidorg] = backup_uuid
But then I find out there is more than one value for every server. As dictionary can have just one value assigned to one key I wanted to use some hybrid of list and dictionary, but I just can't figure it out how to do this with this example.
Servers are nested in storages->storage and I need to assign a couple of uuid which is backup ID to one origin which is server ID.
I know about collections module and with a simple example it is quite understandable, but I have a problem how to use this in my example with extracting data through API.
How extract origin and assign to this key other values stored in json uuid?
What's more it is a massive amount of data so I cannot add every value manually.
You can do something like this.
from collections import defaultdict
backup = defaultdict(list)
for u in backup_list['storages']['storage']:
srvuuidorg = u['origin']
backup_uuid = u['uuid']
backup[srvuuidorg].append(backup_uuid)
Note that you can simplify your loop like this.
from collections import defaultdict
backup = defaultdict(list)
for u in backup_list['storages']['storage']:
backup[u['origin']].append(u['uuid'])
But this may be considering as less readable.
You could store uuid list for origin key.
I sugget the following 2 ways:
Creating empty list for first accessing origin, and then appending to it:
backup = {}
for u in backup_list['storages']['storage']:
srvuuidorg = u['origin']
backup_uuid = u['uuid']
if not backup.get(srvuuidorg):
backup[srvuuidorg] = []
backup[srvuuidorg].append(backup_uuid)
Using defaultdict collection, which basically does the same for you under the hood:
from collections import defaultdict
backup = defaultdict(list)
for u in backup_list['storages']['storage']:
srvuuidorg = u['origin']
backup_uuid = u['uuid']
backup[srvuuidorg].append(backup_uuid)
It seems to me that the last way is more elegant.
If you need to store uuid unique list you should use the saem approach with set instead of list.
A json allows to contain an array in a key:
var= {
"array": [
{"id": 1, "value": "one"},
{"id": 2, "value": "two"},
{"id": 3, "value": "three"}
]
}
print var
{'array': [{'id': 1, 'value': 'one'}, {'id': 2, 'value': 'two'}, {'id': 3, 'value': 'three'}]}
var["array"].append({"id": 4, "value": "new"})
print var
{'array': [{'id': 1, 'value': 'one'}, {'id': 2, 'value': 'two'}, {'id': 3, 'value': 'three'}, {'id': 4, 'value': 'new'}]}
You can use a list for multiple values.
dict = {"Greetings": ["hello", "hi"]}
Related
I just have to check the JSON data on the basis of comma-separated e_code in the table.
how to filter only that data where users e_codes are available
in the database:
id email age e_codes
1. abc#gmail 19 123456,234567,345678
2. xyz#gmail 31 234567,345678,456789
This is my JSON data
[
{
"ct": 1,
"e_code": 123456,
},
{
"ct": 2,
"e_code": 234567,
},
{
"ct": 3,
"e_code": 345678,
},
{
"ct": 4,
"e_code": 456789,
},
{
"ct": 5,
"e_code": 456710,
}
]
If efficiency is not an issue, you could loop through the table, split the values to a list by using case['e_codes'].split(',') and then, for each code loop through the JSON to see whether it is present.
This might be a little inefficient if your data, JSON, or number of values are long.
It might be better to first create a lookup dictionary in which the codes are the keys:
lookup={}
for e in my_json:
lookup[e['e_code']] = 1
You can then check how many of the codes in your table are actually in the JSON:
## Let's assume that the "e_codes" cell of the
## current line is data['e_codes'][i], where i is the line number
for i in lines:
match = [0,0]
for code in data['e_codes'][i].split(','):
try:
match[0]+=lookup[code]
match[1]+=1
except:
match[1]+=1
if match[1]>0: share_present=match[0]/match[1]
For each case, you get a share_present, which is 1.0 if all codes appear in the JSON, 0.0 if none of them do and some value between to indicate the share of codes that were present. Depending on your threshold for keeping a case you can set a filter to True or False depending on this value.
I have a pandas dataframe which I would like to convert to JSON format for my source system to utilize, which requires a very specific JSON format.
I cant seem to get to the exact format like shown in the expected output section, using simple dictionary loops.
Is there anyway I can convert csv/pd.Dataframe to nested JSON?
Any python package specifically built for this?
Input Dataframe:
#Create Input Dataframe
data = {
'col6':['A','A','A','B','B','B'],
'col7':[1, 1, 2, 1, 2, 2],
'col8':['A','A','A','B','B','B'],
'col10':['A','A','A','B','B','B'],
'col14':[1,1,1,1,1,2],
'col15':[1,2,1,1,1,1],
'col16':[9,10,26,9,12,4],
'col18':[1,1,2,1,2,3],
'col1':['xxxx','xxxx','xxxx','xxxx','xxxx','xxxx'],
'col2':[2.02011E+13,2.02011E+13,2.02011E+13,2.02011E+13,2.02011E+13,2.02011E+13],
'col3':['xxxx20201107023012','xxxx20201107023012','xxxx20201107023012','xxxx20201107023012','xxxx20201107023012','xxxx20201107023012'],
'col4':['yyyy','yyyy','yyyy','yyyy','yyyy','yyyy'],
'col5':[0,0,0,0,0,0],
'col9':['A','A','A','B','B','B'],
'col11':[0,0,0,0,0,0],
'col12':[0,0,0,0,0,0],
'col13':[0,0,0,0,0,0],
'col17':[51,63,47,59,53,56]
}
pd.DataFrame(data)
Expected Output:
{
"header1": {
"col1": "xxxx"
"col2": "20201107023012"
"col3": "xxxx20201107023012"
"col4": "yyyy",
"col5": "0"
},
"header2":
{
"header3":
[
{
col6: A,
col7: 1,
header4:
[
{
col8: "A",
col9: 1,
col10: "A",
col11: 0,
col12: 0,
col13: 0,
"header5":
[
{
col14: "1",
col15: 1,
col16: 1,
col17: 51,
col18: 1
},
{
col14: "1",
col15: 1,
col16: 2,
col17: 63,
col18: 2
}
]
},
{
col8: "A",
col9: 1,
col10: "A",
col11: 0,
col12: 0,
col13: 0,
"header5":
[
{
col14: "1",
col15: 1,
col16: 1,
col17: 51,
col18: 1
},
{
col14: "1",
col15: 1,
col16: 2,
col17: 63,
col18: 2
}
]
}
]
}
]
}
}
Maybe this will get you started. I'm not aware of a current python module that will do what you want but this is the basis of how I'd start it. Making assumptions based on what you've provided.
As each successive nest is based on some criteria, you'll need to loop through filtered dataframes. Depending on the size of your dataframes using groupby may be a better option than what I have here but the theory is the same. Also, you'll have to create you key value pairs correctly, this just creates the data support what you are builing.
# assume header 1 is constant so take first row and use .T to transpose to create dictionaries
header1 = dict(df.iloc[0].T[['col1','col2','col3','col4','col5']])
print('header1', header1)
# for header three, looks like you need the unique combinations so create dataframe
# and then iterate through to get all the header3 dictionaries
header3_dicts = []
dfh3 = df[['col6', 'col7']].drop_duplicates().reset_index(drop=True)
for i in range(dfh3.shape[0]):
header3_dicts.append(dict(dfh3.iloc[i].T[['col6','col7']]))
print('header3', header3_dicts)
# iterate over header3 to get header 4
for i in range(dfh3.shape[0]):
#print(dfh3.iat[i,0], dfh3.iat[i,1])
dfh4 = df.loc[(df['col6']==dfh3.iat[i,0]) & (df['col7']==dfh3.iat[i,1])]
header4_dicts = []
for j in range(dfh4.shape[0]):
header4_dicts.append(dict(df.iloc[j].T[['col8','col9','col10','col11','col12','col13']]))
print('header4', header4_dicts)
# next level repeat similar to above
I have a list of dictionaries
d = [{"Date": "2020/10/03 3:30", "Name": "John"}, {"Date": "2020/10/03 5:15", "Name": "Harry"}, {"Date": "2020/10/05 6:30", "Name": "Rob"}]
and I want to only print the name with the same dates.
Output:
John
Harry
I am not sure how I can implement this, any tips ?
Your problem can easily be solved by traversing the list of entries and collecting the dates with the names in a new dictionary. So, you use the dates as key for a dictionary and add the names in a corresponding list of that date. I'm adding a code snippet that does that fairly easily:
d = [{"Date": "2020/10/03 3:30", "Name": "John"}, {"Date": "2020/10/03 5:15","Name": "Harry"}, {"Date": "2020/10/05 6:30", "Name": "Rob"}]
dates = {}
for entry in d:
date = entry["Date"].split()[0]
if date in dates:
dates[date].append(entry["Name"])
else:
dates[date] = []
dates[date].append(entry["Name"])
print(dates["2020/10/03"])
print(dates["2020/10/05"])
Yes, I know my code snippet doesn't directly provide your specified output. I kept it open ended so you can tailor it towards your specific needs.
Here's an approach using collections.Counter and a couple list comprehensions:
from collections import Counter
d = [{"Date": "2020/10/03 3:30", "Name": "John"}, {"Date": "2020/10/03 5:15", "Name": "Harry"}, {"Date": "2020/10/05 6:30", "Name": "Rob"}]
dates = Counter([obj['Date'].split()[0] for obj in d])
multiples = [val for val in dates.keys() if dates[val] > 1]
for obj in d:
if obj['Date'].split()[0] in multiples:
print(obj['Name'])
This prints the following output:
John
Harry
You can extract dates and then sort and group by them.
from datetime import datetime
from itertools import groupby
This is a helper function for extracting the date from a dictionary:
def dategetter(one_dict):
return datetime.strptime(one_dict['Date'],
"%Y/%m/%d %H:%M").date()
This is a dictionary comprehension that extracts, sorts, groups, and organizes the results into a dictionary. You can print the dictionary data in any way you want.
{date: [name['Name'] for name in names] for date,names
in groupby(sorted(d, key=dategetter), key=dategetter)}
#{datetime.date(2020, 10, 3): ['John', 'Harry'],
# datetime.date(2020, 10, 5): ['Rob']}
Assuming the "Date" format is consistent, you can use simple dict & list comprehensions:
from collections import defaultdict
res = defaultdict(list)
for i in d:
dt = i["Date"].split()[0]
res[dt].append(i["Name"])
for date, names in res.items():
if len(names) > 1:
print(*names, sep="\n")
I have a very long json file, that I need make sense of in order to query the correct data that is related to what I am interested in. In order to do this, I would like to extract all of the key values in order to know what is available to query. Is there an quick way of doing this, or should I just write a parser that traverses the json file and extracts anything in-between either { and : or , and : ?
given the example:
[{"Name": "key1", "Value": "value1"}, {"Name": "key2", "Value": "value2"}]
I am looking for the values:
"Name"
"Value"
That will depend on if there's any nesting. But the basic pattern is something like this:
import json
with open("foo.json", "r") as fh:
data = json.load(fh)
all_keys = set()
for datum in data:
keys = set(datum.keys())
all_keys.update(keys)
This:
dict = [{"Name": "key1", "Value": "value1"}, {"Name": "key2", "Value": "value2"}]
for val in dict:
print(val.keys())
gives you:
dict_keys(['Name', 'Value'])
dict_keys(['Name', 'Value'])
I have a json in the following format. My requirement is to change the data if the "id" field is same then rest of the field should be made into a list. I tried looping it and referring other sample code but I couldn't get the required result. If the "id" is same then I should combine the rest of the field's value into a list and keeping the key as same. I tired to add values to new dictionary based on 'id' field but result was either last value or some thing like this
[
{
"time":" all dates ",
"author_id":"alll ",
"id_number":"all id_number",
"id":"all idd"
}
]
Received JSON :
data = [
{
"time":"2015/03/27",
"author_id":"abc_123",
"id":"4585",
"id_number":123
},
{
"time":"2015/03/30",
"author_id":"abc_123",
"id":"7776",
"id_number":122
},
{
"time":"2015/03/22",
"author_id":"abc_123",
"id":"8449",
"id_number":111
},
{
"time":"2012/03/30",
"author_id":"def_456",
"id":"4585",
"id_number":90
}
]
Required Output:
new_data = [
{
"time":[
"2015/03/27",
"2012/03/30"
],
"author_id":[
"abc_123",
"def_456"
],
"id":"4585",
"id_number":[
123,
90
]
},
{
"time":"2015/03/30",
"author_id":"abc_123",
"id":"7776",
"id_number":122
},
{
"time":"2015/03/27 05:22:42",
"author_id":"abc_123",
"id":"8449",
"id_number":111
}
]
First step could be to create a more regular structure by mapping ids to dictionaries where all key are mapped to lists of the corresponding values and merge the original dictionaries with the same id value.
Then in a second step create the result list by taking the values of the id to merged dictionaries mapping and decide on the length of the values list to just copy the dictionary over or taking the only element out of the values while copying. And that's it.
#!/usr/bin/env python
# coding: utf8
from __future__ import absolute_import, division, print_function
from collections import defaultdict
from functools import partial
from pprint import pprint
def main():
records = [
{
'time': '2015/03/27',
'author_id': 'abc_123',
'id': '4585',
'id_number': 123
},
{
'time': '2015/03/30',
'author_id': 'abc_123',
'id': '7776',
'id_number': 122
},
{
'time': '2015/03/22',
'author_id': 'abc_123',
'id': '8449',
'id_number': 111
},
{
'time': '2012/03/30',
'author_id': 'def_456',
'id': '4585',
'id_number': 90
}
]
id2record = defaultdict(partial(defaultdict, list))
for record in records:
merged_record = id2record[record['id']]
for key, value in record.iteritems():
merged_record[key].append(value)
result = list()
for record in id2record.itervalues():
if len(record['id']) == 1:
result.append(dict((k, vs[0]) for k, vs in record.iteritems()))
else:
record['id'] = record['id'][0]
result.append(dict(record))
pprint(result)
if __name__ == '__main__':
main()
If you can change the requirements for the output I would suggest getting rid of the irregularity in the values. Code for processing the result has to deal with both cases — single values and list/array with values — which just makes it a little more complicated than it has to be.
Update: Fixed a problem in the code. The id value should always be a single value and never a list.