How to turn a dataframe of categorical data into a dictionary

How to turn a dataframe of categorical data into a dictionary - python

I have a dataframe that I need to transform into JSON. I think it would be easier to first turn it into a dictionary, but I can't figure out how. I need to transform it into JSON so that I can visualize it with js.d3
Here is what the data looks like currently:
NAME, CATEGORY, TAG
Ex1, Education, Books
Ex2, Transportation, Bus
Ex3, Education, Schools
Ex4, Education, Books
Ex5, Markets, Stores
Here is what I want the data to look like:
Data = {
Education {
Books {
key: Ex1,
key: Ex2
}
Schools {
key: Ex3
}
}
Transportation {
Bus {
key: Ex2
}
}
Markets {
Stores {
key: Ex5
}
}
(I think my JSON isn't perfect here, but I just wanted to convey the general idea).

This code is thanks to Brent Washburne's very helpful answer above. I just needed to remove the tags column because for now it was too messy (many of the rows had more than one tag separated by commas). I also added a column (of integers) which I wanted connected to the names. Here it is:
import json, string
import pprint
def to_json(file):
data = {}
for line in open(file):
fields = map(string.strip, line.split(','))
categories = data.get(fields[1], [])
to_append = {}
to_append[fields[0]] = fields[3]
categories.append(to_append)
data[fields[1]] = categories
return json.dumps(data)
print to_json('data.csv')

You can't use 'key' as a key more than once, so the innermost group is a list:
import json, string
def to_json(file):
data = {}
for line in open(file):
fields = map(string.strip, line.split(','))
categories = data.get(fields[1], {})
tags = categories.get(fields[2], [])
tags.append(fields[0])
categories[fields[2]] = tags
data[fields[1]] = categories
return json.dumps(data)
print to_json('data.csv')
Result:
{"Markets": {"Stores": ["Ex5"]}, "Education": {"Schools": ["Ex3"], "Books": ["Ex1", "Ex4"]}, "Transportation": {"Bus": ["Ex2"]}}

Related

Build URL's by iterating through multiple dictionaries in Python

I want to build URL's responding to a specific format, the URL's are made of a concatenation of French regions and departments (subdivisions of regions) codes, such as this one : https://www.resultats-elections.interieur.gouv.fr/telechargements/PR2022/resultatsT1/027/021/
I have set two dictionaries, one for the regions and one for the departments codes associated.
regions = {
'centre_loire': '024',
'bourgogne': '027',
'normandie': '028',
'grand_est': '044',
'pays_loire': '052',
'bretagne': '053',
'aquitaine': '075',
}
departements = {
'centre_loire': ['018','028','036','037','041','045'],
'bourgogne': ['021','025','039','058','070','071','089','090'],
'normandie': ['014','027','050','061','076'],
'grand_est': ['008','010','051','052','054','055','057','067','068','088'],
'pays_loire': ['044','049','053','072','085'],
'bretagne': ['022','029','035','056'],
'aquitaine': ['016','017','019','023','024','033','040','047','064','079','086','087'],
}
The idea is to iterate through those two dictionaries to build URL's but the way I have arranged the for loop associates all the departments with all the regions, even those that have nothing to do together.
regs = []
urls_final = []
for i in regions.values():
regs = (url_base+tour+'/'+str(i)+'/')
for key, values in departements.items():
for value in values:
deps_result = (regs+str(value)+'/'+str(value)+'/')
urls_final.append(deps_result)
print(urls_final)
For example, for the "bourgogne" region, the URL's created should contain only the 8 departments codes corresponding to the "bourgogne" region.

Use the key from the regions dict to get the list values of the departements dict.
regions = {
'centre_loire': '024',
'bourgogne': '027',
'normandie': '028',
'grand_est': '044',
'pays_loire': '052',
'bretagne': '053',
'aquitaine': '075',
}
departements = {
'centre_loire': ['018','028','036','037','041','045'],
'bourgogne': ['021','025','039','058','070','071','089','090'],
'normandie': ['014','027','050','061','076'],
'grand_est': ['008','010','051','052','054','055','057','067','068','088'],
'pays_loire': ['044','049','053','072','085'],
'bretagne': ['022','029','035','056'],
'aquitaine': ['016','017','019','023','024','033','040','047','064','079','086','087'],
}
url_base = "https://whatever.com"
urls = []
for reg, reg_code in regions.items():
for dep_code in departements[reg]:
urls.append(f"{url_base}/{reg_code}/{dep_code}")
from pprint import pprint
pprint(urls)
output
['https://whatever.com/024/018',
'https://whatever.com/024/028',
'https://whatever.com/024/036',
'https://whatever.com/024/037',
'https://whatever.com/024/041',
'https://whatever.com/024/045',
'https://whatever.com/027/021',
'https://whatever.com/027/025',
'https://whatever.com/027/039',
'https://whatever.com/027/058',
'https://whatever.com/027/070',
'https://whatever.com/027/071',
'https://whatever.com/027/089',
'https://whatever.com/027/090',
'https://whatever.com/028/014',
'https://whatever.com/028/027',
'https://whatever.com/028/050',
'https://whatever.com/028/061',
'https://whatever.com/028/076',
'https://whatever.com/044/008',
'https://whatever.com/044/010',
'https://whatever.com/044/051',
'https://whatever.com/044/052',
'https://whatever.com/044/054',
'https://whatever.com/044/055',
'https://whatever.com/044/057',
'https://whatever.com/044/067',
'https://whatever.com/044/068',
'https://whatever.com/044/088',
'https://whatever.com/052/044',
'https://whatever.com/052/049',
'https://whatever.com/052/053',
'https://whatever.com/052/072',
'https://whatever.com/052/085',
'https://whatever.com/053/022',
'https://whatever.com/053/029',
'https://whatever.com/053/035',
'https://whatever.com/053/056',
'https://whatever.com/075/016',
'https://whatever.com/075/017',
'https://whatever.com/075/019',
'https://whatever.com/075/023',
'https://whatever.com/075/024',
'https://whatever.com/075/033',
'https://whatever.com/075/040',
'https://whatever.com/075/047',
'https://whatever.com/075/064',
'https://whatever.com/075/079',
'https://whatever.com/075/086',
'https://whatever.com/075/087']

base = "www.domain.com/{}/{}"
urls = [base.format(dept, subdept) for region, dept in regions.items() for subdept in departments[region]]
print(urls)
Output:
['www.domain.com/024/018', 'www.domain.com/024/028', 'www.domain.com/024/036', 'www.domain.com/024/037', 'www.domain.com/024/041', 'www.domain.com/024/045', 'www.domain.com/027/021', 'www.domain.com/027/025', 'www.domain.com/027/039', 'www.domain.com/027/058', 'www.domain.com/027/070', 'www.domain.com/027/071', 'www.domain.com/027/089', 'www.domain.com/027/090', 'www.domain.com/028/014', 'www.domain.com/028/027', 'www.domain.com/028/050', 'www.domain.com/028/061', 'www.domain.com/028/076', 'www.domain.com/044/008', 'www.domain.com/044/010', 'www.domain.com/044/051', 'www.domain.com/044/052', 'www.domain.com/044/054', 'www.domain.com/044/055', 'www.domain.com/044/057', 'www.domain.com/044/067', 'www.domain.com/044/068', 'www.domain.com/044/088', 'www.domain.com/052/044', 'www.domain.com/052/049', 'www.domain.com/052/053', 'www.domain.com/052/072', 'www.domain.com/052/085', 'www.domain.com/053/022', 'www.domain.com/053/029', 'www.domain.com/053/035', 'www.domain.com/053/056', 'www.domain.com/075/016', 'www.domain.com/075/017', 'www.domain.com/075/019', 'www.domain.com/075/023', 'www.domain.com/075/024', 'www.domain.com/075/033', 'www.domain.com/075/040', 'www.domain.com/075/047', 'www.domain.com/075/064', 'www.domain.com/075/079', 'www.domain.com/075/086', 'www.domain.com/075/087']
>>>
Note: I've renamed your departements dictionary to departments.

constructing a message format from the fetchall result in python

*New to Programming
Question: I need to use the below "Data" (two rows as arrays) queried from sql and use it to create the message structure below.
data from sql using fetchall()
Data = [[100,1,4,5],[101,1,4,6]]
##expected message structure
message = {
"name":"Tom",
"Job":"IT",
"info": [
{
"id_1":"100",
"id_2":"1",
"id_3":"4",
"id_4":"5"
},
{
"id_1":"101",
"id_2":"1",
"id_3":"4",
"id_4":"6"
},
]
}
I tried to create below method to iterate over the rows and then input the values, this is was just a starting, but this was also not working
def create_message(data)
for row in data:
{
"id_1":str(data[0][0],
"id_2":str(data[0][1],
"id_3":str(data[0][2],
"id_4":str(data[0][3],
}
Latest Code
def create_info(data):
info = []
for row in data:
temp_dict = {"id_1_tom":"","id_2_hell":"","id_3_trip":"","id_4_clap":""}
for i in range(0,1):
temp_dict["id_1_tom"] = str(row[i])
temp_dict["id_2_hell"] = str(row[i+1])
temp_dict["id_3_trip"] = str(row[i+2])
temp_dict["id_4_clap"] = str(row[i+3])
info.append(temp_dict)
return info

Edit: Updated answer based on updates to the question and comment by original poster.
This function might work for the example you've given to get the desired output, based on the attempt you've provided:
def create_info(data):
info = []
for row in data:
temp_dict = {}
temp_dict['id_1_tom'] = str(row[0])
temp_dict['id_2_hell'] = str(row[1])
temp_dict['id_3_trip'] = str(row[2])
temp_dict['id_4_clap'] = str(row[3])
info.append(temp_dict)
return info
For the input:
[[100, 1, 4, 5],[101,1,4,6]]
This function will return a list of dictionaries:
[{"id_1_tom":"100","id_2_hell":"1","id_3_trip":"4","id_4_clap":"5"},
{"id_1_tom":"101","id_2_hell":"1","id_3_trip":"4","id_4_clap":"6"}]
This can serve as the value for the key info in your dictionary message. Note that you would still have to construct the message dictionary.

customized OrderedDict format and transfer into dictionary

I have an issue about how to customize OrderedDict format and convert them into a json or dictionary format(but be able to reset the key names and the structure). I have the data below:
result= OrderedDict([('index', 'cfs_fsd_00001'),
('host', 'GIISSP707'),
('source', 'D:\\usrLLSS_SS'),
('_time', '2018-11-02 14:43:30.000 EDT'),
('count', '153')])
...However, I want to change the format like this:
{
"servarname": {
"index": "cfs_fsd_00001",
"host": "GIISSP707"
},
"times": '2018-11-02 14:43:30.000 EDT',
"metricTags": {
"source": 'D:\\ddevel.log'"
},
"metricName": "serverice count",
"metricValue": 153,
"metricType": "count"
}
I will be really appreciate your help. Basically the output I got is pretty flat. But I want to customize the structure. The original structure is
OrderedDict([('index', 'cfs_fsd_00001'),('host', 'GIISSP707').....]).
The output I want to achieve is {"servarname"{"index":"cfs_fsd_00001","host":"GIISSP707"},......

You can simply reference the result dict with the respective keys that you want your target data structure to have:
{
"servarname": {
"index": result['index'],
"host": result['host']
},
"times": result['_time'],
"metricTags": {
"source": result['source']
},
"metricName": "serverice count",
"metricValue": result['count'],
"metricType": "count"
}

No sure how flexible you need for your method. I assume you have a few common keys in your OrderedDict and you want to find the metric there, then reformat them into a new dict. Here is a short function which is implemented in python 3 and I hope it could help.
from collections import OrderedDict
import json
def reformat_ordered_dict(dict_result):
"""Reconstruct the OrderedDict result into specific format
This method assumes that your input OrderedDict has the following common keys: 'index',
'host', 'source', '_time', and a potential metric whcih is subject to change (of course
you can support more metrics with minor tweak of the code). The function also re-map the
keys (for example, mapping '_time' to 'times', pack 'index' and 'source' into 'servarname'
).
:param dict_result: the OrderedDict
:return: the reformated OrderedDict
"""
common_keys = ('index', 'host', 'source', '_time')
assert all(common_key in dict_result for common_key in common_keys), (
'You have to provide all the commen keys!')
# write common keys
reformated = OrderedDict()
reformated["servarname"] = OrderedDict([
("index", dict_result['index']),
("host", dict_result['host'])
])
reformated["times"] = dict_result['_time']
reformated["metricTags"] = {"source": dict_result['source']}
# write metric
metric = None
for key in dict_result.keys():
if key not in common_keys:
metric = key
break
assert metric is not None, 'Cannot find metric in the OrderedDict!'
# don't know where you get this value. But you can customize it if needed
# for exampe if the metric name is needed here
reformated['metricName'] = "serverice count"
reformated['metricValue'] = dict_result[metric]
reformated['metricType'] = metric
return reformated
if __name__ == '__main__':
result= OrderedDict([('index', 'cfs_fsd_00001'),
('host', 'GIISSP707'),
('source', 'D:\\usrLLSS_SS'),
('_time', '2018-11-02 14:43:30.000 EDT'),
('count', '153')])
reformated = reformat_ordered_dict(result)
print(json.dumps(reformated))

How to create a function that subtracts the current value from previous value every time it's called

I am querying a dataset that works like this: each time I query it produces a different value of Bytes_Written and Bytes_Read. What I cannot seem to accomplish is to subtract the current value from the previous value and keep doing this every second.
Here is what the data looks like:
{
"Number of Devices": 2,
"Block Devices": {
"bdev0": {
"Backend_Device_Path": "/dev/disk/by-path/ip-192.168.26.1:3260-iscsi-iqn.2010-10.org.openstack:volume-d1c8e7c6-8c77-444c-9a93-8b56fa1e37f2-lun-010.0.0.142",
"Capacity": "2147483648",
"Guest_Device_Name": "vdb",
"IO_Operations": "97069",
"Bytes_Written": "34410496",
"Bytes_Read": "363172864"
},
"bdev1": {
"Backend_Device_Path": "/dev/disk/by-path/ip-192.168.26.1:3260-iscsi-iqn.2010-10.org.openstack:volume-b27110f9-41ba-4bc6-b97c-b5dde23af1f9-lun-010.0.0.146",
"Capacity": "2147483648",
"Guest_Device_Name": "vdb",
"IO_Operations": "93",
"Bytes_Written": "0",
"Bytes_Read": "380928"
}
}
}
The code that queries the data:
def counterVolume_one():
#Get data
url = 'http://url:8080/vrio/blk'
r = requests.get(url)
data = r.json()
wanted = {'Bytes_Written', 'Bytes_Written', 'IO_Operation'}
for d in data['Block Devices'].itervalues():
values = {k: v for k, v in d.iteritems() if k in wanted}
print json.dumps(values)
counterVolume_one()
The way I want to get the output is:
{"IO_Operations": "97069", "Bytes_Read": "363172864", "Bytes_Written": "34410496"}
{"IO_Operations": "93", "Bytes_Read": "380928", "Bytes_Written": "0"}
Here is what I want to accomplish:
first time query = first set of values
after 1 sec
second time query = first set of values-second set of values
after 1 sec
third time query = second set of values-third set of values
Expected output would be a json object as below
{'bytes-read': newvalue, 'bytes-written': newvalue, 'io_operations': newvalue}

The simplest fix may be to modify the counterVolume_one() function so that it accepts parameters that define the current state, and that you'd update as you collect data. For example, the following code collects and sums the the fields your interested in from the JSON documents:
FIELDS = ('Bytes_Written', 'Bytes_Read', 'IO_Operation')
def counterVolume_one(state):
url = 'http://url:8080/vrio/blk'
r = requests.get(url)
data = r.json()
for field in FIELDS:
state[field] += data[field]
return state
state = {"Bytes_Written": 0, "Bytes_Read": 0, "IO_Operation": 0}
while True:
counterVolume_one(state)
time.sleep(1)
for field in FIELDS:
print("{field:s}: {count:d}".format(field=field,
count=state[field]))
A more correct fix might be to use a class to hold the state, and that has a method that updates the state. But, I think following the idea above will get you the solution quickest.

Python - Parsing JSON Data Set

I am trying to parse a JSON data set that looks something like this:
{"data":[
{
"Rest":0,
"Status":"The campaign is moved to the archive",
"IsActive":"No",
"StatusArchive":"Yes",
"Login":"some_login",
"ContextStrategyName":"Default",
"CampaignID":1111111,
"StatusShow":"No",
"StartDate":"2013-01-20",
"Sum":0,
"StatusModerate":"Yes",
"Clicks":0,
"Shows":0,
"ManagerName":"XYZ",
"StatusActivating":"Yes",
"StrategyName":"HighestPosition",
"SumAvailableForTransfer":0,
"AgencyName":null,
"Name":"Campaign_01"
},
{
"Rest":82.6200000000008,
"Status":"Impressions will begin tomorrow at 10:00",
"IsActive":"Yes",
"StatusArchive":"No",
"Login":"some_login",
"ContextStrategyName":"Default",
"CampaignID":2222222,
"StatusShow":"Yes",
"StartDate":"2013-01-28",
"Sum":15998,"StatusModerate":"Yes",
"Clicks":7571,
"Shows":5535646,
"ManagerName":"XYZ",
"StatusActivating":"Yes",
"StrategyName":"HighestPosition",
"SumAvailableForTransfer":0,
"AgencyName":null,
"Name":"Campaign_02"
}
]
}
Lets assume that there can be many of these data sets.
I would like to iterate through each one of them and grab the "Name" and the "Campaign ID" parameter.
So far my code looks something like this:
decoded_response = response.read().decode("UTF-8")
data = json.loads(decoded.response)
for item in data[0]:
for x in data[0][item] ...
-> need a get name procedure
-> need a get campaign_id procedure
Probably quite straight forward! I am not good with lists/dictionaries :(

Access dictionaries with d[dict_key] or d.get(dict_key, default) (to provide default value):
jsonResponse=json.loads(decoded_response)
jsonData = jsonResponse["data"]
for item in jsonData:
name = item.get("Name")
campaignID = item.get("CampaignID")
I suggest you read something about dictionaries.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to turn a dataframe of categorical data into a dictionary - python

Related

Build URL's by iterating through multiple dictionaries in Python

constructing a message format from the fetchall result in python

customized OrderedDict format and transfer into dictionary

How to create a function that subtracts the current value from previous value every time it's called

Python - Parsing JSON Data Set

Categories

Resources