How to extract two values from dict in python? - python

I'm using python3 and and i have data set. That contains the following data. I'm trying to get the desire value from this data list. I have tried many ways but unable to figure out how to do that.
slots_data = [
{
"id":551,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":552,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":525,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":524,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":553,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":550,
"user_id":2,
"time":"199322002",
"expire":"199322002"
}
]
# Desired output
# [
# {"user_id":1,"slots_ids":[551,552,553]}
# {"user_id":2,"slots_ids":[550]}
# {"user_id":3,"slots_ids":[524,525]}
# ]
I have tried in the following way and obviously this is not correct. I couldn't figure out the solution of this problem :
final_list = []
for item in slots_data:
obj = obj.dict()
obj = {
"user_id":item["user_id"],
"slot_ids":item["id"]
}
final_list.append(obj)
print(set(final_list))

The other answer added here has a nice solution, but here's one without using pandas:
users = {}
for item in slots_data:
# Check if we've seen this user before,
if item['user_id'] not in users:
# if not, create a new entry for them
users[item['user_id']] = {'user_id': item['user_id'], 'slot_ids': []}
# Add their slot ID to their dictionary
users[item['user_id']]['slot_ids'].append(item['id'])
# We only need the values (dicts)
output_list = list(users.values())

Lots of good answers here.
If I was doing this, I would base my answer on setdefault and/or collections.defaultdict that can be used in a similar way. I think the defaultdict version is very readable but if you are not already importing collections you can do without it.
Given your data:
slots_data = [
{
"id":551,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":552,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
#....
]
You can reshape it into your desired output via:
## -------------------
## get the value for the key user_id if it exists
## if it does not, set the value for that key to a default
## use the value to append the current id to the sub-list
## -------------------
reshaped = {}
for slot in slots_data:
user_id = slot["user_id"]
id = slot["id"]
reshaped.setdefault(user_id, []).append(id)
## -------------------
## -------------------
## take a second pass to finish the shaping in a sorted manner
## -------------------
reshaped = [
{
"user_id": user_id,
"slots_ids": sorted(reshaped[user_id])
}
for user_id
in sorted(reshaped)
]
## -------------------
print(reshaped)
That will give you:
[
{'user_id': 1, 'slots_ids': [551, 552, 553]},
{'user_id': 2, 'slots_ids': [550]},
{'user_id': 3, 'slots_ids': [524, 525]}
]

I would say try using pandas to group the user id's together and convert it back to a dictionary
pd.DataFrame(slots_data).groupby('user_id')['id'].agg(list).reset_index().to_dict('records')
[{'user_id': 1, 'id': [551, 552, 553]},
{'user_id': 2, 'id': [550]},
{'user_id': 3, 'id': [525, 524]}]

thriough just simple loop way
>>> result = {}
>>> for i in slots_data:
... if i['user_id'] not in result:
... result[i['user_id']] = []
... result[i['user_id']].append(i['id'])
...
>>> output = []
>>> for i in result:
... dict_obj = dict(user_id=i, slots_id=result[i])
... output.append(dict_obj)
...
>>> output
[{'user_id': 1, 'slots_id': [551, 552, 553]}, {'user_id': 3, 'slots_id': [525, 524]}, {'user_id': 2, 'slots_id': [550]}]

You can use the following to get it done. Purely Python. Without any dependencies.
slots_data = [
{
"id":551,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":552,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":525,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":524,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":553,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":550,
"user_id":2,
"time":"199322002",
"expire":"199322002"
}
]
user_wise_slots = {}
for slot_detail in slots_data:
if not slot_detail["user_id"] in user_wise_slots:
user_wise_slots[slot_detail["user_id"]] = {
"user_id": slot_detail["user_id"],
"slot_ids": []
}
user_wise_slots[slot_detail["user_id"]]["slot_ids"].append(slot_detail["id"])
print(user_wise_slots.values())

This can be made in a using listcomprehension:
final_list = [{"user_id": user_id, "id":sorted([slot["id"] for slot in slots_data if slot["user_id"] == user_id])} for user_id in sorted(set([slot["user_id"] for slot in slots_data]))]
A more verbose and better formatted version of the same code:
all_user_ids = [slot["user_id"] for slot in slots_data]
unique_user_ids = sorted(set(all_user_ids))
final_list = [
{
"user_id": user_id,
"id": sorted([slot["id"] for slot in slots_data if slot["user_id"] == user_id])
}
for user_id in unique_user_ids]
Explanation:
get all the user ids with list comprehension
get the unique user ids by creating a set
create the final list of dictionaries using list comprehension.
each field id is of itself a list with list comprehension. We get the id of the slot, and only add it to the list, if the user ids match

Using pandas you can easily achieve the result.
First install pandas if you don't have as follow
pip install pandas
import pandas as pd
df = pd.DataFrame(slots_data) #create dataframe
df1 = df.groupby("user_id")['id'].apply(list).reset_index(name="slots_ids") #groupby on user_id and combine elements of id in list and give the column name is slots_ids
final_slots_data = df1.to_dict('records') # convert dataframe into a list of dictionary
final_slots_data
Output:
[{'user_id': 1, 'slots_ids': [551, 552, 553]},
{'user_id': 2, 'slots_ids': [550]},
{'user_id': 3, 'slots_ids': [525, 524]}]

Related

Create a list of nested dictionaries from a single csv file in python

I have a csv file with the following structure:
team,tournament,player
Team 1,spring tournament,Rebbecca Cardone
Team 1,spring tournament,Salina Youngblood
Team 1,spring tournament,Catarina Corbell
Team 1,summer tournament,Cara Mejias
Team 1,summer tournament,Catarina Corbell
...
Team 10, spring tournament,Jessi Ravelo
I want to create a nested dictionary (team, tournament) with a list of player dictionary. The desired outcome would be something like:
{'data':
{Team 1:
{'spring tournament':
{'players': [
{name: Rebecca Cardone},
{name: Salina Youngblood},
{name: Catarina Corbell}]
},
{'summer tournament':
{'players': [
{name: Cara Mejias},
{name: Catarina Corbell}]
}
}
},
...
{Team 10:
{'spring tournament':
{'players': [
{name: Jessi Ravelo}]
}
}
}
}
I've been struggling to format it like this. I have been able to successfully nest the first level (team # --> tournament) but I cannot get the second level to nest. Currently, my code looks like this:
d = {}
header = True
with open("input.csv") as f:
for line in f.readlines():
if header:
header = False
continue
team, tournament, player = line.strip().split(",")
d_team = d.get(team,{})
d_tournament = d_team.get(tournament, {})
d_player = d_tournament.get('player',['name'])
d_player.append(player)
d_tournament['player'] = d_tournament
d_team[tournament] = d_tournament
d[team] = d_team
print(d)
What would be the next step in fixing my code so I can create the nested dictionary?
Some problems with your implementation:
You do d_player = d_tournament.get('player',['name']). But you actually want to get the key named players, and this should be a list of dictionaries. Each of these dictionaries must have the form {"name": "Player's Name"}. So you want
l_player = d_tournament.get('players',[]) (default to an empty list), and then do l_player.append({"name": player}) (I renamed it to l_player because it's a list, not a dict).
You do d_tournament['player'] = d_tournament. I suspect you meant d_tournament['player'] = d_player
Strip the whitespace off the elements in the rows. Do team, tournament, player = (word.strip() for word in line.split(","))
Your code works fine after you make these changes
I strongly suggest you use the csv.reader class to read your CSV file instead of manually splitting the line by commas.
Also, since python's containers (lists and dictionaries) hold references to their contents, you can just add the container once and then modify it using mydict["key"] = value or mylist.append(), and these changes will be reflected in parent containers too. Because of this behavior, you don't need to repeatedly assign these things in the loop like you do with d_team[tournament] = d_tournament
allteams = dict()
hasHeader = True
with open("input.csv") as f:
csvreader = csv.reader(f)
if hasHeader: next(csvreader) # Consume one line if a header exists
# Iterate over the rows, and unpack each row into three variables
for team_name, tournament_name, player_name in csvreader:
# If the team hasn't been processed yet, create a new dict for it
if team_name not in allteams:
allteams[team_name] = dict()
# Get the dict object that holds this team's information
team = allteams[team_name]
# If the tournament hasn't been processed already for this team, create a new dict for it in the team's dict
if tournament_name not in team:
team[tournament_name] = {"players": []}
# Get the tournament dict object
tournament = team[tournament_name]
# Add this player's information to the tournament dict's "player" list
tournament["players"].append({"name": player_name})
# Add all teams' data to the "data" key in our result dict
result = {"data": allteams}
print(result)
Which gives us what we want (prettified output):
{
'data': {
'Team 1': {
'spring tournament': {
'players': [
{ 'name': 'Rebbecca Cardone' },
{ 'name': 'Salina Youngblood' },
{ 'name': 'Catarina Corbell' }
]
},
'summer tournament': {
'players': [
{ 'name': 'Cara Mejias' },
{ 'name': 'Catarina Corbell' }
]
}
},
'Team 10': {
' spring tournament': {
'players': [
{ 'name': 'Jessi Ravelo' }
]
}
}
}
}
The example dictionary you describe is not possible (if you want multiple dictionaries under the key "Team 1", put them in a list), but this snippet:
if __name__ == '__main__':
your_dict = {}
with open("yourfile.csv") as file:
all_lines = file.readlines()
data_lines = all_lines[1:] # Skipping "team,tournament,player" line
for line in data_lines:
line = line.strip() # Remove \n
team, tournament_type, player_name = line.split(",")
team_dict = your_dict.get(team, {}) # e.g. "Team 1"
tournaments_of_team_dict = team_dict.get(tournament_type, {'players': []}) # e.g. "spring_tournament"
tournaments_of_team_dict["players"].append({'name': player_name})
team_dict[tournament_type] = tournaments_of_team_dict
your_dict[team] = team_dict
your_dict = {'data': your_dict}
For this example yourfile.csv:
team,tournament,player
Team 1,spring tournament,Rebbecca Cardone
Team 1,spring tournament,Salina Youngblood
Team 2,spring tournament,Catarina Corbell
Team 1,summer tournament,Cara Mejias
Team 2,summer tournament,Catarina Corbell
Gives the following:
{
"data": {
"Team 1": {
"spring tournament": {
"players": [
{
"name": "Rebbecca Cardone"
},
{
"name": "Salina Youngblood"
}
]
},
"summer tournament": {
"players": [
{
"name": "Cara Mejias"
}
]
}
},
"Team 2": {
"spring tournament": {
"players": [
{
"name": "Catarina Corbell"
}
]
},
"summer tournament": {
"players": [
{
"name": "Catarina Corbell"
}
]
}
}
}
}
Process finished with exit code 0
Maybe I overlook somethign but couldn't you use:
df.groupby(['team','tournament'])['player'].apply(list).reset_index().to_json(orient='records')
You might approach it this way:
from collections import defaultdict
import csv
from pprint import pprint
d = defaultdict(dict)
with open('f00.txt', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
d[ row['team'] ].setdefault(row['tournament'], []
).append(row['player'])
pprint(dict(d))
Prints:
{'Team 1': {'spring tournament': ['Rebbecca Cardone',
'Salina Youngblood',
'Catarina Corbell'],
'summer tournament': ['Cara Mejias', 'Catarina Corbell']},
'Team 10': {' spring tournament': ['Jessi Ravelo']}}

Join array of objects to string value in Python

I want to join array of objects to a string in python. Is there any way for me to do that?
url = "https://google.com",
search = "thai food",
results = [
{
"restaurant": "Siam Palace",
"rating": "4.5"
},
{
"restaurant": "Bangkok Palace",
"rating": "3.5"
}
]
I want to be able to join these all to form one value.
If I could make it look like:
data = { url = "https://google.com",
{
search = "thai food",
results = [
{
"restaurant": "Siam Palace",
"rating": "4.5"
},
{
"restaurant": "Bangkok Palace",
"rating": "3.5"
}
]
}
}
I am receiving these results from mongodb and want to join these 3 together.
Use the JSON module
data = {} # create empty dict
# set the fields
data['url'] = 'https://google.com'
data['search'] = 'thai food'
# set the results
data['results'] = results
# export as string
import json
print(json.dumps(data, indent=4)

how to extract specific data from json and put in to csv using python

I have a JSON which is in nested form. I would like to extract specific data from json and put into csv using pandas python.
data = {
"class":"hudson.model.Hudson",
"jobs":[
{
"_class":"hudson.model.FreeStyleProject",
"name":"git_checkout",
"url":"http://localhost:8080/job/git_checkout/",
"builds":[
{
"_class":"hudson.model.FreeStyleBuild",
"duration":1201,
"number":6,
"result":"FAILURE",
"url":"http://localhost:8080/job/git_checkout/6/"
}
]
},
{
"_class":"hudson.model.FreeStyleProject",
"name":"output",
"url":"http://localhost:8080/job/output/",
"builds":[
]
},
{
"_class":"org.jenkinsci.plugins.workflow.job.WorkflowJob",
"name":"pipeline_test",
"url":"http://localhost:8080/job/pipeline_test/",
"builds":[
{
"_class":"org.jenkinsci.plugins.workflow.job.WorkflowRun",
"duration":9274,
"number":85,
"result":"SUCCESS",
"url":"http://localhost:8080/job/pipeline_test/85/"
},
{
"_class":"org.jenkinsci.plugins.workflow.job.WorkflowRun",
"duration":4251,
"number":84,
"result":"SUCCESS",
"url":"http://localhost:8080/job/pipeline_test/84/"
}
]
}
]
}
From the above JSON i want to fetch jobs name value and builds result value . I am new to python any help will be appreciated .
Till now i have tried
main_data = data['jobs]
json_normalize(main_data,['builds'],
record_prefix='jobs_', errors='ignore')
which gives information only build key values and not the name of job .
Can anyone help ?
Expected Output:
Considering only first build result value you can need to be in csv column you can achieve this using pandas.
data = {
"class": "hudson.model.Hudson",
"jobs": [
{
"_class": "hudson.model.FreeStyleProject",
"name": "git_checkout",
"url": "http://localhost:8080/job/git_checkout/",
"builds": [
{
"_class": "hudson.model.FreeStyleBuild",
"duration": 1201,
"number": 6,
"result": "FAILURE",
"url": "http://localhost:8080/job/git_checkout/6/"
}
]
},
{
"_class": "hudson.model.FreeStyleProject",
"name": "output",
"url": "http://localhost:8080/job/output/",
"builds": []
},
{
"_class": "org.jenkinsci.plugins.workflow.job.WorkflowJob",
"name": "pipeline_test",
"url": "http://localhost:8080/job/pipeline_test/",
"builds": [
{
"_class": "org.jenkinsci.plugins.workflow.job.WorkflowRun",
"duration": 9274,
"number": 85,
"result": "SUCCESS",
"url": "http://localhost:8080/job/pipeline_test/85/"
},
{
"_class": "org.jenkinsci.plugins.workflow.job.WorkflowRun",
"duration": 4251,
"number": 84,
"result": "SUCCESS",
"url": "http://localhost:8080/job/pipeline_test/84/"
}
]
}
]
}
main_data = data.get('jobs')
res = {'name':[], 'result':[]}
for name_dict in main_data:
res['name'].append(name_dict.get('name','NA'))
resultval = name_dict['builds'][0].get('result') if len(name_dict['builds'])>0 else 'NA'
res['result'].append(resultval)
print(res)
import pandas as pd
df = pd.DataFrame(res)
df.to_csv("/home/file_timer/jobs.csv", index=False)
Check the csv file output
name,result
git_checkout,FAILURE
output,NA
pipeline_test,SUCCESS
If 'NA' result want to skip then
main_data = data.get('jobs')
res = {'name':[], 'result':[]}
for name_dict in main_data:
if len(name_dict['builds'])==0:
continue
res['name'].append(name_dict.get('name', 'NA'))
resultval = name_dict['builds'][0].get('result')
res['result'].append(resultval)
print(res)
import pandas as pd
df = pd.DataFrame(res)
df.to_csv("/home/akash.pagar/shell_learning/file_timer/jobs.csv", index=False)
Output will bw like
name,result
git_checkout,FAILURE
pipeline_test,SUCCESS
Simply with build number,
for job in data.get('jobs'):
for build in job.get('builds'):
print(job.get('name'), build.get('number'), build.get('result'))
gives the result
git_checkout 6 FAILURE
pipeline_test 85 SUCCESS
pipeline_test 84 SUCCESS
If you want to get the result of latest build, and pretty sure about the build number always in decending order,
for job in data.get('jobs'):
if job.get('builds'):
print(job.get('name'), job.get('builds')[0].get('result'))
and if you are not sure the order,
for job in data.get('jobs'):
if job.get('builds'):
print(job.get('name'), sorted(job.get('builds'), key=lambda k: k.get('number'))[-1].get('result'))
then the result will be:
git_checkout FAILURE
pipeline_test SUCCESS
Assuming last build is the last element of its list and you don't care about jobs with no builds, this does:
import pandas as pd
#data = ... #same format as in the question
z = [(job["name"], job["builds"][-1]["result"]) for job in data["jobs"] if len(job["builds"])]
df = pd.DataFrame(data=z, columns=["name", "result"])
#df.to_csv #TODO
Also we don't necessarily need pandas to create the csv file.
You could do:
import csv
#z = ... #see previous code block
with open("f.csv", 'w') as fp:
csv.writer(fp).writerows([("name", "result")] + z)

filter json file with python

How to filter a json file to show only the information I need?
To start off I want to say I'm fairly new to python and working with JSON so sorry if this question was asked before and I overlooked it.
I have a JSON file that looks like this:
[
{
"Store": 417,
"Item": 10,
"Name": "Burger",
"Modifiable": true,
"Price": 8.90,
"LastModified": "09/02/2019 21:30:00"
},
{
"Store": 417,
"Item": 15,
"Name": "Fries",
"Modifiable": false,
"Price": 2.60,
"LastModified": "10/02/2019 23:00:00"
}
]
I need to filter this file to only show Item and Price, like
[
{
"Item": 10,
"Price": 8.90
},
{
"Item": 15,
"Price": 2.60
}
]
I have a code that looks like this:
# Transform json input to python objects
with open("StorePriceList.json") as input_file:
input_dict = json.load(input_file)
# Filter python objects with list comprehensions
output_dict = [x for x in input_dict if ] #missing logical test here.
# Transform python object back into json
output_json = json.dumps(output_dict)
# Show json
print(output_json)
What logical test I should be doing here to do that?
Let's say we can use dict comprehension, then it will be
output_dict = [{k:v for k,v in x.items() if k in ["Item", "Price"]} for x in input_dict]
You can also do it like this :)
>>> [{key: d[key] for key in ['Item', 'Price']} for d in input_dict] # you should rename it to `input_list` rather than `input_dict` :)
[{'Item': 10, 'Price': 8.9}, {'Item': 15, 'Price': 2.6}]
import pprint
with open('data.json', 'r') as f:
qe = json.load(f)
list = []
for item in qe['<your data>']:
query = (f'{item["Item"]} {item["Price"]}')
print("query")

Defaultdict with Dict

I have a dictionary like this
a = [{'CohortList': [{'DriverValue': 0.08559936}, {'DriverValue': 0.08184596527051588}],
'_id': {'DriverName': 'Yield', 'MonthsOnBooks': 50, 'SegmentName': 'LTV110-Super Prime'}},
{'CohortList': [{'DriverValue': 2406.04329}, {'DriverValue': 2336.0058100690103}, ],
'_id': {'DriverName': 'ADB', 'MonthsOnBooks': 15, 'SegmentName': 'LTV110-Super Prime'}},
{'CohortList': [{'DriverValue': 2406.04329}, {'DriverValue': 2336.0058100690103}, ],
'_id': {'DriverName': 'ADB', 'MonthsOnBooks': 16, 'SegmentName': 'LTV110-Prime'}}]
I want to construct a list of dictionary with values as lists from the above dict set like this
{
"LTV110-Prime": [
{
"ADB": [
{
"16": 1500
}
]
},
{
"Yield": []
}
],
"LTV110-Super Prime": [
{
"ADB": [
{
"15": 1500
}
]
},
{
"Yield": [
{
"50": 0.09
}
]
}
]
}
Essentially, I want to group ADB and Yield for each segments with their values.
This is what I have done so far to achieve this target. The values for ADB are mean of DriverValue from CohortList list. I have used statistics.mean to find out the mean of the mapped values.
sg_wrap = defaultdict(dict)
for p in pp_data:
mapped = map(lambda d: d.get('DriverValue', 0), p['CohortList'])
dic = {p['_id']['MonthsOnBooks']: statistics.mean(mapped)}
print(p)
print(sg_wrap)
I am not able to append the Drivers to the inner dict. Please help.
Since you are wrapping everything into lists, you do not need a defaultdict(dcit) but a defaultdict(list).
The following seems to work:
result = defaultdict(list)
for entry in a:
id_ = entry["_id"]
name, months, segment = id_["DriverName"], id_["MonthsOnBooks"], id_["SegmentName"]
values = [x["DriverValue"] for x in entry["CohortList"]]
d = {name: [{months: statistics.mean(values)}]}
result[segment].append(d)
Result is
{'LTV110-Prime': [{'ADB': [{16: 2371.0245500345054}]}],
'LTV110-Super Prime': [{'Yield': [{50: 0.08372266263525793}]},
{'ADB': [{15: 2371.0245500345054}]}]}

Categories

Resources