Issue with updating a dictionary in for loop (not unique) - python

I have an issue with populating a dictionary from the database. It overwrites or merges values that from my logic should be unique to the given country.
country_list=[]
for countries in item_obj.values('ship_country'):
if countries['ship_country'] not in country_list:
country_list.append(countries['ship_country'])
country_dict = {}
for country in country_list:
country_dict.update({country: []})
cat_dict = { i : {} for i in ProductCategories.objects.filter(user=request.user) }
for country in country_list:
country_dict[country].append(cat_dict)
# outputs: country_dict = {'Netherlands': [{'Games': {}, 'Bikes': {}, 'Paint': {}}], 'Belgium': [{'Games': {}, 'Bikes': {}, 'Paint': {}}]}
for country, value in country_dict.items():
for key in value:
for category in key:
for channel in channel_list:
if channel in channels:
aggr = item_obj.filter(
country=country,
category=category,
channel=channel).aggregate(Sum('price'))
if aggr['price__sum'] != None:
key[category] = {channel: round(aggr['price__sum'], 2)}
which gives me if i print out line for line, exactly what i want see:
Netherlands
Games
{'price__sum': None}
{}
Bikes
{'Channel #3': Decimal('30.10')}
Paint
{'Channel #1': Decimal('56.70')}
Belgium
Games
{'price__sum': None}
{}
Bikes
{'Channel #1': Decimal('39.70')}
Paint
{'Channel #2': Decimal('13.90')}
But when I print the final dictionary it overwrites the unique Netherlands values...
{
"Netherlands":[
{
"Games":{
},
"Bikes":{
"channel #1": "Decimal(""39.70"")"
},
"Sprays":{
"channel #2": "Decimal(""13.90"")"
}
}
],
"Belgium":[
{
"Games":{
},
"Bikes":{
"channel #1": "Decimal(""39.70"")"
},
"Paint":{
"channel #2": "Decimal(""13.90"")"
}
}
]
}
if I use update() instead of =
if aggr['price__sum'] != None:
key[category].update({channel: round(aggr['price__sum'], 2)})
it merges the two together like this:
{
"Netherlands":[
{
"Games":{
},
"Bikes":{
"Channel #3":"Decimal(""30.10"")",
"channel #1":"Decimal(""39.70"")"
},
"Paint":{
"Channel #1":"Decimal(""56.70"")",
"channel #2":"Decimal(""13.90"")"
}
}
],
"Belgium":[
{
"Games":{
},
"Bikes":{
"Channel #3":"Decimal(""30.10"")",
"channel #1":"Decimal(""39.70"")"
},
"Paint":{
"Channel #1":"Decimal(""56.70"")",
"channel #2":"Decimal(""13.90"")"
}
}
]
}
It seems that update or = ignores the fact that we are in a different country and I tried a lot and am stuck, I seem to miss something. Can anyone help me? :-)

Related

How to flatten dict in a DataFrame & concatenate all resultant rows

I am using Github's GraphQL API to fetch some issue details.
I used Python Requests to fetch the data locally.
This is how the output.json looks like
{
"data": {
"viewer": {
"login": "some_user"
},
"repository": {
"issues": {
"edges": [
{
"node": {
"id": "I_kwDOHQ63-s5auKbD",
"title": "test issue 1",
"number": 146,
"createdAt": "2023-01-06T06:39:54Z",
"closedAt": null,
"state": "OPEN",
"updatedAt": "2023-01-06T06:42:00Z",
"comments": {
"edges": [
{
"node": {
"id": "IC_kwDOHQ63-s5R2XCV",
"body": "comment 01"
}
},
{
"node": {
"id": "IC_kwDOHQ63-s5R2XC9",
"body": "comment 02"
}
}
]
},
"labels": {
"edges": []
}
},
"cursor": "Y3Vyc29yOnYyOpHOWrimww=="
},
{
"node": {
"id": "I_kwDOHQ63-s5auKm8",
"title": "test issue 2",
"number": 147,
"createdAt": "2023-01-06T06:40:34Z",
"closedAt": null,
"state": "OPEN",
"updatedAt": "2023-01-06T06:40:34Z",
"comments": {
"edges": []
},
"labels": {
"edges": [
{
"node": {
"name": "food"
}
},
{
"node": {
"name": "healthy"
}
}
]
}
},
"cursor": "Y3Vyc29yOnYyOpHOWripvA=="
}
]
}
}
}
}
The json was put inside a list using
result = response.json()["data"]["repository"]["issues"]["edges"]
And then this list was put inside a DataFrame
import pandas as pd
df = pd.DataFrame (result, columns = ['node', 'cursor'])
df
These are the contents of the data frame
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
{'edges': [{'node': {'id': 'IC_kwDOHQ63-s5R2XCV","body": "comment 01"}},{'node': {'id': 'IC_kwDOHQ63-s5R2XC9","body": "comment 02"}}]}
{'edges': []}
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
{'edges': []}
{'edges': [{'node': {'name': 'food"}},{'node': {'name': 'healthy"}}]}
I would like to split/explode the comments and labels columns.
The values in these columns are nested dictionaries
I would like there to be as many rows for a single issue, as there are comments & labels.
I would like to flatten out the data frame.
So this involves split/explode and concat.
There are several stackoverflow answers that delve on this topic. And I have tried the code from several of them.
I can not paste the links to those questions, because stackoverflow marks my question as spam due to many links.
But these are the steps I have tried
df3 = df2['comments'].apply(pd.Series)
Drill down further
df4 = df3['edges'].apply(pd.Series)
df4
Drill down further
df5 = df4['node'].apply(pd.Series)
df5
The last statement above gives me the KeyError: 'node'
I understand, this is because node is not a key in the DataFrame.
But how else can i split this dictionary and concatenate all columns back to my issues row?
This is how I would like the output to look like
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
comment 01
Null
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
comment 02
Null
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
Null
food
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
Null
healthy
If dct is your dictionary from the question you can try:
df = pd.DataFrame(d['node'] for d in dct['data']['repository']['issues']['edges'])
df['comments'] = df['comments'].str['edges']
df = df.explode('comments')
df['comments'] = df['comments'].str['node'].str['body']
df['labels'] = df['labels'].str['edges']
df = df.explode('labels')
df['labels'] = df['labels'].str['node'].str['name']
print(df.to_markdown(index=False))
Prints:
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
OPEN
2023-01-06T06:42:00Z
comment 01
nan
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
OPEN
2023-01-06T06:42:00Z
comment 02
nan
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
OPEN
2023-01-06T06:40:34Z
nan
food
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
OPEN
2023-01-06T06:40:34Z
nan
healthy
#andrej-kesely has answered my question.
I have selected his response as the answer for this question.
I am now posting a consolidated script that includes my poor code and andrej's great code.
In this script i want to fetch details from Github's GraphQL API Server.
And put it inside pandas.
Primary source for this script is this gist.
And a major chunk of remaining code is an answer by #andrej-kesely.
Now onto the consolidated script.
First import the necessary packages and set headers
import requests
import json
import pandas as pd
headers = {"Authorization": "token <your_github_personal_access_token>"}
Now define the query that will fetch data from github.
In my particular case, I am fetching issue details form a particular repo
it can be something else for you.
query = """
{
viewer {
login
}
repository(name: "your_github_repo", owner: "your_github_user_name") {
issues(states: OPEN, last: 2) {
edges {
node {
id
title
number
createdAt
closedAt
state
updatedAt
comments(first: 10) {
edges {
node {
id
body
}
}
}
labels(orderBy: {field: NAME, direction: ASC}, first: 10) {
edges {
node {
name
}
}
}
comments(first: 10) {
edges {
node {
id
body
}
}
}
}
cursor
}
}
}
}
"""
Execute the query and save the response
def run_query(query):
request = requests.post('https://api.github.com/graphql', json={'query': query}, headers=headers)
if request.status_code == 200:
return request.json()
else:
raise Exception("Query failed to run by returning code of {}. {}".format(request.status_code, query))
result = run_query(query)
And now is the trickiest part.
In my query response, there are several nested dictionaries.
I would like to split them - more details in my question above.
This magic code from #andrej-kesely does that for you.
df = pd.DataFrame(d['node'] for d in result['data']['repository']['issues']['edges'])
df['comments'] = df['comments'].str['edges']
df = df.explode('comments')
df['comments'] = df['comments'].str['node'].str['body']
df['labels'] = df['labels'].str['edges']
df = df.explode('labels')
df['labels'] = df['labels'].str['node'].str['name']
print(df)

Printing pair of a dict

Im new in python but always trying to learn.
Today I got this error while trying select a key from dictionary:
print(data['town'])
KeyError: 'town'
My code:
import requests
defworld = "Pacera"
defcity = 'Svargrond'
requisicao = requests.get(f"https://api.tibiadata.com/v2/houses/{defworld}/{defcity}.json")
data = requisicao.json()
print(data['town'])
The json/dict looks this:
{
"houses": {
"town": "Venore",
"world": "Antica",
"type": "houses",
"houses": [
{
"houseid": 35006,
"name": "Dagger Alley 1",
"size": 57,
"rent": 2665,
"status": "rented"
}, {
"houseid": 35009,
"name": "Dream Street 1 (Shop)",
"size": 94,
"rent": 4330,
"status": "rented"
},
...
]
},
"information": {
"api_version": 2,
"execution_time": 0.0011,
"last_updated": "2017-12-15 08:00:00",
"timestamp": "2017-12-15 08:00:02"
}
}
The question is, how to print the pairs?
Thanks
You have to access the town object by accessing the houses field first, since there is nesting.
You want print(data['houses']['town']).
To avoid your first error, do
print(data["houses"]["town"])
(since it's {"houses": {"town": ...}}, not {"town": ...}).
To e.g. print all of the names of the houses, do
for house in data["houses"]["houses"]:
print(house["name"])
As answered, you must do data['houses']['town']. A better approach so that you don't raise an error, you can do:
houses = data.get('houses', None)
if houses is not None:
print(houses.get('town', None))
.get is a method in a dict that takes two parameters, the first one is the key, and the second parameter is ghe default value to return if the key isn't found.
So if you do in your example data.get('town', None), this will return None because town isn't found as a key in data.

Create a list of nested dictionaries from a single csv file in python

I have a csv file with the following structure:
team,tournament,player
Team 1,spring tournament,Rebbecca Cardone
Team 1,spring tournament,Salina Youngblood
Team 1,spring tournament,Catarina Corbell
Team 1,summer tournament,Cara Mejias
Team 1,summer tournament,Catarina Corbell
...
Team 10, spring tournament,Jessi Ravelo
I want to create a nested dictionary (team, tournament) with a list of player dictionary. The desired outcome would be something like:
{'data':
{Team 1:
{'spring tournament':
{'players': [
{name: Rebecca Cardone},
{name: Salina Youngblood},
{name: Catarina Corbell}]
},
{'summer tournament':
{'players': [
{name: Cara Mejias},
{name: Catarina Corbell}]
}
}
},
...
{Team 10:
{'spring tournament':
{'players': [
{name: Jessi Ravelo}]
}
}
}
}
I've been struggling to format it like this. I have been able to successfully nest the first level (team # --> tournament) but I cannot get the second level to nest. Currently, my code looks like this:
d = {}
header = True
with open("input.csv") as f:
for line in f.readlines():
if header:
header = False
continue
team, tournament, player = line.strip().split(",")
d_team = d.get(team,{})
d_tournament = d_team.get(tournament, {})
d_player = d_tournament.get('player',['name'])
d_player.append(player)
d_tournament['player'] = d_tournament
d_team[tournament] = d_tournament
d[team] = d_team
print(d)
What would be the next step in fixing my code so I can create the nested dictionary?
Some problems with your implementation:
You do d_player = d_tournament.get('player',['name']). But you actually want to get the key named players, and this should be a list of dictionaries. Each of these dictionaries must have the form {"name": "Player's Name"}. So you want
l_player = d_tournament.get('players',[]) (default to an empty list), and then do l_player.append({"name": player}) (I renamed it to l_player because it's a list, not a dict).
You do d_tournament['player'] = d_tournament. I suspect you meant d_tournament['player'] = d_player
Strip the whitespace off the elements in the rows. Do team, tournament, player = (word.strip() for word in line.split(","))
Your code works fine after you make these changes
I strongly suggest you use the csv.reader class to read your CSV file instead of manually splitting the line by commas.
Also, since python's containers (lists and dictionaries) hold references to their contents, you can just add the container once and then modify it using mydict["key"] = value or mylist.append(), and these changes will be reflected in parent containers too. Because of this behavior, you don't need to repeatedly assign these things in the loop like you do with d_team[tournament] = d_tournament
allteams = dict()
hasHeader = True
with open("input.csv") as f:
csvreader = csv.reader(f)
if hasHeader: next(csvreader) # Consume one line if a header exists
# Iterate over the rows, and unpack each row into three variables
for team_name, tournament_name, player_name in csvreader:
# If the team hasn't been processed yet, create a new dict for it
if team_name not in allteams:
allteams[team_name] = dict()
# Get the dict object that holds this team's information
team = allteams[team_name]
# If the tournament hasn't been processed already for this team, create a new dict for it in the team's dict
if tournament_name not in team:
team[tournament_name] = {"players": []}
# Get the tournament dict object
tournament = team[tournament_name]
# Add this player's information to the tournament dict's "player" list
tournament["players"].append({"name": player_name})
# Add all teams' data to the "data" key in our result dict
result = {"data": allteams}
print(result)
Which gives us what we want (prettified output):
{
'data': {
'Team 1': {
'spring tournament': {
'players': [
{ 'name': 'Rebbecca Cardone' },
{ 'name': 'Salina Youngblood' },
{ 'name': 'Catarina Corbell' }
]
},
'summer tournament': {
'players': [
{ 'name': 'Cara Mejias' },
{ 'name': 'Catarina Corbell' }
]
}
},
'Team 10': {
' spring tournament': {
'players': [
{ 'name': 'Jessi Ravelo' }
]
}
}
}
}
The example dictionary you describe is not possible (if you want multiple dictionaries under the key "Team 1", put them in a list), but this snippet:
if __name__ == '__main__':
your_dict = {}
with open("yourfile.csv") as file:
all_lines = file.readlines()
data_lines = all_lines[1:] # Skipping "team,tournament,player" line
for line in data_lines:
line = line.strip() # Remove \n
team, tournament_type, player_name = line.split(",")
team_dict = your_dict.get(team, {}) # e.g. "Team 1"
tournaments_of_team_dict = team_dict.get(tournament_type, {'players': []}) # e.g. "spring_tournament"
tournaments_of_team_dict["players"].append({'name': player_name})
team_dict[tournament_type] = tournaments_of_team_dict
your_dict[team] = team_dict
your_dict = {'data': your_dict}
For this example yourfile.csv:
team,tournament,player
Team 1,spring tournament,Rebbecca Cardone
Team 1,spring tournament,Salina Youngblood
Team 2,spring tournament,Catarina Corbell
Team 1,summer tournament,Cara Mejias
Team 2,summer tournament,Catarina Corbell
Gives the following:
{
"data": {
"Team 1": {
"spring tournament": {
"players": [
{
"name": "Rebbecca Cardone"
},
{
"name": "Salina Youngblood"
}
]
},
"summer tournament": {
"players": [
{
"name": "Cara Mejias"
}
]
}
},
"Team 2": {
"spring tournament": {
"players": [
{
"name": "Catarina Corbell"
}
]
},
"summer tournament": {
"players": [
{
"name": "Catarina Corbell"
}
]
}
}
}
}
Process finished with exit code 0
Maybe I overlook somethign but couldn't you use:
df.groupby(['team','tournament'])['player'].apply(list).reset_index().to_json(orient='records')
You might approach it this way:
from collections import defaultdict
import csv
from pprint import pprint
d = defaultdict(dict)
with open('f00.txt', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
d[ row['team'] ].setdefault(row['tournament'], []
).append(row['player'])
pprint(dict(d))
Prints:
{'Team 1': {'spring tournament': ['Rebbecca Cardone',
'Salina Youngblood',
'Catarina Corbell'],
'summer tournament': ['Cara Mejias', 'Catarina Corbell']},
'Team 10': {' spring tournament': ['Jessi Ravelo']}}

How to loop and fill nested dict/json in python using Requests' specific response/output

I'm crawling a site and getting data from some ajax dropdowns, and the data is related.
So basically, for simplicity, let's say I crawl the 1st dropdown and it gives me, name and value, and I use the values to run a loop and for the next dropdown and get its name, value, etc. Let's say the data is for countries, then regions, then districts, etc.
So I can get the name, values; now I want to join each country to be filled with its related regions, and regions be filled with its related districts.
Sample Code:
import requests
from bs4 import BeautifulSoup
URL = "https://somesite.com/"
COUNTRIES = {
"NAME": 1,
"ANOTHER": 2
}
REGIONS = {}
DISTRICTS = {}
def fetch(s, url, value, store):
data = {
'id': str(value)
}
res = s.post(url, data=data)
soup = BeautifulSoup(res.content, 'html5lib')
options = soup.find_all('option')[1:]
for option in options:
name = option.text
value = option.get('value')
#value = option.attrs['value']
store[name] = value
for name, val in COUNTRIES.items():
fetch(requests, URL+"getregions", val, REGION)
for name, val in REGIONS.items():
fetch(requests, URL+"getdistricts", val, DISTRICTS)
I want to combine all this in the end to have one nested json/dict of the form:
DATA = {
"COUNTRY1": {
"REGION1": {
"DISTRICT1": { "WARDS": ..... },
"DISTRICT2": { "WARDS": ..... },
},
"REGION2": {
"DISTRICT1": { "WARDS": ..... },
"DISTRICT2": { "WARDS": ..... },
},
},
"COUNTRY2": {
"REGION1": {
"DISTRICT1": { "WARDS": ..... },
"DISTRICT2": { "WARDS": ..... },
},
"REGION2": {
"DISTRICT1": { "WARDS": ..... },
"DISTRICT2": { "WARDS": ..... },
},
},
}
If possible also in this form:
[{
country: "NAME",
region: "RNAME",
district: "DNAME",
ward: "WNAME"
},
{
country: "NAME",
region: "RNAME",
district: "DNAME",
ward: "WNAME"
},
For both SQL and NoSQL.
I've thought of closures and such but I just can't seem to find the logic to implement it.
Anyone who can help will be really thankful, preferred answer in Python, please.
I'm new to asking questions here and it took me a while to compose this question, I apologize if it's not concise and please ask if you haven't understood so I can explain more.

How do i check for duplicate entries before i add an entry the dictionary

Given i have the following dictionary which stores key(entry_id),value(entry_body,entry_title) pairs.
"entries": {
"1": {
"body": "ooo",
"title": "jack"
},
"2": {
"body": "ooo",
"title": "john"
}
}
How do i check whether the title of an entry that i want to add to the dictionary already exists.
For example: This is the new entry that i want to add.
{
"body": "nnnn",
"title": "jack"
}
Have you thought about changing your data structure? Without context, the IDs of the entries seem a little useless. Your question suggests you only want to store unique titles, so why not make them your keys?
Example:
"entries": {
"jack": "ooo",
"john": "ooo"
}
That way you can do an efficient if newname in entries membership test.
EDIT:
Based on your comment you can still preserve the IDs by extending the data structure:
"entries": {
"jack": {
"body": "ooo",
"id": 1
},
"john": {
"body": "ooo",
"id": 2
}
}
I agree with #Christian König's answer, your data structure seems like it could be made clearer and more efficient. Still, if you need a solution to this setup in particular, here's one that will work - and it automatically adds new integer keys to the entries dict.
I've added an extra case to show both a rejection and an accepted update.
def existing_entry(e, d):
return [True for entry in d["entries"].values() if entry["title"] == e["title"]]
def update_entries(e, entries):
if not any(existing_entry(e, entries)):
current_keys = [int(x) for x in list(entries["entries"].keys())]
next_key = str(max(current_keys) + 1)
entries["entries"][next_key] = e
print("Updated:", entries)
else:
print("Existing entry found.")
update_entries(new_entry_1, data)
update_entries(new_entry_2, data)
Output:
Existing entry found.
Updated:
{'entries':
{'1': {'body': 'ooo', 'title': 'jack'},
'2': {'body': 'ooo', 'title': 'john'},
'3': {'body': 'qqqq', 'title': 'jill'}
}
}
Data:
data = {"entries": {"1": {"body": "ooo", "title": "jack"},"2": {"body": "ooo","title": "john"}}}
new_entry_1 = {"body": "nnnn", "title": "jack"}
new_entry_2 = {"body": "qqqq", "title": "jill"}
This should work?
entry_dict = {
"1": {"body": "ooo", "title": "jack"},
"2": {"body": "ooo", "title": "john"}
}
def does_title_exist(title):
for entry_id, sub_dict in entry_dict.items():
if sub_dict["title"] == title:
print("Title %s already in dictionary at entry %s" %(title, entry_id))
return True
return False
print("Does the title exist? %s" % does_title_exist("jack"))
As Christian Suggests above this seems like an inefficient data structure for the job. It seems like if you just need index ID's a list may be better.
I think to achieve this, one will have to traverse the dictionary.
'john' in [the_dict[en]['title'] for en in the_dict]

Categories

Resources