Faster way to convert Pandas dataframe into nested json - python

I have data that looks like this:
player, goals, matches
ronaldo, 10, 5
messi, 7, 9
I want to convert this dataframe into a nested json, such as this one:
{
"content":[
{
"player": "ronaldo",
"events": {
"goals": 10,
"matches": 5
}
},
{
"player": "messi",
"events": {
"goals": 7,
"matches": 9
}
}
]
}
This is my code, using list comprehension:
df = pd.DataFrame([['ronaldo', 10, 5], ['messi', 7, 9]], columns=['player', 'goals', 'matches'])
d = [{'events': df.loc[ix, ['goals', 'matches']].to_dict(), 'player': df.loc[ix, 'player']} for ix in range(df.shape[0])]
j = {}
j['content'] = d
This works, but the performance is really slow when I have a lot of data. Is there a faster way to do it?

Use pandas.to_json. Fast and easy https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html
df.T.to_json()

try :
df.to_json(orient = "records")
The problem is it doesn't stack goals and matches on the event column , I'm not sure though if you can do it without looping

Related

Group list of dicts by two params and count grouped values

I have list of dicts with id numbers, I need to group it by main_id and second_id and count values in each group. What is the best Python way to reach this?
I'm tried with Pandas, but don't get dict with groups and counts
df = pd.DataFrame(data_list)
df2 = df.groupby('main_id').apply(lambda x: x.set_index('main_id')['second_id']).to_dict()
print(df2)
List looks like:
[
{
"main_id":34,
"second_id":"2149"
},
{
"main_id":82,
"second_id":"174"
},
{
"main_id":24,
"second_id":"4QCp"
},
{
"main_id":34,
"second_id":"2149"
},
{
"main_id":29,
"second_id":"126905"
},
{
"main_id":34,
"second_id":"2764"
},
{
"main_id":43,
"second_id":"16110"
}
]
I need result like:
[
{
"main_id":43,
"second_id":"16110",
"count": 1
},
{
"main_id":34,
"second_id":"2149",
"count": 2
}
]
You could use collections (from the standard library) instead of pandas. I assigned the list of dicts to xs:
import collections
# create a list of tuples; each is (main_id, secondary_id)
ids = [ (x['main_id'], x['second_id']) for x in xs ]
# count occurrences of each tuple
result = collections.Counter(ids)
Finally, result is a dict, which can be readily converted to the final form (not shown).
Counter({(34, '2149'): 2,
(82, '174'): 1,
(24, '4QCp'): 1,
(29, '126905'): 1,
(34, '2764'): 1,
(43, '16110'): 1})

Fastest way to get values from dictionary to another dictionary in python without using if?

I need to find a way to get values from one dictionary to another, bases on key name match without using two loops \ if statement.
Main goal is to make it run more efficiently since it's a part of a larger code and run on multiple threads.
If you can keep the dictionary structure it would help
The second dict is initialized with values 0 in advanced
dict_1 = {
"school": {
"main": ["first_key"]
},
"specific": {
"students": [
{
"name": "sam",
"age": 13
},
{
"name": "dan",
"age": 9
},
{
"name": "david",
"age": 20
},
{
"name": "mike",
"age": 5
}
],
"total_students": 4
}
}
dict_2 = {'sam': 0, 'mike': 0, 'david': 0, 'dan': 0, 'total_students': 0}
for i in dict_1["specific"]['students']:
for x in dict_2:
if x == i["name"]:
dict_2[x] = i["age"]
dict_2["total_students"] = dict_1["specific"]["total_students"]
print(dict_2)
Is there more elegant way of doing it?
You don't need two loops at all! You don't even need to initialize dict_2 in advance. Simply loop over dict_1["specific"]["students"] and assign the ages to dict_2 without an if.
for student in dict_1["specific"]["students"]:
student_name = student["name"]
student_age = student["age"]
dict_2[student_name] = student_age
You could also write this as a comprehension:
dict_2 = {student["name"]: student["age"] for student in dict_1["specific"]["students"]}
Both these give the following dict_2:
{'sam': 13, 'dan': 9, 'david': 20, 'mike': 5}
Then you can set dict_2["total_students"] like you already do, but outside any loops.
dict_2["total_students"] = dict_1["specific"]["total_students"]
If you only want to assign ages for students already in dict_2, you need the if. However, you can still do this with a single loop instead of two. :
for student in dict_1["specific"]["students"]:
student_name = student["name"]
if student_name not in dict_2:
continue # skip the rest of this loop
student_age = student["age"]
dict_2[student_name] = student_age
Both these approaches use a single loop, so they're going to be faster than your two-loop approach.
for i in dict_1['specific']['students']:
dict_2[i['name']]=i['age']
dict_2['total_students']=dict_1['specific']['total_students']
Runs in O(N) time and O(N) space.
Yours currently runs in O(N^2) time and O(N) space

How to yield all possibilities using efficient nesting in Python?

Forex Triangular Arb problem:
I'm currently trying to solve an efficient way on how to yield all of the elements of a Dictionary.items(). Let's suppose the array has N length and I need to acquire a all possible combinations where [[A,B],[A,C],[C,B]...]
Currently, it is not efficient due to nesting
def Arb(tickers: dict) -> list:
for first_pair in tickers.items():
pair1: list = first_pair[0].split("/")
for second_pair in tickers.items():
pair2: list = second_pair[0].split("/")
if pair2[0] == pair1[0] and pair2[1] != pair1[1]:
for third_pair in tickers.items():
pair3: list = third_pair[0].split("/")
if pair3[0] == pair2[1] and pair3[1] == pair1[1]:
id1 = first_pair[1]["id"]
id2 = second_pair[1]["id"]
id3 = third_pair[1]["id"]
yield [pair1, id1, pair2, id2, pair3, id3]
What would be the efficient/pythonic way to return a List with all possible items?
This is an example
tickers = {"EZ/TC": {
"id": 1
},
"LM/TH": {
"id": 2
},
"CD/EH": {
"id": 3
},
"EH/TC": {
"id":4
},
"LM/TC": {
"id": 5
},
"CD/TC":{
"id": 6
},
"BT/TH": {
"id": 7,
},
"BT/TX": {
"id": 8,
},
"TX/TH":{
"id": 9
}
}
print(list(Arb(tickers)))
[(['CD', 'TC'], 6, ['CD', 'EH'], 3, ['EH', 'TC'], 4), (['BT', 'TH'], 7, ['BT', 'TX'], 8, ['TX', 'TH'], 9)]
The Output is a Single List comprised of "lists" of all possibilities.
You don't to iterate on items() as you don't use the values, just the keys. Then you want use itertools.permutations to get all the combinations in every order of each pair, then keep the ones that matches letters
def verify(v1, v2, v3):
return v1[0] == v2[0] and v1[1] == v3[1] and v2[1] == v3[0]
def arb(tickers) -> List:
c = permutations([x.split("/") for x in tickers], r=3)
return list(filter(lambda x: verify(*x), c))
Itertools.permutations and itertools.combinations could be helpful for this type of problem: https://docs.python.org/3/library/itertools.html
Here's a link with an example using itertools.permutations:
https://www.geeksforgeeks.org/python-itertools-permutations/

Finding key in a list of dictionaries and sorting the list [duplicate]

This question already has answers here:
How do I sort a list of dictionaries by a value of the dictionary?
(20 answers)
Closed 3 years ago.
Having a JSON-like structure:
{
"shopping_list": [
{
"level": 0,
"position": 0,
"item_name": "Gold Badge"
},
{
"level": 1,
"position": 10,
"item_name": "Silver Badge"
},
{
"level": 2,
"position": 20,
"item_name": "Bronze Badge"
}
]
}
I'm trying to sort the list by key.
However, when trying to get them by:
k = [c.keys()[0] for c in complete_list["shopping_list"]]
I get TypeError: 'dict_keys' object does not support indexing.
How to get keys?
How to sort the list by specified key?
Here it is (admitting the key you mention is 'level'):
k = sorted([c for c in complete_list["shopping_list"]], key=lambda x: x['level'])
print(k)
# >> [{'level': 0, 'position': 0, 'item_name': 'Gold Badge'}, {'level': 1, 'position': 10, 'item_name': 'Silver Badge'}, ...
Try this to get the keys in a list of lists :
d = {
"shopping_list": [
...
}
k = [list(c.keys()) for c in d["shopping_list"]]
print(k)
Output :
[['item_name', 'level', 'position'], ['item_name', 'level', 'position'], ['item_name', 'level', 'position']]
However even if you wanted to sort this list of lists based on some specific key value say value for "level" key or "position" key, the list of lists would remain same.

python nested dict in array group sum

I have this data structure in Python:
result = {
"data": [
{
"2015-08-27": {
"clicks": 10,
"views":20
}
},
{
"2015-08-28": {
"clicks": 6,
}
}
]
}
How can I add the elements of each dictionary? The output should be :
{
"clicks":16, # 10 + 6
"views":20
}
I am looking for a Pythonic solution for this. Any solutions using Counter are welcome but I am not able to implement it.
I have tried this but I get an error:
counters = []
for i in result:
for k,v in i.items():
counters.append(Counter(v))
sum(counters)
Your code was quite close to a workable solution, and we can make it work with a few important changes. The most important change is that we need to iterate over the "data" item in result.
from collections import Counter
result = {
"data": [
{
"2015-08-27": {
"clicks": 10,
"views":20
}
},
{
"2015-08-28": {
"clicks": 6,
}
}
]
}
counts = Counter()
for d in result['data']:
for k, v in d.items():
counts.update(v)
print(counts)
output
Counter({'views': 20, 'clicks': 16})
We can simplify that a little because we don't need the keys.
counts = Counter()
for d in result['data']:
for v in d.values():
counts.update(v)
The code you posted makes a list of Counters and then tries to sum them. I guess that's also a valid strategy, but unfortunately the sum built-in doesn't know how to add Counters together. But we can do it using functools.reduce.
from functools import reduce
counters = []
for d in result['data']:
for v in d.values():
counters.append(Counter(v))
print(reduce(Counter.__add__, counters))
However, I suspect that the first version will be faster, especially if there are lots of dicts to add together. Also, this version consumes more RAM, since it keeps a list of all the Counters.
Actually we can use sum to add the Counters together, we just have to give it an empty Counter as the start value.
print(sum(counters, Counter()))
We can combine this into a one-liner, eliminating the list by using a generator expression instead:
from collections import Counter
result = {
"data": [
{
"2015-08-27": {
"clicks": 10,
"views":20
}
},
{
"2015-08-28": {
"clicks": 6,
}
}
]
}
totals = sum((Counter(v) for i in result['data'] for v in i.values()), Counter())
print(totals)
output
Counter({'views': 20, 'clicks': 16})
This is not the best solution as I am sure that there are libraries that can get you there in a less verbose way but it is one you can easily read.
res = {}
for x in my_dict['data']:
for y in x:
for t in x[y]:
res.setdefault(t, 0)
res[t] += x[y][t]
print(res) # {'views': 20, 'clicks': 16}

Categories

Resources