Parse multiple nested file in a pandas dataframe

Parse multiple nested file in a pandas dataframe - python

I am having this file, this file is a sample result form a elastic search query.
[{'key': 'hkdshkdsd',
'doc_count': 1851,
'aggs_fs': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 697,
'buckets': [{'key': 'jdsjodsjod',
'doc_count': 113,
'agg_date': {'buckets': [{'key_as_string': '2020-09-07T14:00:00.000Z',
'key': 1599487200000,
'doc_count': 20,
'agg_ave': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T15:00:00.000Z',
'key': 1599490800000,
'doc_count': 19,
'agg_ave': {'value': 40.22999954223633},
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T16:00:00.000Z',
'key': 1599494400000,
'doc_count': 27,
'agg_ave': {'value': 40.22999954223633},
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T17:00:00.000Z',
'key': 1599498000000,
'doc_count': 20,
'agg_ave': {'value': 40.22999954223633},
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T18:00:00.000Z',
'key': 1599501600000,
'doc_count': 23,
'agg_ave': {'value': 40.22999954223633},
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T19:00:00.000Z',
'key': 1599505200000,
'doc_count': 4,
'agg_ave': {'value': 40.22999954223633},
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T20:00:00.000Z',
'key': 1599508800000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T21:00:00.000Z',
'key': 1599512400000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T22:00:00.000Z',
'key': 1599516000000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T23:00:00.000Z',
'key': 1599519600000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-08T00:00:00.000Z',
'key': 1599523200000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-08T01:00:00.000Z',
'key': 1599526800000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-08T02:00:00.000Z',
'key': 1599530400000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}}]}}]}}]
I need to convert this file in a pandas DataFrame, I tried to use json_normalize, but seems it is just normalizing the first key and when the first key is normalized and want to keep normalizing what happens is that return me an error.
Does somebody can help me?
Thanks

I use this code:
dfs = []
for i in your_list:
df = pd.DataFrame.from_dict(i, orient='index')
# Use pd.DataFrame.from_dict(i, orient='index').T may be
dfs.append(dfs)
full_df = pd.concat(dfs)
It just made life easier than json_normalize

Related

having trouble passing multiple dictionaries for the argument record_path in json_normalize

I'm having troubles completely unnesting this json from an Api.
[{'id': 1,
'name': 'Buzz',
'tagline': 'A Real Bitter Experience.',
'first_brewed': '09/2007',
'description': 'A light, crisp and bitter IPA brewed with English and American hops. A small batch brewed only once.',
'image_url': 'https://images.punkapi.com/v2/keg.png',
'abv': 4.5,
'ibu': 60,
'target_fg': 1010,
'target_og': 1044,
'ebc': 20,
'srm': 10,
'ph': 4.4,
'attenuation_level': 75,
'volume': {'value': 20, 'unit': 'litres'},
'boil_volume': {'value': 25, 'unit': 'litres'},
'method': {'mash_temp': [{'temp': {'value': 64, 'unit': 'celsius'},
'duration': 75}],
'fermentation': {'temp': {'value': 19, 'unit': 'celsius'}},
'twist': None},
'ingredients': {'malt': [{'name': 'Maris Otter Extra Pale',
'amount': {'value': 3.3, 'unit': 'kilograms'}},
{'name': 'Caramalt', 'amount': {'value': 0.2, 'unit': 'kilograms'}},
{'name': 'Munich', 'amount': {'value': 0.4, 'unit': 'kilograms'}}],
'hops': [{'name': 'Fuggles',
'amount': {'value': 25, 'unit': 'grams'},
'add': 'start',
'attribute': 'bitter'},
{'name': 'First Gold',
'amount': {'value': 25, 'unit': 'grams'},
'add': 'start',
'attribute': 'bitter'},
{'name': 'Fuggles',
'amount': {'value': 37.5, 'unit': 'grams'},
'add': 'middle',
'attribute': 'flavour'},
{'name': 'First Gold',
'amount': {'value': 37.5, 'unit': 'grams'},
'add': 'middle',
'attribute': 'flavour'},
{'name': 'Cascade',
'amount': {'value': 37.5, 'unit': 'grams'},
'add': 'end',
'attribute': 'flavour'}],
'yeast': 'Wyeast 1056 - American Ale™'},
'food_pairing': ['Spicy chicken tikka masala',
'Grilled chicken quesadilla',
'Caramel toffee cake'],
'brewers_tips': 'The earthy and floral aromas from the hops can be overpowering. Drop a little Cascade in at the end of the boil to lift the profile with a bit of citrus.',
'contributed_by': 'Sam Mason <samjbmason>'},
{'id': 2,
'name': 'Trashy Blonde',
'tagline': "You Know You Shouldn't",
'first_brewed': '04/2008',
'description': 'A titillating, neurotic, peroxide punk of a Pale Ale. Combining attitude, style, substance, and a little bit of low self esteem for good measure; what would your mother say? The seductive lure of the sassy passion fruit hop proves too much to resist. All that is even before we get onto the fact that there are no additives, preservatives, pasteurization or strings attached. All wrapped up with the customary BrewDog bite and imaginative twist.',
'image_url': 'https://images.punkapi.com/v2/2.png',
'abv': 4.1,
'ibu': 41.5,
'target_fg': 1010,
'target_og': 1041.7,
'ebc': 15,
'srm': 15,
'ph': 4.4,
'attenuation_level': 76,
'volume': {'value': 20, 'unit': 'litres'},
'boil_volume': {'value': 25, 'unit': 'litres'},
'method': {'mash_temp': [{'temp': {'value': 69, 'unit': 'celsius'},
'duration': None}],
'fermentation': {'temp': {'value': 18, 'unit': 'celsius'}},
'twist': None},
'ingredients': {'malt': [{'name': 'Maris Otter Extra Pale',
'amount': {'value': 3.25, 'unit': 'kilograms'}},
{'name': 'Caramalt', 'amount': {'value': 0.2, 'unit': 'kilograms'}},
{'name': 'Munich', 'amount': {'value': 0.4, 'unit': 'kilograms'}}],
'hops': [{'name': 'Amarillo',
'amount': {'value': 13.8, 'unit': 'grams'},
'add': 'start',
'attribute': 'bitter'},
{'name': 'Simcoe',
'amount': {'value': 13.8, 'unit': 'grams'},
'add': 'start',
'attribute': 'bitter'},
{'name': 'Amarillo',
'amount': {'value': 26.3, 'unit': 'grams'},
'add': 'end',
'attribute': 'flavour'},
{'name': 'Motueka',
'amount': {'value': 18.8, 'unit': 'grams'},
'add': 'end',
'attribute': 'flavour'}],
'yeast': 'Wyeast 1056 - American Ale™'},
'food_pairing': ['Fresh crab with lemon',
'Garlic butter dipping sauce',
'Goats cheese salad',
'Creamy lemon bar doused in powdered sugar'],
'brewers_tips': 'Be careful not to collect too much wort from the mash. Once the sugars are all washed out there are some very unpleasant grainy tasting compounds that can be extracted into the wort.',
'contributed_by': 'Sam Mason <samjbmason>'}]
I was able to unnest it to a level using json_normalize
import requests
import pandas as pd
url = "https://api.punkapi.com/v2/beers"
requests.get(url).json()
data = requests.get(url).json()
pd.json_normalize(data)
this is an image of the output after using json_normalize
now to unnest the column 'method.mash_temp' I included record_path
pd.json_normalize(
data,
record_path =['method', 'mash_temp'],
meta=['id', 'name']
)
but I am having troubles adding the other columns('ingredients.malt', 'ingredients.hops') with list of dictionaries in the record_path argument.

How to remove the list of dictionary elements with the same specific key-value

I want my python code to browse the list elements and remove those whose 'time' values are the same while keeping the first ones.
Here is my list:
a = [
{'time': '18.00', 'id': 'bus_1', 'CO2': '8165.28'},
{'time': '20.00', 'id': 'bus_1' , 'CO2': '0.00'},
{'time': '21.00', 'id': 'f2.0' , 'CO2': '0.00'},
{'time': '18.00', 'id': 'bus_1', 'CO2': '8165.28', 'waiting': '0.00'},
{'time': '20.00', 'id': 'bus_1', 'CO2': '0.00', 'waiting': '0.00'}
]
The expected output:
b = [
{'time': '18.00', 'id': 'bus_1', 'CO2': '8165.28'},
{'time': '20.00', 'id': 'bus_1' , 'CO2': '0.00'},
{'time': '21.00', 'id': 'f2.0' , 'CO2': '0.00'},
]

One option is to keep track of time already seen:
a = [
{'time': '18.00', 'id': 'bus_1', 'CO2': '8165.28'},
{'time': '20.00', 'id': 'bus_1' , 'CO2': '0.00'},
{'time': '21.00', 'id': 'f2.0' , 'CO2': '0.00'},
{'time': '18.00', 'id': 'bus_1', 'CO2': '8165.28', 'waiting': '0.00'},
{'time': '20.00', 'id': 'bus_1', 'CO2': '0.00', 'waiting': '0.00'}
]
output, time_seen = [], set()
for dct in a:
time = dct['time']
if time not in time_seen:
output.append(dct)
time_seen.add(time)
print(output)
# [{'time': '18.00', 'id': 'bus_1', 'CO2': '8165.28'},
# {'time': '20.00', 'id': 'bus_1', 'CO2': '0.00'},
# {'time': '21.00', 'id': 'f2.0', 'CO2': '0.00'}]

How to get an ec2 instance cost in python?

I launch an instance, stress it's cpu and delete the instance.
Using the aws cdk this takes a couple of minutes and I'm looping over 100
instance types (for benchmark purposes).
How can I get that instance's loop cost programmatically (aws cli or boto3)?
I have the instance-id

import boto3
import pprint
client = boto3.client('ce')
response = client.get_cost_and_usage_with_resources(
Granularity='DAILY',
Metrics=["BlendedCost", "UnblendedCost", "UsageQuantity"],
TimePeriod={
'Start': '2021-12-20',
'End': '2021-12-28'
},
Filter={
"Dimensions": {
"Key": "SERVICE",
"Values": ["Amazon Elastic Compute Cloud - Compute"]
}
},
GroupBy=[{
"Type": "DIMENSION",
"Key": "RESOURCE_ID"
}])
pprint.pprint(response)
Returns (shortened excerpt):
{'DimensionValueAttributes': [],
'GroupDefinitions': [{'Key': 'RESOURCE_ID', 'Type': 'DIMENSION'}],
'ResponseMetadata': {'HTTPHeaders': {'cache-control': 'no-cache',
'connection': 'keep-alive',
'content-length': '8461',
'content-type': 'application/x-amz-json-1.1',
'date': 'Wed, 29 Dec 2021 09:08:16 GMT',
'x-amzn-requestid': '2de9c92e-6d1c-4b1c-9087-bee17a41cb4f'},
'HTTPStatusCode': 200,
'RequestId': '2de9c92e-6d1c-4b1c-9087-bee17a41cb4f',
'RetryAttempts': 0},
'ResultsByTime': [{'Estimated': True,
'Groups': [],
'TimePeriod': {'End': '2021-12-21T00:00:00Z',
'Start': '2021-12-20T00:00:00Z'},
'Total': {'BlendedCost': {'Amount': '0', 'Unit': 'USD'},
'UnblendedCost': {'Amount': '0', 'Unit': 'USD'},
'UsageQuantity': {'Amount': '0', 'Unit': 'N/A'}}},
{'Estimated': True,
'Groups': [],
'TimePeriod': {'End': '2021-12-22T00:00:00Z',
'Start': '2021-12-21T00:00:00Z'},
'Total': {'BlendedCost': {'Amount': '0', 'Unit': 'USD'},
'UnblendedCost': {'Amount': '0', 'Unit': 'USD'},
'UsageQuantity': {'Amount': '0', 'Unit': 'N/A'}}},
{'Estimated': True,
'Groups': [],
'TimePeriod': {'End': '2021-12-23T00:00:00Z',
'Start': '2021-12-22T00:00:00Z'},
'Total': {'BlendedCost': {'Amount': '0', 'Unit': 'USD'},
'UnblendedCost': {'Amount': '0', 'Unit': 'USD'},
'UsageQuantity': {'Amount': '0', 'Unit': 'N/A'}}},
{'Estimated': True,
'Groups': [{'Keys': ['i-03ffa7c932a515d76'],
'Metrics': {'BlendedCost': {'Amount': '0.0027617772',
'Unit': 'USD'},
....
..
.
shortened here
.
.
'TimePeriod': {'End': '2021-12-27T00:00:00Z',
'Start': '2021-12-26T00:00:00Z'},
'Total': {}},
{'Estimated': True,
'Groups': [{'Keys': ['i-0665a330b242714f2'],
'Metrics': {'BlendedCost': {'Amount': '0.216643501',
'Unit': 'USD'},
'UnblendedCost': {'Amount': '0.216643501',
'Unit': 'USD'},
'UsageQuantity': {'Amount': '0.554054168',
'Unit': 'N/A'}}},
{'Keys': ['i-080780d0d7e3394dd'],
'Metrics': {'BlendedCost': {'Amount': '2.7341269802',
'Unit': 'USD'},
'UnblendedCost': {'Amount': '2.7341269802',
'Unit': 'USD'},
'UsageQuantity': {'Amount': '1.0241218603',
'Unit': 'N/A'}}},
{'Keys': ['i-0b95613810475903b'],
'Metrics': {'BlendedCost': {'Amount': '0.432736006',
'Unit': 'USD'},
'UnblendedCost': {'Amount': '0.432736006',
'Unit': 'USD'},
'UsageQuantity': {'Amount': '0.5530218935',
'Unit': 'N/A'}}},
{'Keys': ['i-0eab899e392cf4f35'],
'Metrics': {'BlendedCost': {'Amount': '0.5645311508',
'Unit': 'USD'},
'UnblendedCost': {'Amount': '0.5645311508',
'Unit': 'USD'},
'UsageQuantity': {'Amount': '1.1896368629',
'Unit': 'N/A'}}}],
'TimePeriod': {'End': '2021-12-28T00:00:00Z',
'Start': '2021-12-27T00:00:00Z'},
'Total': {}}]}

Python group by multiple keys in a dict [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a list of dict I want to group by multiple keys.
I have used sort by default in python dict
data = [
[],
[{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020}, {'value': 79, 'bot': 'DB', 'month': 10, 'year': 2020}, {'value': 126, 'bot': 'DB', 'month':8, 'year': 2021}],
[],
[{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020}, {'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}, {'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}],
[{'value': 0, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}],
[{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020}, {'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021}, {'value': 1335, 'bot': 'DB', 'month': 10, 'year': 2020}, {'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021}, {'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021}, {'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}]
]
output_dict = {}
for i in data:
if not i:
pass
for j in i:
for key,val in sorted(j.items()):
output_dict.setdefault(val, []).append(key)
print(output_dict)
{'DB': ['bot', 'bot', 'bot', 'bot', 'bot', 'bot', 'bot', 'bot', 'bot'], 9: ['month', 'month', 'month'], 8: ['value'], 2020: ['year', 'year', 'year', 'year', 'year'], 10: ['month', 'month'], 79: ['value'], 126: ['value'], 2021: ['year', 'year', 'year', 'year', 'year', 'year', 'year', 'year'], 'GEMBOT': ['bot', 'bot', 'bot', 'bot'], 11: ['month', 'month'], 222: ['value'], 4: ['month', 'month', 'month'], 623: ['value'], 628: ['value'], 0: ['value'], 703: ['value'], 3: ['month'], 1081: ['value'], 1335: ['value'], 1920: ['value'], 1: ['month'], 2132: ['value'], 2: ['month'], 2383: ['value']}
But I want the output like this.
[{ "bot": "DB",
"date": "Sept 20",
"value": 134
},{"bot": "DB",
"date": "Oct 20",
"value": 79
}.. So on ]
Is there an efficient way to flatten this list ?
Thanks in advance

Two things will make this easier to answer. The first is a list comprehension that will promote sub-items:
data_reshaped = [cell for row in data for cell in row]
this will take your original data and flatten it a bit to:
[
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020},
{'value': 79, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 126, 'bot': 'DB', 'month': 8, 'year': 2021},
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020},
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021},
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021},
{'value': 0, 'bot': 'GEMBOT', 'month': 4, 'year': 2021},
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020},
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021},
{'value': 1335, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021},
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021},
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
]
Now we can iterate over that using an compound key and setdefault() to aggregate the results. Note if you rather use collections.defaultdict() as I do then swap that out for setdefault().
results = {}
for cell in data_reshaped:
key = f"{cell['bot']}_{cell['year']}_{cell['month']}"
value = cell["value"] # save the value so we can reset cell next
cell["value"] = 0 # setting this to 0 cleans up the next line.
results.setdefault(key, cell)["value"] += value
This should allow you to:
for result in results.values():
print(result)
Giving:
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020}
{'value': 1414, 'bot': 'DB', 'month': 10, 'year': 2020}
{'value': 126, 'bot': 'DB', 'month': 8, 'year': 2021}
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020}
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020}
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021}
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021}
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021}
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
Full solution:
data = [
[],
[
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020},
{'value': 79, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 126, 'bot': 'DB', 'month':8, 'year': 2021}
],
[],
[
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020},
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021},
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}
],
[
{'value': 0, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}
],
[
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020},
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021},
{'value': 1335, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021},
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021},
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
]
]
data_reshaped = [cell for row in data for cell in row]
results = {}
for cell in data_reshaped:
key = f"{cell['bot']}_{cell['year']}_{cell['month']}"
value = cell["value"]
cell["value"] = 0
results.setdefault(key, cell)["value"] += value
for result in results.values():
print(result)
Again Giving:
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020}
{'value': 1414, 'bot': 'DB', 'month': 10, 'year': 2020}
{'value': 126, 'bot': 'DB', 'month': 8, 'year': 2021}
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020}
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020}
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021}
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021}
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021}
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
I will leave it to you to figure out casting the two date fields to some other presentation as that seems out of context with the question at hand.

Maybe try:
from pprint import pprint
import datetime
output_dict = []
for i in data:
if i:
for j in i:
for key, val in sorted(j.items()):
if key == "bot":
temp["bot"] = val
elif key == "value":
temp["value"] = val
elif key == "month":
month = datetime.datetime.strptime(str(val), "%m")
temp["date"] = month.strftime("%b")
elif key == "year":
temp["date"] = str(temp["date"]) + " " + str(val)
output_dict.append(temp)
temp = {}
pprint(output_dict)
The final results are shown as follows:
[{'bot': 'DB', 'date': 'Sep 2020', 'value': 8},
{'bot': 'DB', 'date': 'Oct 2020', 'value': 79},
{'bot': 'DB', 'date': 'Aug 2021', 'value': 126},
{'bot': 'GEMBOT', 'date': 'Nov 2020', 'value': 222},
{'bot': 'GEMBOT', 'date': 'Apr 2021', 'value': 623},
{'bot': 'GEMBOT', 'date': 'Sep 2021', 'value': 628},
{'bot': 'GEMBOT', 'date': 'Apr 2021', 'value': 0},
{'bot': 'DB', 'date': 'Nov 2020', 'value': 703},
{'bot': 'DB', 'date': 'Mar 2021', 'value': 1081},
{'bot': 'DB', 'date': 'Oct 2020', 'value': 1335},
{'bot': 'DB', 'date': 'Apr 2021', 'value': 1920},
{'bot': 'DB', 'date': 'Jan 2021', 'value': 2132},
{'bot': 'DB', 'date': 'Feb 2021', 'value': 2383}]

Maybe try:
output = []
for i in data:
if not i:
pass
for j in i:
output.append(j)
And then if you want to sort it, then you can use sorted_output = sorted(ouput, key=lambda k: k['bot']) to sort it by bot for example. If you want to sort it by date, maybe create a value that calculates the date in months and then sorts it from there.

Pagination of insights results in Facebook

I'm making requests to FB API, retriving data from a FB page.
I have a dict with params like this, just get data from 08-01 to 08-20:
data_test = {
'page_likes' : {'page_id': '111',
'metric': 'page_fans',
'since': datetime(2019, 8, 1, 0, 0, 0),
'until': datetime(2019, 8, 20, 0, 0, 0),
'date_preset' : 'yesterday',
'period': 'day'},
'page_impressions' : {'page_id': '111',
'metric': 'page_impressions',
'since': datetime(2019, 8, 1, 0, 0, 0),
'until': datetime(2019, 8, 20, 0, 0, 0),
'date_preset' : 'yesterday',
'period': 'day'}
}
And run a loop through it, collecting raw data to a list, then checking for additional pages, and adding to the list.
responses_time_balanced = [] # list for responses
# loop through data_test
for k, v in data_test.items():
sample_request = graph.get_connections(id = v['page_id'],
connection_name='insights',
metric = v['metric'],
since = v['since'],
until = v['until'],
#date_preset = v['date_preset'],
period = v['period'])
# add responses to the responses_time_balanced
responses_time_balanced.append(sample_request)
# check for additinal pages and add data from them to the list
for request in responses_time_balanced:
if 'next' in request['paging'].keys():
request = requests.get(request['paging']['next']).json()
responses_time_balanced.append(request)
But for some reasons, which I can't get, data from the second page duplicates.
So the final list looks like this:
Data:
page_fans from 2019-08-02 to 2019-08-19
page_fans from 2019-08-20 to 2019-09-06
page_impressions from 2019-08-02 to 2019-08-19
page_impressions from 2019-08-20 to 2019-09-06
and again
page_fans from 2019-08-20 to 2019-09-06
page_impressions from 2019-08-20 to 2019-09-06
There is some stupid mistake in the code, but i can't get it
The whole response:
the access token is not valid
[{'data': [{'name': 'page_fans',
'period': 'day',
'values': [{'value': 113264, 'end_time': '2019-08-02T07:00:00+0000'},
{'value': 113246, 'end_time': '2019-08-03T07:00:00+0000'},
{'value': 113231, 'end_time': '2019-08-04T07:00:00+0000'},
{'value': 113219, 'end_time': '2019-08-05T07:00:00+0000'},
{'value': 113195, 'end_time': '2019-08-06T07:00:00+0000'},
{'value': 113177, 'end_time': '2019-08-07T07:00:00+0000'},
{'value': 113166, 'end_time': '2019-08-08T07:00:00+0000'},
{'value': 113147, 'end_time': '2019-08-09T07:00:00+0000'},
{'value': 113138, 'end_time': '2019-08-10T07:00:00+0000'},
{'value': 113132, 'end_time': '2019-08-11T07:00:00+0000'},
{'value': 113124, 'end_time': '2019-08-12T07:00:00+0000'},
{'value': 113118, 'end_time': '2019-08-13T07:00:00+0000'},
{'value': 113109, 'end_time': '2019-08-14T07:00:00+0000'},
{'value': 113097, 'end_time': '2019-08-15T07:00:00+0000'},
{'value': 113091, 'end_time': '2019-08-16T07:00:00+0000'},
{'value': 113082, 'end_time': '2019-08-17T07:00:00+0000'},
{'value': 113071, 'end_time': '2019-08-18T07:00:00+0000'},
{'value': 113066, 'end_time': '2019-08-19T07:00:00+0000'}],
'title': 'Lifetime Total Likes',
'description': 'Lifetime: The total number of people who have liked your Page. (Unique Users)',
'id': '571653679838052/insights/page_fans/day'}],
'paging': {'previous': 'https://graph.facebook.com/v4.0/571653679838052/insights?access_token=EAAHzAoOZCOJQBALIh8wCto7ZCsRMTvs4ybK8I0pH1xQ4YKgfbdsOeLwxgFcJZAYDqGxgofdZBlzmqZBH1tHY5QgAH7KHUAhAjVg5lw9psgOysZAzGEVZBZBE1cVPdqZCJneS1ousyt0xZCx1kPexvSvVQ9tK783bpp9PtJZACvjgqUCRL5LC0IuhmB1&since=1563087600&until=1564642800&metric=page_fans&period=day',
'next': 'https://graph.facebook.com/v4.0/571653679838052/insights?access_token=EAAHzAoOZCOJQBALIh8wCto7ZCsRMTvs4ybK8I0pH1xQ4YKgfbdsOeLwxgFcJZAYDqGxgofdZBlzmqZBH1tHY5QgAH7KHUAhAjVg5lw9psgOysZAzGEVZBZBE1cVPdqZCJneS1ousyt0xZCx1kPexvSvVQ9tK783bpp9PtJZACvjgqUCRL5LC0IuhmB1&since=1566198000&until=1567753200&metric=page_fans&period=day'}},
{'data': [{'name': 'page_fans',
'period': 'day',
'values': [{'value': 113209, 'end_time': '2019-08-20T07:00:00+0000'},
{'value': 113299, 'end_time': '2019-08-21T07:00:00+0000'},
{'value': 113352, 'end_time': '2019-08-22T07:00:00+0000'},
{'value': 113409, 'end_time': '2019-08-23T07:00:00+0000'},
{'value': 113469, 'end_time': '2019-08-24T07:00:00+0000'},
{'value': 113517, 'end_time': '2019-08-25T07:00:00+0000'},
{'value': 113578, 'end_time': '2019-08-26T07:00:00+0000'},
{'value': 113622, 'end_time': '2019-08-27T07:00:00+0000'},
{'value': 113652, 'end_time': '2019-08-28T07:00:00+0000'},
{'value': 113700, 'end_time': '2019-08-29T07:00:00+0000'},
{'value': 113723, 'end_time': '2019-08-30T07:00:00+0000'},
{'value': 113756, 'end_time': '2019-08-31T07:00:00+0000'},
{'value': 113792, 'end_time': '2019-09-01T07:00:00+0000'},
{'value': 113839, 'end_time': '2019-09-02T07:00:00+0000'},
{'value': 113873, 'end_time': '2019-09-03T07:00:00+0000'},
{'value': 113911, 'end_time': '2019-09-04T07:00:00+0000'},
{'value': 113913, 'end_time': '2019-09-05T07:00:00+0000'},
{'value': 113913, 'end_time': '2019-09-06T07:00:00+0000'}],
'title': 'Lifetime Total Likes',
'description': 'Lifetime: The total number of people who have liked your Page. (Unique Users)',
'id': '571653679838052/insights/page_fans/day'}],
'paging': {'previous': 'https://graph.facebook.com/v4.0/571653679838052/insights?access_token=EAAHzAoOZCOJQBALIh8wCto7ZCsRMTvs4ybK8I0pH1xQ4YKgfbdsOeLwxgFcJZAYDqGxgofdZBlzmqZBH1tHY5QgAH7KHUAhAjVg5lw9psgOysZAzGEVZBZBE1cVPdqZCJneS1ousyt0xZCx1kPexvSvVQ9tK783bpp9PtJZACvjgqUCRL5LC0IuhmB1&since=1564642800&until=1566198000&metric=page_fans&period=day'}},
{'data': [{'name': 'page_impressions',
'period': 'day',
'values': [{'value': 1467, 'end_time': '2019-08-02T07:00:00+0000'},
{'value': 421, 'end_time': '2019-08-03T07:00:00+0000'},
{'value': 271, 'end_time': '2019-08-04T07:00:00+0000'},
{'value': 260, 'end_time': '2019-08-05T07:00:00+0000'},
{'value': 1584, 'end_time': '2019-08-06T07:00:00+0000'},
{'value': 484, 'end_time': '2019-08-07T07:00:00+0000'},
{'value': 269, 'end_time': '2019-08-08T07:00:00+0000'},
{'value': 1290, 'end_time': '2019-08-09T07:00:00+0000'},
{'value': 487, 'end_time': '2019-08-10T07:00:00+0000'},
{'value': 205, 'end_time': '2019-08-11T07:00:00+0000'},
{'value': 267, 'end_time': '2019-08-12T07:00:00+0000'},
{'value': 267, 'end_time': '2019-08-13T07:00:00+0000'},
{'value': 233, 'end_time': '2019-08-14T07:00:00+0000'},
{'value': 388, 'end_time': '2019-08-15T07:00:00+0000'},
{'value': 1383, 'end_time': '2019-08-16T07:00:00+0000'},
{'value': 583, 'end_time': '2019-08-17T07:00:00+0000'},
{'value': 12554, 'end_time': '2019-08-18T07:00:00+0000'},
{'value': 258, 'end_time': '2019-08-19T07:00:00+0000'}],
'title': 'Daily Total Impressions',
'description': "Daily: The number of times any content from your Page or about your Page entered a person's screen. This includes posts, stories, check-ins, ads, social information from people who interact with your Page and more. (Total Count)",
'id': '571653679838052/insights/page_impressions/day'}],
'paging': {'previous': 'https://graph.facebook.com/v4.0/571653679838052/insights?access_token=EAAHzAoOZCOJQBALIh8wCto7ZCsRMTvs4ybK8I0pH1xQ4YKgfbdsOeLwxgFcJZAYDqGxgofdZBlzmqZBH1tHY5QgAH7KHUAhAjVg5lw9psgOysZAzGEVZBZBE1cVPdqZCJneS1ousyt0xZCx1kPexvSvVQ9tK783bpp9PtJZACvjgqUCRL5LC0IuhmB1&since=1563087600&until=1564642800&metric=page_impressions&period=day',
'next': 'https://graph.facebook.com/v4.0/571653679838052/insights?access_token=EAAHzAoOZCOJQBALIh8wCto7ZCsRMTvs4ybK8I0pH1xQ4YKgfbdsOeLwxgFcJZAYDqGxgofdZBlzmqZBH1tHY5QgAH7KHUAhAjVg5lw9psgOysZAzGEVZBZBE1cVPdqZCJneS1ousyt0xZCx1kPexvSvVQ9tK783bpp9PtJZACvjgqUCRL5LC0IuhmB1&since=1566198000&until=1567753200&metric=page_impressions&period=day'}},
{'data': [{'name': 'page_fans',
'period': 'day',
'values': [{'value': 113209, 'end_time': '2019-08-20T07:00:00+0000'},
{'value': 113299, 'end_time': '2019-08-21T07:00:00+0000'},
{'value': 113352, 'end_time': '2019-08-22T07:00:00+0000'},
{'value': 113409, 'end_time': '2019-08-23T07:00:00+0000'},
{'value': 113469, 'end_time': '2019-08-24T07:00:00+0000'},
{'value': 113517, 'end_time': '2019-08-25T07:00:00+0000'},
{'value': 113578, 'end_time': '2019-08-26T07:00:00+0000'},
{'value': 113622, 'end_time': '2019-08-27T07:00:00+0000'},
{'value': 113652, 'end_time': '2019-08-28T07:00:00+0000'},
{'value': 113700, 'end_time': '2019-08-29T07:00:00+0000'},
{'value': 113723, 'end_time': '2019-08-30T07:00:00+0000'},
{'value': 113756, 'end_time': '2019-08-31T07:00:00+0000'},
{'value': 113792, 'end_time': '2019-09-01T07:00:00+0000'},
{'value': 113839, 'end_time': '2019-09-02T07:00:00+0000'},
{'value': 113873, 'end_time': '2019-09-03T07:00:00+0000'},
{'value': 113911, 'end_time': '2019-09-04T07:00:00+0000'},
{'value': 113913, 'end_time': '2019-09-05T07:00:00+0000'},
{'value': 113913, 'end_time': '2019-09-06T07:00:00+0000'}],
'title': 'Lifetime Total Likes',
'description': 'Lifetime: The total number of people who have liked your Page. (Unique Users)',
'id': '571653679838052/insights/page_fans/day'}],
'paging': {'previous': 'https://graph.facebook.com/v4.0/571653679838052/insights?access_token=EAAHzAoOZCOJQBALIh8wCto7ZCsRMTvs4ybK8I0pH1xQ4YKgfbdsOeLwxgFcJZAYDqGxgofdZBlzmqZBH1tHY5QgAH7KHUAhAjVg5lw9psgOysZAzGEVZBZBE1cVPdqZCJneS1ousyt0xZCx1kPexvSvVQ9tK783bpp9PtJZACvjgqUCRL5LC0IuhmB1&since=1564642800&until=1566198000&metric=page_fans&period=day'}},
{'data': [{'name': 'page_impressions',
'period': 'day',
'values': [{'value': 53579, 'end_time': '2019-08-20T07:00:00+0000'},
{'value': 36032, 'end_time': '2019-08-21T07:00:00+0000'},
{'value': 33509, 'end_time': '2019-08-22T07:00:00+0000'},
{'value': 41100, 'end_time': '2019-08-23T07:00:00+0000'},
{'value': 39801, 'end_time': '2019-08-24T07:00:00+0000'},
{'value': 38691, 'end_time': '2019-08-25T07:00:00+0000'},
{'value': 37131, 'end_time': '2019-08-26T07:00:00+0000'},
{'value': 28139, 'end_time': '2019-08-27T07:00:00+0000'},
{'value': 17064, 'end_time': '2019-08-28T07:00:00+0000'},
{'value': 16537, 'end_time': '2019-08-29T07:00:00+0000'},
{'value': 20000, 'end_time': '2019-08-30T07:00:00+0000'},
{'value': 18023, 'end_time': '2019-08-31T07:00:00+0000'},
{'value': 15622, 'end_time': '2019-09-01T07:00:00+0000'},
{'value': 27015, 'end_time': '2019-09-02T07:00:00+0000'},
{'value': 25329, 'end_time': '2019-09-03T07:00:00+0000'},
{'value': 18049, 'end_time': '2019-09-04T07:00:00+0000'},
{'value': 1124, 'end_time': '2019-09-05T07:00:00+0000'},
{'value': 0, 'end_time': '2019-09-06T07:00:00+0000'}],
'title': 'Daily Total Impressions',
'description': "Daily: The number of times any content from your Page or about your Page entered a person's screen. This includes posts, stories, check-ins, ads, social information from people who interact with your Page and more. (Total Count)",
'id': '571653679838052/insights/page_impressions/day'}],
'paging': {'previous': 'https://graph.facebook.com/v4.0/571653679838052/insights?access_token=EAAHzAoOZCOJQBALIh8wCto7ZCsRMTvs4ybK8I0pH1xQ4YKgfbdsOeLwxgFcJZAYDqGxgofdZBlzmqZBH1tHY5QgAH7KHUAhAjVg5lw9psgOysZAzGEVZBZBE1cVPdqZCJneS1ousyt0xZCx1kPexvSvVQ9tK783bpp9PtJZACvjgqUCRL5LC0IuhmB1&since=1564642800&until=1566198000&metric=page_impressions&period=day'}}]
update
checked with the Graph API Explorer, there is everything ok

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parse multiple nested file in a pandas dataframe - python

I use this code: dfs = [] for i in your_list: df = pd.DataFrame.from_dict(i, orient='index') # Use pd.DataFrame.from_dict(i, orient='index').T may be dfs.append(dfs) full_df = pd.concat(dfs) It just made life easier than json_normalize

Related

having trouble passing multiple dictionaries for the argument record_path in json_normalize

How to remove the list of dictionary elements with the same specific key-value

How to get an ec2 instance cost in python?

Python group by multiple keys in a dict [closed]

Pagination of insights results in Facebook

Categories

Resources