Related
I have this function def group_by_transaction, and I want it to return me a new list of dictionaries, but when I run it with my example data I get:
[{'user_id': 'user3',
'transaction_category_id': '698723',
'transaction_amount_sum': 500},
{'user_id': 'user4',
'transaction_category_id': '698723',
'transaction_amount_sum': 500},
{'user_id': 'user5',
'transaction_category_id': '698723',
'transaction_amount_sum': 300}]
But I wish it was:
[{'number_of_users': 3,
'transaction_category_id': '698723',
'transaction_amount_sum': 1300}]
from itertools import groupby
from operator import itemgetter
data = [{'transaction_id': '00004ed8-2c57-4374-9a0c-3ff1d8a94a9e',
'date': '2013-12-30',
'user_id': 'user3',
'is_blocked': 'false',
'transaction_amount': 200,
'transaction_category_id': '698723',
'is_active': '0'},
{'transaction_id': '00004ed8-2c57-4374-9a0c-3ff1d8a94a7e',
'date': '2013-12-21',
'user_id': 'user3',
'is_blocked': 'false',
'transaction_amount': 300,
'transaction_category_id': '698723',
'is_active': '0'},
{'transaction_id': '00004ed8-2c57-4374-9a0c-3ff1d8a94a9e',
'date': '2013-12-30',
'user_id': 'user4',
'is_blocked': 'false',
'transaction_amount': 200,
'transaction_category_id': '698723',
'is_active': '0'},
{'transaction_id': '00004ed8-2c57-4374-9a0c-3ff1d8a94a7e',
'date': '2013-12-21',
'user_id': 'user4',
'is_blocked': 'false',
'transaction_amount': 300,
'transaction_category_id': '698723',
'is_active': '0'},
{'transaction_id': '00004ed8-2c57-4374-9a0c-3ff1d8a94a7e',
'date': '2013-12-21',
'user_id': 'user5',
'is_blocked': 'false',
'transaction_amount': 300,
'transaction_category_id': '698723',
'is_active': '0'}]
def group_by_transaction(data):
grouper = ['user_id', 'transaction_category_id']
key = itemgetter(*grouper)
data.sort(key=key)
return [{**dict(zip(grouper, k)), 'transaction_amount_sum': sum(map(itemgetter('transaction_amount'), g))}
for k, g in groupby(data, key=key)]
group_by_transaction(data)
Can anybody help me please?
I tried to add a new column into the calculation in the loop but I couldn't achieve any way
Collect data with groupby
I'm not brave enough to do implicit data conversion inside a function. So I prefer to use sorted(data) instead of data.sort
You have to count somehow the number of unique users in order to get the key-value pair {'number_of_users': 3}
You're grouping by ['user_id', 'transaction_category_id'] pair, but to get what you wish the only key to group by is 'transaction_category_id'
With that said, here's a code that I'm sure is close enough to yours to produce the desired grouping.
def group_by_transaction(data):
category = itemgetter('transaction_category_id')
user_amount = itemgetter('user_id', 'transaction_amount')
return [
{
'transaction_category_id': cat_id
, 'number_of_users': len({*users})
, 'transaction_amount_sum': sum(amounts)
}
for cat_id, group in groupby(sorted(data, key=category), category)
for users, amounts in [zip(*(user_amount(record) for record in group))]
]
Update
About for users, amounts in [zip(*(user_amount(record) for record in group))]:
by user_amount(record) ... we extract pairs of data (user_id, transaction_amount)
by zip(*(...)) we accomplish transposing of collected data
zip is a generator, which in this case will result in two rows, where first is user_id values and second is transaction_amount values. To get them both at once we wrap zip-object as the only item of a list. That's the meaning of [zip(...)]
when assigning zip not to one but to several variables, like in users, amounts = zip(...), zipped values will be unpacked. In our case that's two rows mentioned above.
datainput = {'thissong-fav-user:type1-chan-44-John': [{'Song': 'Rock',
'Type': 'Hard',
'Price': '10'}],
'thissong-fav-user:type1-chan-45-kelly-md': [{'Song': 'Rock',
'Type': 'Soft',
'Price': '5'}]}
Outputrequired:
{'thissong-fav-user:type1-chan-44-John': [{key:'Song',Value:'Rock'},
{key:'Type', Value:'Hard'},
{Key: 'Price', Value:'10'}],
'thissong-fav-user:type1-chan-45-kelly-md': [{key:'Song',Value:'Rock'},
{key:'Type', Value:'Soft'},
{Key: 'Price', Value:'5'}]}
I started with below, which gives me an inner nested pattern not sure how I can get the desired output.
temps = [{'Key': key, 'Value': value} for (key, value) in datainput.items()]
Here is how:
datainput = {'thissong-fav-user:type1-chan-44-John': [{'Song': 'Rock',
'Type': 'Hard',
'Price': '10'}],
'thissong-fav-user:type1-chan-45-kelly-md': [{'Song': 'Rock',
'Type': 'Soft',
'Price': '5'}]}
temps = {k:[{'Key':a, 'Value':b}
for a,b in v[0].items()]
for k,v in datainput.items()}
print(datainput)
Output:
{'thissong-fav-user:type1-chan-44-John': [{'Key': 'Song', 'Value': 'Rock'},
{'Key': 'Type', 'Value': 'Hard'},
{'Key': 'Price', 'Value': '10'}],
'thissong-fav-user:type1-chan-45-kelly-md': [{'Key': 'Song', 'Value': 'Rock'},
{'Key': 'Type', 'Value': 'Soft'},
{'Key': 'Price', 'Value': '5'}]}
I believe the way of having taken the input is fine but in order to get the desired output, you got to take the inputs initially, then key-value pair and finally iterate.
datainput = {'thissong-fav-user:type1-chan-44-John': [{'Song': 'Rock',
'Type': 'Hard',
'Price': '10'}],
'thissong-fav-user:type1-chan-45-kelly-md': [{'Song': 'Rock',
'Type': 'Soft',
'Price': '5'}]}
datainput = {k:[{'Key':a, 'Value':b} for a,b in v[0].items()] for k,v in datainput.items()}
print(datainput)
Most probably, you'll get the desired output in this fashion.
I have loaded two json files in Python3.8, and I need to merge the two based on a condition.
Obj1 = [{'account': '223', 'colr': '#555555', 'hash': True},
{'account': '134', 'colr': '#666666', 'hash': True},
{'account': '252', 'colr': '#777777', 'hash': True}]
Obj2 = [{'sn': 38796, 'code': 'df', 'id': 199, 'desc': 'jex - #777777- gg2349.252'},
{'sn': 21949, 'code': 'se', 'id': 193, 'desc': 'jex - #555555 - gf23569'},
{'sn': 21340, 'code': 'se', 'id': 3, 'desc': 'jex - #666666 - gf635387'}]
# What I am trying to get
Obj3 = [{'sn': 38796, 'code': 'df', 'id': 199, 'desc': 'jex - #777777- gg2349.252', 'account': '252', 'colr': '#777777', 'hash': True},
{'sn': 21949, 'code': 'se', 'id': 193, 'desc': 'jex - #555555 - gf23569', 'account': '223', 'colr': '#555555', 'hash': True},
{'sn': 21340, 'code': 'se', 'id': 3, 'desc': 'jex - #666666 - gf635387', 'account': '134', 'colr': '#666666', 'hash': True}]
I have tried from what I can gather everything on SO from append, extend etc but I fall short on the condition.
I need to be able to append elements in Obj1 to Obj2 at their correct place based on a condition that if the colr of Obj1 is mentioned in desc of Obj2 it should append that whole element from Obj1 into the correlated element of Obj2. Or create a new Obj3 that I can print these updated values from.
What I have tried and looked at thus far Append JSON Object, Append json objects to nested list, Appending json object to existing json object and a few others that also didn't help.
Hope this makes sense and thank you
Something simple like this would work.
for i in range(len(Obj1)):
for j in range(len(Obj2)):
if Obj1[i]['colr'] in Obj2[j]['desc']:
Obj1[i].update(Obj2[j])
print(Obj1)
One approach is to first create a dictionary mapping each color to the JSON element. You can do this as
colr2elem = {elem['colr']: elem for elem in json_obj1}
Then you can see which color to append applying a Regular Expression to the description and update json_obj2 dictionary (merge dictionaries).
import re
for elem2 in json_obj2:
elem1 = colr2elem.get(re.search('#\d+', elem2['desc']).group(0))
elem2.update(elem1 if elem1 is not None else {})
Obj1 = [{'account': '223', 'colr': '#555555', 'hash': True},
{'account': '134', 'colr': '#666666', 'hash': True},
{'account': '252', 'colr': '#777777', 'hash': True}]
Obj2 = [{'sn': 38796, 'code': 'df', 'id': 199, 'desc': 'jex - #777777- gg2349.252'},
{'sn': 21949, 'code': 'se', 'id': 193, 'desc': 'jex - #555555 - gf23569'},
{'sn': 21340, 'code': 'se', 'id': 3, 'desc': 'jex - #666666 - gf635387'}]
Obj3 = []
for i in Obj1:
for j in Obj2:
if i["colr"]==j["desc"][6:13] :
a = {**j,**i}
Obj3.append(a)
print(Obj3)
You can use element1['colr'] in element2['desc'] to check if elements from the first and second arrays match. Now, you can iterate over the second array and for each of its elements find the corresponding element from the first array by checking this condition:
json_obj3 = []
for element2 in json_obj2:
for element1 in json_obj1:
if element1['colr'] in element2['desc']:
element3 = dict(**element1, **element2)
json_obj3.append(element3)
break # stop inner for loop, because matched element is found
BTW, this can be written as single expression using nested list comprehension:
json_obj3 = [
dict(**element1, **element2)
for element1 in json_obj1
for element2 in json_obj2
if element1['colr'] in element2['desc']
]
I'm trying to connect to my office's SmartSheet API via Python to create some performance tracking dashboards that utilize data outside of SmartSheet. All I want to do is create a simple DataFrame where fields reflect columnId and cell values reflect the displayValue key in the Smartsheet dictionary. I am doing this using a standard API requests.get rather than SmartSheet's API documentation because I've found the latter less easy to work with.
The table (sample) is set up as:
Number Letter Name
1 A Joe
2 B Jim
3 C Jon
The JSON syntax from the sheet GET request is:
{'id': 339338304219012,
'name': 'Sample Smartsheet',
'version': 1,
'totalRowCount': 3,
'accessLevel': 'OWNER',
'effectiveAttachmentOptions': ['GOOGLE_DRIVE',
'EVERNOTE',
'DROPBOX',
'ONEDRIVE',
'LINK',
'FILE',
'BOX_COM',
'EGNYTE'],
'ganttEnabled': False,
'dependenciesEnabled': False,
'resourceManagementEnabled': False,
'cellImageUploadEnabled': True,
'userSettings': {'criticalPathEnabled': False, 'displaySummaryTasks': True},
'userPermissions': {'summaryPermissions': 'ADMIN'},
'hasSummaryFields': False,
'permalink': 'https://app.smartsheet.com/sheets/5vxMCJQhMV7VFFPMVfJgg2hX79rj3fXgVGG8fp61',
'createdAt': '2020-02-13T16:32:02Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'isMultiPicklistEnabled': True,
'columns': [{'id': 6273865019090820,
'version': 0,
'index': 0,
'title': 'Number',
'type': 'TEXT_NUMBER',
'primary': True,
'validation': False,
'width': 150},
{'id': 4022065205405572,
'version': 0,
'index': 1,
'title': 'Letter',
'type': 'TEXT_NUMBER',
'validation': False,
'width': 150},
{'id': 8525664832776068,
'version': 0,
'index': 2,
'title': 'Name',
'type': 'TEXT_NUMBER',
'validation': False,
'width': 150}],
'rows': [{'id': 8660990817003396,
'rowNumber': 1,
'expanded': True,
'createdAt': '2020-02-14T13:15:18Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'cells': [{'columnId': 6273865019090820, 'value': 1.0, 'displayValue': '1'},
{'columnId': 4022065205405572, 'value': 'A', 'displayValue': 'A'},
{'columnId': 8525664832776068, 'value': 'Joe', 'displayValue': 'Joe'}]},
{'id': 498216492394372,
'rowNumber': 2,
'siblingId': 8660990817003396,
'expanded': True,
'createdAt': '2020-02-14T13:15:18Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'cells': [{'columnId': 6273865019090820, 'value': 2.0, 'displayValue': '2'},
{'columnId': 4022065205405572, 'value': 'B', 'displayValue': 'B'},
{'columnId': 8525664832776068, 'value': 'Jim', 'displayValue': 'Jim'}]},
{'id': 5001816119764868,
'rowNumber': 3,
'siblingId': 498216492394372,
'expanded': True,
'createdAt': '2020-02-14T13:15:18Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'cells': [{'columnId': 6273865019090820, 'value': 3.0, 'displayValue': '3'},
{'columnId': 4022065205405572, 'value': 'C', 'displayValue': 'C'},
{'columnId': 8525664832776068, 'value': 'Jon', 'displayValue': 'Jon'}]}]}
Here are the two ways I've approached the problem:
INPUT:
from pandas.io.json import json_normalize
samplej = sample.json()
s_rows = json_normalize(data=samplej['rows'], record_path='cells', meta=['id', 'rowNumber'])
s_rows
OUTPUT:
DataFrame with columnId, value, disdlayValue, id, and rowNumber as their own fields.
If I could figure out how to transpose this data in the right way I could probably make it work, but that seems incredibly complicated.
INPUT:
samplej = sample.json()
cellist = []
def get_cells():
srows = samplej['rows']
for s_cells in srows:
scells = s_cells['cells']
cellist.append(scells)
get_cells()
celldf = pd.DataFrame(cellist)
celldf
OUTPUT:
This returns a DataFrame with the correct number of columns and rows, but each cell is populated with a dictionary that looks like
In [14]:
celldf.loc[1,1]
Out [14]:
{'columnId': 4022065205405572, 'value': 'B', 'displayValue': 'B'}
If there was a way to remove everything except the value corresponding to the displayValue key in every cell, this would probably solve my problem. Again, though, it seems weirdly complicated.
I'm fairly new to Python and working with API's, so there may be a simple way to address the problem I'm overlooking. Or, if you have a suggestion for approaching the possible solutions I outlined above I'm all ears. Thanks for your help!
You must make use of the columns field:
colnames = {x['id']: x['title'] for x in samplej['columns']}
columns = [x['title'] for x in samplej['columns']]
cellist = [{colnames[scells['columnId']]: scells['displayValue']
for scells in s_cells['cells']} for s_cells in samplej['rows']]
celldf = pd.DataFrame(cellist, columns=columns)
This gives as expected:
Number Letter Name
0 1 A Joe
1 2 B Jim
2 3 C Jon
If some cells could contain only a columnId but no displayValue field, scells['displayValue'] should be replaced in above code with scells.get('displayValue', defaultValue), where defaultValue could be None, np.nan or any other relevant default.
I have a YAML file that parses into an object, e.g.:
{'name': [{'proj_directory': '/directory/'},
{'categories': [{'quick': [{'directory': 'quick'},
{'description': None},
{'table_name': 'quick'}]},
{'intermediate': [{'directory': 'intermediate'},
{'description': None},
{'table_name': 'intermediate'}]},
{'research': [{'directory': 'research'},
{'description': None},
{'table_name': 'research'}]}]},
{'nomenclature': [{'extension': 'nc'}
{'handler': 'script'},
{'filename': [{'id': [{'type': 'VARCHAR'}]},
{'date': [{'type': 'DATE'}]},
{'v': [{'type': 'INT'}]}]},
{'data': [{'time': [{'variable_name': 'time'},
{'units': 'minutes since 1-1-1980 00:00 UTC'},
{'latitude': [{'variable_n...
I'm having trouble accessing the data in python and regularly see the error TypeError: list indices must be integers, not str
I want to be able to access all elements corresponding to 'name' so to retrieve each data field I imagine it would look something like:
import yaml
settings_stream = open('file.yaml', 'r')
settingsMap = yaml.safe_load(settings_stream)
yaml_stream = True
print 'loaded settings for: ',
for project in settingsMap:
print project + ', ' + settingsMap[project]['project_directory']
and I would expect each element would be accessible via something like ['name']['categories']['quick']['directory']
and something a little deeper would just be:
['name']['nomenclature']['data']['latitude']['variable_name']
or am I completely wrong here?
The brackets, [], indicate that you have lists of dicts, not just a dict.
For example, settingsMap['name'] is a list of dicts.
Therefore, you need to select the correct dict in the list using an integer index, before you can select the key in the dict.
So, giving your current data structure, you'd need to use:
settingsMap['name'][1]['categories'][0]['quick'][0]['directory']
Or, revise the underlying YAML data structure.
For example, if the data structure looked like this:
settingsMap = {
'name':
{'proj_directory': '/directory/',
'categories': {'quick': {'directory': 'quick',
'description': None,
'table_name': 'quick'}},
'intermediate': {'directory': 'intermediate',
'description': None,
'table_name': 'intermediate'},
'research': {'directory': 'research',
'description': None,
'table_name': 'research'},
'nomenclature': {'extension': 'nc',
'handler': 'script',
'filename': {'id': {'type': 'VARCHAR'},
'date': {'type': 'DATE'},
'v': {'type': 'INT'}},
'data': {'time': {'variable_name': 'time',
'units': 'minutes since 1-1-1980 00:00 UTC'}}}}}
then you could access the same value as above with
settingsMap['name']['categories']['quick']['directory']
# quick