Dictionary data is not properly appended to another dictionay - python

Dictionary data is not properly appended to another dictionay. Hereacc_grp is a grouped pandas data.
acc_grp
amount_currency balance credit debit lid
ldate
2018-04-01 0.0 -27359.250 30219.25 1115.0000 643259
2018-04-02 0.0 -208574.742 5000.00 1194.0005 872275
Here template_dict is my dictionay.When I print result , both lines of my acc_grp is correctly available.
Flow (From Terminal)
result (1st iteration)
{'date': '2018-04-01', 'credit': 30219.25, 'balance': -29104.25, 'debit': 1115.0}
template_dict
{'code': u'300103', 'lines': [{'date': '2018-04-01', 'credit': 30219.25, 'balance': -29104.25, 'debit': 1115.0}], 'name': u'CASH COLLECTION'}
In first case,result is correctly appended to template_dict.
result (2nd iteration)
{'date': '2018-04-02', 'credit': 5000.0, 'balance': -3805.9994999999999, 'debit': 1194.0005}
template_dict
{'code': u'300103', 'lines': [{'date': '2018-04-02', 'credit': 5000.0, 'balance': -3805.9994999999999, 'debit': 1194.0005}, {'date': '2018-04-02', 'credit': 5000.0, 'balance': -3805.9994999999999, 'debit': 1194.0005}], 'name': u'CASH COLLECTION'}
Here when we look , template_dict's lines's value is supposed to be result1 , result2 but the data is coming as result2,result2.
code
result = {}
template_dict = dict()
template_dict['lines'] = []
template_dict['code'] = line['code']
template_dict['name'] = line['name']
for index,row in acc_grp.iterrows():
balance=0
row.balance=row.debit.item()-row.credit.item()
result['date']=row.name
result['debit']=row.debit.item()
result['credit']=row.credit.item()
result['balance']=row.balance
print result
template_dict['lines'].append(result)
print template_dict

You need to create a new dictionary for each line. Otherwise, you're always changing the very same dictionary:
...
for index,row in acc_grp.iterrows():
result = {} # Create a brand new dictionary
balance=0
row.balance=row.debit.item()-row.credit.item()
result['date']=row.name
...
template_dict['lines'].append(result)

Related

How do extract query-like results from a nested dictionary in Python?

I have a relatively simple nested dictionary as below:
emp_details = {
'Employee': {
'jim': {'ID':'001', 'Sales':'75000', 'Title': 'Lead'},
'eva': {'ID':'002', 'Sales': '50000', 'Title': 'Associate'},
'tony': {'ID':'003', 'Sales': '150000', 'Title': 'Manager'}
}
}
I can get the sales info of 'eva' easily by:
print(emp_details['Employee']['eva']['Sales'])
but I'm having difficulty writing a statement to extract information on all employees whose sales are over 50000.
You can't use one statement because the list initializer expression can't have an if without an else.
Use a for loop:
result = {} # dict expression
result_list = [] # list expression using (key, value)
for key, value in list(emp_details['Employee'].items())): # iterate from all items in dictionary
if int(value['Sales']) > 50000: # your judgement
result[key] = value # add to dict
result_list.append((key, value)) # add to list
print(result)
print(result_list)
# should say:
'''
{'jim': {'ID':'001', 'Sales':'75000', 'Title': 'Lead'}, 'tony': {'ID':'003', 'Sales': '150000', 'Title': 'Manager'}}
[('jim', {'ID':'001', 'Sales':'75000', 'Title': 'Lead'}), ('tony', {'ID':'003', 'Sales': '150000', 'Title': 'Manager'})]
'''
Your Sales is of String type.
Therefore, we can do something like this to get the information of employees whose sales are over 50000 : -
Method1 :
If you just want to get the information : -
emp_details={'Employee':{'jim':{'ID':'001', 'Sales':'75000', 'Title': 'Lead'}, \
'eva':{'ID':'002', 'Sales': '50000', 'Title': 'Associate'}, \
'tony':{'ID':'003', 'Sales': '150000', 'Title': 'Manager'}
}}
for emp in emp_details['Employee']:
if int(emp_details['Employee'][emp]['Sales']) > 50000:
print(emp_details['Employee'][emp])
It print outs to -:
{'ID': '001', 'Sales': '75000', 'Title': 'Lead'}
{'ID': '003', 'Sales': '150000', 'Title': 'Manager'}
Method2 : You can use Dict and List comprehension to get complete information : -
emp_details={'Employee':{'jim':{'ID':'001', 'Sales':'75000', 'Title': 'Lead'}, \
'eva':{'ID':'002', 'Sales': '50000', 'Title': 'Associate'}, \
'tony':{'ID':'003', 'Sales': '150000', 'Title': 'Manager'}
}}
emp_details_dictComp = {k:v for k,v in list(emp_details['Employee'].items()) if int(v['Sales']) > 50000}
print(emp_details_dictComp)
emp_details_listComp = [(k,v) for k,v in list(emp_details['Employee'].items()) if int(v['Sales']) > 50000]
print(emp_details_listComp)
Result : -
{'jim': {'ID': '001', 'Sales': '75000', 'Title': 'Lead'}, 'tony': {'ID': '003', 'Sales': '150000', 'Title': 'Manager'}}
[('jim', {'ID': '001', 'Sales': '75000', 'Title': 'Lead'}), ('tony', {'ID': '003', 'Sales': '150000', 'Title': 'Manager'})]

Reading a JSON with multiple nested lists and transforming into a DataFrama

I have a JSON inside a list. And this JSON have lists inside of lists. Something like that:
my data = [{'page': 1,
'page_size': 100,
'total_pages': 11,
'total_results': 1057,
'items': [{'jw_entity_id': 'ts88361',
'id': 88361,
'title': 'Love, Death & Robots',
'object_type': 'show',
'scoring': [{'provider_type': 'imdb:votes', 'value': 131937},
{'provider_type': 'tmdb:score', 'value': 8.2},
{'provider_type': 'imdb:score', 'value': 8.4}]},
{'jw_entity_id': 'tm374139',
'id': 374139,
'title': 'Sonic - O Filme',
'object_type': 'movie',
'scoring': [{'provider_type': 'tmdb:id', 'value': 454626},
{'provider_type': 'imdb:score', 'value': 6.5},
{'provider_type': 'tmdb:score', 'value': 7.4}]
I managed to transform it into a DataFrame, but one of the column scoring/provider_type still with values nested. How can I "unpack" that list and integrate into de DataFrame?
from pandas import json_normalize
df = pd.concat([json_normalize(entry, 'items')
for entry in my_data])
This is what I get now:
{'jw_entity_id': {0: 'ts88361', 1: 'tm374139'},
'id': {0: 88361, 1: 374139},
'title': {0: 'Love, Death & Robots', 1: 'Sonic - O Filme'},
'object_type': {0: 'show', 1: 'movie'},
'scoring': {0: [{'provider_type': 'imdb:votes', 'value': 131937},
{'provider_type': 'tmdb:score', 'value': 8.2},
{'provider_type': 'imdb:score', 'value': 8.4}],
1: [{'provider_type': 'tmdb:id', 'value': 454626},
{'provider_type': 'imdb:score', 'value': 6.5},
{'provider_type': 'tmdb:score', 'value': 7.4}]}}
I need the scoring column "unpacked", with the imdb:score as a column.
The Structure of your dictionaries in the scoring column is a bit convoluted with the repeating keys.
You can concatenate Dataframes created from these lists:
df = pd.concat([pd.json_normalize(entry, 'items')
for entry in my_data])
df_scor = pd.concat([
pd.DataFrame({x['provider_type']: [x['value']]
for x in l }
)
for l in df['scoring'].to_list()
]).reset_index(drop=True)
df = df.drop('scoring', axis=1).join(df_scor['imdb:score']) # here we keep only imdb:score
print(df)
Output:
jw_entity_id id title object_type imdb:score
0 ts88361 88361 Love, Death & Robots show 8.4
1 tm374139 374139 Sonic - O Filme movie 6.5

Extracting value for one dictionary key in Pandas based on another in the same dictionary

This is from an R guy.
I have this mess in a Pandas column: data['crew'].
array(["[{'credit_id': '54d5356ec3a3683ba0000039', 'department': 'Production', 'gender': 1, 'id': 494, 'job': 'Casting', 'name': 'Terri Taylor', 'profile_path': None}, {'credit_id': '56407fa89251417055000b58', 'department': 'Sound', 'gender': 0, 'id': 6745, 'job': 'Music Editor', 'name': 'Richard Henderson', 'profile_path': None}, {'credit_id': '5789212392514135d60025fd', 'department': 'Production', 'gender': 2, 'id': 9250, 'job': 'Executive In Charge Of Production', 'name': 'Jeffrey Stott', 'profile_path': None}, {'credit_id': '57892074c3a36835fa002886', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 23783, 'job': 'Makeup Artist', 'name': 'Heather Plott', 'profile_path': None}
It goes on for quite some time. Each new dict starts with a credit_id field. One sell can hold several dicts in an array.
Assume I want the names of all Casting directors, as shown in the first entry. I need to check check the job entry in every dict and, if it's Casting, grab what's in the name field and store it in my data frame in data['crew'].
I tried several strategies, then backed off and went for something simple.
Running the following shut me down, so I can't even access a simple field. How can I get this done in Pandas.
for row in data.head().iterrows():
if row['crew'].job == 'Casting':
print(row['crew'])
EDIT: Error Message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-138-aa6183fdf7ac> in <module>()
1 for row in data.head().iterrows():
----> 2 if row['crew'].job == 'Casting':
3 print(row['crew'])
TypeError: tuple indices must be integers or slices, not str
EDIT: Code used to get the array of dict (strings?) in the first place.
def convert_JSON(data_as_string):
try:
dict_representation = ast.literal_eval(data_as_string)
return dict_representation
except ValueError:
return []
data["crew"] = data["crew"].map(lambda x: sorted([d['name'] if d['job'] == 'Casting' else '' for d in convert_JSON(x)])).map(lambda x: ','.join(map(str, x))
To create a DataFrame from your sample data, write:
df = pd.DataFrame(data=[
{ 'credit_id': '54d5356ec3a3683ba0000039', 'department': 'Production',
'gender': 1, 'id': 494, 'job': 'Casting', 'name': 'Terri Taylor',
'profile_path': None},
{ 'credit_id': '56407fa89251417055000b58', 'department': 'Sound',
'gender': 0, 'id': 6745, 'job': 'Music Editor',
'name': 'Richard Henderson', 'profile_path': None},
{ 'credit_id': '5789212392514135d60025fd', 'department': 'Production',
'gender': 2, 'id': 9250, 'job': 'Executive In Charge Of Production',
'name': 'Jeffrey Stott', 'profile_path': None},
{ 'credit_id': '57892074c3a36835fa002886', 'department': 'Costume & Make-Up',
'gender': 0, 'id': 23783, 'job': 'Makeup Artist',
'name': 'Heather Plott', 'profile_path': None}])
Then you can get your data with a single instruction:
df[df.job == 'Casting'].name
The result is:
0 Terri Taylor
Name: name, dtype: object
The above result is Pandas Series object with names found.
In this case, 0 is the index value for the record found and
Terri Taylor is the name of (the only in your data) Casting Director.
Edit
If you want just a list (not Series), write:
df[df.job == 'Casting'].name.tolist()
The result is ['Terri Taylor'] - just a list.
I think, both my solutions should be quicker than "ordinary" loop
based on iterrows().
Checking the execution time, you may try also yet another solution:
df.query("job == 'Casting'").name.tolist()
==========
And as far as your code is concerned:
iterrows() returns each time a pair containing:
the key of the current row,
a named tuple - the content of this row.
So your loop should look something like:
for row in df.iterrows():
if row[1].job == 'Casting':
print(row[1]['name'])
You can not write row[1].name because it refers to the index value
(here we have a collision with default attributes of the named tuple).

Sublist values to two different variables?

I'm trying to assign two values of a list to two different variables. Here is the json list. Its raising key error. Please let me know where I am wrong.
[{'min': 1.158, 'max': 1.150, 'id': 269097, 'to': 1532003820, 'from': 1532003760, 'check': 1.15852, 'no_check': 1.15822, 'volume': 0},{'min': 1.1, 'max': 1.17, 'id': 269098, 'to': 1532003880, 'from': 1532003820, 'check': 1.158615, 'nocheck': 1.158515, 'volume': 0}]
Here is my code python3 code:
pt = [{'min': 1.158, 'max': 1.150, 'id': 269097, 'to': 1532003820, 'from': 1532003760, 'check': 1.15852, 'no_check': 1.15822, 'volume': 0},{'min': 1.1, 'max': 1.17, 'id': 269098, 'to': 1532003880, 'from': 1532003820, 'check': 1.158615, 'nocheck': 1.158515, 'volume': 0}]
y = [item[0] for item in pt]
z = [item[0] for item in pt]
print(y)
print(z)
Error:
File "test_YourM.py", line 19, in <module>
y = [item[0][0] for item in pt] File "test_YourM.py", line 19, in <listcomp>
y = [item[0][0] for item in pt] KeyError: 0
Expected output:
print(y) # {'min': 1.158, 'max': 1.150, 'id': 269097, 'to': 1532003820, 'from': 1532003760, 'check': 1.15852, 'no_check': 1.15822, 'volume': 0}
print(z) # {'min': 1.1, 'max': 1.17, 'id': 269098, 'to': 1532003880, 'from': 1532003820, 'check': 1.158615, 'nocheck': 1.158515, 'volume': 0}
[item for item in pt[0]]
[item for item in pt[1]]
The item is generated in in that scope, while pt isn't, even though you're enumerating a dict, you may want to do something like this:
{key: value for key, value in pt[0].items()}
{key: value for key, value in pt[1].items()}

Appending to List within a Dictionary

Objective: Append items from value['itemArray'] to Products['Items'] - see function fba_orders
Problem: The Current code only appends the last item of value['itemArray'] to Products['Items']
Current Output:
{'Items': [{'SellerFulfillmentOrderItemId': 266804219, 'SellerSKU': 'IX-GZ31-31K6', 'Quantity': 1}, {'SellerFulfillmentOrderItemId': 266804219, 'SellerSKU': 'IX-GZ31-31K6', 'Quantity': 1}]}
Correct Output would be:
{'Items': [{'SellerFulfillmentOrderItemId': 266804218, 'SellerSKU': 'KM-090914-840-BEARLAPTOP', 'Quantity': 1}, {'SellerFulfillmentOrderItemId': 266804219, 'SellerSKU': 'IX-GZ31-31K6', 'Quantity': 1}]}
Code:
import sys
VALUE = {'amountPaid': '43.38',
'amountSaved': 0.0,
'buyerCheckoutMessage': '',
'buyerUserID': 13182254,
'buyerUserName': 'W5Tiny',
'checkoutStatus': {'status': 'Complete'},
'createdTime': '2015-06-30T22:41:01Z',
'creatingUserRole': 'Buyer',
'itemArray': [{'item': {'itemID': 266804218,
'price': '21.1',
'quantity': 1,
'sellerInventoryID': 'KM-090914-840-BEARLAPTOP',
'sk': 'KM-090914-840-BEARLAPTOP',
'title': u"VTech Bear's Baby Laptop, Blue [Toy]"}},
{'item': {'itemID': 266804219,
'price': '22.28',
'quantity': 1,
'sellerInventoryID': 'IX-GZ31-31K6',
'sk': 'IX-GZ31-31K6',
'title': 'Toy State Caterpillar Push Powered Rev It Up Dump Truck [Toy]'}}],
'orderID': 34013525,
'orderStatus': 'Completed',
'paidTime': '2015-06-30T22:50:38Z',
'shippingAddress': {'addressID': 15798541,
'cityName': 'Nashville',
'country': 'US',
'countryName': None,
'name': 'UNKNOWN',
'postalCode': '37221',
'stateOrProvince': 'TN',
'street1': '123 BOOGIE DRIVE',
'street2': None},
'shippingDetails': {'amount': '0.0',
'insuranceFee': 0,
'servicesArray': [],
'shippingService': 'Standard shipping'},
'subtotal': 43.38,
'taxAmount': 0.0,
'total': '43.38',
'transactionArray': {'transaction': {'buyer': {'email': 'fakewilson259612#hotmail.com'},
'finalValueFee': '0.0',
'providerID': '11V84334FD304010L',
'providerName': 'Paypal'}}}
def fba_order():
address = {}
products = {'Items': []}
item = {}
Items = []
address['City'] = VALUE['shippingAddress']['cityName']
address['CountryCode'] = VALUE['shippingAddress']['country']
address['Line1'] = VALUE['shippingAddress']['street1']
address['Line2'] = VALUE['shippingAddress']['street2']
address['Name'] = VALUE['shippingAddress']['name']
address['PostalCode'] = VALUE['shippingAddress']['postalCode']
address['StateOrProvinceCode'] = VALUE['shippingAddress']['stateOrProvince']
for items in VALUE['itemArray']:
item['Quantity'] = items['item']['quantity']
item['SellerFulfillmentOrderItemId'] = items['item']['itemID']
item['SellerSKU'] = items['item']['sk']
products['Items'].append(item)
continue
print address, '\n', products
if __name__ == '__main__':
sys.exit(fba_order())
You are reusing the dictionary referenced by item over and over again; appending this one dictionary won't create a new copy. Rather, you are adding multiple references to that one dictionary. As you continue to alter the dictionary all those references will show those changes.
Better produce an entirely new dictionary for each loop iteration:
for items in VALUE['itemArray']:
item = {
'Quantity': items['item']['quantity'],
'SellerFulfillmentOrderItemId': items['item']['itemID']
'SellerSKU': items['item']['sk'],
}
products['Items'].append(item)
The issue is that you are creating item outside the for loop and then just changing the values inside the for loop and appending it to the list.
dictionaries are reference, and hence even after appending to products['Items'] list if you change the item dictionary it will make changes to the item that was appended to the list.
You want to initialize item to a new dictionary inside the for loop.
Example -
def fba_order():
address = {}
products = {'Items': []}
Items = []
address['City'] = VALUE['shippingAddress']['cityName']
address['CountryCode'] = VALUE['shippingAddress']['country']
address['Line1'] = VALUE['shippingAddress']['street1']
address['Line2'] = VALUE['shippingAddress']['street2']
address['Name'] = VALUE['shippingAddress']['name']
address['PostalCode'] = VALUE['shippingAddress']['postalCode']
address['StateOrProvinceCode'] = VALUE['shippingAddress']['stateOrProvince']
for items in VALUE['itemArray']:
item = {}
item['Quantity'] = items['item']['quantity']
item['SellerFulfillmentOrderItemId'] = items['item']['itemID']
item['SellerSKU'] = items['item']['sk']
products['Items'].append(item)
continue
print address, '\n', products

Categories

Resources