Data Transformation | Python - python

Need to perform a python data transformation from the left format to the right format for a excel file.
This is a very common business problem in the finance world, matching debit vs credits to get even. I guess we might need a for loop, but after googling without success.
any suggestions will be highly appreciated. the original data set is in json format below. thanks
Transformation requirement
{
"from": [
{
"scenario": "case 1",
"amount": "55.65",
"debit/credit": "debit",
"uid": "S001"
},
{
"scenario": "case 1",
"amount": "43.98",
"debit/credit": "debit",
"uid": "S002"
},
{
"scenario": "case 1",
"amount": "21.52",
"debit/credit": "credit",
"uid": "S003"
},
{
"scenario": "case 1",
"amount": "4.5",
"debit/credit": "credit",
"uid": "S004"
},
{
"scenario": "case 1",
"amount": "23.78",
"debit/credit": "credit",
"uid": "S005"
},
{
"scenario": "case 1",
"amount": "0.99",
"debit/credit": "credit",
"uid": "S006"
},
{
"scenario": "case 1",
"amount": "48.84",
"debit/credit": "credit",
"uid": "S007"
},
{
"scenario": "case 2",
"amount": "88.38",
"debit/credit": "debit",
"uid": "S008"
},
{
"scenario": "case 2",
"amount": "9.95",
"debit/credit": "debit",
"uid": "S009"
},
{
"scenario": "case 2",
"amount": "4.23",
"debit/credit": "credit",
"uid": "S010"
},
{
"scenario": "case 2",
"amount": "94.1",
"debit/credit": "credit",
"uid": "S011"
}
]
}

You can read in your data as a json file. Then use pandas.read_json
method to convert to a pandas data frame. The following will do what
you want.
import pandas as pd
data = pd.read_json("./debit_credit.json")
# boolean mask: whether debit or credit
debits = data['debit/credit'] == 'debit'
credits = data['debit/credit'] == 'credit'
# desired output dataframes
debits_df = data.loc[debits]
credits_df = data.loc[credits]
print(debits_df)
print(credits_df)
# whether debits and credits match
is_match = debits_df.amount.sum() == credits_df.amount.sum()
print(f'credit and debit match: {is_match}')

Related

Nested json files - Python

Good afternoon all,
I've been reading through the various posts regarding reading .json files using pandas but so far I've not been sucessful extract.
I need to read a specific 'score' in the json file of which I'll then iterate through all the json files I have as the label would be the same.
In the below how would I read the 'score'? I've tried using the normalise function but regardless of the agruement I put in I cannot get any closer.
Part of the json file:
"template_id": "template_fe61177cb0eb4642901b1eae9488fbb4",
"audit_id": "audit_1a0e9ef4a7914286808accb3dcb0700b",
"archived": false,
"created_at": "2022-10-07T08:00:14.021Z",
"modified_at": "2022-10-07T08:05:56.594Z",
"audit_data": {
"score": 10,
"total_score": 11,
"score_percentage": 90.909,
"name": "7 Oct 2022 / Test",
"duration": 240,
"authorship": {
"device_id": "user_65c3799b0f1a48549cacbceca244e1db",
"owner": "test",
"owner_id": "user_65c3799b0f1a48549cacbceca244e1db",
"author": "test",
"author_id": "user_65c3799b0f1a48549cacbceca244e1db"
},
"date_completed": "2022-10-07T08:05:55.860Z",
"date_modified": "2022-10-07T08:05:56.594Z",
"date_started": "2022-10-07T08:00:13.000Z",
"site": {
"name": "Blue Warehouse"
}
},
"template_data": {
"authorship": {
"device_id": "user_4bb896b5308341f7a7543a32f6c1f3ec",
"owner": "test",
"owner_id": "user_4bb896b5308341f7a7543a32f6c1f3ec",
"author": "test",
"author_id": "user_4bb896b5308341f7a7543a32f6c1f3ec"
},
"metadata": {
"description": "",
"name": "RCS",
"image": {
"date_created": "2022-04-12T13:27:18.852Z",
"file_ext": "png",
"label": "Go \u0026 See icon.PNG",
"media_id": "cf944a4b-7589-47e6-b42a-8d17f06b7031",
"href": "https://1"
}
},
"response_sets": {
"5b69aee5-0532-46a4-b2f5-d020d4d5381d": {
"id": "5b69aee5-0532-46a4-b2f5-d020d4d5381d",
"type": "question",
"responses": [
{
"id": "ef4abf51-3361-46f5-ba04-70c23c85ca20",
"label": "Good",
"colour": "19,133,95",
***"score": 1,***
"enable_score": true
},
Thanks for your help.
Rob.
This is done without pandas
import json
with open("my_file.json", 'r') as f:
my_dict = json.load(f)
score = my_dict["response_sets"]["5b69aee5-0532-46a4-b2f5-d020d4d5381d"]["responses"][0]["score"]

How to handle JSON list value in dataframe

I receive this json from an API call:
data = {'List': [{'id': 12403,
'name': 'myname',
'code': 'mycode',
'description': '',
'createdBy': '',
'createdDate': '24-Jun-2008 15:03:59 CDT',
'lastModifiedBy': '',
'lastModifiedDate': '24-Jun-2008 15:03:59 CDT'}]}
I want to handle this data and move it into a dataframe. When I attempt this with json_normalize it's basically putting my list value into a single cell in my dataframe.
My attempt:
import pandas as pd
df = pd.json_normalize(data)
Current output:
List
0 [{'id': 12403, 'name': 'myname', 'code': 'mycode...
Desired output:
Question
What's the best way to work with a list value from JSON to pandas dataframe?
Update
{
"Count": 38,
"Items": [
{
"Actions": [
"edit_",
"remove_",
"attachments_",
"cancel",
"continue",
"auditTrail",
"offline_",
"changeUser",
"linkRecord",
"resendNotification"
],
"Columns": [
{
"Label": "Workflow Name",
"Name": "__WorkflowName__",
"Value": "VOAPTSQA00000735"
},
{
"Label": "Workflow Description",
"Name": "__WorkflowDescription__",
"Value": "Vendor Outsourcing Contract Request (APTSQA | SAP Integration)"
},
{
"Label": "Current Assignee",
"Name": "__CurrentAssignee__",
"Value": "Vendor Outsourcing Integration User"
},
{
"Label": "Last Updated",
"Name": "__DateLastUpdated__",
"Value": "9/7/2022 12:22:14 PM"
},
{
"Label": "Created",
"Name": "__DateCreated__",
"Value": "9/7/2022 12:20:55 PM"
},
{
"Label": "Date Signed",
"Name": "__DateSigned__",
"Value": ""
},
{
"Label": "Completed",
"Name": "__DateCompleted__",
"Value": ""
},
{
"Label": "Status",
"Name": "__Status__",
"Value": "In RFP"
},
{
"Label": "Document ID",
"Name": "__DocumentIdentifier__",
"Value": ""
},
{
"Label": "End Date",
"Name": "__EndDate__",
"Value": "12/31/2033 12:00:00 AM"
},
{
"Label": "Stage Progress",
"Name": "__FormProgress__",
"Value": "0"
},
{
"Label": "Next Signer",
"Name": "__NextSigner__",
"Value": ""
}
],
"ResultSetId": "784a1b83-4d83-4b80-87a3-9c1293baa7d8",
"TaskId": "784a1b83-4d83-4b80-87a3-9c1293baa7d8",
"TokenId": "cdd53c33-803d-4a63-9abd-47b733b55e89"
}
Adding context for my comment about nested list of key pair values. Here when I normalize the json, I get the list of Columns all as one value in a cell.
The values of interest are under the List key, so slice it:
df = pd.json_normalize(data['List'])
output:
id name code description createdBy createdDate lastModifiedBy lastModifiedDate
0 12403 myname mycode 24-Jun-2008 15:03:59 CDT 24-Jun-2008 15:03:59 CDT

how to convert json response to excel using python

this reponse I am getting:
{
"value": [
{
"id": "/providers/Microsoft.Billing/Departments/1234/providers/Microsoft.Billing/billingPeriods/201903/providers/Microsoft.Consumption/usageDetails/usageDetails_Id1",
"name": "usageDetails_Id1",
"type": "Microsoft.Consumption/usageDetails",
"kind": "legacy",
"tags": {
"env": "newcrp",
"dev": "tools"
},
"properties": {
"billingAccountId": "xxxxxxxx",
"billingAccountName": "Account Name 1",
"billingPeriodStartDate": "2019-03-01T00:00:00.0000000Z",
"billingPeriodEndDate": "2019-03-31T00:00:00.0000000Z",
"billingProfileId": "xxxxxxxx",
"billingProfileName": "Account Name 1",
"accountName": "Account Name 1",
"subscriptionId": "00000000-0000-0000-0000-000000000000",
"subscriptionName": "Subscription Name 1",
"date": "2019-03-30T00:00:00.0000000Z",
"product": "Product Name 1",
"partNumber": "Part Number 1",
"meterId": "00000000-0000-0000-0000-000000000000",
"meterDetails": null,
"quantity": 0.7329,
"effectivePrice": 0.000402776395232,
"cost": 0.000295194820065,
"unitPrice": 4.38,
"billingCurrency": "CAD",
"resourceLocation": "USEast",
"consumedService": "Microsoft.Storage",
"resourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/Resource Group 1/providers/Microsoft.Storage/storageAccounts/Resource Name 1",
"resourceName": "Resource Name 1",
"invoiceSection": "Invoice Section 1",
"costCenter": "DEV",
"resourceGroup": "Resource Group 1",
"offerId": "Offer Id 1",
"isAzureCreditEligible": false,
"chargeType": "Usage",
"benefitId": "00000000-0000-0000-0000-000000000000",
"benefitName": "Reservation_purchase_03-09-2018_10-59"
}
},
{
"id": "/providers/Microsoft.Billing/Departments/1234/providers/Microsoft.Billing/billingPeriods/201903/providers/Microsoft.Consumption/usageDetails/usageDetails_Id1",
"name": "usageDetails_Id1",
"type": "Microsoft.Consumption/usageDetails",
"kind": "legacy",
"tags": {
"env": "newcrp",
"dev": "tools"
},
"properties": {
"billingAccountId": "xxxxxxxx",
"billingAccountName": "Account Name 1",
"billingPeriodStartDate": "2019-03-01T00:00:00.0000000Z",
"billingPeriodEndDate": "2019-03-31T00:00:00.0000000Z",
"billingProfileId": "xxxxxxxx",
"billingProfileName": "Account Name 1",
"accountName": "Account Name 1",
"subscriptionId": "00000000-0000-0000-0000-000000000000",
"subscriptionName": "Subscription Name 1",
"date": "2019-03-30T00:00:00.0000000Z",
"product": "Product Name 1",
"partNumber": "Part Number 1",
"meterId": "00000000-0000-0000-0000-000000000000",
"meterDetails": null,
"quantity": 0.7329,
"effectivePrice": 0.000402776395232,
"cost": 0.000295194820065,
"unitPrice": 4.38,
"billingCurrency": "CAD",
"resourceLocation": "USEast",
"consumedService": "Microsoft.Storage",
"resourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/Resource Group 1/providers/Microsoft.Storage/storageAccounts/Resource Name 1",
"resourceName": "Resource Name 1",
"invoiceSection": "Invoice Section 1",
"costCenter": "DEV",
"resourceGroup": "Resource Group 1",
"offerId": "Offer Id 1",
"isAzureCreditEligible": false,
"chargeType": "Usage",
"benefitId": "00000000-0000-0000-0000-000000000000",
"benefitName": "Reservation_purchase_03-09-2018_10-59"
}
}
]
}
code:
import pandas as pd
frame=pd.DataFrame()
for i in range (len(json_output['value'])):
df1= pd.DataFrame(data={'kind':json_output['value'][i]['kind'],
'id': json_output['value'][i]['id'],
'tags': json_output['value'][i]['tags'],
'name':json_output['value'][i]['name'],
'type':json_output['value'][i]['type'],
'billingAccountid':json_output['value'][i]['properties']['billingAccountId']},index=[i])
print(df1)
frame=frame.append(df1)
frame.to_csv('datt.csv')
Can you please help me to convert this data in to csv.
I am looking for
id,name,type,kind,tags,billingAccountId,resourceName etc into all column
I tried to convert into DataFrame it didn't work.
At last I am trying above python but its giving tags into null.
Note : I want to keep tags in dict format (for now)
I tried your code and stored json file into an output first:
-TAGS is a dictionary you access it without any keys so it will be NONE
If not comfortable by splitting TAGS use:
'tags':json_output['value'][i]['tags']['env']+json_output['value'][i]['tags']['dev']

Best way to build denormilazed dataframe with pandas from spotify API

I just downloaded some json from spotify and took a look into the pd.normalize_json().
But if I normalise the data i still have dictionaries within my dataframe. Also setting the level doesnt help.
DATA I want to have in my dataframe:
{
"collaborative": false,
"description": "",
"external_urls": {
"spotify": "https://open.spotify.com/playlist/5"
},
"followers": {
"href": null,
"total": 0
},
"href": "https://api.spotify.com/v1/playlists/5?additional_types=track",
"id": "5",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/a",
"width": 640
}
],
"name": "Another",
"owner": {
"display_name": "user",
"external_urls": {
"spotify": "https://open.spotify.com/user/user"
},
"href": "https://api.spotify.com/v1/users/user",
"id": "user",
"type": "user",
"uri": "spotify:user:user"
},
"primary_color": null,
"public": true,
"snapshot_id": "M2QxNTcyYTkMDc2",
"tracks": {
"href": "https://api.spotify.com/v1/playlists/100&additional_types=track",
"items": [
{
"added_at": "2020-12-13T18:34:09Z",
"added_by": {
"external_urls": {
"spotify": "https://open.spotify.com/user/user"
},
"href": "https://api.spotify.com/v1/users/user",
"id": "user",
"type": "user",
"uri": "spotify:user:user"
},
"is_local": false,
"primary_color": null,
"track": {
"album": {
"album_type": "album",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/1dfeR4Had"
},
"href": "https://api.spotify.com/v1/artists/1dfDbWqFHLkxsg1d",
"id": "1dfeR4HaWDbWqFHLkxsg1d",
"name": "Q",
"type": "artist",
"uri": "spotify:artist:1dfeRqFHLkxsg1d"
}
],
"available_markets": [
"CA",
"US"
],
"external_urls": {
"spotify": "https://open.spotify.com/album/6wPXmlLzZ5cCa"
},
"href": "https://api.spotify.com/v1/albums/6wPXUJ9LzZ5cCa",
"id": "6wPXUmYJ9zZ5cCa",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/ab676620a47",
"width": 640
},
{
"height": 300,
"url": "https://i.scdn.co/image/ab67616d0620a47",
"width": 300
},
{
"height": 64,
"url": "https://i.scdn.co/image/ab603e6620a47",
"width": 64
}
],
"name": "The (Deluxe ",
"release_date": "1920-07-17",
"release_date_precision": "day",
"total_tracks": 15,
"type": "album",
"uri": "spotify:album:6m5cCa"
},
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/1dg1d"
},
"href": "https://api.spotify.com/v1/artists/1dsg1d",
"id": "1dfeR4HaWDbWqFHLkxsg1d",
"name": "Q",
"type": "artist",
"uri": "spotify:artist:1dxsg1d"
}
],
"available_markets": [
"CA",
"US"
],
"disc_number": 1,
"duration_ms": 21453,
"episode": false,
"explicit": false,
"external_ids": {
"isrc": "GBU6015"
},
"external_urls": {
"spotify": "https://open.spotify.com/track/5716J"
},
"href": "https://api.spotify.com/v1/tracks/5716J",
"id": "5716J",
"is_local": false,
"name": "Another",
"popularity": 73,
"preview_url": null,
"track": true,
"track_number": 3,
"type": "track",
"uri": "spotify:track:516J"
},
"video_thumbnail": {
"url": null
}
}
],
"limit": 100,
"next": null,
"offset": 0,
"previous": null,
"total": 1
},
"type": "playlist",
"uri": "spotify:playlist:fek"
}
So what are best practices to read nested data like this into one dataframe in pandas?
I'm glad for any advice.
EDIT:
so basically I want all keys as columns in my dataframe. But with normalise it stops at "tracks.items" and if I normalise this again i have the recursive problem again.
It depends on the information you are looking for. Take a look at pandas.read_json() to see if that can work. Also you can select data as such
json_output = {"collaborative": 'false',"description": "", "external_urls": {"spotify": "https://open.spotify.com/playlist/5"}}
df['collaborative'] = json_output['collaborative'] #set value of your df to value of returned json values

JSON GET Request with Python

I'm working in an application in Python. I have to do a GET request to get specific information. My code is somethings like that:
...
conn = httplib.HTTPConnection(self.url)
header = {"Authorization":"Bearer "+self.token}
conn.request("GET","/data",headers=header)
...
and the JSON that I obtain is similar to this (you can observe that there are 2 big different part... this is just an example, in my application the parts are a lots).
[
{
"createdAt": "2015-11-26T10:06:05.756Z",
"date": "2015-10-31T23:00:00.000Z",
"files": [],
"id": 1,
"metadata": {},
"notes": "note impianto 1",
"parentSubject": {
"code": "soggetto1",
"createdAt": "2015-11-26T10:05:38.765Z",
"id": 1,
"metadata": {},
"notes": "note soggetto 1",
"personalInfo": 1,
"sex": "M",
"tags": null,
"type": 1,
"updatedAt": "2015-11-26T10:05:38.765Z"
}
},
{
"createdAt": "2015-11-26T10:06:36.684Z",
"date": "2015-11-01T23:00:00.000Z",
"files": [],
"id": 2,
"metadata": {},
"notes": "note impianto 2",
"parentSubject": {
"code": "soggetto1",
"createdAt": "2015-11-26T10:05:38.765Z",
"id": 1,
"metadata": {},
"notes": "note soggetto 1",
"personalInfo": 1,
"sex": "M",
"tags": null,
"type": 1,
"updatedAt": "2015-11-26T10:05:38.765Z"
}
}
]
If for example I make this request:
...
conn.request("GET","/data?id=1",headers=header)
...
I obviously get only the first part. The problem is that I don't want to get all data that have id=1 but all data that have code=soggetto1. How can I do?
If API can't do this then you have to get all data and find on your own.
In example I changed one code to "soggetto2"
data = '''[
{
"createdAt": "2015-11-26T10:06:05.756Z",
"date": "2015-10-31T23:00:00.000Z",
"files": [],
"id": 1,
"metadata": {},
"notes": "note impianto 1",
"parentSubject": {
"code": "soggetto2",
"createdAt": "2015-11-26T10:05:38.765Z",
"id": 1,
"metadata": {},
"notes": "note soggetto 1",
"personalInfo": 1,
"sex": "M",
"tags": null,
"type": 1,
"updatedAt": "2015-11-26T10:05:38.765Z"
}
},
{
"createdAt": "2015-11-26T10:06:36.684Z",
"date": "2015-11-01T23:00:00.000Z",
"files": [],
"id": 2,
"metadata": {},
"notes": "note impianto 2",
"parentSubject": {
"code": "soggetto2",
"createdAt": "2015-11-26T10:05:38.765Z",
"id": 1,
"metadata": {},
"notes": "note soggetto 1",
"personalInfo": 1,
"sex": "M",
"tags": null,
"type": 1,
"updatedAt": "2015-11-26T10:05:38.765Z"
}
}
]'''
#------------------------------------------------------
import json
j = json.loads(data)
results = []
for x in j:
if x["parentSubject"]["code"] == "soggetto1":
results.append(x)
print results
You can do this with list comprehension
results = [ x for x in j if x["parentSubject"]["code"] == "soggetto1" ]
or filter()
results = filter(lambda x:x["parentSubject"]["code"] == "soggetto1", j)

Categories

Resources