Data Transformation | Python

Data Transformation | Python - python

Need to perform a python data transformation from the left format to the right format for a excel file.
This is a very common business problem in the finance world, matching debit vs credits to get even. I guess we might need a for loop, but after googling without success.
any suggestions will be highly appreciated. the original data set is in json format below. thanks
Transformation requirement
{
"from": [
{
"scenario": "case 1",
"amount": "55.65",
"debit/credit": "debit",
"uid": "S001"
},
{
"scenario": "case 1",
"amount": "43.98",
"debit/credit": "debit",
"uid": "S002"
},
{
"scenario": "case 1",
"amount": "21.52",
"debit/credit": "credit",
"uid": "S003"
},
{
"scenario": "case 1",
"amount": "4.5",
"debit/credit": "credit",
"uid": "S004"
},
{
"scenario": "case 1",
"amount": "23.78",
"debit/credit": "credit",
"uid": "S005"
},
{
"scenario": "case 1",
"amount": "0.99",
"debit/credit": "credit",
"uid": "S006"
},
{
"scenario": "case 1",
"amount": "48.84",
"debit/credit": "credit",
"uid": "S007"
},
{
"scenario": "case 2",
"amount": "88.38",
"debit/credit": "debit",
"uid": "S008"
},
{
"scenario": "case 2",
"amount": "9.95",
"debit/credit": "debit",
"uid": "S009"
},
{
"scenario": "case 2",
"amount": "4.23",
"debit/credit": "credit",
"uid": "S010"
},
{
"scenario": "case 2",
"amount": "94.1",
"debit/credit": "credit",
"uid": "S011"
}
]
}

You can read in your data as a json file. Then use pandas.read_json
method to convert to a pandas data frame. The following will do what
you want.
import pandas as pd
data = pd.read_json("./debit_credit.json")
# boolean mask: whether debit or credit
debits = data['debit/credit'] == 'debit'
credits = data['debit/credit'] == 'credit'
# desired output dataframes
debits_df = data.loc[debits]
credits_df = data.loc[credits]
print(debits_df)
print(credits_df)
# whether debits and credits match
is_match = debits_df.amount.sum() == credits_df.amount.sum()
print(f'credit and debit match: {is_match}')

Related

Nested json files - Python

Good afternoon all,
I've been reading through the various posts regarding reading .json files using pandas but so far I've not been sucessful extract.
I need to read a specific 'score' in the json file of which I'll then iterate through all the json files I have as the label would be the same.
In the below how would I read the 'score'? I've tried using the normalise function but regardless of the agruement I put in I cannot get any closer.
Part of the json file:
"template_id": "template_fe61177cb0eb4642901b1eae9488fbb4",
"audit_id": "audit_1a0e9ef4a7914286808accb3dcb0700b",
"archived": false,
"created_at": "2022-10-07T08:00:14.021Z",
"modified_at": "2022-10-07T08:05:56.594Z",
"audit_data": {
"score": 10,
"total_score": 11,
"score_percentage": 90.909,
"name": "7 Oct 2022 / Test",
"duration": 240,
"authorship": {
"device_id": "user_65c3799b0f1a48549cacbceca244e1db",
"owner": "test",
"owner_id": "user_65c3799b0f1a48549cacbceca244e1db",
"author": "test",
"author_id": "user_65c3799b0f1a48549cacbceca244e1db"
},
"date_completed": "2022-10-07T08:05:55.860Z",
"date_modified": "2022-10-07T08:05:56.594Z",
"date_started": "2022-10-07T08:00:13.000Z",
"site": {
"name": "Blue Warehouse"
}
},
"template_data": {
"authorship": {
"device_id": "user_4bb896b5308341f7a7543a32f6c1f3ec",
"owner": "test",
"owner_id": "user_4bb896b5308341f7a7543a32f6c1f3ec",
"author": "test",
"author_id": "user_4bb896b5308341f7a7543a32f6c1f3ec"
},
"metadata": {
"description": "",
"name": "RCS",
"image": {
"date_created": "2022-04-12T13:27:18.852Z",
"file_ext": "png",
"label": "Go \u0026 See icon.PNG",
"media_id": "cf944a4b-7589-47e6-b42a-8d17f06b7031",
"href": "https://1"
}
},
"response_sets": {
"5b69aee5-0532-46a4-b2f5-d020d4d5381d": {
"id": "5b69aee5-0532-46a4-b2f5-d020d4d5381d",
"type": "question",
"responses": [
{
"id": "ef4abf51-3361-46f5-ba04-70c23c85ca20",
"label": "Good",
"colour": "19,133,95",
***"score": 1,***
"enable_score": true
},
Thanks for your help.
Rob.

This is done without pandas
import json
with open("my_file.json", 'r') as f:
my_dict = json.load(f)
score = my_dict["response_sets"]["5b69aee5-0532-46a4-b2f5-d020d4d5381d"]["responses"][0]["score"]

How to handle JSON list value in dataframe

I receive this json from an API call:
data = {'List': [{'id': 12403,
'name': 'myname',
'code': 'mycode',
'description': '',
'createdBy': '',
'createdDate': '24-Jun-2008 15:03:59 CDT',
'lastModifiedBy': '',
'lastModifiedDate': '24-Jun-2008 15:03:59 CDT'}]}
I want to handle this data and move it into a dataframe. When I attempt this with json_normalize it's basically putting my list value into a single cell in my dataframe.
My attempt:
import pandas as pd
df = pd.json_normalize(data)
Current output:
List
0 [{'id': 12403, 'name': 'myname', 'code': 'mycode...
Desired output:
Question
What's the best way to work with a list value from JSON to pandas dataframe?
Update
{
"Count": 38,
"Items": [
{
"Actions": [
"edit_",
"remove_",
"attachments_",
"cancel",
"continue",
"auditTrail",
"offline_",
"changeUser",
"linkRecord",
"resendNotification"
],
"Columns": [
{
"Label": "Workflow Name",
"Name": "__WorkflowName__",
"Value": "VOAPTSQA00000735"
},
{
"Label": "Workflow Description",
"Name": "__WorkflowDescription__",
"Value": "Vendor Outsourcing Contract Request (APTSQA | SAP Integration)"
},
{
"Label": "Current Assignee",
"Name": "__CurrentAssignee__",
"Value": "Vendor Outsourcing Integration User"
},
{
"Label": "Last Updated",
"Name": "__DateLastUpdated__",
"Value": "9/7/2022 12:22:14 PM"
},
{
"Label": "Created",
"Name": "__DateCreated__",
"Value": "9/7/2022 12:20:55 PM"
},
{
"Label": "Date Signed",
"Name": "__DateSigned__",
"Value": ""
},
{
"Label": "Completed",
"Name": "__DateCompleted__",
"Value": ""
},
{
"Label": "Status",
"Name": "__Status__",
"Value": "In RFP"
},
{
"Label": "Document ID",
"Name": "__DocumentIdentifier__",
"Value": ""
},
{
"Label": "End Date",
"Name": "__EndDate__",
"Value": "12/31/2033 12:00:00 AM"
},
{
"Label": "Stage Progress",
"Name": "__FormProgress__",
"Value": "0"
},
{
"Label": "Next Signer",
"Name": "__NextSigner__",
"Value": ""
}
],
"ResultSetId": "784a1b83-4d83-4b80-87a3-9c1293baa7d8",
"TaskId": "784a1b83-4d83-4b80-87a3-9c1293baa7d8",
"TokenId": "cdd53c33-803d-4a63-9abd-47b733b55e89"
}
Adding context for my comment about nested list of key pair values. Here when I normalize the json, I get the list of Columns all as one value in a cell.

The values of interest are under the List key, so slice it:
df = pd.json_normalize(data['List'])
output:
id name code description createdBy createdDate lastModifiedBy lastModifiedDate
0 12403 myname mycode 24-Jun-2008 15:03:59 CDT 24-Jun-2008 15:03:59 CDT

how to convert json response to excel using python

this reponse I am getting:
{
"value": [
{
"id": "/providers/Microsoft.Billing/Departments/1234/providers/Microsoft.Billing/billingPeriods/201903/providers/Microsoft.Consumption/usageDetails/usageDetails_Id1",
"name": "usageDetails_Id1",
"type": "Microsoft.Consumption/usageDetails",
"kind": "legacy",
"tags": {
"env": "newcrp",
"dev": "tools"
},
"properties": {
"billingAccountId": "xxxxxxxx",
"billingAccountName": "Account Name 1",
"billingPeriodStartDate": "2019-03-01T00:00:00.0000000Z",
"billingPeriodEndDate": "2019-03-31T00:00:00.0000000Z",
"billingProfileId": "xxxxxxxx",
"billingProfileName": "Account Name 1",
"accountName": "Account Name 1",
"subscriptionId": "00000000-0000-0000-0000-000000000000",
"subscriptionName": "Subscription Name 1",
"date": "2019-03-30T00:00:00.0000000Z",
"product": "Product Name 1",
"partNumber": "Part Number 1",
"meterId": "00000000-0000-0000-0000-000000000000",
"meterDetails": null,
"quantity": 0.7329,
"effectivePrice": 0.000402776395232,
"cost": 0.000295194820065,
"unitPrice": 4.38,
"billingCurrency": "CAD",
"resourceLocation": "USEast",
"consumedService": "Microsoft.Storage",
"resourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/Resource Group 1/providers/Microsoft.Storage/storageAccounts/Resource Name 1",
"resourceName": "Resource Name 1",
"invoiceSection": "Invoice Section 1",
"costCenter": "DEV",
"resourceGroup": "Resource Group 1",
"offerId": "Offer Id 1",
"isAzureCreditEligible": false,
"chargeType": "Usage",
"benefitId": "00000000-0000-0000-0000-000000000000",
"benefitName": "Reservation_purchase_03-09-2018_10-59"
}
},
{
"id": "/providers/Microsoft.Billing/Departments/1234/providers/Microsoft.Billing/billingPeriods/201903/providers/Microsoft.Consumption/usageDetails/usageDetails_Id1",
"name": "usageDetails_Id1",
"type": "Microsoft.Consumption/usageDetails",
"kind": "legacy",
"tags": {
"env": "newcrp",
"dev": "tools"
},
"properties": {
"billingAccountId": "xxxxxxxx",
"billingAccountName": "Account Name 1",
"billingPeriodStartDate": "2019-03-01T00:00:00.0000000Z",
"billingPeriodEndDate": "2019-03-31T00:00:00.0000000Z",
"billingProfileId": "xxxxxxxx",
"billingProfileName": "Account Name 1",
"accountName": "Account Name 1",
"subscriptionId": "00000000-0000-0000-0000-000000000000",
"subscriptionName": "Subscription Name 1",
"date": "2019-03-30T00:00:00.0000000Z",
"product": "Product Name 1",
"partNumber": "Part Number 1",
"meterId": "00000000-0000-0000-0000-000000000000",
"meterDetails": null,
"quantity": 0.7329,
"effectivePrice": 0.000402776395232,
"cost": 0.000295194820065,
"unitPrice": 4.38,
"billingCurrency": "CAD",
"resourceLocation": "USEast",
"consumedService": "Microsoft.Storage",
"resourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/Resource Group 1/providers/Microsoft.Storage/storageAccounts/Resource Name 1",
"resourceName": "Resource Name 1",
"invoiceSection": "Invoice Section 1",
"costCenter": "DEV",
"resourceGroup": "Resource Group 1",
"offerId": "Offer Id 1",
"isAzureCreditEligible": false,
"chargeType": "Usage",
"benefitId": "00000000-0000-0000-0000-000000000000",
"benefitName": "Reservation_purchase_03-09-2018_10-59"
}
}
]
}
code:
import pandas as pd
frame=pd.DataFrame()
for i in range (len(json_output['value'])):
df1= pd.DataFrame(data={'kind':json_output['value'][i]['kind'],
'id': json_output['value'][i]['id'],
'tags': json_output['value'][i]['tags'],
'name':json_output['value'][i]['name'],
'type':json_output['value'][i]['type'],
'billingAccountid':json_output['value'][i]['properties']['billingAccountId']},index=[i])
print(df1)
frame=frame.append(df1)
frame.to_csv('datt.csv')
Can you please help me to convert this data in to csv.
I am looking for
id,name,type,kind,tags,billingAccountId,resourceName etc into all column
I tried to convert into DataFrame it didn't work.
At last I am trying above python but its giving tags into null.
Note : I want to keep tags in dict format (for now)

I tried your code and stored json file into an output first:
-TAGS is a dictionary you access it without any keys so it will be NONE
If not comfortable by splitting TAGS use:
'tags':json_output['value'][i]['tags']['env']+json_output['value'][i]['tags']['dev']

Best way to build denormilazed dataframe with pandas from spotify API

I just downloaded some json from spotify and took a look into the pd.normalize_json().
But if I normalise the data i still have dictionaries within my dataframe. Also setting the level doesnt help.
DATA I want to have in my dataframe:
{
"collaborative": false,
"description": "",
"external_urls": {
"spotify": "https://open.spotify.com/playlist/5"
},
"followers": {
"href": null,
"total": 0
},
"href": "https://api.spotify.com/v1/playlists/5?additional_types=track",
"id": "5",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/a",
"width": 640
}
],
"name": "Another",
"owner": {
"display_name": "user",
"external_urls": {
"spotify": "https://open.spotify.com/user/user"
},
"href": "https://api.spotify.com/v1/users/user",
"id": "user",
"type": "user",
"uri": "spotify:user:user"
},
"primary_color": null,
"public": true,
"snapshot_id": "M2QxNTcyYTkMDc2",
"tracks": {
"href": "https://api.spotify.com/v1/playlists/100&additional_types=track",
"items": [
{
"added_at": "2020-12-13T18:34:09Z",
"added_by": {
"external_urls": {
"spotify": "https://open.spotify.com/user/user"
},
"href": "https://api.spotify.com/v1/users/user",
"id": "user",
"type": "user",
"uri": "spotify:user:user"
},
"is_local": false,
"primary_color": null,
"track": {
"album": {
"album_type": "album",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/1dfeR4Had"
},
"href": "https://api.spotify.com/v1/artists/1dfDbWqFHLkxsg1d",
"id": "1dfeR4HaWDbWqFHLkxsg1d",
"name": "Q",
"type": "artist",
"uri": "spotify:artist:1dfeRqFHLkxsg1d"
}
],
"available_markets": [
"CA",
"US"
],
"external_urls": {
"spotify": "https://open.spotify.com/album/6wPXmlLzZ5cCa"
},
"href": "https://api.spotify.com/v1/albums/6wPXUJ9LzZ5cCa",
"id": "6wPXUmYJ9zZ5cCa",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/ab676620a47",
"width": 640
},
{
"height": 300,
"url": "https://i.scdn.co/image/ab67616d0620a47",
"width": 300
},
{
"height": 64,
"url": "https://i.scdn.co/image/ab603e6620a47",
"width": 64
}
],
"name": "The (Deluxe ",
"release_date": "1920-07-17",
"release_date_precision": "day",
"total_tracks": 15,
"type": "album",
"uri": "spotify:album:6m5cCa"
},
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/1dg1d"
},
"href": "https://api.spotify.com/v1/artists/1dsg1d",
"id": "1dfeR4HaWDbWqFHLkxsg1d",
"name": "Q",
"type": "artist",
"uri": "spotify:artist:1dxsg1d"
}
],
"available_markets": [
"CA",
"US"
],
"disc_number": 1,
"duration_ms": 21453,
"episode": false,
"explicit": false,
"external_ids": {
"isrc": "GBU6015"
},
"external_urls": {
"spotify": "https://open.spotify.com/track/5716J"
},
"href": "https://api.spotify.com/v1/tracks/5716J",
"id": "5716J",
"is_local": false,
"name": "Another",
"popularity": 73,
"preview_url": null,
"track": true,
"track_number": 3,
"type": "track",
"uri": "spotify:track:516J"
},
"video_thumbnail": {
"url": null
}
}
],
"limit": 100,
"next": null,
"offset": 0,
"previous": null,
"total": 1
},
"type": "playlist",
"uri": "spotify:playlist:fek"
}
So what are best practices to read nested data like this into one dataframe in pandas?
I'm glad for any advice.
EDIT:
so basically I want all keys as columns in my dataframe. But with normalise it stops at "tracks.items" and if I normalise this again i have the recursive problem again.

It depends on the information you are looking for. Take a look at pandas.read_json() to see if that can work. Also you can select data as such
json_output = {"collaborative": 'false',"description": "", "external_urls": {"spotify": "https://open.spotify.com/playlist/5"}}
df['collaborative'] = json_output['collaborative'] #set value of your df to value of returned json values

JSON GET Request with Python

I'm working in an application in Python. I have to do a GET request to get specific information. My code is somethings like that:
...
conn = httplib.HTTPConnection(self.url)
header = {"Authorization":"Bearer "+self.token}
conn.request("GET","/data",headers=header)
...
and the JSON that I obtain is similar to this (you can observe that there are 2 big different part... this is just an example, in my application the parts are a lots).
[
{
"createdAt": "2015-11-26T10:06:05.756Z",
"date": "2015-10-31T23:00:00.000Z",
"files": [],
"id": 1,
"metadata": {},
"notes": "note impianto 1",
"parentSubject": {
"code": "soggetto1",
"createdAt": "2015-11-26T10:05:38.765Z",
"id": 1,
"metadata": {},
"notes": "note soggetto 1",
"personalInfo": 1,
"sex": "M",
"tags": null,
"type": 1,
"updatedAt": "2015-11-26T10:05:38.765Z"
}
},
{
"createdAt": "2015-11-26T10:06:36.684Z",
"date": "2015-11-01T23:00:00.000Z",
"files": [],
"id": 2,
"metadata": {},
"notes": "note impianto 2",
"parentSubject": {
"code": "soggetto1",
"createdAt": "2015-11-26T10:05:38.765Z",
"id": 1,
"metadata": {},
"notes": "note soggetto 1",
"personalInfo": 1,
"sex": "M",
"tags": null,
"type": 1,
"updatedAt": "2015-11-26T10:05:38.765Z"
}
}
]
If for example I make this request:
...
conn.request("GET","/data?id=1",headers=header)
...
I obviously get only the first part. The problem is that I don't want to get all data that have id=1 but all data that have code=soggetto1. How can I do?

If API can't do this then you have to get all data and find on your own.
In example I changed one code to "soggetto2"
data = '''[
{
"createdAt": "2015-11-26T10:06:05.756Z",
"date": "2015-10-31T23:00:00.000Z",
"files": [],
"id": 1,
"metadata": {},
"notes": "note impianto 1",
"parentSubject": {
"code": "soggetto2",
"createdAt": "2015-11-26T10:05:38.765Z",
"id": 1,
"metadata": {},
"notes": "note soggetto 1",
"personalInfo": 1,
"sex": "M",
"tags": null,
"type": 1,
"updatedAt": "2015-11-26T10:05:38.765Z"
}
},
{
"createdAt": "2015-11-26T10:06:36.684Z",
"date": "2015-11-01T23:00:00.000Z",
"files": [],
"id": 2,
"metadata": {},
"notes": "note impianto 2",
"parentSubject": {
"code": "soggetto2",
"createdAt": "2015-11-26T10:05:38.765Z",
"id": 1,
"metadata": {},
"notes": "note soggetto 1",
"personalInfo": 1,
"sex": "M",
"tags": null,
"type": 1,
"updatedAt": "2015-11-26T10:05:38.765Z"
}
}
]'''
#------------------------------------------------------
import json
j = json.loads(data)
results = []
for x in j:
if x["parentSubject"]["code"] == "soggetto1":
results.append(x)
print results
You can do this with list comprehension
results = [ x for x in j if x["parentSubject"]["code"] == "soggetto1" ]
or filter()
results = filter(lambda x:x["parentSubject"]["code"] == "soggetto1", j)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Data Transformation | Python - python

Related

Nested json files - Python

How to handle JSON list value in dataframe

how to convert json response to excel using python

Best way to build denormilazed dataframe with pandas from spotify API

JSON GET Request with Python

Categories

Resources