How to create the given jSON format from a pandas dataframe? - python

The data looks like this :
The expected Json fomat is like this
{
"DataExtractName": "SalesDataExtract",
"BusinessName" : {
"InvoiceDate": {
"SourceSystem": {
"MYSQL" : "Invc_Dt",
"CSV" : "Invc_Date"
},
"DataType": {
"MYSQL" : "varchar",
"CSV" : "string"
}
},
"Description": {
"SourceSystem": {
"MYSQL" : "Prod_Desc",
"CSV" : "Prod_Descr"
},
"DataType": {
"MYSQL" : "varchar",
"CSV" : "string"
}
}
}
},
{
"DataExtractName": "DateDataExtract",
"BusinessName" : {
"InvoiceDate": {
"SourceSystem": {
"MYSQL" : "Date"
},
"DataType": {
"MYSQL" : "varchar"
}
}
}
}
How do i achieve this using python dataframes? Or do i need to write some script to make the data like this?
Note
I've tried using -
df.to_json
df.to_dict

With so many nested structures, you should use marshmallow. It is built with your use case in mind. Please check out the excellent documentation: https://marshmallow.readthedocs.io/en/stable/ . All you need is the masic usage.
It is a lot of code, but better be explicit than clever. I am sure a shorter solution exists, but it is probably unmaintainable. Also I had to build your dataframe. Please provide it in a data format next time.
import pandas as pd
import marshmallow as ma
# build test data
df = pd.DataFrame.from_records([
['InvoiceDate', 'MYSQL', 'Invc_Dt', 'varchar', 'SalesDataExtract'],
['InvoiceDate', 'CSV', 'Invc_Date', 'string', 'SalesDataExtract'],
['Description', 'MYSQL', 'Prod_Descr', 'varchar', 'SalesDataExtract'],
['Description', 'CSV', 'Prod_Descr', 'string', 'SalesDataExtract'],
['InvoiceDate', 'MYSQL', 'Date', 'varchar', 'DateDataExtract'],]
)
df.columns = ['BusinessName', 'SourceSystem', 'FunctionalName', 'DataType', 'DataExtractName']
# define marshmallow schemas
class SourceSystemTypeSchema(ma.Schema):
MYSQL = ma.fields.String()
CSV = ma.fields.String()
class DataTypeSchema(ma.Schema):
MYSQL = ma.fields.String()
CSV = ma.fields.String()
class InvoiceDateSchema(ma.Schema):
InvoiceDate = ma.fields.Nested(SourceSystemTypeSchema())
DataType = ma.fields.Nested(DataTypeSchema())
class DescriptionSchema(ma.Schema):
SourceSystem = ma.fields.Nested(SourceSystemTypeSchema())
DataType = ma.fields.Nested(DataTypeSchema())
class BusinessNameSchema(ma.Schema):
InvoiceDate = ma.fields.Nested(InvoiceDateSchema())
Description = ma.fields.Nested(DescriptionSchema())
class DataSchema(ma.Schema):
DataExtractName = ma.fields.String()
BusinessName = ma.fields.Nested(BusinessNameSchema())
# building json
result = []
mask_business_name_invoicedate = df.BusinessName == 'InvoiceDate'
mask_business_name_description = df.BusinessName == 'Description'
for data_extract_name in set(df['DataExtractName'].to_list()):
mask_data_extract_name = df.DataExtractName == data_extract_name
# you need these two helper dfs to get the dictionaries
df_source_system = df[mask_data_extract_name & mask_business_name_invoicedate].set_index('SourceSystem').to_dict(orient='dict')
df_description = df[mask_data_extract_name & mask_business_name_description].set_index('SourceSystem').to_dict(orient='dict')
# all dictionaries are defined, so you can use your schemas
source_system_type = SourceSystemTypeSchema().dump(df_source_system['FunctionalName'])
data_type = DataTypeSchema().dump(df_source_system['DataType'])
source_system = SourceSystemTypeSchema().dump(df_description['FunctionalName'])
invoice_date = InvoiceDateSchema().dump({'SourceSystemType': source_system_type, 'DataType': data_type})
description = DescriptionSchema().dump({'SourceSystem': source_system, 'DataType': data_type})
business_name = BusinessNameSchema().dump({'InvoiceDate': invoice_date, 'Description': description})
data = DataSchema().dump({'DataExtractName': data_extract_name, 'BusinessName': business_name})
# end result
result.append(data)
Now,
ma.pprint(result)
returns
[{'BusinessName': {'Description': {'DataType': {'CSV': 'string',
'MYSQL': 'varchar'},
'SourceSystem': {'CSV': 'Prod_Descr',
'MYSQL': 'Prod_Descr'}},
'InvoiceDate': {'DataType': {'CSV': 'string',
'MYSQL': 'varchar'}}},
'DataExtractName': 'SalesDataExtract'},
{'BusinessName': {'Description': {'DataType': {'MYSQL': 'varchar'},
'SourceSystem': {}},
'InvoiceDate': {'DataType': {'MYSQL': 'varchar'}}},
'DataExtractName': 'DateDataExtract'}]

Related

Failing to parse complex JSON using Python and Marshmallow

I am using Marshmallow to create a mapper for a JSON file. Following are the details:
My JSON File:
{
"version": "3.0",
"name": "A1",
"request": {
"startdate": "26022022",
"enddate": "26022022",
"records": 1000
},
"ranking": {
"90": {
"name": "N1",
"class1": "C1"
},
"98": {
"name": "N2",
"class1": "C2"
},
"86": {
"name": "N3",
"class1": "C3"
}
}
}
My mapper class:
class RequestMapper(Schema):
startdate = fields.String()
enddate = fields.String()
records = fields.Int()
class Ranking(Schema):
name = fields.String()
class1 = fields.String()
class RankingMapper(Schema):
rank = fields.Nested(Ranking(), dataKey = fields.Int)
class SampleSchema(Schema):
name = fields.Str()
request = fields.Nested(RequestMapper())
ranking = fields.Nested(RankingMapper())
Code to call Mapper:
print("\n\nOutput using mapper")
pprint(mapper.SampleSchema().dump(data), indent=3)
print("\n\n")
Following is the output:
Output using mapper
{ 'name': 'A1',
'ranking': {},
'request': {'enddate': '26022022', 'records': 1000, 'startdate': '26022022'}}
I am not getting any data for ranking as datakey [90, 98, 86...] are dynamic and am not sure how to create mapper for dynamic keys please.
Any inputs will be helpful.
Thank you
When nesting schemas, pass the class NAME, not a class instance:
class RankingMapper(Schema):
rank = fields.Nested(Ranking, dataKey = fields.Int)
class SampleSchema(Schema):
name = fields.Str()
request = fields.Nested(RequestMapper)
ranking = fields.Nested(RankingMapper)

How to get specific json value - python

I am migrating my code from java to python, but I am still having some difficulties understanding how to fetch a specific path in json using python.
This is my Java code, which returns a list of accountsId.
public static List < String > v02_JSON_counterparties(String date) {
baseURI = "https://cdwp/cdw";
String counterparties =
given()
.auth().basic(getJiraUser(), getJiraPass())
.param("limit", "1000000")
.param("count", "false")
.when()
.get("/counterparties/" + date).body().asString();
List < String > accountId = extract_accountId(counterparties);
return accountId;
}
public static List < String > extract_accountId(String res) {
List < String > ids = JsonPath.read(res, "$..identifier[?(#.accountIdType == 'ACCOUNTID')].accountId");
return ids;
}
And this is the json structure where I am getting the accountID.
{
'organisationId': {
'#value': 'MHI'
},
'accountName': 'LAZARD AM DEUT AC LF1632',
'identifiers': {
'accountId': 'LAZDLF1632',
'customerId': 'LAZAMDEUSG',
'blockAccountCode': 'LAZDEUBDBL',
'bic': 'LAMDDEF1XXX',
'identifier': [{
'accountId': 'MHI',
'accountIdType': 'REVNCNTR'
}, {
'accountId': 'LAZDLF1632',
'accountIdType': 'ACCOUNTID'
}, {
'accountId': 'LAZAMDEUSG',
'accountIdType': 'MHICUSTID'
}, {
'accountId': 'LAZDEUBDBL',
'accountIdType': 'BLOCKACCOUNT'
}, {
'accountId': 'LAMDDEF1XXX',
'accountIdType': 'ACCOUNTBIC'
}, {
'accountId': 'LAZDLF1632',
'accountIdType': 'GLOBEOP'
}]
},
'isBlocAccount': 'N',
'accountStatus': 'COMPLETE',
'products': {
'productType': [{
'productLineName': 'CASH',
'productTypeId': 'PRODMHI1',
'productTypeName': 'Bond, Equity,Convertible Bond',
'cleared': 'N',
'bilateral': 'N',
'limitInstructions': {
'limitInstruction': [{
'limitAmount': '0',
'limitCurrency': 'GBP',
'limitType': 'PEAEXPLI',
'limitTypeName': 'Cash-Peak Exposure Limit'
}]
}
}]
},
'etc': {
'addressGeneral': 'LZFLUS33XXX',
'addressAccount': 'LF1632',
'tradingLevel': 'B'
},
'clientBroker': 'C',
'costCentre': 'Credit Sales',
'clientLevel': 'SUBAC',
'accountCreationDate': '2016-10-19T00:00:00.000Z',
'accountOpeningDate': '2016-10-19T00:00:00.000Z'
}
This is my code in Python
import json, requests, urllib.parse, re
from pandas.io.parsers import read_csv
import pandas as pd
from termcolor import colored
import numpy as np
from glob import glob
import os
# Set Up
dateinplay = "2021-09-27"
#Get accountId
cdwCounterparties = (
f"http://cdwu/cdw/counterparties/?limit=1000000?yyyy-mm-dd={dateinplay}"
)
r = json.loads(requests.get(cdwCounterparties).text)
account_ids = [i['accountId'] for i in data['identifiers']['identifier']if i['accountIdType']=="ACCOUNTID"]
I am getting this error when I try to fetch the accountId:
Traceback (most recent call last):
File "h:\DESKTOP\test_check\checkCounterpartie.py", line 54, in <module>
account_ids = [i['accountId'] for i in data['identifiers']['identifier']if i['accountIdType']=="ACCOUNTID"]
TypeError: list indices must be integers or slices, not str
If I'm inerpeting your question correctly you want all ids where accountistype is "ACCOUNTID".
this give you that:
account_ids = [i['accountId'] for i in data['identifiers']['identifier']if i['accountIdType']=="ACCOUNTID"]
accs = {
"identifiers": {
...
account_id_list = []
for acc in accs.get("identifiers", {}).get("identifier", []):
account_id_list.append(acc.get("accountId", ""))
creates a list called account_id_list which is
['MHI', 'DKEPBNPGIV', 'DKEPLLP SG', 'DAVKEMEQBL', '401821', 'DKEPGB21XXX', 'DKEPBNPGIV', 'DKPARTNR']
assuming you store the dictionary (json structure) in variable x, getting all accountIDs is something like:
account_ids = [i['accountId'] for i in x['identifiers']['identifier']]
I'd like to thank you all for your answers. It helped me a lot to find a resolution to my problem.
Below is how I solved it.
listAccountId = []
cdwCounterparties = (
f"http://cdwu/cdw/counterparties/?limit=100000?yyyy-mm-dd={dateinplay}"
)
r = requests.get(cdwCounterparties).json()
jsonpath_expression = parse("$..accounts.account[*].identifiers.identifier[*]")
for match in jsonpath_expression.find(r):
# print(f'match id: {match.value}')
thisdict = match.value
if thisdict["accountIdType"] == "ACCOUNTID":
# print(thisdict["accountId"])
listAccountId.append(thisdict["accountId"])

Create dictionary from CSV without repeating top level

I am trying to make an API using a script. It runs but I need to make it iterate the CSV file without appending port_config to each line:
Reading this CSV File:
device_id,port,description
4444,eth1/1,test1
1111,eth1/2,test2
2222,eth1/3,test3
1234,eth1/4,test4
The code I have so far:
for device_id,port,description in devices:
print(device_id,port,description)
payload="{\n \"port_config\": { \"%s\": { \"description\": \"%s\"
}\n}\n}" % (port,description)
print(payload)
Result of above:
{
"port_config": { "interfacex/x": { "description": "test1" }
}
}
{
"port_config": { "interfacex/x": { "description": "test2" }
}
}
{
"port_config": { "interfacex/x": { "description": "test3" }
}
}
{
"port_config": { "interfacex/x": { "description": "test4" }
}
}
Desired results:
{
"port_config": {
"eth1/1": {"description": "test0,"},
"eth1/2": {"description": "test1,"},
"eth1/3": {"description": "test2,"},
"eth1/4": {"description": "test3,"}
}
}
Your question still has one problem. The CSV example doesn't exactly match the desired output. Let's say you have the following CSV string.
devices = """device_id,port,description
4444,eth1/1,test1
1111,eth1/2,test2
2222,eth1/3,test3
1234,eth1/4,test4
"""
And you want the following output:
d = {
"port_config": {
"eth1/1": {"description": "test1,"},
"eth1/2": {"description": "test2,"},
"eth1/3": {"description": "test3,"},
"eth1/4": {"description": "test4,"},
}
}
You can easily achieve this in Python by doing some dictionary gymnastics like the following:
import csv
from pprint import pprint
# Input csv.
devices = """device_id,port,description
4444,eth1/1,test1
1111,eth1/2,test2
2222,eth1/3,test3
1234,eth1/4,test4
"""
# Read the data using Python's built-in csv module.
lines = devices.splitlines()
reader = csv.reader(lines)
devices = list(reader)
# Let's initialize the target payloads data structure.
payload = {
"port_config": {},
}
for idx, (device_id, port, description) in enumerate(devices):
if idx == 0:
continue # This line skips the header.
payload["port_config"][port] = {"description": description}
pprint(payload)
This should give you the following output:
{'port_config': {'eth1/1': {'description': 'test1'},
'eth1/2': {'description': 'test2'},
'eth1/3': {'description': 'test3'},
'eth1/4': {'description': 'test4'}}}
You can do this via pandas also -
import pandas as pd
df = pd.read_csv('inp_file.csv')
result = {'port_config' : {item['port'] :{"description": item['description']} for item in df[['port','description']].to_dict(orient='records')}}

boto3, python: appending value to DynamoDB String Set

I have an object in DynamoDB:
{ 'UserID' : 'Hank', ConnectionList : {'con1', 'con2'} }
By using boto3 in lambda functions, I would like to add 'con3' to the String Set.
So far, I have been trying with the following code without success:
ddbClient = boto3.resource('dynamodb')
table = ddbClient.Table("UserInfo")
table.update_item(
Key={
"UserId" : 'Hank'
},
UpdateExpression =
"SET ConnectionList = list_append(ConnectionList, :i)",
ExpressionAttributeValues = {
":i": { "S": "Something" }
},
ReturnValues="ALL_NEW"
)
However, no matter the way I try to put the information inside the String Set, it always runs error.
Since you're using the resource API, you have to use the Python data type set in your statement:
table.update_item(
Key={
"UserId" : 'Hank'
},
UpdateExpression =
"ADD ConnectionList :i",
ExpressionAttributeValues = {
":i": {"Something"}, # needs to be a set type
},
ReturnValues="ALL_NEW"
)

Is is possible to use shorthand option_settings with boto3?

When writing .ebextensions .config files, Amazon allows for long and shortform entries, for example these two configurations are identical:
Long form:
"option_settings": [
{
'Namespace': 'aws:rds:dbinstance',
'OptionName': 'DBEngine',
'Value': 'postgres'
},
{
'Namespace': 'aws:rds:dbinstance',
'OptionName': 'DBInstanceClass',
'Value': 'db.t2.micro'
}
]
Shortform:
"option_settings": {
"aws:rds:dbinstance": {
"DBEngine": "postgres",
"DBInstanceClass": "db.t2.micro"
}
}
However, all of the configurations I've seen only specify using a long form with boto3:
response = eb_client.create_environment(
... trimmed ...
OptionSettings=[
{
'Namespace': 'aws:rds:dbinstance',
'OptionName': 'DBEngineVersion',
'Value': '5.6'
},
... trimmed ...
)
Is it possible to use a dictionary with shortform entries with boto3?
Bonus: If not, why not?
Trial and error suggests no, you can not use the shortform config type.
However, if you are of that sort of persuasion you can do this:
def short_to_long(_in):
out = []
for namespace,key_vals in _in.items():
for optname,value in key_vals.items():
out.append(
{
'Namespace': namespace,
'OptionName': optname,
'Value': value
}
)
return out
Then elsewhere:
response = eb_client.create_environment(
OptionSettings=short_to_long({
"aws:rds:dbinstance": {
"DBDeletionPolicy": "Delete", # or snapshot
"DBEngine": "postgres",
"DBInstanceClass": "db.t2.micro"
},
})

Categories

Resources