Flattening dict/list object - Python vs Sql? - python

I have a bit of dict/list values- typically I would store as json and flatten via SQL - but wondering if anyone has any perspective on using python? Just in general the most efficient way to get this data in one row, where the values for dimensions & metricHeaderEntries are columns and the respective row values fall in line under them
BTW this is an example Google Analytics API Export , so may help others. The goal would be to get all the values for Dimension & Metric on one row.
{
"reports": [
{
"columnHeader": {
"dimensions": [
"DimA",
"DimAB",
"DimAC",
"DimAD",
"DimAE"
],
"metricHeader": {
"metricHeaderEntries": [
{
"name": "METRICAB",
"type": "INTEGER"
},
{
"name": "METRICABAC",
"type": "INTEGER"
},
{
"name": "METRICABACD",
"type": "INTEGER"
},
{
"name": "METRICABADCE",
"type": "VARCHAR"
},
{
"name": "METRICABADEEW",
"type": "FLOAT"
},
{
"name": "METRICABADEEFG",
"type": "FLOAT"
}
]
}
},
"data": {
"rows": [
{
"dimensions": [
"(test)",
"2019022",
"(not set)",
"d",
"/search"
],
"metrics": [
{
"values": [
"0",
"1",
"1",
"100000.0",
"61.0",
"16.0"
]
}
]
},
{
"dimensions": [
"(test)",
"20190222",
"(not set)",
"asdf",
"/adf"
],
"metrics": [
{
"values": [
"30",
"133",
"31",
"330.0",
"31.0",
"32.0"
]
}
]
},
"nextPageToken": "331000"
}
]
}
Desired Output:

Related

How to update a value inside an object which is inside array of another object in mongodb using python code

My collection looks like this
{"ingr": [
{
"ingrName": [
{
"_id": "57aa56e2a06b57b",
"name": "abc",
"type": "ingr"
}
],
"_id": {
"$oid": "62232cd70ce38c50"
},
"quantity": "1.0",
},
{
"ingr": [
{
"_id": "607e7fcca57aa",
"name": "xyz",
"type": "ingr"
}
],
"_id": {
"$oid": "62232cd70ce38c"
},
"quantity": "1.0"
}
}}
I just want to change the id and type based on the object id. what i tried is
db1.update_one({
'ingr.$._id': ObjectId("62232cd70ce38c50")
},
{
'$set': {
"ingr.ingrName.$.type":"alternate",
"ingr.ingrName.$._id":"abc123"
}
})
But the values are not changing.Help me to find the mistake I making. Thanks
expected output
{"ingr": [
{
"ingrName": [
{
"_id": "abc123",
"name": "abc",
"type": "alternate"
}
],
"_id": {
"$oid": "62232cd70ce38c50"
},
"quantity": "1.0",
}
i need to change the id and type

Modify the value of a field of a specific nested object (its index) depending on a condition

I would like to modify the value of a field on a specific index of a nested type depending on another value of the same nested object or a field outside of the nested object.
As example, I have the current mapping of my index feed:
{
"feed": {
"mappings": {
"properties": {
"attacks_ids": {
"type": "keyword"
},
"created_by": {
"type": "keyword"
},
"date": {
"type": "date"
},
"groups_related": {
"type": "keyword"
},
"indicators": {
"type": "nested",
"properties": {
"date": {
"type": "date"
},
"description": {
"type": "text"
},
"role": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
"malware_families": {
"type": "keyword"
},
"published": {
"type": "boolean"
},
"references": {
"type": "keyword"
},
"tags": {
"type": "keyword"
},
"targeted_countries": {
"type": "keyword"
},
"title": {
"type": "text"
},
"tlp": {
"type": "keyword"
}
}
}
}
}
Take the following document as example:
{
"took": 194,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "feed",
"_type": "_doc",
"_id": "W3CS7IABovFpcGfZjfyu",
"_score": 1,
"_source": {
"title": "Test",
"date": "2022-05-22T16:21:09.159711",
"created_by": "finch",
"tlp": "white",
"published": true,
"references": [
"test",
"test"
],
"tags": [
"tag1",
"tag2"
],
"targeted_countries": [
"Italy",
"Germany"
],
"malware_families": [
"family1",
"family2"
],
"groups_related": [
"group1",
"griup2"
],
"attacks_ids": [
""
],
"indicators": [
{
"value": "testest",
"description": "This is a test",
"type": "sha256",
"role": "file",
"date": "2022-05-22T16:21:09.159560"
},
{
"value": "testest2",
"description": "This is a test 2",
"type": "ipv4",
"role": "c2",
"date": "2022-05-22T16:21:09.159699"
}
]
}
}
]
}
}
I would like to make this update: indicators[0].value = 'changed'
if _id == 'W3CS7IABovFpcGfZjfyu'
or if title == 'some_title'
or if indicators[0].role == 'c2'
I already tried with a script, but it seems I can't manage to get it work, I hope the explanation is clear, ask any question if not, thank you.
Edit 1:
I managed to make it work, however it needs the _id, still looking for a way to do that without it.
My partial solution:
update = Pulse.get(id="XHCz7IABovFpcGfZWfz9") #Pulse is my document
update.update(script="for (indicator in ctx._source.indicators) {if (indicator.value=='changed2') {indicator.value='changed3'}}")
# Modify depending on the value of a field inside the same nested object

Creating a dynamic JSON using a python list

This is the json structure that I am having inside of a python file. Here the stationList_of_state is a python list which has some 5-10 values which will change dynamically based on the code.
message = {
"type": "template",
"payload": {
"template_type": "generic",
"elements":
[
{
"buttons": [
{
"title": stationList_of_state[1],
"payload": stationList_of_state[1],
"type": "postback"
}
]
}
]
}
}
I have tried something like this which showed errors:
message = {
"type": "template",
"payload": {
"template_type": "generic",
"elements":
[
for i in range(len(stationList_of_state)):
{
"buttons": [
{
"title": stationList_of_state[i],
"payload": stationList_of_state[i],
"type": "postback"
}
]
}
]
}
}
Can someone suggest an alternate approach to what I have did?
You're almost there:
message = {
"type": "template",
"payload": {
"template_type": "generic",
"elements": [
{
"buttons": [
{
"title": stationList_of_state[i],
"payload": stationList_of_state[i],
"type": "postback",
}
]
}
for i in range(len(stationList_of_state))
],
},
}
or, simplifying the for clause to omit the unnecessary i variable,
message = {
"type": "template",
"payload": {
"template_type": "generic",
"elements": [
{
"buttons": [
{
"title": station,
"payload": station,
"type": "postback",
}
]
}
for station in stationList_of_state
],
},
}

Best way to parse a JSON to store in SQL database (SQL stored procedure/Python)

I have a table of overly complex JSON files I'm trying to convert to tabular format to store in a SQL database. I'm pulling the JSONs from the quickbooks online API, and the format is messy to say the least.. (We're talking 7x nested JSONs for some bits of it..
The format resembles the code snippet down below. Currently I am using a bunch of OpenJSON's + Cross applys to dig down to the innermost ColData then work my way up but it looks like some of the ColData's get skipped over doing that.
Are there any better ways, using either Python (since I pull the JSON initially in Python before sending the JSON to a SQL database to parse) or SQL to convert it to tabular format besides manually trying to use OpenJSON with Cross applys?
The goal is to get all of the ColData's into a SQL table...
Thanks!
{
"Header": {
"ReportName": "BalanceSheet",
"Option": [
{
"Name": "AccountingStandard",
"Value": "GAAP"
},
{
"Name": "NoReportData",
"Value": "false"
}
],
"DateMacro": "this calendar year-to-date",
"ReportBasis": "Accrual",
"StartPeriod": "2016-01-01",
"Currency": "USD",
"EndPeriod": "2016-10-31",
"Time": "2016-10-31T09:42:21-07:00",
"SummarizeColumnsBy": "Total"
},
"Rows": {
"Row": [
{
"Header": {
"ColData": [
{
"value": "ASSETS"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"Header": {
"ColData": [
{
"value": "Current Assets"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"Header": {
"ColData": [
{
"value": "Bank Accounts"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"ColData": [
{
"id": "35",
"value": "Checking"
},
{
"value": "1350.55"
}
],
"type": "Data"
},
{
"ColData": [
{
"id": "36",
"value": "Savings"
},
{
"value": "800.00"
}
],
"type": "Data"
}
]
},
"type": "Section",
"group": "BankAccounts",
"Summary": {
"ColData": [
{
"value": "Total Bank Accounts"
},
{
"value": "2150.55"
}
]
}
},
{
"Header": {
"ColData": [
{
"value": "Accounts Receivable"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"ColData": [
{
"id": "84",
"value": "Accounts Receivable (A/R)"
},
{
"value": "6383.12"
}
],
"type": "Data"
}
]
},
"type": "Section",
"group": "AR",
"Summary": {
"ColData": [
{
"value": "Total Accounts Receivable"
},
{
"value": "6383.12"
}
]
}
},
{
"Header": {
"ColData": [
{
"value": "Other current assets"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"ColData": [
{
"id": "81",
"value": "Inventory Asset"
},
{
"value": "596.25"
}
],
"type": "Data"
},
{
"ColData": [
{
"id": "4",
"value": "Undeposited Funds"
},
{
"value": "2117.52"
}
],
"type": "Data"
}
]
},
"type": "Section",
"group": "OtherCurrentAssets",
"Summary": {
"ColData": [
{
"value": "Total Other current assets"
},
{
"value": "2713.77"
}
]
}
}
]
},
"type": "Section",
"group": "CurrentAssets",
"Summary": {
"ColData": [
{
"value": "Total Current Assets"
},
{
"value": "11247.44"
}
]
}
},
{
"Header": {
"ColData": [
{
"value": "Fixed Assets"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"Header": {
"ColData": [
{
"id": "37",
"value": "Truck"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"ColData": [
{
"id": "38",
"value": "Original Cost"
},
{
"value": "13495.00"
}
],
"type": "Data"
}
]
},
"type": "Section",
"Summary": {
"ColData": [
{
"value": "Total Truck"
},
{
"value": "13495.00"
}
]
}
}
]
},
"type": "Section",
"group": "FixedAssets",
"Summary": {
"ColData": [
{
"value": "Total Fixed Assets"
},
{
"value": "13495.00"
}
]
}
}
]
},
"type": "Section",
"group": "TotalAssets",
"Summary": {
"ColData": [
{
"value": "TOTAL ASSETS"
},
{
"value": "24742.44"
}
]
}
},
{
"Header": {
"ColData": [
{
"value": "LIABILITIES AND EQUITY"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"Header": {
"ColData": [
{
"value": "Liabilities"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"Header": {
"ColData": [
{
"value": "Current Liabilities"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"Header": {
"ColData": [
{
"value": "Accounts Payable"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"ColData": [
{
"id": "33",
"value": "Accounts Payable (A/P)"
},
{
"value": "1984.17"
}
],
"type": "Data"
}
]
},
"type": "Section",
"group": "AP",
"Summary": {
"ColData": [
{
"value": "Total Accounts Payable"
},
{
"value": "1984.17"
}
]
}
},
{
"Header": {
"ColData": [
{
"value": "Credit Cards"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"ColData": [
{
"id": "41",
"value": "Mastercard"
},
{
"value": "157.72"
}
],
"type": "Data"
}
]
},
"type": "Section",
"group": "CreditCards",
"Summary": {
"ColData": [
{
"value": "Total Credit Cards"
},
{
"value": "157.72"
}
]
}
},
{
"Header": {
"ColData": [
{
"value": "Other Current Liabilities"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"ColData": [
{
"id": "89",
"value": "Arizona Dept. of Revenue Payable"
},
{
"value": "4.55"
}
],
"type": "Data"
},
{
"ColData": [
{
"id": "90",
"value": "Board of Equalization Payable"
},
{
"value": "401.98"
}
],
"type": "Data"
},
{
"ColData": [
{
"id": "43",
"value": "Loan Payable"
},
{
"value": "4000.00"
}
],
"type": "Data"
}
]
},
"type": "Section",
"group": "OtherCurrentLiabilities",
"Summary": {
"ColData": [
{
"value": "Total Other Current Liabilities"
},
{
"value": "4406.53"
}
]
}
}
]
},
"type": "Section",
"group": "CurrentLiabilities",
"Summary": {
"ColData": [
{
"value": "Total Current Liabilities"
},
{
"value": "6548.42"
}
]
}
},
{
"Header": {
"ColData": [
{
"value": "Long-Term Liabilities"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"ColData": [
{
"id": "44",
"value": "Notes Payable"
},
{
"value": "25000.00"
}
],
"type": "Data"
}
]
},
"type": "Section",
"group": "LongTermLiabilities",
"Summary": {
"ColData": [
{
"value": "Total Long-Term Liabilities"
},
{
"value": "25000.00"
}
]
}
}
]
},
"type": "Section",
"group": "Liabilities",
"Summary": {
"ColData": [
{
"value": "Total Liabilities"
},
{
"value": "31548.42"
}
]
}
},
{
"Header": {
"ColData": [
{
"value": "Equity"
},
{
"value": ""
}
]
},
"Rows": {
"Row": [
{
"ColData": [
{
"id": "34",
"value": "Opening Balance Equity"
},
{
"value": "-9337.50"
}
],
"type": "Data"
},
{
"ColData": [
{
"id": "2",
"value": "Retained Earnings"
},
{
"value": "91.25"
}
],
"type": "Data"
},
{
"ColData": [
{
"value": "Net Income"
},
{
"value": "2440.27"
}
],
"type": "Data",
"group": "NetIncome"
}
]
},
"type": "Section",
"group": "Equity",
"Summary": {
"ColData": [
{
"value": "Total Equity"
},
{
"value": "-6805.98"
}
]
}
}
]
},
"type": "Section",
"group": "TotalLiabilitiesAndEquity",
"Summary": {
"ColData": [
{
"value": "TOTAL LIABILITIES AND EQUITY"
},
{
"value": "24742.44"
}
]
}
}
]
},
"Columns": {
"Column": [
{
"ColType": "Account",
"ColTitle": "",
"MetaData": [
{
"Name": "ColKey",
"Value": "account"
}
]
},
{
"ColType": "Money",
"ColTitle": "Total",
"MetaData": [
{
"Name": "ColKey",
"Value": "total"
}
]
}
]
}
}
Here is what I tried to get the ColData (unsuccessfully I might add), I think it might be a little too contrived to do in SQL but I'm not sure if I should continue trying this way or if there's a better way in Python:
declare #json nvarchar(max)
SELECT #json = json FROM QboApiRawJSONData WHERE ID = 2
--Outer layer of JSON breaks into 3 parts - header, columns, rows
SELECT * FROM OPENJSON(#json)
WITH
(
Rows nvarchar(max) AS JSON
) as MainLayer
CROSS APPLY OPENJSON (MainLayer.Rows)
WITH
(
Row nvarchar(max) AS JSON
) as SecondaryLayer
CROSS APPLY OPENJSON (SecondaryLayer.Row)
WITH
(
Rows nvarchar(max) AS JSON
) As ThirdLayer
CROSS APPLY OPENJSON (ThirdLayer.Rows)
WITH
(
Row nvarchar(max) AS JSON
) as FourthLayer
CROSS APPLY OPENJSON (FourthLayer.Row)
WITH
(
Rows nvarchar(max) AS JSON
) as FifthLayer
CROSS APPLY OPENJSON (FifthLayer.Rows)
WITH
(
Row nvarchar(max) AS JSON
) as SixthLayer
CROSS APPLY OPENJSON (SixthLayer.Row)
WITH
(
Rows nvarchar(max) AS JSON
) as SeventhLayer
CROSS APPLY OPENJSON (SeventhLayer.Rows)
WITH
(
Row nvarchar(max) AS JSON
) as EighthLayer
CROSS APPLY OPENJSON (EighthLayer.Row)
WITH
(
Rows nvarchar(max) AS JSON
) as LayerNine
---Things get funky here
CROSS APPLY OPENJSON (LayerNine.Rows)
WITH
(
Row nvarchar(max) AS JSON
) as LayerTen
CROSS APPLY OPENJSON (LayerTen.Row)
WITH
(
Rows nvarchar(max) AS JSON
) as LayerEleven
CROSS APPLY OPENJSON (LayerEleven.Rows)
WITH
(
Row nvarchar(max) AS JSON
) as LayerTwelve
--21 items in last col
There is JSON support for sql-server:
https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver15
There is a JSON storage method here: https://learn.microsoft.com/en-us/sql/relational-databases/json/store-json-documents-in-sql-tables?view=sql-server-ver15. Although it will be fairly complex here I recommend storing the JSON data as a Logs table and then following the tutorial above to see if that solves your issue.
Use Quickbooks API to get the JSON, then refer to this guide:
https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver15
and this guide:
https://learn.microsoft.com/en-us/sql/relational-databases/json/convert-json-data-to-rows-and-columns-with-openjson-sql-server?view=sql-server-ver15
You can also consider setting something like AWS Lambda or Google Cloud Functions if you need something more automated.

Transforming nested JSON with pyjq

I'm trying to transform the JSON from this:
{
"meta": {
"ver": "3.0"
},
"cols": [
{
"name": "val"
}
],
"rows": [
"cols": [
{
"name": "ts"
},
{
"name": "v0"
},
{
"name": "v1"
},
{
"name": "v2"
},
{
"name": "v3"
},
{
"ts": {
"_kind": "dateTime",
"val": "2021-07-07T00:10:00-07:00",
"tz": "Los_Angeles"
},
"v3": {
"_kind": "number",
"val": 6167699.5,
"unit": "kWh"
}
},
{
"ts": {
"_kind": "dateTime",
"val": "2021-07-07T00:15:00-07:00",
"tz": "Los_Angeles"
},
"v0": {
"_kind": "number",
"val": 808926.0625,
"unit": "m\\u00b3"
},
"v1": {
"_kind": "number",
"val": 112999.3046875,
"unit": "m\\u00b3"
},
"v2": {
"_kind": "number",
"val": 8823498,
"unit": "kWh"
}
}
]
}
to a more simplified form using the pyjq module:
{
"data": {
"v0": [
[
"first timestamp",
val
],
[
"second timestamp",
val
]
],
"v1": [
[
"first timestamp",
val
],
[
"second timestamp",
val
]
]
}
}
I got started with the pyjq module, however I'm unsure about how to proceed with place two values (one str, one float) within the [] separated by a comma. Here's my code (returns error as expected).
import json
import pyjq
with open('file.json') as f:
data = json.load(f)
transformed = pyjq.all('{data: { meter_id_1: [[[.rows[].val.rows[].ts.val + "," + .rows[].val.rows[].v0.val]]}}', data)
Thanks in advance.

Categories

Resources