I am trying to retrieve the values from specific columns from the python list object. This is the response format from Log analytics API here - https://dev.loganalytics.io/documentation/Using-the-API/ResponseFormat
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{
"name": "Category",
"type": "string"
},
{
"name": "count_",
"type": "long"
}
],
"rows": [
[
"Administrative",
20839
],
[
"Recommendation",
122
],
[
"Alert",
64
],
[
"ServiceHealth",
11
]
]
}
]
}
There are hundreds of columns and i want to read specific columns and row values. To do that, i initially tried to find an index for the column for e.g., "Category" and retrieve all the values from rows. Here is what i have done so far.
result=requests.get(url, params=params, headers=headers, verify=False)
index_category = (result.json()['tables'][0]['columns']).index('Category')
result contains data in the format posted above. I get this below error. What am i missing?
ValueError: 'Category' is not in list
I want to be able to retrieve the Category values from rows array in a loop. I have also done this below loop and i am able to get what i want but want to confirm if there is a better way to do this. Also i am retrieving the column index first before reading the row value because i suspect blindly reading the row values with explicit index values is prone to error, particularly when the sequence of columns change.
for column in range(0,columns):
if ((result.json()['tables'][0]['columns'][column]['name']) == 'Category'):
index_category = column
for row in range(0,rows):
print(result.json()['tables'][0]['rows'][row][index_category])
json_data = results.json()
for index, columns in enumerate(json_data['tables'][0]['columns']):
if columns['name'] == 'Category':
category_index = index
break
category_list = []
for row in json_data['tables'][0]['rows']:
category_list.append(row[category_index])
Haven't tested it btw.
You could also refactor the first loop where we find the index for the category with the filter function.
Related
I'm trying to create a python pandas DataFrame out of a JSON dictionary. The embedding is tripping me up.
The column headers are in a different section of the JSON file to the values.
The json looks similar to below. There is one section of column headers and multiple sections of data.
I need each column filled with the data that relates to it. So value_one in each case will fill the column under header_one and so on.
I have come close, but can't seem to get it to spit out the dataframe as described.
{
"my_data": {
"column_headers": [
"header_one",
"header_two",
"header_three"
],
"values": [
{
"data": [
"value_one",
"value_two",
"value_three"
]
},
{
"data": [
"value_one",
"value_two",
"value_three"
]
}
]
}
}
Assuming your dictionary is my_dict, try:
>>> pd.DataFrame(data=[d["data"] for d in my_dict["my_data"]["values"]],
columns=my_dict["my_data"]["column_headers"])
I'm parsing some XML data, doing some logic on it, and trying to display the results in an HTML table. The dictionary, after filling, looks like this:
{
"general_info": {
"name": "xxx",
"description": "xxx",
"language": "xxx",
"prefix": "xxx",
"version": "xxx"
},
"element_count": {
"folders": 23,
"conditions": 72,
"listeners": 1,
"outputs": 47
},
"external_resource_count": {
"total": 9,
"extensions": {
"jar": 8,
"json": 1
},
"paths": {
"/lib": 9
}
},
"complexity": {
"over_1_transition": {
"number": 4,
"percentage": 30.769
},
"over_1_trigger": {
"number": 2,
"percentage": 15.385
},
"over_1_output": {
"number": 4,
"percentage": 30.769
}
}
}
Then I'm using pandas to convert the dictionary into a table, like so:
data_frame = pandas.DataFrame.from_dict(data=extracted_metrics, orient='index').stack().to_frame()
The result is a table that is mostly correct:
While the first and second levels seem to render correctly, those categories with a sub-sub category get written as a string in the cell, rather than as a further column. I've also tried using stack(level=1) but it raises an error "IndexError: Too many levels: Index has only 1 level, not 2". I've also tried making it into a series with no luck. It seems like it only renders "complete" columns. Is there a way of filling up the empty spaces in the dictionary before processing?
How can I get, for example, external_resource_count -> extensions to have two daughter rows jar and json, with an additional column for the values, so that the final table looks like this:
Extra credit if anyone can tell me how to get rid of the first row with the index numbers. Thanks!
The way you load the dataframe is correct but you should rename the 0 to a some column name.
# this function extracts all the keys from your nested dicts
def explode_and_filter(df, filterdict):
return [df[col].apply(lambda x:x.get(k) if type(x)==dict else x).rename(f'{k}')
for col,nested in filterdict.items()
for k in nested]
data_frame = pd.DataFrame.from_dict(data= extracted_metrics, orient='index').stack().to_frame(name='somecol')
#lets separate the rows where a dict is present & explode only those rows
mask = data_frame.somecol.apply(lambda x:type(x)==dict)
expp = explode_and_filter(data_frame[mask],
{'somecol':['jar', 'json', '/lib', 'number', 'percentage']})
# here we concat the exploded series to a frame
exploded_df = pd.concat(expp, axis=1).stack().to_frame(name='somecol2').reset_index(level=2)\.rename(columns={'level_2':'somecol'})
# and now we concat the rows with dict elements with the rows with non dict elements
out = pd.concat([data_frame[~mask], exploded_df])
The output dataframe looks like this
I just have to check the JSON data on the basis of comma-separated e_code in the table.
how to filter only that data where users e_codes are available
in the database:
id email age e_codes
1. abc#gmail 19 123456,234567,345678
2. xyz#gmail 31 234567,345678,456789
This is my JSON data
[
{
"ct": 1,
"e_code": 123456,
},
{
"ct": 2,
"e_code": 234567,
},
{
"ct": 3,
"e_code": 345678,
},
{
"ct": 4,
"e_code": 456789,
},
{
"ct": 5,
"e_code": 456710,
}
]
If efficiency is not an issue, you could loop through the table, split the values to a list by using case['e_codes'].split(',') and then, for each code loop through the JSON to see whether it is present.
This might be a little inefficient if your data, JSON, or number of values are long.
It might be better to first create a lookup dictionary in which the codes are the keys:
lookup={}
for e in my_json:
lookup[e['e_code']] = 1
You can then check how many of the codes in your table are actually in the JSON:
## Let's assume that the "e_codes" cell of the
## current line is data['e_codes'][i], where i is the line number
for i in lines:
match = [0,0]
for code in data['e_codes'][i].split(','):
try:
match[0]+=lookup[code]
match[1]+=1
except:
match[1]+=1
if match[1]>0: share_present=match[0]/match[1]
For each case, you get a share_present, which is 1.0 if all codes appear in the JSON, 0.0 if none of them do and some value between to indicate the share of codes that were present. Depending on your threshold for keeping a case you can set a filter to True or False depending on this value.
I want to just apply a formatting from a JSON Entry. The first thing I did was make my desirable format on my spreadsheet for the second row of all columns. I then retrieved them with a .get request (from A2 to AO3).
request = google_api.service.spreadsheets().get(
spreadsheetId=ss_id,
ranges="Tab1!A2:AO3",
includeGridData=True).execute()
The next thing I did was collect each of the formats for each column and record them in a dictionary.
my_dictionary_of_formats = {}
row_values = row_1['sheets'][0]['data'][0]['rowData'][0]['values']
for column in range(0, len(row_values)):
my_dictionary_of_formats[column] = row_values[column]['effectiveFormat']
Now I have a dictionray of all my effective formats for all my columns. I'm having trouble now applying that format to all rows in each column. I tried a batchUpdate request:
cell_data = {
"effectiveFormat": my_dictionary_of_formats[0]}
row_data = {
"values": [
cell_data
]
}
update_cell = {
"rows": [
row_data
],
"fields": "*",
"range":
{
"sheetId": input_master.tab_id,
"startRowIndex": 2,
"startColumnIndex": 0,
"endColumnsIndex": 1
}
}
request_body = {
"requests": [
{"updateCells": update_cell}],
"includeSpreadsheetInResponse": True,
"responseIncludeGridData": True}
service.spreadsheets().batchUpdate(spreadsheetId=my_id, body=request_body).execute()
This wiped out everything and I'm not sure why. I don't think I understand the fields='* attribute.
TL;DR
I want to apply a format to all rows in a single column. Much like if I used the "Paint Format" tool on the second row, first column and dragged it all the way down to the last row.
-----Update
Hi, thanks to the comments this was my solution:
###collect all formats from second row
import json
row_2 = goolge_api.service.spreadsheets().get(
spreadsheetId=spreadsheet_id,
ranges="tab1!A2:AO2",
includeGridData=True).execute()
my_dictionary = {}
row_values = row_2['sheets'][0]['data'][0]['rowData'][0]['values']
for column in range(0,len(row_values)):
my_dictionary[column] = row_values[column]
json.dumps(my_dictionary,open('config/format.json','w'))
###Part 2, apply formats
requests = []
my_dict = json.load(open('config/format.json'))
for column in my_dict:
requests.append(
{
"repeatCell": {
"range": {
"sheetId": tab_id,
"startRowIndex": str(1),
"startColumnIndex":str(column),
"endColumnIndex":str(int(column)+1)
},
"cell": {
"userEnteredFormat": my_dict[column]
},
'fields': "userEnteredFormat({})".format(",".join(my_dict[column].keys()))
}
})
body = {"requests": requests}
google_api.service.spreadsheets().batchUpdate(spreadsheetId=s.spreadsheet_id,body=body).execute()
When you include fields as a part of the request, you indicate to the API endpoint that it should overwrite the specified fields in the targeted range with the information found in your uploaded resource. fields="*" correspondingly is interpreted as "This request specifies the entire data and metadata of the given range. Remove any previous data and metadata from the range and use what is supplied instead."
Thus, anything not specified in your updateCells requests will be removed from the range supplied in the request (e.g. values, formulas, data validation, etc.).
You can learn more in the guide to batchUpdate
For an updateCell request, the fields parameter is as described:
The fields of CellData that should be updated. At least one field must be specified. The root is the CellData; 'row.values.' should not be specified. A single "*" can be used as short-hand for listing every field.
If you then view the resource description of CellData, you observe the following fields:
"userEnteredValue"
"effectiveValue"
"formattedValue"
"userEnteredFormat"
"effectiveFormat"
"hyperlink"
"note"
"textFormatRuns"
"dataValidation"
"pivotTable"
Thus, the proper fields specification for your request is likely to be fields="effectiveFormat", since this is the only field you supply in your row_data property.
Consider also using the repeatCell request if you are just specifying a single format.
I have a very large JSON file with multiple individual JSON objects in the format shown below. I am trying to convert it to a CSV so that each row is a combination of the outer id/name/alphabet in a JSON object and 1 set of conversion: id/name/alphabet. This is repeated for all the sets of id/name/alphabet within an individual JSON object. So from the object below, 2 rows should be created where the first row is (outer) id/name/alphabet and 1st id/name/alphabet of conversion. The second row is again (outer) id/name/alphabet and now the 2nd id/name/alphabet of conversion.
Important note is that certain Objects in the file can have upwards of 50/60 conversion id/name/alphabet pairs.
What I tried so far was to flatten the JSON objects first which resulted in keys like conversion_id_0 and conversion_id_1 etc... so I can map the outer as its always constant but I am unsure how to map each corresponding number set to be a seperate row.
Any help or insight would be greatly appreciated!
[
{
"alphabet": "ABCDEFGHIJKL",
"conversion": [
{
"alphabet": "BCDEFGHIJKL",
"id": 18589260,
"name": [
"yy"
]
},
{
"alphabet": "EFGHIJEFGHIJ",
"id": 18056632,
"name": [
"zx",
"cd"
]
}
],
"id": 23929934,
"name": [
"x",
"y"
]
}
]
Your question is unclear about exactly the mapping from input JSON data to rows of the CSV file, so I had to guess on what should happen when there's more than one "name" associated with an inner or outer object.
Regardless, hopefully the following will give you a general idea of how to solve such problems.
import csv
objects = [
{
"alphabet": "ABCDEFGHIJKL",
"id": 23929934,
"name": [
"x",
"y"
],
"conversion": [
{
"alphabet": "BCDEFGHIJKL",
"id": 18589260,
"name": [
"yy"
]
},
{
"alphabet": "EFGHIJEFGHIJ",
"id": 18056632,
"name": [
"zx",
"cd"
]
}
],
}
]
with open('converted_json.csv', 'wb') as outfile:
def group(item):
return [item["id"], item["alphabet"], ' '.join(item["name"])]
writer = csv.writer(outfile, quoting=csv.QUOTE_NONNUMERIC)
for obj in objects:
outer = group(obj)
for conversion in obj["conversion"]:
inner = group(conversion)
writer.writerow(outer + inner)
Contents of the CSV file generated:
23929934,"ABCDEFGHIJKL","x y",18589260,"BCDEFGHIJKL","yy"
23929934,"ABCDEFGHIJKL","x y",18056632,"EFGHIJEFGHIJ","zx cd"