I need to 'cross join' (for want of a better term !) 2 lists.
Between them they represent a tabled dataset but ..
One holds the column header names, the other a nested array with the row values.
I've managed the easy bit :
col_names = [i['name'] for i in c]
which strips the column names out in to a list without 'typeName'
But just thinking how to extract the row field values and map them with column names .. is giving me a headache!
Any pointers appreciated ;)
Thanks
Columns (as provided):
[
{
"name": "col1",
"typeName": "varchar"
},
{
"name": "col2",
"typeName": "int4"
}
]
Records (as provided):
[
[
{
"stringValue": "apples"
},
{
"longValue": 1
}
],
[
{
"stringValue": "bananas"
},
{
"longValue": 2
}
]
]
Required Result:
[
{
'col1':'apples',
'col2':1
},
{
'col1':'bananas',
'col2':2
}
]
You have to be able to assume there is a 1-to-1 correspondence between the names in the schema and the dicts in the records. Once you assume that, it's pretty easy:
names = [i['name'] for i in schema]
data = []
for row in records:
d = {}
for a,b in zip( names, row ):
d[a] = list(b.values())[0]
data.append(d)
print(data)
Related
I'm parsing some XML data, doing some logic on it, and trying to display the results in an HTML table. The dictionary, after filling, looks like this:
{
"general_info": {
"name": "xxx",
"description": "xxx",
"language": "xxx",
"prefix": "xxx",
"version": "xxx"
},
"element_count": {
"folders": 23,
"conditions": 72,
"listeners": 1,
"outputs": 47
},
"external_resource_count": {
"total": 9,
"extensions": {
"jar": 8,
"json": 1
},
"paths": {
"/lib": 9
}
},
"complexity": {
"over_1_transition": {
"number": 4,
"percentage": 30.769
},
"over_1_trigger": {
"number": 2,
"percentage": 15.385
},
"over_1_output": {
"number": 4,
"percentage": 30.769
}
}
}
Then I'm using pandas to convert the dictionary into a table, like so:
data_frame = pandas.DataFrame.from_dict(data=extracted_metrics, orient='index').stack().to_frame()
The result is a table that is mostly correct:
While the first and second levels seem to render correctly, those categories with a sub-sub category get written as a string in the cell, rather than as a further column. I've also tried using stack(level=1) but it raises an error "IndexError: Too many levels: Index has only 1 level, not 2". I've also tried making it into a series with no luck. It seems like it only renders "complete" columns. Is there a way of filling up the empty spaces in the dictionary before processing?
How can I get, for example, external_resource_count -> extensions to have two daughter rows jar and json, with an additional column for the values, so that the final table looks like this:
Extra credit if anyone can tell me how to get rid of the first row with the index numbers. Thanks!
The way you load the dataframe is correct but you should rename the 0 to a some column name.
# this function extracts all the keys from your nested dicts
def explode_and_filter(df, filterdict):
return [df[col].apply(lambda x:x.get(k) if type(x)==dict else x).rename(f'{k}')
for col,nested in filterdict.items()
for k in nested]
data_frame = pd.DataFrame.from_dict(data= extracted_metrics, orient='index').stack().to_frame(name='somecol')
#lets separate the rows where a dict is present & explode only those rows
mask = data_frame.somecol.apply(lambda x:type(x)==dict)
expp = explode_and_filter(data_frame[mask],
{'somecol':['jar', 'json', '/lib', 'number', 'percentage']})
# here we concat the exploded series to a frame
exploded_df = pd.concat(expp, axis=1).stack().to_frame(name='somecol2').reset_index(level=2)\.rename(columns={'level_2':'somecol'})
# and now we concat the rows with dict elements with the rows with non dict elements
out = pd.concat([data_frame[~mask], exploded_df])
The output dataframe looks like this
I am trying to retrieve the values from specific columns from the python list object. This is the response format from Log analytics API here - https://dev.loganalytics.io/documentation/Using-the-API/ResponseFormat
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{
"name": "Category",
"type": "string"
},
{
"name": "count_",
"type": "long"
}
],
"rows": [
[
"Administrative",
20839
],
[
"Recommendation",
122
],
[
"Alert",
64
],
[
"ServiceHealth",
11
]
]
}
]
}
There are hundreds of columns and i want to read specific columns and row values. To do that, i initially tried to find an index for the column for e.g., "Category" and retrieve all the values from rows. Here is what i have done so far.
result=requests.get(url, params=params, headers=headers, verify=False)
index_category = (result.json()['tables'][0]['columns']).index('Category')
result contains data in the format posted above. I get this below error. What am i missing?
ValueError: 'Category' is not in list
I want to be able to retrieve the Category values from rows array in a loop. I have also done this below loop and i am able to get what i want but want to confirm if there is a better way to do this. Also i am retrieving the column index first before reading the row value because i suspect blindly reading the row values with explicit index values is prone to error, particularly when the sequence of columns change.
for column in range(0,columns):
if ((result.json()['tables'][0]['columns'][column]['name']) == 'Category'):
index_category = column
for row in range(0,rows):
print(result.json()['tables'][0]['rows'][row][index_category])
json_data = results.json()
for index, columns in enumerate(json_data['tables'][0]['columns']):
if columns['name'] == 'Category':
category_index = index
break
category_list = []
for row in json_data['tables'][0]['rows']:
category_list.append(row[category_index])
Haven't tested it btw.
You could also refactor the first loop where we find the index for the category with the filter function.
I have 3 dictionaries( 2 of them are setdefault dicts with multiple values)-
Score_dict-
{'Id_1': [('100001124156327', 0.0),
('100003643614411',0.0)],
'Id_2': [('100000435456546',5.7),
('100000234354556',3.5)]}
post_dict-
{'Id_1':[(+,100004536)],
'Id_2' :[(-,100035430)]}
comment_dict-
{'Id_1':[(+,1023434234)],
'Id_2':[(-,10343534534)
(*,1097963644)]}
My current approach is to write them into 3 different csv files and then merging them,I want to merge them according to a common first row(ID_row).
But I am unable to figure out how to merge 3 csv files into a single csv file. Also , Is there any way which I can write all the 3 dictionaries into a single csv without writing them individually.
Output required-
Ids Score_Ids Post_Ids Comment_Ids
Id_1 100001124156327',0.0 +,100004536 +,1023434234
100003643614411',0.0
Id_2 100000435456546',5.7 -,100035430 -,10343534534
100000234354556',3.5 *,1097963644
How to do this in a correct way with the best approach?
You can merge them all first, then write them to a csv file:
import pprint
scores = {
'Id_1': [
('100001124156327', 0.0),
('100003643614411',0.0)],
'Id_2': [
('100000435456546',5.7),
('100000234354556',3.5)
]
}
post_dict = {
'Id_1':[
('+',100004536)
],
'Id_2' :[
('-',100035430)
]
}
comment_dict = {
'Id_1':[
('+',1023434234)
],
'Id_2':[
('-',10343534534),
('*',1097963644)
]
}
merged = {
key: {
"Score_Ids": value,
"Post_Ids": post_dict[key],
"Comment_Ids": comment_dict[key]
}
for key, value
in scores.iteritems()
}
pp = pprint.PrettyPrinter(depth=6)
pp.pprint(merged)
For reference: https://repl.it/repls/SqueakySlateblueDictionaries
I suggest you to transform your three dicts into one list of dicts before write it to a csv file.
Example
rows = [
{"Score_Id": "...", "Post_Id": "...", "Comment_Id": "..."},
{"Score_Id": "...", "Post_Id": "...", "Comment_Id": "..."},
{"Score_Id": "...", "Post_Id": "...", "Comment_Id": "..."},
...
]
And then use the csv.DictWriter class to write all the rows.
Since you have commas in your values (are you sure it's a good behaviour? Maybe splitting them in two different columns could be a better approach), be careful to use tabs or something else as separator
I suggest writing all three to the same file
You could get common keys by doing something like:
common_keys = set(score_dict.keys()+post_dict.keys()+comment_dict.keys())
for key_ in common_keys:
val_score = score_dict.get(key_, some_default_value)
post_score = post_dict.get(key_, some_default_value)
comment_score = comment_dict.get(key_, some_default_value)
# print key and vals to csv as before
I need to generate a json from my dataframe but I have tried many formats of df but still I am not able get the required json format.
My required json format is,
[
{
"Keyword": "Red",
"values": [
{
"value": 5,
"TC": "Color"
}
]
},
{
"Keyword": "Orange",
"values": [
{
"value": 5,
"TC": "Color"
}
]
},
{
"Keyword": "Violet",
"values": [
{
"value": 5,
"TC": "Color"
}
]
}
]
I want a df to generate this json. Please help.
but currently im getting df.to_json:
{"Names":{"0":"Ram","1":"pechi","2":"Sunil","3":" Ravi","4":"sri"},"Values":{"0":"[{'value':2,'TC': 'TC Count'}]","1":"[{'value':2,'TC': 'TC Count'}]","2":"[{'value':1,'TC': 'TC Count'}]","3":"[{'value':1,'TC': 'TC Count'}]","4":"[{'value':1,'TC': 'TC Count'}]"}}
I think you need:
set_index for columns not in nested dictionaries
create dicts by apply with to_dict
reset_index for column from index
create json by to_json
print (df)
Keyword TC value
0 Red Color 5
1 Orange Color 5
2 Violet Color 5
j = (df.set_index('Keyword')
.apply(lambda x: [x.to_dict()], axis=1)
.reset_index(name='values')
.to_json(orient='records'))
print (j)
[{"Keyword":"Red","values":[{"TC":"Color","value":5}]},
{"Keyword":"Orange","values":[{"TC":"Color","value":5}]},
{"Keyword":"Violet","values":[{"TC":"Color","value":5}]}]
For write to file:
(df.set_index('Keyword')
.apply(lambda x: [x.to_dict()], axis=1)
.reset_index(name='values')
.to_json('myfile.json', orient='records'))
I have a very large JSON file with multiple individual JSON objects in the format shown below. I am trying to convert it to a CSV so that each row is a combination of the outer id/name/alphabet in a JSON object and 1 set of conversion: id/name/alphabet. This is repeated for all the sets of id/name/alphabet within an individual JSON object. So from the object below, 2 rows should be created where the first row is (outer) id/name/alphabet and 1st id/name/alphabet of conversion. The second row is again (outer) id/name/alphabet and now the 2nd id/name/alphabet of conversion.
Important note is that certain Objects in the file can have upwards of 50/60 conversion id/name/alphabet pairs.
What I tried so far was to flatten the JSON objects first which resulted in keys like conversion_id_0 and conversion_id_1 etc... so I can map the outer as its always constant but I am unsure how to map each corresponding number set to be a seperate row.
Any help or insight would be greatly appreciated!
[
{
"alphabet": "ABCDEFGHIJKL",
"conversion": [
{
"alphabet": "BCDEFGHIJKL",
"id": 18589260,
"name": [
"yy"
]
},
{
"alphabet": "EFGHIJEFGHIJ",
"id": 18056632,
"name": [
"zx",
"cd"
]
}
],
"id": 23929934,
"name": [
"x",
"y"
]
}
]
Your question is unclear about exactly the mapping from input JSON data to rows of the CSV file, so I had to guess on what should happen when there's more than one "name" associated with an inner or outer object.
Regardless, hopefully the following will give you a general idea of how to solve such problems.
import csv
objects = [
{
"alphabet": "ABCDEFGHIJKL",
"id": 23929934,
"name": [
"x",
"y"
],
"conversion": [
{
"alphabet": "BCDEFGHIJKL",
"id": 18589260,
"name": [
"yy"
]
},
{
"alphabet": "EFGHIJEFGHIJ",
"id": 18056632,
"name": [
"zx",
"cd"
]
}
],
}
]
with open('converted_json.csv', 'wb') as outfile:
def group(item):
return [item["id"], item["alphabet"], ' '.join(item["name"])]
writer = csv.writer(outfile, quoting=csv.QUOTE_NONNUMERIC)
for obj in objects:
outer = group(obj)
for conversion in obj["conversion"]:
inner = group(conversion)
writer.writerow(outer + inner)
Contents of the CSV file generated:
23929934,"ABCDEFGHIJKL","x y",18589260,"BCDEFGHIJKL","yy"
23929934,"ABCDEFGHIJKL","x y",18056632,"EFGHIJEFGHIJ","zx cd"