Reading nested dictionary-like txt file into a Pandas dataframe

Reading nested dictionary-like txt file into a Pandas dataframe - python

Sort of a new python guy here and haven't had much success with the following.
I have a txt file with data formatted as follows:
{
"$type" : "TableInstance",
"$version" : 1,
"Instance" : "InstanceName",
"ColumnAliases" : [ "", "", ],
"ColumnNames" : [ "keyName", "dateName"],
"ColumnData" : [ {
"type" : "ColumnData1",
"Strings" : [key1, key2],]
}, {
"type" : "ColumnData2",
"Strings" : [date1, date2]}]
}
That I would like to read into a dataframe such that it is formatted as:
[ keyName dateName
key1 date1
key2 date1 ]
Is there a simple way to do this?

does this work for you?
dict = {
"$type" : "TableInstance",
"$version" : 1,
"Instance" : "InstanceName",
"ColumnAliases" : [ "", "", ],
"ColumnNames" : [ "keyName", "dateName"],
"ColumnData" : [ {
"type" : "ColumnData1",
"Strings" : ['key1', 'key2']
}, {
"type" : "ColumnData2",
"Strings" : ['date1', 'date2']}]
}
df = pd.DataFrame({dict['ColumnNames'][0]:dict['ColumnData'][0]['Strings'], dict['ColumnNames'][1]:dict['ColumnData'][1]['Strings']})

It looks it that you stored the serialized python object in the file. Hence, you can deserialize the Python object by the help of pickle, then you can parse the object based on your requirements.
import pickle
import pandas as pd
filePath = 'test.txt'
obj = pd.read_pickle(filePath)
#obj = pickle.load(open(filePath, "rb"))
df = pd.DataFrame({obj['ColumnNames'][0]:obj['ColumnData'][0]['Strings'], obj['ColumnNames'][1]:obj['ColumnData'][1]['Strings']})

Related

how to read nested lists information from a Json file using Python

Here is a part of my Jason file, and I want to read "information" under "runs" -> "results" -> "properties"
I am trying the following:
with open(inputFile, "r") as readFile:
data = json.load(readFile)
print(type(data))
print("Run data type is: ",type(data['runs']))
#print("properties data type is: ", type(data['runs']['properties']))
# error: print("results data type is: ", type(data['runs']['properties']))TypeError: list indices must be integers or slices, not str
for info in data['runs']:
res = info.get('results',{})
#res = info.get('results', {}).get('properties', None)
#Error: AttributeError: 'list' object has no attribute 'get'
#inf = info.get('properties')
print(res)
All the parts that I have commented is not working. and I added also the error message
how can i read "information" in a loop?
{
"$schema" : "https://schemastore.azurewebsites.net/schemas/json/sarif-2.1.0-rtm.4.json",
"version" : "2.1.0",
"runs" : [ {
"tool" : { ...},
"artifacts" : [ ...],
"results" : [ {
"ruleId" : "DECL_MISMATCH",
"ruleIndex" : 0,
"message" : {
"text" : "XXXXX"
},
"level" : "error",
"baselineState" : "unchanged",
"rank" : 100,
"kind" : "fail",
"properties" : {
"tags" : [ "databaseId", "metaFamily", "family", "group", "information", "severity", "status", "comment", "justified", "assignedTo", "ticketKey", "color" ],
"databaseId" : 54496,
"metaFamily" : "Defect",
"family" : "Defect",
"group" : "Programming",
"information" : "Impact: High",
"severity" : "Unset",
"status" : "Unreviewed",
"comment" : "",
"justified" : false,
"color" : "RED"
},
"locations" : [ {
"physicalLocation" : {
"artifactLocation" : {
"index" : 0
}
},
"logicalLocations" : [ {
"fullyQualifiedName" : "File Scope",
"kind" : "function"
} ]
} ]
} ]
} ]
}

While you're trying to access the key properties which is inside a list, you have to set the index number. In this json you've posted the index number can be 0. So the code probably should be like this:
with open(inputFile, "r") as readFile:
data = json.load(readFile)
print(type(data))
print("Run data type is: ",type(data['runs']))
#print("properties data type is: ", type(data['runs']['properties']))
# error: print("results data type is: ", type(data['runs']['properties']))TypeError: list indices must be integers or slices, not str
for info in data['runs']:
# res = info.get('results',{})
res = info.get('results', {})[0].get('properties', None)
#inf = info.get('properties')
print(res)

for run in data['runs']:
for result in run['results']:
properties = result['properties']
print("information = {}".format(properties['information']))

Creating a JSON text file from a nested dictionary

I am creating a JSON file of a nested dictionary. My code is currently as follows:
myfamily = {
"child1" : {
"name" : "Emil"
},
"child2" : {
"name" : "Tobias"
},
"child3" : {
"name" : "Linus"
}
}
names = []
for i in myfamily.values():
print(type(i))
print(i)
s = json.dumps(i)
names.append(s)
df_family = pd.DataFrame()
df_family['Child'] = myfamily.keys()
df_family['Name'] = values
text = df_family.to_json(orient='records')
print(text)
This leads to the following output:
[{"Child":"child1","Name":"{\"2022\": 50, \"2023\": 50, \"2024\": 0}"},{"Child":"child2","Name":"{\"2022\": 50, \"2023\": 50, \"2024\": 50}"},{"Child":"child3","Name":"{\"2022\": 0, \"2023\": 100, \"2024\": 0}"}]
So my question is, why are these slashes added and is this the correct way to create a JSON text format of a nested dictionary?

import json
myfamily = {
"child1" : {
"name" : "Emil"
},
"child2" : {
"name" : "Tobias"
},
"child3" : {
"name" : "Linus"
}
}
def nested_json(dict_t,fist_key="Child"):
list_t= []
for key,val in myfamily.items():
nested_key = next(iter( val.keys()))
list_t+= [{
fist_key:key,
nested_key:val[nested_key]
}]
return json.dumps(list_t)
nested_json(myfamily)

Python: Get all values of a specific key from json file

Im getting the json data from a file:
"students": [
{
"name" : "ben",
"age" : 15
},
{
"name" : "sam",
"age" : 14
}
]
}
here's my initial code:
def get_names():
students = open('students.json')
data = json.load(students)
I want to get the values of all names
[ben,sam]

you need to extract the names from the students list.
data = {"students": [
{
"name" : "ben",
"age" : 15
},
{
"name" : "sam",
"age" : 14
}
]
}
names = [each_student['name'] for each_student in data['students']]
print(names) #['ben', 'sam']

Try using a list comprehension:
>>> [dct['name'] for dct in data['students']]
['ben', 'sam']
>>>

import json
with open('./students.json', 'r') as students_file:
students_content = json.load(students_file)
print([student['name'] for student in students_content['students']]) # ['ben', 'sam']

JSON's load function from the docs:
Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object...
The JSON file in students.json will look like:
{
"students": [
{
"name" : "ben",
"age" : 15
},
{
"name" : "sam",
"age" : 14
}
]
}
The JSON load function can then be used to deserialize this JSON object in the file to a Python dictionary:
import json
# use with context manager to ensure the file closes properly
with open('students.json', 'rb')as students_fp:
data = json.load(students_fp)
print(type(data)) # dict i.e. a Python dictionary
# list comprehension to take the name of each student
names = [student['name'] for student in data['students']]
Where names now contains the desired:
["ben", "sam"]

Correctly referencing a JSON in Python. Strings versus Integers and Nested items

Sample JSON file below
{
"destination_addresses" : [ "New York, NY, USA" ],
"origin_addresses" : [ "Washington, DC, USA" ],
"rows" : [
{
"elements" : [
{
"distance" : {
"text" : "225 mi",
"value" : 361715
},
"duration" : {
"text" : "3 hours 49 mins",
"value" : 13725
},
"status" : "OK"
}
]
}
],
"status" : "OK"
}
I'm looking to reference the text value for distance and duration. I've done research but i'm still not sure what i'm doing wrong...
I have a work around using several lines of code, but i'm looking for a clean one line solution..
thanks for your help!

If you're using the regular JSON module:
import json
And you're opening your JSON like this:
json_data = open("my_json.json").read()
data = json.loads(json_data)
# Equivalent to:
data = json.load(open("my_json.json"))
# Notice json.load vs. json.loads
Then this should do what you want:
distance_text, duration_text = [data['rows'][0]['elements'][0][key]['text'] for key in ['distance', 'duration']]
Hope this is what you wanted!

Recursively parse multiple json objects to practice calculations - Python

I have to set up an analytical system of a Json data base(Firebase Realtime Database).
Here's what it looks like:
{
"247" : {
"activity_duration" : 15,
"battery_used" : 0,
"date" : "2017-09-05",
"day" : 247,
"heat_waves" : 3,
"outside_temperature" : 16.64,
"shirt_temperature" : [ 24.883928571428573, 23.660714285714285 ]
},
"262" : {
"activity_duration" : 240,
"battery_used" : 2,
"date" : "2017-09-20",
"day" : 262,
"heat_waves" : 5,
"outside_temperature" : 21.19,
"shirt_temperature" : [ 24.233616504854368, 22.954490291262136 ]
},
"268" : {
"activity_duration" : 260,
"battery_used" : 5,
"date" : "2017-09-26",
"day" : 268,
"heat_waves" : 4,
"outside_temperature" : 16.07,
"shirt_temperature" : [ 18.68695652173913, 17.576630434782608 ]
}
}
To do this, I want to practice calculations in python on my json file like the average of heat_waves.
The problem is that I can not access the nodes without writing them in raw.
data["247"]["heat_waves"] but I want something like data[0]["heat_waves"].When I try:
import json;
data = [ ];
filename = "Database.json";
with open(filename,'r') as json_data:
data = json.load(json_data);
print(json.dumps(data[0]["heat_waves"], indent=4, sort_keys=True));
I have this error message:
print(json.dumps(data[0]["heat_waves"], indent=4, sort_keys=True));
KeyError: 0
So, my final question is:
How can I access these nodes without writing them in raw?

If you dont want to address elements by key but with index then you can do
>>> import json;
>>>
>>> data = [ ];
>>> filename = "Database.json";
>>> with open(filename,'r') as json_data:
... data = json.load(json_data)
>>> data.values()[0]['heat_waves']
5
>>>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading nested dictionary-like txt file into a Pandas dataframe - python

Related

how to read nested lists information from a Json file using Python

Creating a JSON text file from a nested dictionary

Python: Get all values of a specific key from json file

Correctly referencing a JSON in Python. Strings versus Integers and Nested items

Recursively parse multiple json objects to practice calculations - Python

Categories

Resources