Get Average from two JSON Arrays - python

I have two json feeds which I combined into one file. Events and Users. I need to get the average age from users.json to determine the average age of all distinct users who visited the home page (events.json).
Sample JSON Feed Events:
"events": [
{
"name": "Added item to cart",
"timestamp": 1422119921,
"user_id": "5a5598f2-f7db-420e-9b8e-52a9ad694bc1"
},
{
"name": "Visited product page",
"timestamp": 1409554014,
"user_id": "4683c9b6-3c8b-4215-a401-a9bbfde833ee"
}
Sample Users Feed:
"age": 27,
"gender": "F",
"device": "android"
},
"712ae3b5-fbf0-4d83-9324-adc06af77d3a": {
"age": 34,
"gender": "F",
"device": "android"
},
I'm new to python and I believe I am on the right track with the below but not sure where to go next. Any additional help would be appreciated.
import json
# Opening JSON file
with open('combined.json') as json_file:
data = json.load(json_file)
# for reading nested data [0] represents
# the index value of the list
print(data['events'][0])
print(data['users'][0])
# for printing the key-value pair of
# nested dictionary for looop can be used
print("\nPrinting nested dicitonary as a key-value pair\n")
for i in data['events']:
print("Name:", i['name'])
for i in data ['users']:
print ("Age:", i['age'])

You're on the right track, just add up all the ages and divide by the number of users. This is one way to do that:
import json
# Opening JSON file
with open('combined.json') as json_file:
data = json.load(json_file)
total_age = 0
counter = 0
for k, v in json.load(data['users']).items():
# Add all the ages
total_age += v['age']
counter += 1
print(total_age / counter)

Related

Getting value from a JSON file based on condition

In python I'm trying to get the value(s) of the key "relativePaths" from a JSON element if that element contains the value "concept" for the key "tags". The JSON file has the following format.
]
},
{
"fileName": "#Weizman.2011",
"relativePath": "Text/#Weizman.2011.md",
"tags": [
"text",
"concept"
],
"frontmatter": {
"authors": "Weizman",
"year": 2011,
"position": {
"start": {
"line": 0,
"col": 0,
"offset": 0
},
"end": {
"line": 4,
"col": 3,
"offset": 120
}
}
},
"aliases": [
"The least of all possible evils - humanitarian violence from Arendt to Gaza"
],
I have tried the following codes:
import json
with open("/Users/metadata.json") as jsonFile:
data = json.load(jsonFile)
for s in range(len(data)):
if 'tags' in s in range(len(data)):
if data[s]["tags"] == "concept":
files = data[s]["relativePaths"]
print(files)
Which results in the error message:
TypeError: argument of type 'int' is not iterable
I then tried:
with open("/Users/metadata.json") as jsonFile:
data = json.load(jsonFile)
for s in str(data):
if 'tags' in s in str(data):
print(s["relativePaths"])
That code seems to work. But I don't get any output from the print command. What am I doing wrong?
Assuming your json is a list of the type you put on your question, you can get those values like this:
with open("/Users/metadata.json") as jsonFile:
data = json.load(jsonFile)
for item in data: # Assumes the first level of the json is a list
if ('tags' in item) and ('concept' in item['tags']): # Assumes that not all items have a 'tags' entry
print(item['relativePaths']) # Will trigger an error if relativePaths is not in the dictionary
Figured it
import json
f = open("/Users/metadata.json")
# returns JSON object as
# a dictionary
data = json.load(f)
# Iterating through the json
# list
for i in data:
if "tags" in i:
if "concept" in i["tags"]:
print(i["relativePaths"])
# Closing file
f.close()
I think this will do what you want. It is more "pythonic" because it doesn't use numerical indices to access elements of the list — making it easier to write and read).
import json
with open("metadata.json") as jsonFile:
data = json.load(jsonFile)
for elem in data:
if 'tags' in elem and 'concept' in elem['tags']:
files = elem["relativePath"]
print(files)

Organizing JSON files while appending through Python (discord.py) input

I'm currently making a bot on Discord.py right now, and one of the commands takes input that would be stored into a JSON file. One of the keys is a sort of ID associated for each object, and its format is basically a sort of alphanumeric code. A sample version of the JSON looks like this:
{
"data": [
{
"id": "tbg210915-1",
"date": "September 15, 2021",
"time": "21:09",
"url": "https://example.com"
},
{
"id": "tbg210915-2",
"date": "September 15, 2021",
"time": "21:09",
"url": "https://example2.com"
},
{
"id": "tbg210917-1",
"date": "September 17, 2021",
"time": "01:33",
"url": "https://example3.com"
},
]
}
What I want to happen is that if I input a new object dict whose id is tbg210916-1, it would be appended between tbg210915-2 and tbg210917-1.
Here is how the command works right now so you know how these are currently being appended:
# function to generate the aforementioned id
def generateImgID(file, type, img_date):
# turning input data type into abbreviated codes
convert_list = {...} # a list of types one can input and what their corresponding code would be
if type in convert_list.keys():
m = convert_list[type]
# converting Month dd, yyyy format to yymmdd
d = datetime.datetime.strptime(img_date, '%B %d, %Y').strftime('%y%m%d')
# that number at the end thingy signifying what nth pic that is in a given day
new = open(file)
data = json.load(new)
result = Counter(data.values())[img_date]
if result == 0:
num = 1
else:
num = result + 1
n = str(num)
return f'{m}{d}-{n}'
# command to add new pic to database of json files
#client.command()
async def addpic(ctx):
def check(message):
return message.author == ctx.author and message.channel == ctx.channel
... # here is basically a multi-step input process then for the bot to ask the user what to input
# adding all input to json
with open(f'image json/{type}.json', 'r') as fp:
data = json.load(fp)
data[f'{type}'].append({
"id": img_id,
"date": img_date,
"time": img_time,
"url": img_url
})
with open(f'image json/{type}.json', 'w') as fp:
json.dump(data, fp, indent = 2)
await ctx.send(embed = successembed)
You can try sorting the list every time you append a new element i.e:
data.append(newelement)
data.sort(key=lambda x: x['id'])
Which will sort the list 'data' inplace

parse the file and generate a list of dictionaries where each user information is represented by a single dictionary

I have a .txt file like this image for txt file, where every column can be classified as UserName:ReferenceToThePassowrd:UserID:GroupID:UserIDInfo:HomeDir:LoginShell.
I want to make a script to parse the file and generate a list of dictionaries where each user information is represented by a single dictionary:
Here is an example of how the final output should look like:
[
{
"user_name": "user1",
"user_id": "1001",
"home_dir": "/home/user1",
"login_shell": "/bin/bash"
},
{
"user_name": "user2",
"user_id": "1002",
"home_dir": "/home/user2",
"login_shell": "/bin/bash"
}
]
A very basic way of doing this might look like:
objects = []
for line in open("myData.txt"):
line = line.strip()
items = line.split(":")
someDict = {}
someDict["first"] = items[0]
someDict["second"] = items[1]
# ... etc
objects.append(someDict)
print(objects)

To retrieve specific data from multiple similar portions in a .json file

A part of json file's content as below:
{"ID": "PK45", "People": "Kate", "Date": "2020-01-05"}, {"ID": "OI85", "People": "John", "Date": "2020-01-18" }, {"ID": "CN658", "People": "Pevo", "Date": "2020-02-01" }
It has multiple portions containing "ID", "People" and "Date".
What I want to do is to retrieve John's ID (in this case "OI85". )
If the key is unique, I can use:
data_content = json.loads(data)
ID = data_content['ID']
But there are multiple similar portions. So I can only locate the "John" first:
with open("C:\\the_file.json") as data_file:
data = data_file.read()
where = data.find('John')
where_1 = data[where - 20 : where]
ID = where_1[where_1.find('ID')+3:where_1.find('ID')+7]
print (ID)
It looked clumsy.
What will be the smart json way to retrieve the specific data from multiple similar portions in a .json file?
Thank you.
Iterate on the list of dicts until you find the right one:
import json
data = '[{"ID": "PK45", "People": "Kate", "Date": "2020-01-05"}, {"ID": "OI85", "People": "John", "Date": "2020-01-18" }, {"ID": "CN658", "People": "Pevo", "Date": "2020-02-01" }]'
data_content = json.loads(data)
def find_id_by_name(name, data):
for d in data:
if d['People'] == name:
return d["ID"]
else:
raise ValueError('Name not found')
print(find_id_by_name('John', data_content))
# OI85
print(find_id_by_name('Jane', data_content))
# ... ValueError: Name not found
If you have to do many such searches, it may be worth creating another dict from your data to associate IDs to names:
ids_by_name = {d['People']:d['ID'] for d in data_content}
print(ids_by_name['John'])
# OI85
You probably should use the json module, which makes the task trivial:
import json
with open('data.json') as f:
records = json.load(f)
for record in records:
if record['People'] == 'John':
print(record['ID'])
break

Python 3 Get JSON value

I am using rest with a python script to extract Name and Start Time from a response.
I can get the information but I can't combine data so that the information is on the same line in a CSV. When I go to export them to CSV they all go on new lines.
There is probably a much better way to extract data from a JSON List.
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
for key, value in data_item.items():
#driver = {}
#test = {}
#startTime = {}
if key == "Name":
drivers.append(value)
if key == "StartTime":
drivers.append(value)
print (drivers)
Code to write to CSV:
with open(logFileName, 'a') as outcsv:
# configure writer to write standard csv file
writer = csv.writer(outcsv, delimiter=',', quotechar="'",
quoting=csv.QUOTE_MINIMAL, lineterminator='\n',skipinitialspace=True)
for driver in drivers:
writer.writerow(driver)
Here is a sample of the response:
"Query": {
"Results": [
{
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1400"
},
{
"Name": " John Doe"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1401"
},
{
"Name": " Jane Smith"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
My ouput in csv:
John Doe
2018-06-19T07:16:10Z
Jane Smith
2018-06-19T07:16:10Z
Desired Outcome:
John Doe, 2018-06-19T07:16:10Z
Jane Smith, 2018-06-19T07:16:10Z
Just use normal dictionary access to get the values:
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
drivers.append(data_item["Name"])
if "StartTime" in data_item:
drivers.append(data_item["StartTime"])
print (drivers)
If you know the items will already have the required fields then you won't even need the in tests.
writer.writerow() expects a sequence. You are calling it with a single string as a parameter so it will split the string into individual characters. Probably you want to keep the name and start time together so extract them as a tuple:
for item in driverDetails['Query']['Results']:
name, start_time = "", ""
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
name = data_item["Name"]
if "StartTime" in data_item:
start_time = data_item["StartTime"]
drivers.append((name, start_time))
print (drivers)
Now instead of being a list of strings, drivers is a list of tuples: the name for every item that has a name and the start time but if an input item has a name and no start time that field could be empty. Your code to write the csv file should now do the expected thing.
If you want to get all or most of the values try gathering them together into a single dictionary, then you can pull out the fields you want:
for item in driverDetails['Query']['Results']:
fields = {}
for data_item in item['XValues']:
body.append(data_item)
fields.update(data_item)
drivers.append((fields["ID"], fields["Name"], fields["StartTime"]))
print (drivers)
Once you have the fields in a single dictionary you could even build the tuple with a loop:
drivers.append(tuple(fields[f] for f in ("ID", "Name", "StartTime", "ReportScopeStartTime", "ReportScopeEndTime")))
I think you should list the fields you want explicitly just to ensure that new fields don't surprise you.

Categories

Resources