How can I even use the'else' syntax in Python? - python

I am reading data from a JSON file to check the existence of some values.
In the JSON structure below, I try to find adomain from the data in bid and check if there is a cat value, which is not always present.
How do I fix it in the syntax below?
import pandas as pd
import json
path = 'C:/MyWorks/Python/Anal/data_sample.json'
records = [json.loads(line) for line in open(path, encoding='utf-8')]
adomain = [
rec['win_res']['seatbid'][0]['bid'][0]['adomain']
for rec in records
if 'adomain' in rec
]
Here is a data sample:
[
{ "win_res": {
"id": "12345",
"seatbid": [
{
"bid": [
{
"id": "12345",
"impid": "1",
"price": 0.1,
"adm": "",
"adomain": [
"adomain.com"
],
"iurl": "url.com",
"cid": "11",
"crid": "11",
"cat": [
"IAB12345"
],
"w": 1,
"h": 1
}
],
"seat": "1"
}
]
}}
]
As a result, the adomain value exists unconditionally, but the cat value may not be present sometimes.
So, if cat exists in adomain, I want to express adomain and cat in this way, but if there is no adomain, the cat value, how can I do it?

Your question is not clear but I think this is what you are looking for:
import json
path = 'C:/MyWorks/Python/Anal/data_sample.json'
with open(path, encoding='utf-8') as f:
records = json.load(f)
adomain = [
_['win_res']['seatbid'][0]['bid'][0]['adomain']
for _ in records
if _['win_res']['seatbid'][0]['bid'][0].get('adomain', None) and
_['win_res']['seatbid'][0]['bid'][0].get('cat', None)
]
The code above will add the value of ['win_res']['seatbid'][0]['bid'][0]['adomain'] to the list adomain only if there is a ['win_res']['seatbid'][0]['bid'][0]['cat'] corresponding value.
The code will be a lot clearer if we just walk through a bids list. Something like this:
import json
path = 'C:/MyWorks/Python/Anal/data_sample.json'
with open(path, encoding='utf-8') as f:
records = json.load(f)
bids = [_['win_res']['seatbid'][0]['bid'][0] for _ in records]
adomain = [
_['adomain']
for _ in bids
if _.get('adomain', None) and _.get('cat', None)
]

Related

Getting value from a JSON file based on condition

In python I'm trying to get the value(s) of the key "relativePaths" from a JSON element if that element contains the value "concept" for the key "tags". The JSON file has the following format.
]
},
{
"fileName": "#Weizman.2011",
"relativePath": "Text/#Weizman.2011.md",
"tags": [
"text",
"concept"
],
"frontmatter": {
"authors": "Weizman",
"year": 2011,
"position": {
"start": {
"line": 0,
"col": 0,
"offset": 0
},
"end": {
"line": 4,
"col": 3,
"offset": 120
}
}
},
"aliases": [
"The least of all possible evils - humanitarian violence from Arendt to Gaza"
],
I have tried the following codes:
import json
with open("/Users/metadata.json") as jsonFile:
data = json.load(jsonFile)
for s in range(len(data)):
if 'tags' in s in range(len(data)):
if data[s]["tags"] == "concept":
files = data[s]["relativePaths"]
print(files)
Which results in the error message:
TypeError: argument of type 'int' is not iterable
I then tried:
with open("/Users/metadata.json") as jsonFile:
data = json.load(jsonFile)
for s in str(data):
if 'tags' in s in str(data):
print(s["relativePaths"])
That code seems to work. But I don't get any output from the print command. What am I doing wrong?
Assuming your json is a list of the type you put on your question, you can get those values like this:
with open("/Users/metadata.json") as jsonFile:
data = json.load(jsonFile)
for item in data: # Assumes the first level of the json is a list
if ('tags' in item) and ('concept' in item['tags']): # Assumes that not all items have a 'tags' entry
print(item['relativePaths']) # Will trigger an error if relativePaths is not in the dictionary
Figured it
import json
f = open("/Users/metadata.json")
# returns JSON object as
# a dictionary
data = json.load(f)
# Iterating through the json
# list
for i in data:
if "tags" in i:
if "concept" in i["tags"]:
print(i["relativePaths"])
# Closing file
f.close()
I think this will do what you want. It is more "pythonic" because it doesn't use numerical indices to access elements of the list — making it easier to write and read).
import json
with open("metadata.json") as jsonFile:
data = json.load(jsonFile)
for elem in data:
if 'tags' in elem and 'concept' in elem['tags']:
files = elem["relativePath"]
print(files)

Remove specific data from json file in linux

I have a json file with set of data with repeating fields. I need to remove an entry with specific data of a field.
Json file:
[
{
"targets": [
"172.17.1.199"
],
"labels": {
"__meta_netbox_pop": "st-1742",
"__snmp_module__": "arista_sw"
}
},
{
"targets": [
"172.17.1.51"
],
"labels": {
"__meta_netbox_pop": "st-1754",
"__snmp_module__": "arista_sw"
}
}
]
The json file goes on and but this is an example of the whole json file.
I need to remove an entry of targets with its labels given a data of the target's IP.
Input:
172.17.1.51
expected output:
[
{
"targets": [
"172.17.1.199"
],
"labels": {
"__meta_netbox_pop": "st-1742",
"__snmp_module__": "arista_sw"
}
}
]
Using jq:
$ jq --arg ip 172.17.1.51 'map(select(.targets | contains([$ip]) | not ))' input.json
[
{
"targets": [
"172.17.1.199"
],
"labels": {
"__meta_netbox_pop": "st-1742",
"__snmp_module__": "arista_sw"
}
}
]
If I understand the question correctly, this should do the trick:
The target parameter is the IP address you want to remove, while the file_path is the path of the json file
edit: also, don't forget to import json or else it won't work
def remove_obj(target, file_path):
with open(file_path, "r") as data_file:
data = json.load(data_file)
for obj in data:
if target in obj["targets"]:
data.remove(obj)
with open(file_path, "w") as data_file:
json.dump(data, data_file, indent=4)

How i can parse the first information in to CSV?

I'm trying to parse more than 100 json files, but i do not need all the info.
i only need to parse the first set of the 'coordinates', the CSV already have printed URL and URL type, but i cannot print the first set of coordinates.
this is a section of the Json file
{
"type":"featureCollection",
"features" : [
{
"type": "feature",
"geometry": {
"type": "multilinestring",
"coordinates":[
[
[
148.9395348,
-21.3292286
],
[
148.93963,
-21.33001
],
[
148.93969,
-21.3303
]
]
]
},
"properties":{
"url" :"www.thiswebpageisfake.com",
"url_type":"fake"
},
"information":{
"timestamp":"10/10/19"
}
}]
}
i'm using python 2.7, i have tried creating an array for coordinates but i have a type error
import os
import csv
import json
import sys
reload(sys)
file_path = 'C:\\Users\\user\\Desktop\\Python\\json'
dirs = os.listdir(file_path)
file_out = 'C:\\Users\\user\\output.csv'
f = csv.writer(open(file_out, "wb+"))
f.writerow(
['url','url_type','lat','long'])
for file in dirs:
json_dict = json.loads(open(os.path.join(file_path, file)).read())
print file
for key in json_dict['features']:
for key1 in key:
description = key['properties']['description']
if description is None:
description = 'null'
array = ()
array = (key['geometry']['type']['coordinates'])
f.writerow([file,
key['properties']['url'],
key['properties']['url_type'],
array[1]
])
print 'completed'
Firstly, it looks like your second loop is supposed to be nested in the first, otherwise you'll do nothing with all json files except the last and only end up processing one file.
Secondly, your array should be defined as array = (key['geometry']['coordinates']), as 'coordinates' is not contained in 'type'.

convert csv file to multiple nested json format

I have written a code to convert csv file to nested json format. I have multiple columns to be nested hence assigning separately for each column. The problem is I'm getting 2 fields for the same column in the json output.
import csv
import json
from collections import OrderedDict
csv_file = 'data.csv'
json_file = csv_file + '.json'
def main(input_file):
csv_rows = []
with open(input_file, 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter='|')
for row in reader:
row['TYPE'] = 'REVIEW', # adding new key, value
row['RAWID'] = 1,
row['CUSTOMER'] = {
"ID": row['CUSTOMER_ID'],
"NAME": row['CUSTOMER_NAME']
}
row['CATEGORY'] = {
"ID": row['CATEGORY_ID'],
"NAME": row['CATEGORY']
}
del (row["CUSTOMER_NAME"], row["CATEGORY_ID"],
row["CATEGORY"], row["CUSTOMER_ID"]) # deleting since fields coccuring twice
csv_rows.append(row)
with open(json_file, 'w') as f:
json.dump(csv_rows, f, sort_keys=True, indent=4, ensure_ascii=False)
f.write('\n')
The output is as below:
[
{
"CATEGORY": {
"ID": "1",
"NAME": "Consumers"
},
"CATEGORY_ID": "1",
"CUSTOMER_ID": "41",
"CUSTOMER": {
"ID": "41",
"NAME": "SA Port"
},
"CUSTOMER_NAME": "SA Port",
"RAWID": [
1
]
}
]
I'm getting 2 entries for the fields I have assigned using row[''].
Is there any other way to get rid of this? I want only one entry for a particular field in each record.
Also how can I convert the keys to lower case after reading from csv.DictReader(). In my csv file all the columns are in upper case and hence I'm using the same to assign. But I want to convert all of them to lower case.
In order to convert the keys to lower case, it would be simpler to generate a new dict per row. BTW, it should be enough to get rid of the duplicate fields:
for row in reader:
orow = collection.OrderedDict()
orow['type'] = 'REVIEW', # adding new key, value
orow['rawid'] = 1,
orow['customer'] = {
"id": row['CUSTOMER_ID'],
"name": row['CUSTOMER_NAME']
}
orow['category'] = {
"id": row['CATEGORY_ID'],
"name": row['CATEGORY']
}
csv_rows.append(orow)

extract urls from json file without data name using python

i have json file that containd the metadata of 900 articles and i want to extract the Urls from it. my file start like this
[
{
"title": "The histologic phenotypes of …",
"authors": [
{
"name": "JE Armes"
},
],
"publisher": "Wiley Online Library",
"article_url": "https://onlinelibrary.wiley.com/doi/abs/10.1002/(SICI)1097-0142(19981201)83:11%3C2335::AID-CNCR13%3E3.0.CO;2-N",
"cites": 261,
"use": true
},
{
"title": "Comparative epidemiology of pemphigus in ...",
"authors": [
{
"name": "S Bastuji-Garin"
},
{
"name": "R Souissi"
}
],
"year": 1995,
"publisher": "search.ebscohost.com",
"article_url": "http://search.ebscohost.com/login.aspx?direct=true&profile=ehost&scope=site&authtype=crawler&jrnl=0022202X&AN=12612836&h=B9CC58JNdE8SYy4M4RyVS%2FrPdlkoZF%2FM5hifWcv%2FwFvGxUCbEaBxwQghRKlK2vLtwY2WrNNl%2B3z%2BiQawA%2BocoA%3D%3D&crl=c",
"use": true
},
.........
I want to inspect the file with objectpath to create json.tree for the extraxtion of the url. this is the code i want to execute
1. import json
2. import objectpath
3. with open("Data_sample.json") as datafile: data = json.load(datafile)
4. jsonnn_tree = objectpath.Tree(data['name of data'])
5. result_tuple = tuple(jsonnn_tree.execute('$..article_url'))
But in the step 4 for the creation of the tree, I have to insert the name of the data whitch i think that i haven't in my file. How can i replace this line?
You can get all the article urls using a list comprehension.
import json
with open("Data_sample.json") as fh:
articles = json.load(fh)
article_urls = [article['article_url'] for article in articles]
You can instantiate the tree like this:
tobj = op.Tree(your_data)
results = tobj.execute("$.article_url")
And in the end:
results = [x for x in results]
will yield:
["url1", "url2", ...]
Did you try removing the reference and just using:
jsonnn_tree = objectpath.Tree(data)

Categories

Resources