How i can parse the first information in to CSV? - python

I'm trying to parse more than 100 json files, but i do not need all the info.
i only need to parse the first set of the 'coordinates', the CSV already have printed URL and URL type, but i cannot print the first set of coordinates.
this is a section of the Json file
{
"type":"featureCollection",
"features" : [
{
"type": "feature",
"geometry": {
"type": "multilinestring",
"coordinates":[
[
[
148.9395348,
-21.3292286
],
[
148.93963,
-21.33001
],
[
148.93969,
-21.3303
]
]
]
},
"properties":{
"url" :"www.thiswebpageisfake.com",
"url_type":"fake"
},
"information":{
"timestamp":"10/10/19"
}
}]
}
i'm using python 2.7, i have tried creating an array for coordinates but i have a type error
import os
import csv
import json
import sys
reload(sys)
file_path = 'C:\\Users\\user\\Desktop\\Python\\json'
dirs = os.listdir(file_path)
file_out = 'C:\\Users\\user\\output.csv'
f = csv.writer(open(file_out, "wb+"))
f.writerow(
['url','url_type','lat','long'])
for file in dirs:
json_dict = json.loads(open(os.path.join(file_path, file)).read())
print file
for key in json_dict['features']:
for key1 in key:
description = key['properties']['description']
if description is None:
description = 'null'
array = ()
array = (key['geometry']['type']['coordinates'])
f.writerow([file,
key['properties']['url'],
key['properties']['url_type'],
array[1]
])
print 'completed'

Firstly, it looks like your second loop is supposed to be nested in the first, otherwise you'll do nothing with all json files except the last and only end up processing one file.
Secondly, your array should be defined as array = (key['geometry']['coordinates']), as 'coordinates' is not contained in 'type'.

Related

Getting value from a JSON file based on condition

In python I'm trying to get the value(s) of the key "relativePaths" from a JSON element if that element contains the value "concept" for the key "tags". The JSON file has the following format.
]
},
{
"fileName": "#Weizman.2011",
"relativePath": "Text/#Weizman.2011.md",
"tags": [
"text",
"concept"
],
"frontmatter": {
"authors": "Weizman",
"year": 2011,
"position": {
"start": {
"line": 0,
"col": 0,
"offset": 0
},
"end": {
"line": 4,
"col": 3,
"offset": 120
}
}
},
"aliases": [
"The least of all possible evils - humanitarian violence from Arendt to Gaza"
],
I have tried the following codes:
import json
with open("/Users/metadata.json") as jsonFile:
data = json.load(jsonFile)
for s in range(len(data)):
if 'tags' in s in range(len(data)):
if data[s]["tags"] == "concept":
files = data[s]["relativePaths"]
print(files)
Which results in the error message:
TypeError: argument of type 'int' is not iterable
I then tried:
with open("/Users/metadata.json") as jsonFile:
data = json.load(jsonFile)
for s in str(data):
if 'tags' in s in str(data):
print(s["relativePaths"])
That code seems to work. But I don't get any output from the print command. What am I doing wrong?
Assuming your json is a list of the type you put on your question, you can get those values like this:
with open("/Users/metadata.json") as jsonFile:
data = json.load(jsonFile)
for item in data: # Assumes the first level of the json is a list
if ('tags' in item) and ('concept' in item['tags']): # Assumes that not all items have a 'tags' entry
print(item['relativePaths']) # Will trigger an error if relativePaths is not in the dictionary
Figured it
import json
f = open("/Users/metadata.json")
# returns JSON object as
# a dictionary
data = json.load(f)
# Iterating through the json
# list
for i in data:
if "tags" in i:
if "concept" in i["tags"]:
print(i["relativePaths"])
# Closing file
f.close()
I think this will do what you want. It is more "pythonic" because it doesn't use numerical indices to access elements of the list — making it easier to write and read).
import json
with open("metadata.json") as jsonFile:
data = json.load(jsonFile)
for elem in data:
if 'tags' in elem and 'concept' in elem['tags']:
files = elem["relativePath"]
print(files)

Generate JSON file from file names and their content

I'm new to python, and couldn't find a close enough answer to make me figure it out. I'm trying generate a single json file that contains current directory file names that end with a .txt extension as nodes, and the contents of those files as a list inside the file name's node.
for example:
node1.txt contains
foo
bar
and node2.txt contains
test1
test2
the output should look like:
{
"node1": [
"foo",
"bar"
],
"node2": [
"test1",
"test2"
]
}
Use pathlib and json modules and a simple loop...
import pathlib
import json
data = {}
for node in pathlib.Path('.').glob('node*.txt'):
with open(node, 'r') as fp:
data[node.stem] = [line.strip() for line in fp.readlines()]
print(json.dumps(data, indent=4))
Output:
{
"node1": [
"foo",
"bar"
],
"node2": [
"test1",
"test2"
]
}

How can I even use the'else' syntax in Python?

I am reading data from a JSON file to check the existence of some values.
In the JSON structure below, I try to find adomain from the data in bid and check if there is a cat value, which is not always present.
How do I fix it in the syntax below?
import pandas as pd
import json
path = 'C:/MyWorks/Python/Anal/data_sample.json'
records = [json.loads(line) for line in open(path, encoding='utf-8')]
adomain = [
rec['win_res']['seatbid'][0]['bid'][0]['adomain']
for rec in records
if 'adomain' in rec
]
Here is a data sample:
[
{ "win_res": {
"id": "12345",
"seatbid": [
{
"bid": [
{
"id": "12345",
"impid": "1",
"price": 0.1,
"adm": "",
"adomain": [
"adomain.com"
],
"iurl": "url.com",
"cid": "11",
"crid": "11",
"cat": [
"IAB12345"
],
"w": 1,
"h": 1
}
],
"seat": "1"
}
]
}}
]
As a result, the adomain value exists unconditionally, but the cat value may not be present sometimes.
So, if cat exists in adomain, I want to express adomain and cat in this way, but if there is no adomain, the cat value, how can I do it?
Your question is not clear but I think this is what you are looking for:
import json
path = 'C:/MyWorks/Python/Anal/data_sample.json'
with open(path, encoding='utf-8') as f:
records = json.load(f)
adomain = [
_['win_res']['seatbid'][0]['bid'][0]['adomain']
for _ in records
if _['win_res']['seatbid'][0]['bid'][0].get('adomain', None) and
_['win_res']['seatbid'][0]['bid'][0].get('cat', None)
]
The code above will add the value of ['win_res']['seatbid'][0]['bid'][0]['adomain'] to the list adomain only if there is a ['win_res']['seatbid'][0]['bid'][0]['cat'] corresponding value.
The code will be a lot clearer if we just walk through a bids list. Something like this:
import json
path = 'C:/MyWorks/Python/Anal/data_sample.json'
with open(path, encoding='utf-8') as f:
records = json.load(f)
bids = [_['win_res']['seatbid'][0]['bid'][0] for _ in records]
adomain = [
_['adomain']
for _ in bids
if _.get('adomain', None) and _.get('cat', None)
]

How can i read through each line of a file and append it to a json file in python?

I've been trying to figure out a way to store proxy data in a json form, i know the easier way is to just take each proxy from the text box and save it to the file and then to access it i would just load the information from the file but i want to have groups that work with different types of IP's. Say for example one group uses the proxy IP from a certain provider and another group would use an IP from a different one, i would need to store the IP's in their respected groups which is why i think i need to create a json file to store each of the proxies in their own json array. What i'm having trouble with is adding the IP's to the json array as i am trying to loop over a transfer file with the IP's in them and then add it to the json array. As of now i tried this,
def save_proxy():
proxy = pooled_data_default.get('1.0', 'end-2c')
transfer_file = open('proxies.txt', 'w')
transfer_file.write(proxy)
transfer_file.close()
transfer_file1 = open('proxies.txt', 'r')
try:
with open('proxy_groups.txt', 'r+') as file:
proxy_group = json.load(file)
except:
proxy_group = []
total = []
for line in transfer_file1:
line = transfer_file1.readline().strip()
total.append(line)
proxy_group.append({
'group_name': pool_info.get(),
'proxy': [{
'proxy': total,
}]
}),
with open('proxy_groups.txt', 'w') as outfile:
json.dump(proxy_group, outfile, indent=4)
This doesn't work but it was my attempt at taking each line from the file and adding it to the json array dynamically. Any help is appreciated.
EDIT: this is what is being outputted:
[
{
"group_name": "Defualt",
"proxy": [
{
"proxy": [
"asdf",
""
]
}
]
}
]
This was the input
wdsa
asdf
sfs
It seems that it is only selecting the middle one of the 3. I thought that printing the list of them would work but it is still printing the middle and then a blank space at the end.
An example of my data is the input to the text box may be
wkenwwins:1000:username:password
uwhsuh:1000:username:password
2ewswsd:1000:username:password
gfrfccv:1000:username:password
the selected group which i may want to save this to could be called 'Default'. I select default and then clicking save should add these inputs to the seperate txt sheet called 'proxies.txt', which it does. From the text sheet i then want to loop through each line and append each line to the json data. Which it doesnt do, here it was i expect it to look like in json data
[
{
"group_name": "Defualt",
"proxy": [
{
"proxy": [
'ewswsd:1000:username:password',
'wkenwwins:1000:username:password',
'uwhsuh:1000:username:password'
]
}
]
}
]
So then say if i made 2 groups the json data txt file should look like this:
[
{
"group_name": "Defualt",
"proxy": [
{
"proxy": [
'ewswsd:1000:username:password',
'wkenwwins:1000:username:password',
'uwhsuh:1000:username:password'
]
}
]
}
]
[
{
"group_name": "Test",
"proxy": [
{
"proxy": [
'ewswsd:1000:username:password',
'wkenwwins:1000:username:password',
'uwhsuh:1000:username:password'
]
}
]
}
]
This is so i can access each group by only calling the group name.
You can simplify the save_proxy() as below:
def save_proxy():
proxy = pooled_data_default.get('1.0', 'end-1c')
# save the proxies to file
with open('proxies.txt', 'w') as transfer_file:
transfer_file.write(proxy)
# load the proxy_groups if exists
try:
with open('proxy_groups.txt', 'r') as file:
proxy_group = json.load(file)
except:
proxy_group = []
proxy_group.append({
'group_name': pool_info.get(),
'proxy': proxy.splitlines()
})
with open('proxy_groups.txt', 'w') as outfile:
json.dump(proxy_group, outfile, indent=4)
The output file proxy_groups.txt would look like below:
[
{
"group_name": "default",
"proxy": [
"wkenwwins:1000:username:password",
"uwhsuh:1000:username:password"
]
}
]

In Python, how to parse the complex json file

I want to get "path" from the below json file; I used json.load to get read json file and then parse one by one using for key, value in data.items() and it leads to lot of for loop (Say 6 loops) to get to the value of "path"; Is there any simple method to retrieve the value of path?
The complete json file can be found here and below is the snippet of it.
{
"products": {
"com.ubuntu.juju:12.04:amd64": {
"version": "2.0.1",
"arch": "amd64",
"versions": {
"20161129": {
"items": {
"2.0.1-precise-amd64": {
"release": "precise",
"version": "2.0.1",
"arch": "amd64",
"size": 23525972,
"path": "released/juju-2.0.1-precise-amd64.tgz",
"ftype": "tar.gz",
"sha256": "f548ac7b2a81d15f066674365657d3681e3d46bf797263c02e883335d24b5cda"
}
}
}
}
},
"com.ubuntu.juju:14.04:amd64": {
"version": "2.0.1",
"arch": "amd64",
"versions": {
"20161129": {
"items": {
"2.0.1-trusty-amd64": {
"release": "trusty",
"version": "2.0.1",
"arch": "amd64",
"size": 23526508,
"path": "released/juju-2.0.1-trusty-amd64.tgz",
"ftype": "tar.gz",
"sha256": "7b86875234477e7a59813bc2076a7c1b5f1d693b8e1f2691cca6643a2b0dc0a2"
}
}
}
}
},
You can use recursive generator:
def get_paths(data):
if 'path' in data:
yield data['path']
for k in data.keys():
if isinstance(data[k], dict):
for i in get_paths(data[k]):
yield i
for path in get_paths(json_data): # loaded json data
print(path)
Is path key always at the same depth in the loaded json (which is a dict so) ? If so, what about doing
products = loaded_json['products']
for product in products.items():
print product[1].items()[2][1].items()[0][1].items()[0][1].items()[0][1]['path']
If not, the answer of Yevhen Kuzmovych is clearly better, cleaner and more general than mine.
If you only care about the path, I think using any JSON parser is an overkill, you can just use built in re regex and use the following pattern (\"path\":\s*\")(.*\s*)(?=\",). I didn't test the whole file but should be able to figure out the best pattern fairly easily.
If you only need the file names present in path field, you can easily get them by simply parsing the file:
import re
files = []
pathre = re.compile(r'\s*"path"\s*:\s*"(.*?)"')
with open('file.json') as fd:
for line in fd:
if "path" in line:
m = pathre.match(line)
if m is not None:
files.append(m.group(1))
If you need to process simultaneously the path and sha256 fields:
files = []
pathre = re.compile(r'\s*"path"\s*:\s*"(.*?)"')
share = re.compile(r'\s*"sha256"\s*:\s*"(.*?)"')
path = None
with open('file.json') as fd:
for line in fd:
if "path" in line:
m = pathre.match(line)
path = m.group(1)
elif "sha256" in line:
m = share.match(line)
if path is not None:
files.append((path, m.group(1)))
path = None
You can use a query language like JSONPath. Here you find the Python implementation: https://pypi.python.org/pypi/jsonpath-rw
Assuming you have your JSON content already loaded, you can do something like the following:
from jsonpath_rw import jsonpath, parse
# Load your JSON content first from a file or from a string
# json_data = ...
jsonpath_expr = parse('products..path')
for match in jsonpath_expr.find(json_data):
print(match.value)
For a further discussion you can read this: Is there a query language for JSON?

Categories

Resources