JSON: Stripping file of unnecessary information in python

JSON: Stripping file of unnecessary information in python - python

I'm working on a simple web app that pulls query information from a news article api. I'm looking to reduce client-side processing by stripping a json file of unnecessary information within my flask server. I want to store the edited json in a database (currently just locally in code below).
Currently my python code looks like:
def get_query(query):
response = urllib2.urlopen(link + '?q=' + query + '&fl=' + fields + '&api-key=' + key)
result = response.read()
# store json locally
with open('static/json/' + query + '.json', 'w') as stored_json:
json.dump(result, stored_json)
with open('static/json/' + query + '.json', 'r') as stored_json:
return json.load(stored_json)
My issues are:
a) I am unsure of how to properly edit the json. Currently in my javascript I am using the data on my ajax call as:
data.response.docs[i].headline.main;
where I would rather just store and return the object docs as a json. I know variable result in my python code is a string so I cannot write and return result.response.docs. I tried returning response.response.docs but I realize this is incorrect.
b) My last four lines seem redundant, I was wondering how to place my return within my first open block. I tried both 'w+' and 'r+' with no luck.

Im not sure if I am getting your question completely, but it sounds like what you want to do is:
1) receive the response
2) parse the json into a Python object
3) filter the data
4) store the filtered data locally (in a database, file, etc)
5) return the filtered data to the client
I am supposing that your json.dump / json.load combination was intended to get the json string into a format that you can manipulate easily (i.e. a Python object). If so, the json.loads (emphasis on the s) does what you need. Try something like this:
import json
def get_query(query):
response = urllib2.urlopen(...)
result = json.loads(response.read())
# result is a regular Python object holding the data from the json response
filtered = filter_the_data(result)
# filter_the_data is some function that manipulates data
with open('outfile.json', 'w') as outfile:
# here dump (no s) is used to serialize the data
# back to json and store it on the filesystem as outfile.json
json.dump(filtered, outfile)
...
At this point you have saved the data locally, and you still hold a reference to the filtered data. You can re-serialize it and send it to the client easily using Flask's jsonify function
Hope it helps

Related

Issue with handling json api response in Python

I am using the Censys api in python to programmatically look through host and grab information about them. Censys website says it returns Json formatted data and it looks like Json formatted data but, I cant seem to figure out how to tun the api response into a json object. However, if i write the json response to a json file and load it. It works fine Any ideas?
Update: Figured out issue is with nested json that the api returns. Looking for libraries to flatten it.
Main.py
c = censys.ipv4.CensysIPv4(api_id=UID, api_secret=SECRET)
for result in c.search("autonomous_system.asn:15169 AND tags.raw:iot", max_records=1):
hostIPS.append(result["ip"]);
for host in hostIPS:
for details in c.view(host):
# test = json.dumps(details)
# test = json.load(test)
# data = json.load(details)
data = json.loads(details)
print(data)

You don't need to convert it to an object, it's already json.loaded. See the implementation here: https://github.com/censys/censys-python/blob/master/censys/base.py

Scraping only select fields from a JSON file

I'm trying to produce only the following JSON data fields, but for some reason it writes the entire page to the .html file? What am I doing wrong? It should only produce the boxes referenced e.g. title, audiosource url, medium sized image, etc?
r = urllib.urlopen('https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=10000&page=1')
data = json.loads(r.read().decode('utf-8'))
for post in data['posts']:
# data.append([post['title'], post['audioSource'], post['image']['medium'], post['excerpt']['long']])
([post['title'], post['audioSource'], post['image']['medium'], post['excerpt']['long']])
with io.open('criminal-json.html', 'w', encoding='utf-8') as r:
r.write(json.dumps(data, ensure_ascii=False))

You want to differentiate from your input data and your output data. In your for loop, you are referencing the same variable data that you are using to take input in as you are using to output. You want to add the selected data from the input to a list containing the output.
Don't re-use the same variable names. Here is what you want:
import urllib
import json
import io
url = urllib.urlopen('https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=10000&page=1')
data = json.loads(url.read().decode('utf-8'))
posts = []
for post in data['posts']:
posts.append([post['title'], post['audioSource'], post['image']['medium'], post['excerpt']['long']])
with io.open('criminal-json.html', 'w', encoding='utf-8') as r:
r.write(json.dumps(posts, ensure_ascii=False))

You are loading the whole json in the variable data, and you are dumping it without changing it. That's the reason why this is happening. What you need to do is put whatever you want into a new variable and then dump it.
See the line -
([post['title'], post['audioSource'], post['image']['medium'], post['excerpt']['long']])
it does nothing. So, data remains unchanged. Do what Mark Tolonen suggested and it'll be fine.

Use python and Accessing the request GITUB authenticated URL that response JSON file after that we have parse the JSON to CSV file format

I am new to python code. We are requesting the GitHub URL and Response is JSON. We have to parse the Json to filter out the labels that need to store in the CSV format as specific labels. We have authentication token that we use it for request of the URL. could you please provide the coding the above scenario

Your question is very general and since you didn't include any code, it seems like you are just looking for a straightforward answer.
I can't give you that, but the below code should get you started on converting a json into a python object, looking for a specific keyword label and writing it to a cvs file.
import json
x = json.loads(your_json_object)
for label in x:
with open('your_file.csv', 'w') as file:
for label in x:
file.write("{}, ".format(label))

How do I load JSON into Couchbase Headless Server in Python?

I am trying to create a Python script that can take a JSON object and insert it into a headless Couchbase server. I have been able to successfully connect to the server and insert some data. I'd like to be able to specify the path of a JSON object and upsert that.
So far I have this:
from couchbase.bucket import Bucket
from couchbase.exceptions import CouchbaseError
import json
cb = Bucket('couchbase://XXX.XXX.XXX?password=XXXX')
print cb.server_nodes
#tempJson = json.loads(open("myData.json","r"))
try:
result = cb.upsert('healthRec', {'record': 'bob'})
# result = cb.upsert('healthRec', {'record': tempJson})
except CouchbaseError as e:
print "Couldn't upsert", e
raise
print(cb.get('healthRec').value)
I know that the first commented out line that loads the json is incorrect because it is expecting a string not an actual json... Can anyone help?
Thanks!

Figured it out:
with open('myData.json', 'r') as f:
data = json.load(f)
try:
result = cb.upsert('healthRec', {'record': data})
I am looking into using cbdocloader, but this was my first step getting this to work. Thanks!

I know that you've found a solution that works for you in this instance but I thought I'd correct the issue that you experienced in your initial code snippet.
json.loads() takes a string as an input and decodes the json string into a dictionary (or whatever custom object you use based on the object_hook), which is why you were seeing the issue as you are passing it a file handle.
There is actually a method json.load() which works as expected, as you have used in your eventual answer.
You would have been able to use it as follows (if you wanted something slightly less verbose than the with statement):
tempJson = json.load(open("myData.json","r"))
As Kirk mentioned though if you have a large number of json documents to insert then it might be worth taking a look at cbdocloader as it will handle all of this boilerplate code for you (with appropriate error handling and other functionality).
This readme covers the uses of cbdocloader and how to format your data correctly to allow it to load your documents into Couchbase Server.

Importing JSON in Python and Removing Header

I'm trying to write a simple JSON to CSV converter in Python for Kiva. The JSON file I am working with looks like this:
{"header":{"total":412045,"page":1,"date":"2012-04-11T06:16:43Z","page_size":500},"loans":[{"id":84,"name":"Justine","description":{"languages":["en"], REST OF DATA
The problem is, when I use json.load, I only get the strings "header" and "loans" in data, but not the actual information such as id, name, description, etc. How can I skip over everything until the [? I have a lot of files to process, so I can't manually delete the beginning in each one. My current code is:
import csv
import json
fp = csv.writer(open("test.csv","wb+"))
f = open("loans/1.json")
data = json.load(f)
f.close()
for item in data:
fp.writerow([item["name"]] + [item["posted_date"]] + OTHER STUFF)

Instead of
for item in data:
use
for item in data['loans']:
The header is stored in data['header'] and data itself is a dictionary, so you'll have to key into it in order to access the data.

data is a dictionary, so for item in data iterates the keys.
You probably want for loan in data['loans']:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

JSON: Stripping file of unnecessary information in python - python

Related

Issue with handling json api response in Python

Scraping only select fields from a JSON file

Use python and Accessing the request GITUB authenticated URL that response JSON file after that we have parse the JSON to CSV file format

How do I load JSON into Couchbase Headless Server in Python?

Importing JSON in Python and Removing Header

Categories

Resources