working with JSON with a bash script - python

I’m writing a bash script that does a few things. Right now it copies a few files into correct directories and runs a few commands. I need this bash script to edit a JSON file. Essentially this script would append a snippet of JSON to an existing JSON object that exists in a file.JSON. I cannot just append the data because the JSON snippet must be a part of an existing JSON object (should be added to the tracks array). So is this possible to do with a bash script? Should I just write another python or R script to handle this JSON Logic or is there a more elegant solution. Thanks for any help.
file.JSON looks like this...
{
"formatVersion" : 1,
"tracks" : [
{
"key" : "Reference sequence",
"chunkSize" : 20000,
"urlTemplate" : "seq/{refseq_dirpath}/{refseq}-",
"storeClass" : "JBrowse/Store/Sequence/StaticChunked",
"type" : "SequenceTrack",
"seqType" : "dna",
"category" : "Reference sequence",
"label" : "DNA"
},
{
"type" : "FeatureTrack",
"label" : "gff_track1",
"trackType" : null,
"key" : "gff_track1",
"compress" : 0,
"style" : {
"className" : "feature"
},
"storeClass" : "JBrowse/Store/SeqFeature/NCList",
"urlTemplate" : "tracks/gff_track1/{refseq}/trackData.json"
},
{
"storeClass" : "JBrowse/Store/SeqFeature/NCList",
"style" : {
"className" : "feature"
},
"urlTemplate" : "tracks/ITAG2.4_gene_models.gff3/{refseq}/trackData.json",
"key" : "ITAG2.4_gene_models.gff3",
"compress" : 0,
"trackType" : null,
"label" : "ITAG242.4_gene_models.gff3",
"type" : "FeatureTrack"
},
{
"urlTemplate" : "g-231FRL.bam",
"storeClass" : "JBrowse/Store/SeqFeature/BAM",
"label" : "g-1FRL.bam",
"type" : "JBrowse/View/Track/Alignments2",
"key" : "g-1FRL.bam"
}
]
}
the JSON snippet looks like this ...
{
"urlTemplate": "AX2_filtered.vcf.gz",
"label": "AX2_filtered.vcf.gz",
"storeClass": "JBrowse/Store/SeqFeature/VCFTabix",
"type": "CanvasVariants"
}

Do yourself a favor and install jq, then it's as simple as:
jq -n 'input | .tracks += [inputs]' file.json snippet.json > out.json
Trying to modify a structured data (like JSON) is a fool's errand without a proper parser and jq really makes it easy.
However, if you prefer doing it through Python (although it would be an overkill for this kind of a task) it's pretty much as straight forward as with jq:
import json
with open("file.json", "r") as f, open("snippet.json", "r") as s, open("out.json", "w") as u:
data = json.load(f) # parse `file.json`
data["tracks"].append(json.load(s)) # parse `snippet.json` and append it to `.tracks[]`
json.dump(data, u, indent=4) # encode the data back to JSON and write it to `out.json`

Related

Exception when trying to parse large JSON file using ijson

I am trying to parse a large JSON file (16GB) using ijson but I always get the following error :
Exception has occurred: IncompleteJSONError
lexical error: invalid char in json text.
venue" : { "type" : NumberInt(0) }, "yea
(right here) ------^
File "C:\pyth\dblp_parser.py", line 14, in <module>
for record in ijson.items(f, 'item', use_float=True):
My code is as follows:
with open("dblpv13.json", "rb") as f:
for record in ijson.items(f, 'records.item', use_float=True):
paper_id = record["_id"] #_id is only for test
paper_id_tab.append(paper_id)
A part of my json file is as follows:
{
"_id" : "53e99784b7602d9701f3f636",
"title" : "Flatlined",
"authors" : [
{
"_id" : "53f58b15dabfaece00f8046d",
"name" : "Peter J. Denning",
"org" : "ACM Education Board",
"gid" : "5b86c72de1cd8e14a3c2b772",
"oid" : "544bd99545ce266baef0668a",
"orgid" : "5f71b2811c455f439fe3c58a"
}
],
"venue" : {
"_id" : "555036f57cea80f954169e28",
"raw" : "Commun. ACM",
"raw_zh" : null,
"publisher" : null,
"type" : NumberInt(0)
},
"year" : NumberInt(2002),
"keywords" : [
"linear scale",
"false dichotomy"
],
"n_citation" : NumberInt(7),
"page_start" : "15",
"page_end" : "19",
"lang" : "en",
"volume" : "45",
"issue" : "6",
"issn" : "",
"isbn" : "",
"doi" : "10.1145/508448.508463",
"pdf" : "",
"url" : [
"http://doi.acm.org/10.1145/508448.508463"
],
"abstract" : "Our propensity to create linear scales between opposing alternatives creates false dichotomies that hamper our thinking and limit our action."
},
I tried to fill in records item by item but always the same error. I'm completely blocked.
Please, can any body help me?
The same problem happened to me with the said dataset. ijson can't handle it. I overcame the problem by creating another dataset and then parsing the new dataset with ijson. The approach is quite simple: read the orignal dataset with simple read; remove "NumberInt(" and ")", write the result to a new json file. the code is given below.
f=open('dblpv13_clean.json')
with open('dblpv13.json','r',errors='ignore') as myFile:
for line in myFile:
line=line.replace("NumberInt(","").replace(")","")
f.write(line)
f.close()
Now you can parse the new dataset with ijson as follows.
with open('dblpv13_clean.json', "r",errors='ignore') as f:
for i, element in enumerate(ijson.items(f, "item")):
do something....

Correctly referencing a JSON in Python. Strings versus Integers and Nested items

Sample JSON file below
{
"destination_addresses" : [ "New York, NY, USA" ],
"origin_addresses" : [ "Washington, DC, USA" ],
"rows" : [
{
"elements" : [
{
"distance" : {
"text" : "225 mi",
"value" : 361715
},
"duration" : {
"text" : "3 hours 49 mins",
"value" : 13725
},
"status" : "OK"
}
]
}
],
"status" : "OK"
}
I'm looking to reference the text value for distance and duration. I've done research but i'm still not sure what i'm doing wrong...
I have a work around using several lines of code, but i'm looking for a clean one line solution..
thanks for your help!
If you're using the regular JSON module:
import json
And you're opening your JSON like this:
json_data = open("my_json.json").read()
data = json.loads(json_data)
# Equivalent to:
data = json.load(open("my_json.json"))
# Notice json.load vs. json.loads
Then this should do what you want:
distance_text, duration_text = [data['rows'][0]['elements'][0][key]['text'] for key in ['distance', 'duration']]
Hope this is what you wanted!

Jsonify data not returning to ajax call

I have an app where I am using flask, python, ajax, json, javascript, and leaflet. This app reads a csv file, puts it into json format, then returns it to an ajax call. My issue is that the geojson is not being returned. In the console, I am getting a 5000 NetworkError in the console log. The end result is to use the return geojson in a leaflet map layer. If I remove the jsonify, the return works fine, but it is a string of course, and this wont work for the layer.
As you can see, I have a simple alert("success") in the ajax success part. This is not being executed. Nor is the alert(data).
I do have jsonify in the from Flask import statement.
Thank you for the help
Ajax call
$.ajax({
type : "POST",
url : '/process',
data: {
chks: chks
}
})
.success(function(data){
alert("success"); // I am doing this just to get see if I get back here. I do not
alert(data);
python/flask
#app.route('/process', methods=['POST'])
def process():
data = request.form['chks']
rawData = csv.reader(open('static/csvfile.csv', 'r'), dialect='excel')
count = sum(1 for row in open('static/csvfile.csv))
template =\
''' \
{"type" : "Feature",
"geometry" : {
"type" : "Point",
"coordinates" : [%s, %s]},
"properties" : {"name" : "%s" }
}%s
'''
output = \
''' \
{"type" : "Feature Collection",
"features" : [
'''
iter = 0
separator = ","
lastrow = ""
for row in rawData:
iter += 1 // this is used to skip the first line of the csv file
if iter >=2:
id = row[0]
lat = row[1]
long = row[2]
if iter != count:
output += template % (row[2], row[1], row[0], separator)
else:
output += template % (row[2], row[1], row[0], lastrow)
output += \
''' \
]}
'''
return jsonify(output)
More Info - taking David Knipe's info into hand, If I remove the jsonify from my return statement, it returns what I expect, and I can output the return in an alert. It looks like this
{ "type" : "Feature Collection",
"features" : [
{"type" : "Feature",
"geometry" : {
"type" : "Point",
"coordinates" : [ -86.28, 32.36]},
"properties" : {"name" : "Montgomery"}
},
{ "type" : "Feature",
"geometry" : {
"type" : "Point",
"coordinates" : [ -105.42, 40.30]},
"properties" : {"name" : "Boulder"}
},
]}
If I take that data and hard code it into the ajax success, then pass it to the leaflet layer code like this - it will work, and my points will be displayed on my map
...
.success(function(data){
var pointsHC= { "type" : "Feature Collection",
"features" : [
{"type" : "Feature",
"geometry" : {
"type" : "Point",
"coordinates" : [ -86.28, 32.36]},
"properties" : {"name" : "Montgomery"}
},
{ "type" : "Feature",
"geometry" : {
"type" : "Point",
"coordinates" : [ -105.42, 40.30]},
"properties" : {"name" : "Boulder"}
},
]};
// leaflet part
var layer = L.geoJson(pointsHC, {
pointToLayer: function(feature, latlng){
return L.circleMarker( ...
If I do not hard code and pass the data via a variable, it does not work, and I get and invalid geoJson object. I have tried it with both the final semi-colon removed and not removed, and no love either way
...
.success(function(data){
// leaflet part
var layer = L.geoJson(data, {
pointToLayer: function(feature, latlng){
return L.circleMarker( ...
So it works if you don't try to parse the JSON, but if you do then it fails. Your JSON is invalid:
As loganbertram pointed out, you're missing a " on "Feature Collection".
You're missing a " on "properties".
output = template % ... should be output += template % ... - you're appending to output, not replacing it.
the features array will have a trailing comma (unless it is empty).
Although actually in your code features will always be empty anyway: you set iter = 0, never change its value, and then don't do the output = ... bit because iter < 2.
Are you sure you actually want to use jsonify? As I understand it, that turns any object into a JSON string. But output is already a JSON string - or should be, if you fix the various bugs loganbertram and I have spotted. In that case the client-side code will not fail trying to parse JSON. But if you jsonify something that's already JSON, you'll get something like this:
"{\"type\" : \"Feature\",
\"geometry\" : {
...
which the javascript will then convert back to the original JSON string, instead of a JSON object.
Actually, it would be better to rewrite the whole thing so it constructs an object instead of a string, and then calls jsonify on that object. But I don't know enough Python to give more details easily.

python elasticsearch bulk index datatype

I am using the following code to create an index and load data in elastic search
from elasticsearch import helpers, Elasticsearch
import csv
es = Elasticsearch()
es = Elasticsearch('localhost:9200')
index_name='wordcloud_data'
with open('./csv-data/' + index_name +'.csv') as f:
reader = csv.DictReader(f)
helpers.bulk(es, reader, index=index_name, doc_type='my-type')
print ("done")
My CSV data is as follows
date,word_data,word_count
2017-06-17,luxury vehicle,11
2017-06-17,signifies acceptance,17
2017-06-17,agency imposed,16
2017-06-17,customer appreciation,11
The data loads fine but then the datatype is not accurate
How do I force it to say that the word_count is integer and not text
See how it figures out the date type ?
Is there a way it can figure out the int datatype automatically ? or by passing some parameter ?
Also what do I do to increase the ignore_above or remove it for some of the fields if I wanted to. basically no limit to the number of characters ?
{
"wordcloud_data" : {
"mappings" : {
"my-type" : {
"properties" : {
"date" : {
"type" : "date"
},
"word_count" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"word_data" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
You need to create a mapping that would describe field types.
With the elasticsearch-py client this can be done using es.indices.put_mapping or index.create methods, by passing it JSON document that describes mappings, like shown in this SO answer. It would be something like this:
es.indices.put_mapping(
index="wordcloud_data",
doc_type="my-type",
body={
"properties": {
"date": {"type":"date"},
"word_data": {"type": "text"},
"word_count": {"type": "integer"}
}
}
)
However, I'd suggest to take a look at the elasticsearch-dsl package that provides much nicer declarative API to describe things. It would be something along those lines (untested):
from elasticsearch_dsl import DocType, Date, Integer, Text
from elasticsearch_dsl.connections import connections
from elasticsearch.helpers import bulk
connections.create_connection(hosts=["localhost"])
class WordCloud(DocType):
word_data = Text()
word_count = Integer()
date = Date()
class Index:
name = "wordcloud_data"
doc_type = "my_type" # If you need it to be called so
WordCloud.init()
with open("./csv-data/%s.csv" % index_name) as f:
reader = csv.DictReader(f)
bulk(
connections.get_connection(),
(WordCloud(**row).to_dict(True) for row in reader)
)
Please note, I haven't tried the code I've posted - just written it. Don't have an ES server at hand to test. There could be some small mistakes or typos there (please point out if there are), but the general idea should be correct.
Thanks. #drdaeman's Solution worked for me. Although, I thought it's worth mentioning that in elasticsearch-dsl 6+
class Meta:
index = "wordcloud_data"
doc_type = "my-type"
This snippet will raise cannot write to wildcard index exception. Change the following to,
class Index:
name = 'wordcloud_data'
doc_type = 'my_type'

Python json - editing a .json file

I am trying to add something to a .json file.
This is what saves
"106569102398611456" : {
"currentlocation" : "Pallet Town",
"name" : "Anthony",
"party" : [
{
"hp" : "5",
"level" : "1",
"pokemonname" : "bulbasaur"
}
],
"pokedollars" : 0
}
}
What I'm trying to do is make a command to add something else to the "party". Here is an example of what I want.
"106569102398611456" : {
"currentlocation" : "Pallet Town",
"name" : "Anthony",
"party" : [
{
"hp" : "5",
"level" : "1",
"pokemonname" : "bulbasaur"
},
{
"hp" : "3",
"level" : "1",
"pokemonname" : "squirtle"
}
],
"pokedollars" : 0
}
}
edit:
This is what I've attempted but I have no idea
def addPokemon(pokemon):
pokemonName = convert(pokemon)
for pokemon in players['party']:
pokemon.append(pokemonName)
convert(pokemon) basically grabs the pokemon i type in and change gives it a level and health to be added to the .json file
To update a JSON file, write out the object to a temporary file and then replace the target file with the temporary file. Example:
import json
import os
import shutil
import tempfile
def rewriteJsonFile(sourceObj, targetFilePath, **kwargs):
temp = tempfile.mkstemp()
tempHandle = os.fdopen(temp[0], 'w')
tempFilePath = temp[1]
json.dump(sourceObj, tempHandle, **kwargs)
tempHandle.close()
shutil.move(tempFilePath, targetFilePath)
This assumes that updates are happening serially. If updates are potentially happening in parallel, you'd need some kind of locking to ensure only one update is happening at a time. Although at that point, you're better off using a database like sqlite and returning queries in JSON format.

Categories

Resources