Altering JSON array using python

Altering JSON array using python - python

This is the way reading from a .json file on ubuntu terminal:
python -c "import json;print json.loads(open('json_file.json', 'r').read())['foo']['bar']"
What I'd like to do is altering the JSON file, adding new objects and arrays. So how to do this in python?
json_file.json:
{
"data1" :
[
{
"unit" : "Unit_1",
"value" : "20"
},
{
"unit" : "Unit_2",
"value" : "10"
}
]
}

First of all, create a new python file.
import json
data = json.loads(open('json_file.json', 'r').read())
The data is then just a bunch of nested dictionaries and lists.
You can modify it the same way you would modify any python dictionary and list; it shouldn't be hard to find a resource on this as it is one of the most basic python functionalities. You can find a complete reference at the official python documentation, and if you are familiar with arrays/lists and associative arrays/hashes in any language, this should be enough to get you going. If it's not, you can probably find a tutorial and if that doesn't help, if you are able to create a well-formed specific question then you could ask it here.
once you are done, you can put everything back into json:
print json.dumps(data)
For more information on how to customize the output, and about the json module overall, see the documentation.

Related

Adding new values with same keys to existing document in Firestore firebase without overwriting

I am trying to add data to the Firestore database without overwriting it. The data is in the format written below and has numerous other "Question" in the same format and I want to add this to just one document.
{
"Question": String,
"Answer": String,
}
The same question has been asked here but it covers it in java and not in python. I have tried updating it and setting it but it has only been overwriting it.
Note that all of my Questions are elements in a list in this format:
['{\n "Question": String,\n "Answer":String \n}, ...]
What I am currently doing in my code is going through the array and performing the code below:
doc_ref = db.collection(u"Questions").document(u"ques")
doc_ref.update(questionsAnswers)
but this only leaves me with the last question added to the database.

Use the update method to change the contents of an existing document as shown in the documentation.
city_ref = db.collection(u'your-collection').document(u'your-document')
city_ref.update({u'your-field': u'your-field-value'})
I suggest also using the API documentation.

Work with nested objects using couchdb-python

Disclaimer: Both Python and CouchDB are new for me. So far my "programming" has mostly consisted of Bash scripts.
I'm trying to create a small script that updates objects in a CouchDB database. The objects however aren't created by my script but by an App called Tap Forms that uses CouchDB for sync. Basically I'm trying to automatically update the content of the app. That also means I can't really influence the structure or names of the objects in CouchDB.
The Database is mostly filled with objects of this structure:
{
"_id": "rec-3b17...",
"_rev": "21-cdf6...",
"values": {
"fld-c3d4...": 4,
"fld-1def...": 1000000000000,
"fld-bb44...": 760000000000,
"fld-a44f...": "admin,name",
"fld-5fc0...": "SSD",
"fld-642c...": true,
},
"deviceName": "MacBook Air",
"dateModified": "2019-02-08T14:47:06.051Z",
"dateCreated": "2019-02-08T11:33:00.018Z",
"type": "frm-7ff3...",
"dbID": "db-1435...",
"form": "frm-7ff3..."
}
I shortened the numbers a bit and removed some entries to increase readability.
Now the actual values I'm trying to update are within the "values" : {...} array (or object, or list, guess I don't have much experience with JSON either).
As I know some of these values, I managed to create view that finds the _id of an object on the server. I then use the python-couchdb module as described in documentation:
for item in db.view('CustomViews/test2', key="GENERIC"):
doc = db[item.id]
This gives me the object. However I want to update one of the values within the values array, lets say fld-c3d4.... But how? Using doc['values'] = 'new_value' updates the whole array. I tried other (seemingly logical) ways along the lines of doc['values['fld-c3d4']'] = 'new_value' but couldn't wrap my head around it. I couldn't find an example in any documentation.

So here's a example how to update the fld-c3d4.
You have your document that represent a dictionary with nested dictionary.
If you want to get the values, you will do something like this:
values = doc['values']
Now the variable values points to the values in your document.
From there, you can access a sub value:
values['fld-c3d4'] = 'new value'
If you want to directly update the value from the doc, you just have to chain those operations:
doc['values']['fld-c3d4'] = 'new value'

Troubleshoot JSON Parsing/Adding Property

I have a json whose first few lines are:
{
"type": "Topology",
"objects": {
"counties": {
"type": "GeometryCollection",
"bbox": [-179.1473399999999, 17.67439566600018, 179.7784800000003, 71.38921046500008],
"geometries": [{
"type": "MultiPolygon",
"id": 53073,
"arcs": [
[
[0, 1, 2]
]
]
},
I built a python dictionary from that data as follows:
import json
with open('us.json') as f:
data = json.load(f)
It's a very long json (each county in the US). Yet when I run: len(data) it returns 4. I was a bit confused by that. So I set out to probe further and explore the data:
data['id']
data['geometry']
both of which return key errors. Yet I know that this json file is defined for those properties. In fact, that's all the json is, its the id for each county 'id' and a series of polygon coordinates for each county 'geometry'. Entering data does indeed return the whole json, and I can see the properties that way, but that doesn't help much.
My ultimate aim is to add a property to the json file, somewhat similar to this:
Add element to a json in python
The difference is I'm adding a property that is from a tsv. If you'd like all the details you may find my json and tsv here:
https://gist.github.com/diggetybo/ca9d3c2fed76ddc7185cf966a65b8718
For clarity, let me summarize what I'm asking:
My question is: Why can't I access the properties in the above way? Can someone provide a way to access the properties I'm interested in ('id','geometries') Or better yet, demonstrate how to add a property?
Thank you

json.load
Deserialize fp (a .read()-supporting file-like object containing a
JSON document) to a Python object using this conversion table.
[] are for lists and {} are for dictionaries.So this is an example to get id:
with open("us.json") as f:
c=json.load(f)
for i in c["objects"]["counties"]["geometries"]:
print i["id"]
And the structure of your data is like this:
{
"type":"xx",
"objects":"xx",
"arcs":"xx",
"transform":"xx"
}
So the length of data is 4.You can append data or add a new element just like using list and dict.See more details from Json.
Hope this helps.

Pythonic way to import multiple dictionaries from text file

So I have a text file,
question_one = {question:"what is 2+2", answer: "4", fake1: "5"}
question_two = {question:"what is the meaning of life?", answer:"pizza", fake:"42"}
How can I then import these dictionaries so that I could use them like this,
print(question_one["question"])
print(question_two["question"])
So the out come would be
what is 2+2
what is the meaning of life?
I would like this so that I can add questions to a text file from within the program and then save them should I add more, If this is possible another way please let me know!

The simplest way would be to store your questions into a JSON file, like #Thom Wiggers is suggesting.
Here's an example:
[
{
"question": "what is 2+2",
"answer": "4",
"fake1": "5"
},
{
"question": "what is the meaning of life?",
"answer": "pizza",
"fake1": "42"
}
]
import json
with open('questions.json') as f:
questions = json.load(f)
for question in questions:
print(question['question'])
You can read more about the JSON module in the official documentation.

If you only want to serialize data, you want to use pickle or json. exec will execute all Python code, and can be a serious security problem.
pickle is faster, and is specificity tailored to Python, while json can be read & written work by just about any programming language, and is still fairly human-readable & human-editable.
Now, to answer the question as you asked it (you probably don't want to do this):
You can use exec()
This function supports dynamic execution of Python code. object must
be either a string or a code object. If it is a string, the string is
parsed as a suite of Python statements which is then executed (unless
a syntax error occurs).
ie.
exec(open('data.txt', 'r').read())
Another way to do is would be to (ab)use import, assuming your file is named data.py:
import data
data.question_one['question']
This is obviously not what import was intended for... I've 'used' import like this in the past, and regretted it (there are a number of caveats, I'll leave it as an exercise to the reader to think about what they might be).
Warning Both are eval-like statements, and should be used with care, any Python code in data.txt will be executed, which may be potentially dangerous. Be very sure you trust the source of whatever you pass to exec(), and don't use if you only want to serialize data (instead of running Python code as such).

What is the best way to search millions of JSON files?

I've very recently picked up programming in Python and am working on creating a database.
I've already worked out extracting all these files from their source so they are all in a directory on my computer.
All of these files are structured the same way and what I want to do is search these multidimensional dictionaries and locate the value for a specific set of keys.
These json files are all structured similarly,
{
"userid": 34535367,
"result": {
"list": [
{
"name": 264,
"age": 64,
"id": 456345345
},
{
"name": 263,
"age": 42,
"id": 364563463456
}
]
}
}
In my case, I would like to search for the "name" key and return the relevant data(quality, id and the original userid) for the thousands of names just like it from my millions of JSON files.
Basically I'm very new at this and the little programming knowledge I have is in Python. I'm happy to start learning whatever I need to, but I'm not sure which direction to go.

If your goal is to create a database, then you should look on how databases work and solve the same problem you are trying to solve right now :)
NoSQL databases (like mangodb) work also with json documents and implements most likely a whole set of tools to search and filter documents.
Now to answer your question, there is no quick way to do so unless you do some preprocessing, meaning that you store different information about the data (called metadata).
This is a huge subject and I don't have enough expertise to give you all the answers, but I can give you a simple tip: Use indexes.
An index is a sorted key/value map where for every value, we store the documents that contains that value (or the file + position of the Json document) . For example an index for the name property would like this:
{
263: ('jsonfile10.json', '0')
264: ('jsonfile10.json', '30'),
# The json document can be found on the jsonfile10.json file on line 30
}
By keeping an index for the most queried values, you can turn a linear time search into a logarithmic time search not to mention that inserting a new document is much faster. in your case, you seems to only need an index on the name field.
Creating/updating the index is done when you insert, update or remove a document. Using a balanced binary tree can accelerate the updates on the index.

As a suggestion, why don't you just process all the incoming files and insert the data into a database? You will have a toolset to query that database. SQLite for example will do (as well as any other more sophisticated database):
http://www.sqlite.org/
http://docs.python.org/2/library/sqlite3.html
Simple other solution might be to build a file mapping name_id to /file/path. Then you can logarithmically do a binary search by the name id. But I'd still advise using a proper database as maintaining the index will be more cumbersome than doing some inserts/deletes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.