Work with nested objects using couchdb-python

Work with nested objects using couchdb-python - python

Disclaimer: Both Python and CouchDB are new for me. So far my "programming" has mostly consisted of Bash scripts.
I'm trying to create a small script that updates objects in a CouchDB database. The objects however aren't created by my script but by an App called Tap Forms that uses CouchDB for sync. Basically I'm trying to automatically update the content of the app. That also means I can't really influence the structure or names of the objects in CouchDB.
The Database is mostly filled with objects of this structure:
{
"_id": "rec-3b17...",
"_rev": "21-cdf6...",
"values": {
"fld-c3d4...": 4,
"fld-1def...": 1000000000000,
"fld-bb44...": 760000000000,
"fld-a44f...": "admin,name",
"fld-5fc0...": "SSD",
"fld-642c...": true,
},
"deviceName": "MacBook Air",
"dateModified": "2019-02-08T14:47:06.051Z",
"dateCreated": "2019-02-08T11:33:00.018Z",
"type": "frm-7ff3...",
"dbID": "db-1435...",
"form": "frm-7ff3..."
}
I shortened the numbers a bit and removed some entries to increase readability.
Now the actual values I'm trying to update are within the "values" : {...} array (or object, or list, guess I don't have much experience with JSON either).
As I know some of these values, I managed to create view that finds the _id of an object on the server. I then use the python-couchdb module as described in documentation:
for item in db.view('CustomViews/test2', key="GENERIC"):
doc = db[item.id]
This gives me the object. However I want to update one of the values within the values array, lets say fld-c3d4.... But how? Using doc['values'] = 'new_value' updates the whole array. I tried other (seemingly logical) ways along the lines of doc['values['fld-c3d4']'] = 'new_value' but couldn't wrap my head around it. I couldn't find an example in any documentation.

So here's a example how to update the fld-c3d4.
You have your document that represent a dictionary with nested dictionary.
If you want to get the values, you will do something like this:
values = doc['values']
Now the variable values points to the values in your document.
From there, you can access a sub value:
values['fld-c3d4'] = 'new value'
If you want to directly update the value from the doc, you just have to chain those operations:
doc['values']['fld-c3d4'] = 'new value'

Related

How to add document for search in marqo

i recently started using the marqo library and i am trying to add document so that marqo can search and return the relevant part of the document but i keep getting error when i run the the code.
i used the
add_document()
method and i pass the document as a string for search but it returns an error. Here is what my code look like;
import marqo
DOCUMENT = 'the document'
mq = marqo.Client(url='http://localhost:8882')
mq.index("my-first-index").add_documents(DOCUMENT)
and when i run it i get a
MarqoWebError

you are getting the error because the add_document() method takes a list of python dictionaries as an argument not a string, so you are to pass the document as a value to any key you assign to it. But it is advisable to add a title and also an id for later referencing. Here is what i mean;
mq.index("my-first-index").add_documents([
{
"Title": the_title_of_your_document,
"Description": your_document,
"_id": your_id,
}]
)
the id can be any string of your choice. You can add as many dictionaries as you want to the list, each dictionary represents a document.

I think the documents need to be a list of dicts. See here https://marqo.pages.dev/API-Reference/documents/

Firebase Realtime Database - cannot read document as json/dictionary

In my realtime database I have a path /stats which contains a set of documents.
I want to using the python sdk get the /stats document as a dict. My code looks like that
path = "/stats"
ref = db.reference(path, firebase_app)
document = ref.get()
print(document)
And the output is
[None, {'name': 'Full Time Statistics', 'thumbnail': 'https://***', 'url': 'https://***'}]
which is a list not a dictionary. How to change it and read this document path as a dictionary something like that
{"1": {'name': 'Full Time Statistics', 'thumbnail': 'https://***', 'url': 'https://***'}}
On the other hand I can get other documents with similar structures as a dictionary with no issue. Why is it like that and how to solve it ?

Two things are happening here:
Since you are retrieving /stats you are getting all nodes under it. Since this is a repeated list and Firebase Realtime Database keys are strings, you'd normally get a dictionary (with the keys in the dictionary being the keys in the JSON).
Since your keys are numeric values, Firebase "thinks" you are trying to store an array/list and it tries to coerce the data into an array for you. That's why you get a None entry in the list: that's Firebase filling in the zeroth element for you.
There's unfortunately no way to disable this array coercion. I typically get around it by prefixing the keys with a fixed string, so that Firebase bypasses its array logic. So:
stats: {
stat1: { ... },
stat2: { ... }
}
Also see:
Best Practices: Arrays in Firebase

Adding new values with same keys to existing document in Firestore firebase without overwriting

I am trying to add data to the Firestore database without overwriting it. The data is in the format written below and has numerous other "Question" in the same format and I want to add this to just one document.
{
"Question": String,
"Answer": String,
}
The same question has been asked here but it covers it in java and not in python. I have tried updating it and setting it but it has only been overwriting it.
Note that all of my Questions are elements in a list in this format:
['{\n "Question": String,\n "Answer":String \n}, ...]
What I am currently doing in my code is going through the array and performing the code below:
doc_ref = db.collection(u"Questions").document(u"ques")
doc_ref.update(questionsAnswers)
but this only leaves me with the last question added to the database.

Use the update method to change the contents of an existing document as shown in the documentation.
city_ref = db.collection(u'your-collection').document(u'your-document')
city_ref.update({u'your-field': u'your-field-value'})
I suggest also using the API documentation.

Structuring Firebase Database

I'm following this tutorial to structure Firebase data. Near the end, it says the following:
With this kind of structure, you should keep in mind to update the data at 2 locations under the user and group too. Also, I would like to notify you that everywhere on the Internet, the object keys are written like "user1","group1","group2" etc. where as in practical scenarios it is better to use firebase generated keys which look like '-JglJnGDXcqLq6m844pZ'. We should use these as it will facilitate ordering and sorting.
So based on that, I'm assuming that the final result should be the following:
I'm using this python wrapper to post the data.
How can I achieve this?

When you write data to a Firebase array (for example in Javascript) using a line like this
var newPostKey = firebase.database().ref().child('users').push().key;
var updates = {item1: value1, item2: value2};
return firebase.database().ref().update(updates);
Like is described here, you will get a generated key for data "pushed". In the example above newPostKey will contain this generated key
UPDATE
To answer the updated question with with the Python wrapper:
Look for the section "Saving Data" in the page you linked to.
The code would look something like this;
data = {"Title": "The Animal Book"}
book = db.child("AllBooks").push(data)
data = {"Title": "Animals"}
category = db.child("Categories").push(data)
data = {category['name']: true }
db.child("AllBooks").child(book['name']).child("categories").push(data)

What is the best way to search millions of JSON files?

I've very recently picked up programming in Python and am working on creating a database.
I've already worked out extracting all these files from their source so they are all in a directory on my computer.
All of these files are structured the same way and what I want to do is search these multidimensional dictionaries and locate the value for a specific set of keys.
These json files are all structured similarly,
{
"userid": 34535367,
"result": {
"list": [
{
"name": 264,
"age": 64,
"id": 456345345
},
{
"name": 263,
"age": 42,
"id": 364563463456
}
]
}
}
In my case, I would like to search for the "name" key and return the relevant data(quality, id and the original userid) for the thousands of names just like it from my millions of JSON files.
Basically I'm very new at this and the little programming knowledge I have is in Python. I'm happy to start learning whatever I need to, but I'm not sure which direction to go.

If your goal is to create a database, then you should look on how databases work and solve the same problem you are trying to solve right now :)
NoSQL databases (like mangodb) work also with json documents and implements most likely a whole set of tools to search and filter documents.
Now to answer your question, there is no quick way to do so unless you do some preprocessing, meaning that you store different information about the data (called metadata).
This is a huge subject and I don't have enough expertise to give you all the answers, but I can give you a simple tip: Use indexes.
An index is a sorted key/value map where for every value, we store the documents that contains that value (or the file + position of the Json document) . For example an index for the name property would like this:
{
263: ('jsonfile10.json', '0')
264: ('jsonfile10.json', '30'),
# The json document can be found on the jsonfile10.json file on line 30
}
By keeping an index for the most queried values, you can turn a linear time search into a logarithmic time search not to mention that inserting a new document is much faster. in your case, you seems to only need an index on the name field.
Creating/updating the index is done when you insert, update or remove a document. Using a balanced binary tree can accelerate the updates on the index.

As a suggestion, why don't you just process all the incoming files and insert the data into a database? You will have a toolset to query that database. SQLite for example will do (as well as any other more sophisticated database):
http://www.sqlite.org/
http://docs.python.org/2/library/sqlite3.html
Simple other solution might be to build a file mapping name_id to /file/path. Then you can logarithmically do a binary search by the name id. But I'd still advise using a proper database as maintaining the index will be more cumbersome than doing some inserts/deletes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.