Django: How to populate database from JSON

Django: How to populate database from JSON - python

I would like to populate my model's DB from this JSON:
{
"pk": 1,
"model": "companies.Company",
"name": "Google",
"site": "google.com"
}
}{
"pk": 2,
"model": "companies.Company",
"fields": {
"name": "Zoho",
"site": "zoho.com",
}
}{
"pk": 3,
"model": "companies.Company",
"fields": {
"name": "Digg",
"site": "digg.com",
}
}{
I've made my JSON like how the documentation describes but I'm not sure what to do from here!
If anyone knows what I have to do from here I would love some help! Happy to answer any questions about this!
EDIT:
I was told to run
./manage.py loaddata companies.json
When I ran that I got:
' django.core.serializers.base.DeserializationError: Problem
installing fixture 'PATH_TO_FILE/companies/fixtures/companies.json':
Extra data: line 21 column 2 - line 5586860 column 6 (char 909 -
249730297)'
"line 21 column 2 - line 5586860 column 6 (char 909 -249730297)" Being the last character in the file. I also tried removing one whole entry to the model(to eliminate that maybe the last entry was messed up), but I again got the same error with the error referring to the last character in the file again.
EDIT 2
Lines 20, and 21 are simply where the first entry ends and the second one begins(Line 20 is the last line in the example shown in the example above):
Line 20: " }"
Line 21: "}{"
P.S. The reason why it's line 20 and 21 is that there are actually more fields than; name, and site, the one's shown in the question.

That's not valid JSON; you can't have a close brace immediately followed by an open brace. You need a comma between them, but in order for that to be valid you'd need the whole file to be enclosed in [...].

With that file inside your "companies/fixtures" directory, you should just have to run
./manage.py loaddata your-fixture-filename.json
And fixed JSON from your example:
[
{
"pk": 1,
"model": "companies.Company",
"fields": {
"name": "Google",
"site": "google.com"
}
},
{
"pk": 2,
"model": "companies.Company",
"fields": {
"name": "Zoho",
"site": "zoho.com"
}
},
{
"pk": 3,
"model": "companies.Company",
"fields": {
"name": "Digg",
"site": "digg.com"
}
}
]

Related

What happens to a $match term in a pipeline?

I'm a newbie to MongoDB and Python scripts. I'm confused how a $match term is handled in a pipeline.
Let's say I manage a library, where books are tracked as JSON files in a MongoDB. There is one JSON for each copy of a book. The book.JSON files look like this:
{
"Title": "A Tale of Two Cities",
"subData":
{
"status": "Checked In"
...more data here...
}
}
Here, status will be one string from a finite set of strings, perhaps just: { "Checked In", "Checked Out", "Missing", etc. } But also note also that there may not be a status field at all:
{
"Title": "Great Expectations",
"subData":
{
...more data here...
}
}
Okay: I am trying to write a MongoDB pipeline within a Python script that does the following:
For each book in the library:
Groups and counts the different instances of the status field
So my target output from my Python script would be something like this:
{ "A Tale of Two Cities" 'Checked In' 3 }
{ "A Tale of Two Cities" 'Checked Out' 4 }
{ "Great Expectations" 'Checked In' 5 }
{ "Great Expectations" '' 7 }
Here's my code:
mydatabase = client.JSON_DB
mycollection = mydatabase.JSON_all_2
listOfBooks = mycollection.distinct("bookname")
for book in listOfBooks:
match_variable = {
"$match": { 'Title': book }
}
group_variable = {
"$group":{
'_id': '$subdata.status',
'categories' : { '$addToSet' : '$subdata.status' },
'count': { '$sum': 1 }
}
}
project_variable = {
"$project": {
'_id': 0,
'categories' : 1,
'count' : 1
}
}
pipeline = [
match_variable,
group_variable,
project_variable
]
results = mycollection.aggregate(pipeline)
for result in results:
print(str(result['Title'])+" "+str(result['categories'])+" "+str(result['count']))
As you can probably tell, I have very little idea what I'm doing. When I run the code, I get an error because I'm trying to reference my $match term:
Traceback (most recent call last):
File "testScript.py", line 34, in main
print(str(result['Title'])+" "+str(result['categories'])+" "+str(result['count']))
KeyError: 'Title'
So a $match term is not included in the pipeline? Or am I not including it in the group_variable or project_variable ?
And on a general note, the above seems like a lot of code to do something relatively easy. Does anyone see a better way? Its easy to find simple examples online, but this is one step of complexity away from anything I can locate. Thank you.

Here's one aggregation pipeline to "$group" all the books by "Title" and "subData.status".
db.collection.aggregate([
{
"$group": {
"_id": {
"Title": "$Title",
"status": {"$ifNull": ["$subData.status", ""]}
},
"count": { "$count": {} }
}
},
{ // not really necessary, but puts output in predictable order
"$sort": {
"_id.Title": 1,
"_id.status": 1
}
},
{
"$replaceWith": {
"$mergeObjects": [
"$_id",
{"count": "$count"}
]
}
}
])
Example output for one of the "books":
{
"Title": "mumblecore",
"count": 3,
"status": ""
},
{
"Title": "mumblecore",
"count": 3,
"status": "Checked In"
},
{
"Title": "mumblecore",
"count": 8,
"status": "Checked Out"
},
{
"Title": "mumblecore",
"count": 6,
"status": "Missing"
}
Try it on mongoplayground.net.

How to restructure a collection in MongoDB

I'm looking to restructure my MongoDB collection and haven't been able to do so. I'm quite new to it and looking for some help. I'm struggling to access move the data within the "itemsList" field.
My collection documents are currently structured like this:
{
"_id": 1,
"pageName": "List of Fruit",
"itemsList":[
{
"myID": 101,
"itemName": "Apple"
},
{
"myID": 102,
"itemName": "Orange"
}
]
},
{
"_id": 2,
"pageName": "List of Computers",
"itemsList":[
{
"myID": 201,
"itemName": "MacBook"
},
{
"myID": 202,
"itemName": "Desktop"
}
]
}
The end result
But I would like the data to be restructured so that the value for "itemName" is it's own document.
I would also like to change the name of "myID" to "itemID".
And save the new documents to another collection.
{
"_id": 1,
"itemName": "Apple",
"itemID": 101,
"pageName": "List of Fruit"
},
{
"_id": 2,
"itemName": "Orange",
"itemID": 102,
"pageName": "List of Fruit"
},
{
"_id": 3,
"itemName": "MacBook",
"itemID": 201,
"pageName": "List of Computers"
},
{
"_id": 4,
"itemName": "Desktop",
"itemID": 202,
"pageName": "List of Computers"
}
What I've tried
I have tried using MongoDB's aggregate functionality, but because there are multiple "itemName" fields in each document, it will add both of them to one Array - instead of one in each document.
db.collection.aggregate([
{$Project:{
itemName: "$itemsList.itemName",
itemID: "$itemsList.otherID",
pageName: "$pageName"
}},
{$out: "myNewCollection"}
])
I've also tried using PyMongo 3.x to loop through the document's fields and save as a new document, but haven't been successful.
Ways to implement it
I'm open to using MongoDB's aggregate functionality, if it can move these items to their own documents, or a Python script (3.x) - or any other means you think can help.
Thanks in advance for your help!

You just need a $unwind to "break" the array. Then you can do some data wrangling and output to your collection.
Note that as you didn't specify the exact requirement for the _id. You might need to take extra handling. Below demonstration use the native _id generation, which will auto assigned ObjectIds.
db.collection.aggregate([
{
"$unwind": "$itemsList"
},
{
"$project": {
"_id": 0,
"itemName": "$itemsList.itemName",
"itemID": "$itemsList.myID",
"pageName": "$pageName"
}
},
{
$out: "myNewCollection"
}
])
Here is the Mongo playground for your reference.

Parse JSON from URL and skip first line with Python

I have a URL which contains some JSON data. I would like to parse this data and convert to a dictionary using Python. The first line of the data on the webpage is not in JSON format, so I would like to skip the first line before parsing. The data on the webpage looks like the following:
expected 1 issue, got 1
{
"Issues": [
{
"issue": {
"assignedTo": {
"iD": "2",
"name": "industry"
},
"count": "1117",
"logger": "errors",
"metadata": {
"function": "_execute",
"type": "IntegrityError",
"value": "duplicate key value violates unique constraint \nDETAIL: Key (id, date, reference)=(17, 2020-08-03, ZER) already exists.\n"
},
"stats": {},
"status": "unresolved",
"type": "error"
},
"Events": [
{
"message": "Unable to record contract details",
"tags": {
"environment": "worker",
"handled": "yes",
"level": "error",
"logger": "errors",
"mechanism": "logging",
},
"Messages": null,
"Stacktraces": null,
"Exceptions": null,
"Requests": null,
"Templates": null,
"Users": null,
"Breadcrumbs": null,
"Context": null
},
],
"fetch_time": "2020-07-20"
}
]
}
And I have tried running this script:
with urllib.request.urlopen("[my_url_here]") as url:
if(url.getcode()==200):
for _ in range(1):
next(url)
data = url.read()
json=json.loads(data)
else:
print("Error receiving data", url.getcode())
But am met with the error:
Traceback (most recent call last):
File "<stdin>", line 6, in <module>
File
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I get the same error when I run it without using
for _ in range(2):
next(url)
... but with the last line as 'Expecting value: line 2 column 1 (char 1)'.
Any advice? Thanks

You can Remove the First line by the following code.
Code:
data = ''.join(data.split('\n')[1:])
print(data)
Output:
{ "Issues": [ { "issue": { "assignedTo": { "iD": "2", "name": "industry" }, "count": "1117", "logger": "errors", "metadata": { "function": "_execute", "type": "IntegrityError", "value": "duplicate key value violates unique constraint DETAIL: Key (id, date, reference)=(17, 2020-08-03, ZER) already exists." }, "stats": {}, "status": "unresolved", "type": "error" }, "Events": [ { "message": "Unable to record contract details", "tags": { "environment": "worker", "handled": "yes", "level": "error", "logger": "errors", "mechanism": "logging", }, "Messages": null, "Stacktraces": null, "Exceptions": null, "Requests": null, "Templates": null, "Users": null, "Breadcrumbs": null, "Context": null }, ], "fetch_time": "2020-07-20" } ]}
As you can see that the we achieved removing first line. But your Parsed Json response have issues. It is not properly formatted. Take a look on the below image.
On the crossed lines we got extra comma letting know the parser there are more instances left still but your response doesn't have any more instance on that scope. So please check your code which used to convert your data to json. If you have doubt please write here. For validating your json you can check on https://jsonlint.com/
I hope it would be helpful... :)

you can try to load the json like this:
json.loads(data.split("\n",1)[1])
this will split the string at the first endline and use the second part of it.
However I discourage this as you can't be sure your server will always reply like this - try to fix the endpoint or find one that returns a valid json reply if you can.
you will still get a json.decoder.JSONDecodeError: Invalid control character at: line 14 column 68 (char 336) because of that \n in the data.

Django loaddata ignore existing objects

I have a fixture with list of entries. eg:
[
{
"fields": {
"currency": 1,
"price": "99.99",
"product_variant": 1
},
"model": "products.productprice",
"pk": 1
},
{
"fields": {
"currency": 2,
"price": "139.99",
"product_variant": 1
},
"model": "products.productprice",
"pk": 2
}
]
This is only initial data for each entry (The price might change). I would like to be able to add another entry to that fixture and load it with loaddata but without updating entries that already exist in the database.
Is there any way to do that? Something like --ignorenonexistent but for existing entries.

If you keep pk in the json like that, you will always overwrite the first two records in product.productprice.
I would use "pk: null".
This way, you will always create new record with every load.
So if you want to create a new price:
[
{
"fields": {
"currency": 1,
"price": "99.99",
"product_variant": 1
},
"model": "products.productprice",
"pk": 1
},
{
"fields": {
"currency": 2,
"price": "139.99",
"product_variant": 1
},
"model": "products.productprice",
"pk": 2
},
{
"fields": {
"currency": 4,
"price": "9.99",
"product_variant": 1
},
"model": "products.productprice",
"pk": null
}
]
The first two records are always be the same, but if you already added a third one ( pk:3 ) with the last section you will create a new productprice with pk: 4.
BTW: if your currency field is an other primary key(with autoincrement), you can put "null" there too, a new primary key will be generated.

invalid model identifier: 'sites.site'

When I try to run manage.py syncdb, I get this error:
django.core.serializers.base.DeserializationError: Problem installing fixture '/Users/cbplusd/repos/wibo1/fixtures/initial_data.json': Invalid model identifier: 'sites.site'
My installed_apps in my settings.py no longer contains django.contrib.sites, and my .json file looks like this:
[
{
"pk": 1,
"model": "sites.site",
"fields": {
"domain": "localhost:8000",
"name": "wibo"
}
},
{
"pk": 2,
"model": "sites.site",
"fields": {
"domain": "example.com",
"name": "wibo"
}
}
]
Does anyone know what I'm doing wrong?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django: How to populate database from JSON - python

That's not valid JSON; you can't have a close brace immediately followed by an open brace. You need a comma between them, but in order for that to be valid you'd need the whole file to be enclosed in [...].

Related

What happens to a $match term in a pipeline?

How to restructure a collection in MongoDB

Parse JSON from URL and skip first line with Python

Django loaddata ignore existing objects

invalid model identifier: 'sites.site'

Categories

Resources