How to set and print the value of json object in postgresql? - python

What I have done till now is :
DO
$do$
DECLARE
namz Text;
jsonObject json =
'{
"Name": "Kshitiz Kala",
"Education": "B.Tech",
"Skills": ["J2EE", "JDBC", "Html"]
}';
BEGIN
SELECT jsonObject->'Name' into namz;
select namz;
END
$do$
I am not finding any success here.
actual problem is I am passing a json Object to a stored procedure which will store the data in three different table 1) user table contains user_id, user_name, user_edu. 2) skill table contain skill_id, skill_name. 3) user_skill table contain id, user_id, usr_skill_id.
This is the json object I am passing from Django application
{"Name": "Kshitiz Kala", "Education": "B.Tech", "Skills": ["J2EE", "JDBC", "Html"]}

Your json code is fine, just check:
SELECT '{"Name": "Kshitiz Kala",
"Education": "B.Tech",
"Skills": ["J2EE", "JDBC", "Html"]
}'::json->'Name'
or in pgplsql:
DO
$do$
DECLARE
namz Text;
jsonObject json =
'{
"Name": "Kshitiz Kala",
"Education": "B.Tech",
"Skills": ["J2EE", "JDBC", "Html"]
}';
BEGIN
SELECT jsonObject->'Name' into namz;
raise 'JSON value Name is %', namz;
END
$do$
Problem is somewhere else. Eg. last line in this block won't do anything (SELECT namz in plpgpsql)

Related

Updating Jsontype column in table database

Trying to update Json column in table User with python script.
So, I have a list of UID(stored in uid_list variable), by this list of uid I would like to update following uids in database.
json_data = Column(JSONType) - column, that need to be updated, name and surname.
The data that stores in this column: {"view_data": {"active": false, "text": "", "link": "http://google.com/"}, "name": "John", "surname": "Black", "email": "john#gmail.com"}
def update_json_column_in_table_in_db_by_list_of_uid():
uid_list = ['25a00f0e-58a5-4356-8b91-b18ea2eed71d', '68ccc759-97ae-48a2-bc42-5c2f1fa7a0ba', '9e2ee469-f777-4622-bca1-68d924caed0f']
name = 'empty'
surname = 'empty2'
User.query.filter(User.uid.in_(uid_list)).update({User.json_data: name + surname})
You need to do two things:
use .where() instead of .filter()
use func.jsonb_set or func.json_set
from sqlalchemy import func
stmt = update(User).values(json_data=func.json_set(User.json_data, '{name}', name)).where(User.uid.in_(uid_list))

parsing JSON with missing fields

I have json array with very dynamic field and some of the array doesn't have all the fields.
Example :
[
{
"Name": "AFG LIMITED",
"Vendor ID": "008343",
"EGID": "67888",
"FID": "83748374"
},
{
"Name": "ABC LIMITED",
"Vendor ID": "008333",
"EGID": "67888",
"AID": "0000292"
"FID": "98979"
},
]
I need to extract particular key with header & pipe delimiter like :Name|Vendor ID|EGID|AID(only present in second array).if any key is not present then it should have null value
I was try to parse this with below code but it's breaking in the second line itself as AID is missing.
import json
with open("sample.json", "r") as rf:
decoded_data = json.load(rf)
# Check is the json object was loaded correctly
try:
for i in decoded_data:
print i["Name"],"|",i["Vendor ID"]"|",i["EGID"],"|",i["AId"]
except KeyError:
print(null)
output from above code:
AFG LIMITED|008343|67888|null

I am unable to find attribute values from JSON data in python

I want to find id and options in this JSON data.
Here's What I did so far.
data = """
"list": null,
"promotionID": "",
"isFreeShippingApplicable": true,
"html": "\n\n\n<div class=\"b-product-tile-price\">\n \n \n\n\n\n<span class=\"b-product-tile-price-outer\">\n <span class=\"b-product-tile-price-item\">\n 1200 €\n\n\n </span>\n</span>\n\n</div>\n\n"
},
"longDescription": "<ul>\n\t<li>STYLE: BQ4420-100</li>\n\t<li>Laufsohle: Gummi</li>\n\t<li>Obermaterial: beschichtetes Leder, Textil</li>\n\t<li>Innenmaterial: Textil</li>\n</ul>\n",
"shortDescription": null,
"availability": {
"messages": [
"Sofort lieferbar"
],
"inStockDate": null,
"custom": {
"code": null,
"label": null,
"orderable": true,
"sizeSelectable": true,
"badge": false
"""
find_values = json.loads(data)
id = find_values["id"]
variables = find_product_data["variables"]
print(id, variables)
The output is an erro but when I try to get the values of first the attribute action, it gets returned but not the others.
You can't access the id directly, because it is nested inside another dictionary. What you have to do is get that dict first and then access the id.
find_values = json.loads(data)
product = find_values["product"]
id_value = product("id")
If you are working with an IDE it could help to debug your code and see how the dict is actually nested.

MongoDB generating same ID between inserts

I am using pymongo and I am trying to insert dicts into mongodb database. My dictionaries look like this
{
"name" : "abc",
"Jobs" : [
{
"position" : "Systems Engineer (Data Analyst)",
"time" : [
"October 2014",
"May 2015"
],
"is_current" : 1,
"location" : "xyz",
"organization" : "xyz"
},
{
"position" : "Systems Engineer (MDM Support Lead)",
"time" : [
"January 2014",
"October 2014"
],
"is_current" : 1,
"location" : "xxx",
"organization" : "xxx"
},
{
"position" : "Asst. Systems Engineer (ETL Support Executive)",
"time" : [
"May 2012",
"December 2013"
],
"is_current" : 1,
"location" : "zzz",
"organization" : "xzx"
},
],
"location" : "Buffalo, New York",
"education" : [
{
"school" : "State University of New York at Buffalo - School of Management",
"major" : "Management Information Systems, General",
"degree" : "Master of Science (MS), "
},
{
"school" : "Rajiv Gandhi Prodyogiki Vishwavidyalaya",
"major" : "Electrical and Electronics Engineering",
"degree" : "Bachelor of Engineering (B.E.), "
}
],
"id" : "abc123",
"profile_link" : "example.com",
"html_source" : "<html> some_source_code </html>"
}
I am getting this error:
pymongo.errors.DuplicateKeyError: E11000 duplicate key error index:
Linkedin_DB.employee_info.$id dup key: { :
ObjectId('56b64f6071c54604f02510a8') }
When I run my program 1st document gets inserted properly but when I insert the second document I get this error. When I start my script again the document which was not inserted because of this error get inserted properly and error comes for next document and this continues.
Clearly mognodb is using the same objecID during two inserts. I don't understand why mongodb is failing to generate a unique ID for new documents.
My code to save passed data:
class Mongosave:
"""
Pass collection_name and dict data
This module stores the passed dict in collection
"""
def __init__(self):
self.connection = pymongo.MongoClient()
self.db = self.connection['Linkedin_DB']
def _exists(self, id):
#To check if user alredy exists
return True if list(self.collection.find({'id': id})) else False
def save(self, collection_name, data):
self.collection = self.db[collection_name]
if not self._exists(data['id']):
print (data['id'])
self.collection.insert(data)
else:
self.collection.update({'id':data['id']}, {"$set": data})
I can figure out why this is happening. Any help is appreciated.
The problem is that your save method is using a field called "id" to decide if it should do an insert or an upsert. You want to use "_id" instead. You can read about the _id field and index here. PyMongo automatically adds an _id to you document if one is not already present. You can read more about that here.
You might have inserted two copies of the same document into your collection in one run.
I cannot quite understand what do you mean by:
When I start my script again the document which was not inserted because of this error get inserted properly and error comes for next document and this continues.
What I do know is if you do:
from pymongo import MongoClient
client = MongoClient()
db = client['someDB']
collection = db['someCollection']
someDocument = {'x': 1}
for i in range(10):
collection.insert_one(someDocument)
You'll get a:
pymongo.errors.DuplicateKeyError: E11000 duplicate key error index:
This make me think although pymongo would generate a unique _id for you if you don't provide one, it is not guaranteed to be unique, especially if the document provided is not unique. Presumably pymongo is using some sort of hash algorithm on what you insert for their auto-gen _id without changing the seed.
Try generate your own _id and see if it would happen again.
Edit:
I just tried this and it works:
for i in range(10):
collection.insert_one({'x':1})
This make me think the way pymongo generates _id is associated with the object you feed into it, this time I'm not referencing to the same object anymore and the problem disappeared.
Are you giving your database two references of a same object?

Count a particular value from list in Python mongodb

I am experimenting with Python with MongoDB. I am a newbie with python. Here I get records from a collection and based on a particular value from that collection, I find the count of that record(from the 1st collection). But my problem is I cannot append this count into my list.
Here is the code:
#gen.coroutine
def post(self):
Sid = self.body['Sid']
alpha = []
test = db.student.find({"Sid": Sid})
count = yield test.count()
print(count)
for document in (yield test.to_list(length=1000)):
cursor = db.attendance.find({"StudentId": document.get('_id')})
check = yield cursor.count()
print(check)
alpha.append(document)
self.write(bson.json_util.dumps({"data": alpha}))
the displayed output alpha is from the first collection (student), the count value is from (attendance collection).
when I try to extend the list with check I end up with error
alpha.append(document.extend(check))
But I am getting the correct count value in python terminal, I am unable to write it along with the output.
My output is like
{"data": [{"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."}}, {"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."}}]}
My output should be like
{"data": [{"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."},"count": "5"}, {"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."},"count": "3"}]}
Please guide me on how I can get my desired output.
Thank you.
A better approach to this is to use the MongoDB .aggregate() method from the python driver you are using rather than repeated .find() and .count() operations:
db.attendance.aggregate([
{ "$group": {
"_id": "$StudentId",
"name": { "$first": "$Student Name" },
"count": { "$sum": 1 }
}}
])
Then it is already done for you.
What your current code is doing is looking up the current student and returning a "count" of how many occurances there are. And you are doing that for every student by the content of your output.
Rather than do that the data is "aggregated" to return both the values from the document along with a "count" within the returned results, and it is aggregated per student.
This means you don't need to run a query for each student just to get the count. Instead you just call the database "once" and make it count all the students you need in one result.
If you need more that one student but not all students then you filter that with query conditions;
db.attendance.aggregate([
{ "$match": { "StudentId": { "$in": list_of_student_ids } } },
{ "$group": {
"_id": "$StudentId",
"name": { "$first": "$Student Name" },
"count": { "$sum": 1 }
}}
])
And the selection along with the aggregation is done for you.
No need for looping code and lots of database request. The .aggregate() method and pipeline will do it for you.
Read the core documation on the Aggregation Pipeline.
Add count entry to the dictionary document and append the dictionary:
for document in (yield test.to_list(length=1000)):
cursor = db.attendance.find({"StudentId": document.get('_id')})
check = yield cursor.count()
document['count'] = check
alpha.append(document)

Categories

Resources