Append item to Mongo Array - python

I have a mongodb document I am trying to update. This answer was helpful, but every time I insert into the database, the data is inserted as an array inside of the array whereas I just want to insert the object directly into the array.
Here is what I am doing.
# My function to update the array
def append_site(gml_id, new_site):
col.update_one({'gml_id': gml_id}, {'$push': {'websites': new_site}}, upsert = True)
# My Dataframe
data = {'name':['ABC'],
'gml_id':['f9395e09'],
'url':['ABC.com']
}
df = pd.DataFrame(data)
# Grouping data for upsert
df = df.groupby(['gml_id']).apply(lambda x: x[['name','url']].to_dict('r')).reset_index().rename(columns={0:'websites'})
# Apply function to every row
df.apply(lambda row: append_site(row['gml_id'], row['websites']), axis = 1)
Here is the outcome:
{
"gml_id": "f9395e09",
"websites": [
{
"name": "XYZ.com",
"url": "...xyz.com"
},
[
{
"name": "ABC.com",
"url": "...abc.com"
}
]
]
}
Here is the goal:
{
"gml_id": "f9395e09",
"websites": [
{
"name": "XYZ.com",
"url": "...xyz.com"
},
{
"name": "ABC.com",
"url": "...abc.com"
}
]
}

Your issue is that the websites array is being appended with a list object rather than a dict, i.e. new_site is a list.
As you haven't posted where you call append_site(), this is a litle speculative, but you could try changing this line and seeing if it gives the effect you need.
col.update_one({'gml_id': gml_id}, {'$push': {'websites': new_site[0]}}, upsert = True)
Alternatively make sure you are passing a dict object to the function.

Instead of doing an unncessary groupby, I decided to leave the dataframe flat and then adjust the function like this:
def append_site(gml_id, name, url):
col.update_one({'gml_id': gml_id}, {'$push': {'websites': {'name': name, 'url': url}}}, upsert = True)
I now call it like this: df.apply(lambda row: append_site(row['gml_id'], row['url'], row['name']), axis = 1)
Works perfectly fine.

Related

Get first item from nested list in glom

from glom import glom, T
target = {
"items": [
{
"label": "valuation",
"value": [
"900 USD"
]
},]
}
spec = ('items',[T['value'][0]])
r = glom(target,spec)
print(r)
The above code returns a list, ['900 USD'] but I'd like to just get the content of that list, i.e the first item in the 'value' list. In this case the result should just be 900 USD
Part 2
from glom import glom, T, Check, SKIP
target = {
"items": [
{
"label": "valuation",
"value": [
"900 USD"
]
},
{
"label": "other_info",
"value": [
"700 USD"
]
},]
}
spec = ({
'answer': ('items', [Check('label', equal_to='valuation', default=SKIP)],([T['value'][0]]))
})
r = glom(target,spec)
print(r)
The above code results in {'answer': ['900 USD'] but I need to just return 900 USD.
Tried adding [0] at the end of the brackets but that didn't work.
Playing around with the T type also didn't result in what I'm looking for
I solved it by iterating over my result list and picking out the first element.
The following spec worked
spec = ({
'answer': ('items', [Check('label', equal_to='valuation', default=SKIP)],( [T['value'][0]] ,Iter().first()) )
})
Notice the Iter().first() function call was added.
spec = {
'answer': ('items', Iter().filter(
lambda x: x['label'] is 'valuation'
).map('value.0').first())
}
Note, this is a streaming solution, thus performing better for larger datasets.
A string spec -- here 'value.0', the argument to the map method -- understands indexing into deep lists.
Depending on the specific logic that is desired this might work as well:
spec = {
'answer': ('items', Iter().first(
lambda x: x['label'] is 'valuation'
), 'value.0')
}
Here, we have combined filter logic with the restriction of a single result.

The best way to transform a response to a json format in the example

Appreciate if you could help me for the best way to transform a result into json as below.
We have a result like below, where we are getting an information on the employees and the companies. In the result, somehow, we are getting a enum like T, but not for all the properties.
[ {
"T.id":"Employee_11",
"T.category":"Employee",
"node_id":["11"]
},
{
"T.id":"Company_12",
"T.category":"Company",
"node_id":["12"],
"employeecount":800
},
{
"T.id":"id~Employee_11_to_Company_12",
"T.category":"WorksIn",
},
{
"T.id":"Employee_13",
"T.category":"Employee",
"node_id":["13"]
},
{
"T.id":"Parent_Company_14",
"T.category":"ParentCompany",
"node_id":["14"],
"employeecount":900,
"childcompany":"Company_12"
},
{
"T.id":"id~Employee_13_to_Parent_Company_14",
"T.category":"Contractorin",
}]
We need to transform this result into a different structure and grouping based on the category, if category in Employee, Company and ParentCompany, then it should be under the node_properties object, else, should be in the edge_properties. And also, apart from the common properties(property_id, property_category and node), different properties to be added if the category is company and parent company. There are few more logic also where we have to get the from and to properties of the edge object based on the 'to' . the expected response is,
"node_properties":[
{
"property_id":"Employee_11",
"property_category":"Employee",
"node":{node_id: "11"}
},
{
"property_id":"Company_12",
"property_category":"Company",
"node":{node_id: "12"},
"employeecount":800
},
{
"property_id":"Employee_13",
"property_category":"Employee",
"node":{node_id: "13"}
},
{
"property_id":"Company_14",
"property_category":"ParentCompany",
"node":{node_id: "14"},
"employeecount":900,
"childcompany":"Company_12"
}
],
"edge_properties":[
{
"from":"Employee_11",
"to":"Company_12",
"property_id":"Employee_11_to_Company_12",
},
{
"from":"Employee_13",
"to":"Parent_Company_14",
"property_id":"Employee_13_to_Parent_Company_14",
}
]
In java, we have used the enhanced for loop, switch etc. How we can write the code in the python to get the structure as above from the initial result structure. ( I am new to python), thank you in advance.
Regards
Here is a method that I quickly made, you can adjust it to your requirements. You can use regex or your own function to get the IDs of the edge_properties then assign it to an object like the way I did. I am not so sure of your full requirements but if that list that you gave is all the categories then this will be sufficient.
def transform(input_list):
node_properties = []
edge_properties = []
for input_obj in input_list:
# print(obj)
new_obj = {}
if input_obj['T.category'] == 'Employee' or input_obj['T.category'] == 'Company' or input_obj['T.category'] == 'ParentCompany':
new_obj['property_id'] = input_obj['T.id']
new_obj['property_category'] = input_obj['T.category']
new_obj['node'] = {input_obj['node_id'][0]}
if "employeecount" in input_obj:
new_obj['employeecount'] = input_obj['employeecount']
if "childcompany" in input_obj:
new_obj['childcompany'] = input_obj['childcompany']
node_properties.append(new_obj)
else: # You can do elif == to as well based on your requirements if there are other outliers
# You can use regex or whichever method here to split the string and add the values like above
edge_properties.append(new_obj)
return [node_properties, edge_properties]

A efficient way to unpack nested json into a dataframe

I have a nested json, and i want to transform it into a pandas dataframe. I was able to normalize with json_normalize.
However, there are still json layer within the dataframe, which i also want to unpack. How can i do it in the best way? I will likely have to deal with this a few more times within the project i am doing currently
The json i have is the following
{
"data": {
"allOpportunityApplication": {
"data": [
{
"id": "111111111",
"opportunity": {
"programme": {
"short_name": "XX"
}
},
"person": {
"home_lc": {
"name": "NAME"
}
},
"standards": [
{
"constant_name": "constant1",
"standard_option": {
"option": "true"
}
},
{
"constant_name": "constant2",
"standard_option": {
"option": "true"
}
}
]
}
]
}
}
}
Used json_normalize
standards_df = json_normalize(
standard_json['allOpportunityApplication']['data'],
record_path=['standards'],
meta=['id','person','opportunity']
)
with that i get a dataframe with the columns: constant_name, standard_option, id, person, opportunity. The problem is that the data standard_option, person and opportunity are json, with a single option inside.
The current ouput and expected output for each column is as follow
Standard_option
Currently an item in the column "standard_option" looks like:
{'option': 'true'}
I want it to be just true
Person
Currently an item in the column "person" looks like:
{'programme': {'short_name': 'XX'}}
I want it to look like: XX
Opportunity
Currently an item in the column "opportunity" looks like:
{'home_lc': {'name': 'NAME'}}
I want it to look like: NAME
Might not be the best way, but I think it works.
standards_df['person'] = (standards_df.loc[:, 'person']
.apply(lambda x: x['home_lc']['name']))
standards_df['opportunity'] = (standards_df.loc[:, 'opportunity']
.apply(lambda x: x['programme']['short_name']))
constant_name standard_option.option id person opportunity
0 constant1 true 111111111 NAME XX
1 constant2 true 111111111 NAME XX
standard_option was already fine when I run your code

Adding key to values in json using Python

This is the structure of my JSON:
"docs": [
{
"key": [
null,
null,
"some_name",
"12345567",
"test_name"
],
"value": {
"lat": "29.538208354844658",
"long": "71.98762580927113"
}
},
I want to add the keys to the key list. This is what I want the output to look like:
"docs": [
{
"key": [
"key1":null,
"key2":null,
"key3":"some_name",
"key4":"12345567",
"key5":"test_name"
],
"value": {
"lat": "29.538208354844658",
"long": "71.98762580927113"
}
},
What's a good way to do it. I tried this but doesn't work:
for item in data['docs']:
item['test'] = data['docs'][3]['key'][0]
UPDATE 1
Based on the answer below, I have tweaked the code to this:
for number, item in enumerate(data['docs']):
# pprint (item)
# print item['key'][4]
newdict["key1"] = item['key'][0]
newdict["yek1"] = item['key'][1]
newdict["key2"] = item['key'][2]
newdict["yek2"] = item['key'][3]
newdict["key3"] = item['key'][4]
newdict["latitude"] = item['value']['lat']
newdict["longitude"] = item['value']['long']
This creates the JSON I am looking for (and I can eliminate the list I had previously). How does one make this JSON persist outside the for loop? Outside the loop, only the last value from the dictionary is added otherwise.
In your first block, key is a list, but in your second block it's a dict. You need to completely replace the key item.
newdict = {}
for number,item in enumerate(data['docs']['key']):
newdict['key%d' % (number+1)] = item
data['docs']['key'] = newdict

How to read this JSON into dataframe with specfic dataframe format

This is my JSON string, I want to make it read into dataframe in the following tabular format.
I have no idea what should I do after pd.Dataframe(json.loads(data))
JSON data, edited
{
"data":[
{
"data":{
"actual":"(0.2)",
"upper_end_of_central_tendency":"-"
},
"title":"2009"
},
{
"data":{
"actual":"2.8",
"upper_end_of_central_tendency":"-"
},
"title":"2010"
},
{
"data":{
"actual":"-",
"upper_end_of_central_tendency":"2.3"
},
"title":"longer_run"
}
],
"schedule_id":"2014-03-19"
}
That's a somewhat overly nested JSON. But if that's what you have to work with, and assuming your parsed JSON is in jdata:
datapts = jdata['data']
rownames = ['actual', 'upper_end_of_central_tendency']
colnames = [ item['title'] for item in datapts ] + ['schedule_id' ]
sched_id = jdata['schedule_id']
rows = [ [item['data'][rn] for item in datapts ] + [sched_id] for rn in rownames]
df = pd.DataFrame(rows, index=rownames, columns=colnames)
df is now:
If you wanted to simplify that a bit, you could construct the core data without the asymmetric schedule_id field, then add that after the fact:
datapts = jdata['data']
rownames = ['actual', 'upper_end_of_central_tendency']
colnames = [ item['title'] for item in datapts ]
rows = [ [item['data'][rn] for item in datapts ] for rn in rownames]
d2 = pd.DataFrame(rows, index=rownames, columns=colnames)
d2['schedule_id'] = jdata['schedule_id']
That will make an identical DataFrame (i.e. df == d2). It helps when learning pandas to try a few different construction strategies, and get a feel for what is more straightforward. There are more powerful tools for unfolding nested structures into flatter tables, but they're not as easy to understand first time out of the gate.
(Update) If you wanted a better structuring on your JSON to make it easier to put into this format, ask pandas what it likes. E.g. df.to_json() output, slightly prettified:
{
"2009": {
"actual": "(0.2)",
"upper_end_of_central_tendency": "-"
},
"2010": {
"actual": "2.8",
"upper_end_of_central_tendency": "-"
},
"longer_run": {
"actual": "-",
"upper_end_of_central_tendency": "2.3"
},
"schedule_id": {
"actual": "2014-03-19",
"upper_end_of_central_tendency": "2014-03-19"
}
}
That is a format from which pandas' read_json function will immediately construct the DataFrame you desire.

Categories

Resources