export json data to csv from mongodb

export json data to csv from mongodb - python

I am having the problem with missing field name in python script when exported data to csv from mongodb. type field name exists in first record, but it does not appear in the rest of records. how to write python script to give null value for type field if it does not exist.
the sample of Mongodb collection:
"stages": [
{
"interview": false,
"hmNotification": false,
"hmStage": false,
"type": "new",
"isEditable": false,
"order": 0,
"name": {
"en": "New"
},
"stageId": "51d1a2f4c0d9887b214f3694"
},
{
"interview": false,
"hmNotification": true,
"isEditable": true,
"order": 1,
"hmStage": true,
"name": {
"en": "Pre-Screen"
},
"stageId": "51f0078d7297363f62059699"
},
{
"interview": false,
"hmNotification": false,
"hmStage": false,
"isEditable": true,
"order": 2,
"name": {
"en": "Phone Screen"
},
"stageId": "51d1a326c0d9887721778eae"
}]
the sample of Python script:
import csv
cursor = db.workflows.find( {}, {'_id': 1, 'stages.interview': 1, 'stages.hmNotification': 1, 'stages.hmStage': 1, 'stages.type':1, 'stages.isEditable':1, 'stages.order':1,
'stages.name':1, 'stages.stageId':1 })
flattened_records = []
for stages_record in cursor:
stages_record_id = stages_record['_id']
for stage_record in stages_record['stages']:
flattened_record = {
'_id': stages_record_id,
'stages.interview': stage_record['interview'],
'stages.hmNotification': stage_record['hmNotification'],
'stages.hmStage': stage_record['hmStage'],
'stages.type': stage_record['type'],
'stages.isEditable': stage_record['isEditable'],
'stages.order': stage_record['order'],
'stages.name': stage_record['name'],
'stages.stageId': stage_record['stageId']}
flattened_records.append(flattened_record)
when run the python script, it shows keyerror:"type". please help me how to add the missing field name in the script.

When you're trying to fetch values that might not exist in a Python dictionary, you can use the .get() method of the dict class.
For instance, let's say you have a dictionary like this:
my_dict = {'a': 1,
'b': 2,
'c': 3}
You can use the get method to get one of the keys that exist:
>>> print(my_dict.get('a'))
1
But if you try to get a key that doesn't exist (such as does_not_exist), you will get None by default:
>>> print(my_dict.get("does_not_exist"))
None
As mentioned in the documentation, you can also provide a default value that will be returned when the key doesn't exist:
>>> print(my_dict.get("does_not_exist", "default_value"))
default_value
But this default value won't be used if the key does exist in the dictionary (if the key does exist, you'll get its value):
>>> print(my_dict.get("a", "default_value"))
1
Knowing that, when you build your flattened_record you can do:
'stages.hmStage': stage_record['hmStage'],
'stages.type': stage_record.get('type', ""),
'stages.isEditable': stage_record['isEditable'],
So if the stage_record dictionary doesn't contain a key type, get('type') will return an empty string.
You can also try with just:
'stages.hmStage': stage_record['hmStage'],
'stages.type': stage_record.get('type'),
'stages.isEditable': stage_record['isEditable'],
and then stage_record.get('type') will return None when that stage_record doesn't contain a type key.
Or you could make the default "UNKNOWN"
'stages.type': stage_record.get('type', "UNKNOWN"),

Related

Referencing Values in a List (syntax issue?) [duplicate]

I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?

For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000

I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.

Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])

Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])

Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'

Proper way to iterate Python dictionary and build new one

I'm trying to find a cogent way to check to see whether certain keys exist in a dictionary and use those to build a new one.
Here is my example json:
"dmarc": {
"record": "v=DMARC1; p=none; rua=mailto:dmarc.spc#test.domain; adkim=s; aspf=s",
"valid": true,
"location": "test.domain",
"warnings": [
"DMARC record at root of test.domain has no effect"
],
"tags": {
"v": {
"value": "DMARC1",
"explicit": true
},
"p": {
"value": "none",
"explicit": true
},
"rua": {
"value": [
{
"scheme": "mailto",
"address": "ssc.dmarc.spc#canada.ca",
"size_limit": null
}
],
"explicit": true
},
"adkim": {
"value": "s",
"explicit": true
},
"aspf": {
"value": "s",
"explicit": true
},
"fo": {
"value": [
"0"
],
"explicit": false
},
"pct": {
"value": 100,
"explicit": false
},
"rf": {
"value": [
"afrf"
],
"explicit": false
},
"ri": {
"value": 86400,
"explicit": false
},
"sp": {
"value": "none",
"explicit": false
}
}
}
}
What I'm specifically looking to do, is pull record, valid, location, tags-p, tags-sp, and tags-pct in a programmatic way, instead of doing a bunch of try/excepts. For example, to get valid, I do:
try:
res_dict['valid'] = jsonData['valid']
except KeyError:
res_dict['valid'] = None
Now, this is easy enough to loop/repeat for top level key/values, but how would I accomplish this for the nested key/values?

No, you don't need a try-except block for the same. You can check if the key exists using:
if jsonData.get("valid"):
res_dict["valid"] = jsonData.get("valid")
The .get("key") method returns the value for the given key, if present in the dictionary. If not, then it will return None (if get() is used with only one argument).
If you want it to return something else if it doesn't find the key then suppose:
jsonData.get("valid", "invalid_something_else")

One way of handling this is by taking advantage of the fact that the result of dict.keys can be treated as a set. See the following code.
my_keys = {'record', 'valid', 'location'} # you can add more here
new_dict = {}
available_keys = my_keys & jsonData.keys()
for key in available_keys:
new_dict[key] = jsonData[key]
Above, we define the keys we are interested in within the my_keys set. We then get the available keys by taking the intersection of the keys in the dictionary and the keys we are interested in. This, in effect, only gets the keys that we are interested in that are also defined in the dictionary. Finally, we just iterate through the available_keys and build the new dictionary.
However, this does not set keys to None if they do not exist in the input dictionary. For that, it may be best to use the get method as mentioned in other answers, like so:
my_keys = ['record', 'valid', 'location'] # you can add more here
new_dict = {}
for key in my_keys:
new_dict[key] = jsonData.get(key)
The get method allows us to attempt to get the value for a key in the dictionary. If that key is not defined, it returns None. You can also change the returned default by adding an extra argument to the get method like so new_dict[key] = jsonData.get(key, "some other default value")

Simple: instead of dict['key'] use
dict.get('key', {}) for all nodes that are not leaves, and
dict.get('key', DEFAULT) for leaves, where DEFAULT is whatever you need.
If you omit DEFAULT and 'key' is absent, you get None. See the docs.
E.g.:
jsonData.get('record', "") # empty string if no 'record' key
jsonData.get('valid', False) # False if no 'valid' key
jsonData.get('location') # None if no 'location'
jsonData.get('tags', {}).get('p') # None if no 'tags' and/or no 'p'
jsonData.get('tags', {}).get('p', {}) # {} if no 'tags' and/or no 'p'
jsonData.get('tags', {}).get('p', {}).get('explicit', False) # and so on
The above presumes that you don't traverse lists (JSON arrays). If you do, you can still use
dict.get('key', [])
but if you have to dive deeper from there, you will probably have to loop over list items.

I am unable to find attribute values from JSON data in python

I want to find id and options in this JSON data.
Here's What I did so far.
data = """
"list": null,
"promotionID": "",
"isFreeShippingApplicable": true,
"html": "\n\n\n<div class=\"b-product-tile-price\">\n \n \n\n\n\n<span class=\"b-product-tile-price-outer\">\n <span class=\"b-product-tile-price-item\">\n 1200 €\n\n\n </span>\n</span>\n\n</div>\n\n"
},
"longDescription": "<ul>\n\t<li>STYLE: BQ4420-100</li>\n\t<li>Laufsohle: Gummi</li>\n\t<li>Obermaterial: beschichtetes Leder, Textil</li>\n\t<li>Innenmaterial: Textil</li>\n</ul>\n",
"shortDescription": null,
"availability": {
"messages": [
"Sofort lieferbar"
],
"inStockDate": null,
"custom": {
"code": null,
"label": null,
"orderable": true,
"sizeSelectable": true,
"badge": false
"""
find_values = json.loads(data)
id = find_values["id"]
variables = find_product_data["variables"]
print(id, variables)
The output is an erro but when I try to get the values of first the attribute action, it gets returned but not the others.

You can't access the id directly, because it is nested inside another dictionary. What you have to do is get that dict first and then access the id.
find_values = json.loads(data)
product = find_values["product"]
id_value = product("id")
If you are working with an IDE it could help to debug your code and see how the dict is actually nested.

modifying json - deleting certain elements within a json structure using python

My json structure is as follows :
"AGENT": {
"pending": [],
"active": null,
"completed": [
**{
"result": {
"job1.AGENT": "SUCCESS",
"job2.AGENT": "SUCCESS"
},
"return_value": {
"job1.AGENT": "",
"job2.AGENT": ""
},
"visible": true,
"global": true,
"locale": [
"en_US"
],
"complete_time": "2018-01-24T17:44:33.484Z",
"persist": true,
"type": "script",
"script": "<script_name>.py",
"preset_status": "CONFIGURING",
"parameters": {},
"submit_time": "2018-01-24T17:44:26.747Z"
}**,
{
"result": {
..
},
"return_value": {
..
},
"visible": true,
"global": true,
"locale": [
"en_US"
],
"complete_time": "2018-04-2T17:44:40.049Z",
"submit_time": "2018-04-2T17:44:26.817Z"
}
I need to delete the entire result block based on complete_time, like delete the result block before 2018-04-03
How can i acheive this in python ?
I have tried the following so far :
json_data = json.dumps(data)
item_dict = json.loads(data)
print item_dict["AGENT"]["completed"][0]["complete_time"]
This prints the complete time. However my problem is "AGENT" is not a constant string. The string can vary. Also I will need to figure out the logic to remove the entire json block based on complete_time

Ok, I assume that you were able to correctly load the json into a Python dictionnary, let call it item_dict, but the key may vary.
What you need now it to walk inside that Python object, and decode the complete_time field. Unfortunately, Python strptime does not know about the Z time zone, so we will have to skip that last character.
Additionaly, you should never modify a collection object while iterating it, so the bullet proof way is to store indices to remove and later remove them. Code could be:
datelimit = datetime.datetime(2018, 4, 1) # limit date for completed_time
to_remove = []
dateformat = '%Y-%m-%dT%H:%M:%S.%f'
for k, v in item_dict.items(): # enumerate top_level objects
for i, block in enumerate(v['completed']): # enumerate inner blocks
complete_time = datetime.datetime.strptime( # skip last char from complete_time
block["complete_time"][:-1], dateformat)
# print(k, i, complete_time) # uncomment for tests
if complete_time < datelimit: # too old
to_remove.append((k, i)) # store the index for later processing
for k, i in reversed(to_remove): # start from the end to keep consistent indices
del item_dict[k]["completed"][i] # actual deletion

Appending a list inside of a JSON - Python 2.7

My JSON dict looks like this:
{
"end": 1,
"results": [
{
"expired": false,
"tag": "search"
},
{
"span": "text goes here"
}
],
"totalResults": 1
}
which is the product of this line:
tmp_response['results'].append({'span':"text goes here"})
My goal is to get the "span" key into the "results" list. This is necessary for when totalResults > 1.
{
"end": 1,
"results": [
{
"expired": false,
"tag": "search",
"span": "text goes here"
},
],
"totalResults": 1
}
I've tried several methods, for example with use 'dictname.update', but this overwrites the existing data in 'results'.

tmp_response['results'][0]['span'] = "text goes here"
or, if you really wanted to use update:
tmp_response['results'][0].update({'span':"text goes here"})
but note that is an unnecessary creation of a dict.

Here is one more solution if you want you can use below code.
>>> tmp_response = {"end": 1,"results": [{"expired": False,"tag": "search"},{"span": "text goes here"}],"totalResults": 1}
>>> tmp_response['results'][0] = dict(tmp_response['results'][0].items() + {'New_entry': "Ney Value"}.items())
>>> tmp_response
{'totalResults': 1, 'end': 1, 'results': [{'tag': 'search', 'expired': False, 'New_entry': 'Ney Value'}, {'span': 'text goes here'}]}
>>>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

export json data to csv from mongodb - python

Related

Referencing Values in a List (syntax issue?) [duplicate]

Proper way to iterate Python dictionary and build new one

I am unable to find attribute values from JSON data in python

modifying json - deleting certain elements within a json structure using python

Appending a list inside of a JSON - Python 2.7

Categories

Resources