I'm revising my previous question. I have a collection named FileCollection with the following document:
{
"_id": {
"$oid": "5e791a53185fbb070378660a"
},
"selectedfiles": [{
"inputfile": "https://localhost/_HAC-154_1584994899979.jpg",
"Selectedby: "Joe"
}]}
I need to read the value of selectedfiles.inputfile as a string variable. I'm trying to do this in Python using this code:
from pymongo import MongoClient
mydb = MongoClient(mongodbConnection)
myCollection=mydb.FileCollection
myValue=myCollection.selectedfile[0].inputfile.value
print(myValue)
client.close
the output is a JSON without having the actual value of inputfile. Please help.
Thanks
Isn't it just because you're missing an s?
You had:
myValue=myCollection.selectedfile[0].inputfile.value
instead of:
myValue=myCollection.selectedfiles[0].inputfile.value
Related
I was looking for a solution for storage and retrieval of time series data.
As I have mongodb set up already in my project, I searched for a solution with mongodb and mongoengine (instead of pymongo).
So I wonder if there a similar solution to this for mongoengine or if there ain't one, how-to develop it.
{
"_id" : ObjectId("60c0d44894c10494260da31e"),
"source" : {sensorId: 123, region: "americas"},
"airPressure" : 99 ,
"windSpeed" : 22,
"temp" : { "degreesF": 39,
"degreesC": 3.8
},
"ts" : ISODate("2021-05-20T10:24:51.303Z")
}
db.createCollection("weather", {
timeseries: {
timeField: "ts",
metaField: "source",
granularity: "minutes"
},
expireAfterSeconds: 9000
});
Sample code is taken from MongoDB's New Time Series Collections in which the solution by pymongo is described but I wanna do it with mongoengine. Is that possible?
try to create your time-series collections with pymongo like this:
import pymongo
connection= pymongo.MongoClient('mongodb://localhost')
db = connection.<dbName>
db.create_collection('<tsCollectionName>', timeseries={ 'timeField': '<timeField>', 'metaField': '<metaField>', 'granularity': '<granularity>' }) })
you can replace every value between <> with yours
I am having a json in a file which i want to access in my Python Code. The Json file looks like :
{
"fc1" : {
region : "Delhi",
marketplace : "IN"
},
"fc2" : {
region : "Rajasthan",
marketplace : "IN"
}
}
The above json i want to use in my Python code. I want to access according to its keys("fc1", "fc2")
Since this is not like actual json, i am facing difficulty in accessing the values in json.
Is there any way in python language to access these type of json.
Thanks.
I agree with the comment that, if you generated that file, then you should put quotes around region and marketplace when generating it (or have the person who generated it do the same). However, if this absolutely isn't an option for whatever reason, the following approach might work:
import json
data_string = """
{
"fc1":{
region:"Delhi",
marketplace: "IN"
},
"fc2" : {
region:"Rajasthan",
marketplace: "IN"
}
}
"""
data = json.loads(data_string.replace('region', '"region"').replace('marketplace', '"marketplace"'))
data
>>>{'fc1': {'region': 'Delhi', 'marketplace': 'IN'},
'fc2': {'region': 'Rajasthan', 'marketplace': 'IN'}}
Note that you would have to do the same for any unquoted key.
There is module dirtyjson which reads this incorrect JSON.
import dirtyjson
data_string = """
{
"fc1":{
region:"Delhi",
marketplace: "IN"
},
"fc2" : {
region:"Rajasthan",
marketplace: "IN"
}
}
"""
data = dirtyjson.loads(data_string)
print(data)
print(data['fc1'])
print(data['fc2'])
I have json files in S3 containing array of objects in each file, like shown below.
[{
"id": "c147162a-a304-11ea-aa90-0242ac110028",
"clientId": "xxx",
"contextUUID": "1bb6b39e-b181-4a6d-b43b-4040f9d254b8",
"tags": {},
"timestamp": 1592855898
}, {
"id": "c147162a-a304-11ea-aa90-0242ac110028",
"clientId": "yyy",
"contextUUID": "1bb6b39e-b181-4a6d-b43b-4040f9d254b8",
"tags": {},
"timestamp": 1592855898
}]
I used crawler to detect and load the schema to catalog. It was successful and it created a schema with a single column named array with data type array<struct<id:string,clientId:string,contextUUID:string,tags:string,timestamp:int>>.
Now, I tried to load the data using glueContext.create_dynamic_frame.from_catalog function, but I could not see any data. I tried printing schema and data as shown below.
ds = glueContext.create_dynamic_frame.from_catalog(
database = "dbname",
table_name = "tablename")
ds.printSchema()
root
ds.schema()
StructType([], {})
ds.show()
empty
ds.toDF().show()
++
||
++
++
Any idea, what I am doing wrong? I am planning to extract each object in array and transform the object to a different schema.
You can try to give regex in format_options to tell glue how it should read the data. Following code has worked for me:
glueContext.create_dynamic_frame_from_options('s3',
{
'paths': ["s3://glue-test-bucket-12345/events/101-1.json"]
},
format="json",
format_options={"jsonPath": "$[*]"}
).toDF()
I hope it solves the problem.
So I'm currently learning MongoDB and I'm using PyMongo rather than MongoDB shell.
When I started trying the basic CRUD operations, I found it is hard to load the bios data using PyMongo, since the original data posted on the website had a strange ISODATA for time.
The original python JSON library seemed to be not support this and the mongoimport seemed to be not support this either(not sure). But I found this, after modifying into {$date:"2017-04-01T05:00:00Z"}, mongoimport was working.
Right now I'm using subprocess to call a external command to import the data. So my question is, how to use python correctly read the JSON data and using PyMongo to insert the data.
Details
the bios data in the mongodb documentation looks like this
{
"_id" : 1,
"name" : {
"first" : "John",
"last" : "Backus"
},
"birth" : ISODate("1924-12-03T05:00:00Z"),
"death" : ISODate("2007-03-17T04:00:00Z"),
"contribs" : [
"Fortran",
"ALGOL",
"Backus-Naur Form",
"FP"
],
"awards" : [
{
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
},
{
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
{
"award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering"
}
]
}
And when I try to parse it with Python's JSON library, I get a error messagejson.decoder.JSONDecodeError because of the "birth" : ISODate("1924-12-03T05:00:00Z"),. And mongoimport can not parse this because of the same reason.
When I modified,
"birth" : ISODate("1924-12-03T05:00:00Z"), into
"birth" : $date:"2017-04-01T05:00:00Z"
mongoimport was working but python still wasn't able to parse it.
What I am asking here is a way to deal this problem within Python and PyMongo rather than calling a external commands.
The example that you're looking at was probably intended to be used within the mongo shell, where the use of the ISODate bson type can be parsed as shown.
Outside of that, we have the challenge that JSON does not have a date datatype, nor does it have a standard way of representing dates. To deal with this challenge, MongoDB created something called Extended JSON, which can encode dates in JSON similar to how you have shown with $date.
In order to work with Extended JSON in Python / PyMongo, you could use json_util.
Here's a brief example:
from bson.json_util import loads
from pymongo import MongoClient
json = '''
{
"_id" : 1,
"name" : {
"first" : "John",
"last" : "Backus"
},
"birth" : {"$date":"2017-04-01T05:00:00.000Z"},
"death" : {"$date":"2017-04-01T05:00:00.000Z"}
}
'''
bson = loads(json)
print(str(bson))
db = MongoClient().test
collection = db.bios
collection.insert(bson)
I'm attempting to understand the basics of JSON and thought using some Google translate examples would be interesting. I'm not actually making requests via the API but they have the following example I have saved as "file.json":
{
"data": {
"detections": [
[
{
"language": "en",
"isReliable": false,
"confidence": 0.18397073
}
]
]
}
}
I'm reading in the file and used simplejson:
json_data = open('file.json').read()
json = simplejson.loads(json_data)
>>> json
{'data': {'detections': [[{'isReliable': False, 'confidence': 0.18397073, 'language': 'en'}]]}}
I've tried multiple ways to print the value of 'language' with no success. For example, this fails. Any pointers would be appreciated!
print json['detections']['language']
You need json['data']['detections'][0][0]['language']. As your example data shows, 'language' is a key of a dict that is inside a list that is inside another list that inside the 'detections' dict which is inside the 'data' dict.