Im using Python, and there is no documentation on doing this in Python. I have blob storage working with python. Now I am trying to save data to the cosmos db. I have no idea what i am supposed to do in azure function?
cosmosdb_data = open(os.environ['outputDocument'], 'wb')
Would really appreciate any help on this!
EDIT:
I got it storing, but it complains that the document is corrupt anmd the _id field is missing. Does this mean you have to set your own id??
data = {
"timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
"image":"path/image.jpg",
"device":subject.split(",")[1],
"detected":"false",
"detection_type":"null"
}
document = open(os.environ['outputCosmosDB'], 'w')
document.write('%s' % data)
document.close()
document.write doesn't output valid JSON, does it? Doesn't it output single quotes, not double quotes? You need to make sure it outputs valid JSON.
_id is not necessary.
Also, Python on Azure Functions v1 is not very good and I'd recommend not using it. We're actively working on a new version of Python for v2 which will work properly for this kind of thing.
Related
I am using the Censys api in python to programmatically look through host and grab information about them. Censys website says it returns Json formatted data and it looks like Json formatted data but, I cant seem to figure out how to tun the api response into a json object. However, if i write the json response to a json file and load it. It works fine Any ideas?
Update: Figured out issue is with nested json that the api returns. Looking for libraries to flatten it.
Main.py
c = censys.ipv4.CensysIPv4(api_id=UID, api_secret=SECRET)
for result in c.search("autonomous_system.asn:15169 AND tags.raw:iot", max_records=1):
hostIPS.append(result["ip"]);
for host in hostIPS:
for details in c.view(host):
# test = json.dumps(details)
# test = json.load(test)
# data = json.load(details)
data = json.loads(details)
print(data)
You don't need to convert it to an object, it's already json.loaded. See the implementation here: https://github.com/censys/censys-python/blob/master/censys/base.py
I am trying to create a Python script that can take a JSON object and insert it into a headless Couchbase server. I have been able to successfully connect to the server and insert some data. I'd like to be able to specify the path of a JSON object and upsert that.
So far I have this:
from couchbase.bucket import Bucket
from couchbase.exceptions import CouchbaseError
import json
cb = Bucket('couchbase://XXX.XXX.XXX?password=XXXX')
print cb.server_nodes
#tempJson = json.loads(open("myData.json","r"))
try:
result = cb.upsert('healthRec', {'record': 'bob'})
# result = cb.upsert('healthRec', {'record': tempJson})
except CouchbaseError as e:
print "Couldn't upsert", e
raise
print(cb.get('healthRec').value)
I know that the first commented out line that loads the json is incorrect because it is expecting a string not an actual json... Can anyone help?
Thanks!
Figured it out:
with open('myData.json', 'r') as f:
data = json.load(f)
try:
result = cb.upsert('healthRec', {'record': data})
I am looking into using cbdocloader, but this was my first step getting this to work. Thanks!
I know that you've found a solution that works for you in this instance but I thought I'd correct the issue that you experienced in your initial code snippet.
json.loads() takes a string as an input and decodes the json string into a dictionary (or whatever custom object you use based on the object_hook), which is why you were seeing the issue as you are passing it a file handle.
There is actually a method json.load() which works as expected, as you have used in your eventual answer.
You would have been able to use it as follows (if you wanted something slightly less verbose than the with statement):
tempJson = json.load(open("myData.json","r"))
As Kirk mentioned though if you have a large number of json documents to insert then it might be worth taking a look at cbdocloader as it will handle all of this boilerplate code for you (with appropriate error handling and other functionality).
This readme covers the uses of cbdocloader and how to format your data correctly to allow it to load your documents into Couchbase Server.
I have created a python script that automates a workflow converting PDF to txt files. I want to be able to store and query these files in MongoDB. Do I need to turn the .txt file into JSON/BSON? Should I be using a program like PyMongo?
I am just not sure what the steps of such a project would be let alone the tools that would help with this.
I've looked at this post: How can one add text files in Mongodb?, which makes me think I need to convert the file to a JSON file, and possibly integrate GridFS?
You don't need to JSON/BSON encode it if you're using a driver. If you're using the MongoDB shell, you'd need to worry about it when you pasted the contents.
You'd likely want to use the Python MongoDB driver:
from pymongo import MongoClient
client = MongoClient()
db = client.test_database # use a database called "test_database"
collection = db.files # and inside that DB, a collection called "files"
f = open('test_file_name.txt') # open a file
text = f.read() # read the entire contents, should be UTF-8 text
# build a document to be inserted
text_file_doc = {"file_name": "test_file_name.txt", "contents" : text }
# insert the contents into the "file" collection
collection.insert(text_file_doc)
(Untested code)
If you made sure that the file names are unique, you could set the _id property of the document and retrieve it like:
text_file_doc = collection.find_one({"_id": "test_file_name.txt"})
Or, you could ensure the file_name property as shown above is indexed and do:
text_file_doc = collection.find_one({"file_name": "test_file_name.txt"})
Your other option is to use GridFS, although it's often not recommended for small files.
There's a starter here for Python and GridFS.
Yes, you must convert your file to JSON. There is a trivial way to do that: use something like {"text": "your text"}. It's easy to extend / update such records later.
Of course you'd need to escape the " occurences in your text. I suppose that you use a JSON library and/or MongoDB library of your favorite language to do all the formatting.
I am very new to Python and am not very familiar with the data structures in Python.
I am writing an automatic JSON parser in Python, the JSON message is read into a dictionary using Ultra-JSON:
jsonObjs = ujson.loads(data)
Now, if I try something like:
jsonObjs[param1][0][param2] it works fine
However, I need to get the path from an external source (I read it from the DB), we initially thought we'll just write in the DB:
myPath = [param1][0][param2]
and then try to access:
jsonObjs[myPath]
But after a couple of failures I realized I'm trying to access:
jsonObjs[[param1][0][param2]]
Is there a way to fix this without parsing myPath?
Many thanks for your help and advice
Store the keys in a format that preserves type information, e.g. JSON, and then use reduce() to perform recursive accesses on the structure.
I'm banging my head against the wall with this one:
What I want to do is store a file that is returned from an API in the data store as a blob.
Here is the code that I use on my local machine (which of course works due to an existing file system):
client.convertHtml(html, open('html.pdf', 'wb'))
Since I cannot write to a file on App Engine I tried several ways to store the response, without success.
Any hints on how to do this? I was trying to do it with StringIO and managed to store the response but then weren't able to store it as a blob in the data store.
Thanks,
Chris
Found the error. Here is how it looks like right now (simplified).
output = StringIO.StringIO()
try:
client.convertURI("example.com", output)
Report.pdf = db.Blob(output.getvalue())
Report.put()
except pdfcrowd.Error, why:
logging.error('PDF creation failed %s' % why)
I was trying to save the output without calling "getvalue()", that was the problem. Perhaps this is of use to someone in the future :)