Python insert YAML in MongoDB

Python insert YAML in MongoDB - python

Folks,
Having a problem with inserting the following yaml document into MongoDB:
works:
---
URLs:
- "http://www.yahoo.com":
intensity: 5
port: 80
does not:
---
URLs:
- "foo":
intensity: 5
port: 80
The only difference is the url. Here is the python code:
stream = open(options.filename, 'r')
yamlData = yaml.load(stream)
jsonData = json.dumps(yamlData)
io = StringIO(jsonData)
me = json.load(io)
... calling classes, etc, then
self.appCollection.insert(me)
err:
bson.errors.InvalidDocument: key 'http://yahoo.com' must not contain '.'
So, what is the correct way to transform this YML file? :)
Thanks!

You cannot use "." in field names (i.e. keys). If you must, then replace occurences of "." with the unicode representation "\uff0E".
Hope this helps.

As the errors says, you have errors in your key. MongoDB uses dot for nested document keys, you cannot have a key that contains dot as part of the key.

Related

python urllib: build urls including parameters with and without keyword

I am building grafana links in python with urllib like the following:
from urllib.parse import urlencode, urlunsplit
parameters = {
"parameter1":"value1",
"parameter2":"value2"
}
query = urlencode(
query = parameters,
doseq = True
)
link = urlunsplit((
"https",
"my_grafana.com",
"/graph",
query,
""
))
link will be in this case 'https://my_grafana.com/graph?parameter1=value1&parameter2=value2'. I now want to add parameters with no keyword for example "kiosk". The link should look like 'https://my_grafana.com/graph?parameter1=value1&parameter2=value2&kiosk&other_parameter'
As urlencode returns a string with the parameters I could manipulate the string like in the following example before I give it to urlunsplit:
no_keyword_parameters = ["kiosk","other_parameter"]
query = "&".join([query, *no_keyword_parameters])
I wonder if you can put parameters with and without keyword directly with urlencode together. I tried giving "kiosk" as a dictionary entry with None as content ({"kiosk": None}) but it includes the None in the url. Approaches, where I give a list of tuples instead of a dictionary for the parameters, were also unsuccessful.
Thank you for any help.

As mentioned by Ondrej, urlencode builds the query using k + '=' + v.
You could add non value parameters manually:
from urllib.parse import urlencode, urlunsplit, quote_plus
parameters = {"parameter1": "value1", "parameter2": "value2"}
no_value_parameters = ["kiosk", "other_parameter"]
no_value_parameters_quoted = [quote_plus(p) for p in no_value_parameters]
query = urlencode(query=parameters, doseq=True)
link = urlunsplit(("https", "my_grafana.com", "/graph", query, ""))
link = f"{link}&{'&'.join(no_value_parameters_quoted)}"
print(link)
Out:
https://my_grafana.com/graph?parameter1=value1&parameter2=value2&kiosk&other_parameter

What you've done seems sound and you could either do it like that or formalize it a bit more in your own encoding function, but urllib.parse.urlencode does not seem to understand the notion of parameters without value. If you look at the implementation (with doseq you get a variation of the same for the part relevant to your question):
for k, v in query:
...
l.append(k + '=' + v)
I.e. you have to have a key, value pair (to unpack two values) and whatever they are quoted to (that happens in the ellipses) will be a str joined over =. So even using custom qoute_via you cannot really change its function.
That linked implementation is the one provided with CPython, but also the documentation expects: key/value pairs, so that behavior really is as specified / documented:
The resulting string is a series of key=value pairs separated by '&' characters...

python variable body API

Good Morning,
I need to understand how to insert a variable into this variable (CHANGEME).
payload = "{\n\t"client": {\n\t\t"clientId": "name"\n\t},\n\t"contentFieldOption": {\n\t\t"returnLinkedContents": false,\n\t\t"returnLinkedCategories": false,\n\t\t"returnEmbedCodes": false,\n\t\t"returnThumbnailUrl": false,\n\t\t"returnItags": false,\n\t\t"returnAclInfo": false,\n\t\t"returnImetadata": false,\n\t\t"ignoreITagCombining": false,\n\t\t"returnTotalResults": true\n\t},\n\t"criteria": {\n\t\t"linkedCategoryOp": {\n\t\t\t"linkedCategoryIds": [\n\t\t\t\t" CHANGEME ",\n\t\t\t\t"!_TRASH"\n\t\t\t],\n\t\t\t"cascade": true\n\t\t}\n\t},\n\t"numberOfresults": 50,\n\t"offset": 0,\n\t"orderBy": "creationDate_A"\n}"
It is part of the body to be inserted inside API POST request.
I have tried various alternatives, but to no avail it led me to solve my problem

Don't try to hack this string with regexes; you'll end up with invalid data in no time. Use json.loads() to convert it into a dictionary, find the key CHANGEME, and do whatever you need to do (which you do not really explain).
>>> paydict = json.loads(payload)
>>> print(json.dumps(paydict, indent=4)
{
"criteria": {
"linkedCategoryOp": {
"linkedCategoryIds": [
" CHANGEME ",
"!_TRASH"
...
API objects usually have a consistent structure, so your variable is probably always in the list paydict["criteria"]["linkedCategoryOp"]["linkedCategoryIds"]. Find the index of " CHANGEME " in this list, and take it from there.

You can use re - Python's regular expressions module :
import re
payload = '{\n\t"client": {\n\t\t"clientId": "name"\n\t},\n\t"contentFieldOption": {\n\t\t"returnLinkedContents": false,\n\t\t"returnLinkedCategories": false,\n\t\t"returnEmbedCodes": false,\n\t\t"returnThumbnailUrl": false,\n\t\t"returnItags": false,\n\t\t"returnAclInfo": false,\n\t\t"returnImetadata": false,\n\t\t"ignoreITagCombining": false,\n\t\t"returnTotalResults": true\n\t},\n\t"criteria": {\n\t\t"linkedCategoryOp": {\n\t\t\t"linkedCategoryIds": [\n\t\t\t\t" CHANGEME ",\n\t\t\t\t"!_TRASH"\n\t\t\t],\n\t\t\t"cascade": true\n\t\t}\n\t},\n\t"numberOfresults": 50,\n\t"offset": 0,\n\t"orderBy": "creationDate_A"\n}'
payload = re.sub("\n|\t","",payload).strip() # do some cleanup
payload = re.sub("\s+CHANGEME\s+","NEW VALUE",payload) # Replace the value
print(payload) # CHANGEME is replaced with NEW VALUE

You could use a simple string replace to swap "CHANGEME" with something else.
new_str = 'IMCHANGED'
payload.replace('CHANGEME', new_str)
This solves your stated problem, unless there are extra constraints about what the payload looks like (right now you're assuming it's a string, or how many times the word CHANGEME occurs). Please clarify if that is the case.

How do i 'professionally' store small data in python? [duplicate]

I need to store basic data of customer's and cars that they bought and payment schedule of these cars. These data come from GUI, written in Python. I don't have enough experience to use a database system like sql, so I want to store my data in a file as plain text. And it doesn't have to be online.
To be able to search and filter them, first I convert my data (lists of lists) to the string then when I need the data re-convert to the regular Python list syntax. I know it is a very brute-force way, but is it safe to do like that or can you advice me to another way?

It is never safe to save your database in a text format (or using pickle or whatever). There is a risk that problems while saving the data may cause corruption. Not to mention risks with your data being stolen.
As your dataset grows there may be a performance hit.
have a look at sqlite (or sqlite3) which is small and easier to manage than mysql. Unless you have a very small dataset that will fit in a text file.
P/S: btw, using berkeley db in python is simple, and you don't have to learn all the DB things, just import bsddb

The answer to use pickle is good, but I personally prefer shelve. It allows you to keep variables in the same state they were in between launches and I find it easier to use than pickle directly. http://docs.python.org/library/shelve.html

I agree with the others that serious and important data would be more secure in some type of light database but can also feel sympathy for the wish to keep things simple and transparent.
So, instead of inventing your own text-based data-format I would suggest you use YAML
The format is human-readable for example:
List of things:
- Alice
- Bob
- Evan
You load the file like this:
>>> import yaml
>>> file = open('test.yaml', 'r')
>>> list = yaml.load(file)
And list will look like this:
{'List of things': ['Alice', 'Bob', 'Evan']}
Of course you can do the reverse too and save data into YAML, the docs will help you with that.
At least another alternative to consider :)

very simple and basic - (more # http://pastebin.com/A12w9SVd)
import json, os
db_name = 'udb.db'
def check_db(name = db_name):
if not os.path.isfile(name):
print 'no db\ncreating..'
udb = open(db_name,'w')
udb.close()
def read_db():
try:
udb = open(db_name, "r")
except:
check_db()
read_db()
try:
dicT = json.load(udb)
udb.close()
return dicT
except:
return {}
def update_db(newdata):
data = read_db()
wdb = dict(data.items() + newdata.items())
udb = open(db_name, 'w')
json.dump(wdb, udb)
udb.close()
using:
def adduser():
print 'add user:'
name = raw_input('name > ')
password = raw_input('password > ')
update_db({name:password})

You can use this lib to write an object into a file http://docs.python.org/library/pickle.html

Writing data in a file isn't a safe way for datastorage. Better use a simple database libary like sqlalchemy. It is a ORM for easy database usage...

You can also keep simple data in plain text file. Then you have not much support, however, to check consistency of data, double values etc.
Here is my simple 'card file' type data in text file code snippet using namedtuple so that you can access values not only by index in line but by they header name:
# text based data input with data accessible
# with named fields or indexing
from __future__ import print_function ## Python 3 style printing
from collections import namedtuple
import string
filein = open("sample.dat")
datadict = {}
headerline = filein.readline().lower() ## lowercase field names Python style
## first non-letter and non-number is taken to be the separator
separator = headerline.strip(string.lowercase + string.digits)[0]
print("Separator is '%s'" % separator)
headerline = [field.strip() for field in headerline.split(separator)]
Dataline = namedtuple('Dataline',headerline)
print ('Fields are:',Dataline._fields,'\n')
for data in filein:
data = [f.strip() for f in data.split(separator)]
d = Dataline(*data)
datadict[d.id] = d ## do hash of id values for fast lookup (key field)
## examples based on sample.dat file example
key = '123'
print('Email of record with key %s by field name is: %s' %
(key, datadict[key].email))
## by number
print('Address of record with key %s by field number is: %s' %
(key ,datadict[key][3]))
## print the dictionary in separate lines for clarity
for key,value in datadict.items():
print('%s: %s' % (key, value))
input('Ready') ## let the output be seen when run directly
""" Output:
Separator is ';'
Fields are: ('id', 'name', 'email', 'homeaddress')
Email of record with key 123 by field name is: gishi#mymail.com
Address of record with key 123 by field number is: 456 happy st.
345: Dataline(id='345', name='tony', email='tony.veijalainen#somewhere.com', homeaddress='Espoo Finland')
123: Dataline(id='123', name='gishi', email='gishi#mymail.com', homeaddress='456 happy st.')
Ready
"""

Comparing two documents and writing output to a third [Python?]

I am seeking some advice whether it be in terms of a script (possibly python?) that I could use to do the following.
I basically have two documents, taken from a DB:
document one contains :
hash / related username.
example:
fb4aa888c283428482370 username1
fb4aa888c283328862370 username2
fb4aa888c283422482370 username3
fb4aa885djsjsfjsdf370 username4
fb4aa888c283466662370 username5
document two contains:
hash : plaintext
example:
fb4aa888c283428482370:plaintext
fb4aa888c283328862370:plaintext2
fb4aa888c283422482370:plaintext4
fb4aa885djsjsfjsdf370:plaintextetc
fb4aa888c283466662370:plaintextetc
can anyone think of an easy way for me to match up the hashes in document two with the relevant username from document one into a new document (say document three) and add the plain so it would look like the following...
Hash : Relevant Username : plaintext
This would save me a lot of time having to cross reference two files, find the relevant hash manually and the user it belongs to.
I've never actually used python before, so some examples would be great!
Thanks in advance

I don't have any code for you but a very basic way to do this would be to whip up a script that does the following:
Read the first doc into a dictionary with the hashes as keys.
Read the second doc into a dictionary with the hashes as keys.
Iterate through both dictionaries, by key, in the same loop, writing out the info you want into the third doc.

You didn't really specify how you wanted the output, but this should get you close enough to modify to your liking. There are guys out there good enough to shorten this into a fey lines of code - but I think the readability of keeping it long may be helpful to you just getting started.
Btw, I would probably avoid this altogether and to the join in SQL before creating the file -- but that wasn't really your question : )
usernames = dict()
plaintext = dict()
result = dict()
with open('username.txt') as un:
for line in un:
arry = line.split() #Turns the line into an array of two parts
hash, user = arry[0], arry[1]
usernames[hash] = user.rsplit()[0] # add to dictionary
with open('plaintext.txt') as un:
for line in un:
arry = line.split(':')
hash, txt = arry[0], arry[1]
plaintext[hash] = txt.rsplit()[0]
for key, val in usernames.items():
hash = key
txt = plaintext[hash]
result[val] = txt
with open("dict.txt", "w") as w:
for name, txt in result.items():
w.write('{0} = {1}\n'.format(name, txt))
print(usernames) #{'fb4aa888c283466662370': 'username5', 'fb4aa888c283422482370': 'username3' ...................
print(plaintext) #{'fb4aa888c283466662370': 'plaintextetc', 'fb4aa888c283422482370': 'plaintext4' ................
print(result) #{'username1': 'plaintext', 'username3': 'plaintext4', .....................

Extracting data from JSON / dictionary with Python

I have an output :
result = {
"sip_domains":{
"prefix":[{"name":""}],
"domain":[{"name":"k200.com"},{"name":"Zinga.com"},{"name":"rambo.com"}]
},
"sip_security":{"level":2},
"sip_trusted_hosts":{"host":[]},
"sip_proxy_mode":{"handle_requests":1}
}
from this i just wanted the output to print to my screen :
domain : k200.com
domain : Zinga.com
domain : rambo.com
how can i get this output using regular expression
Help needed urgently

If it's the text you need to parse then Use JSON module to parse the JSON payload:
http://docs.python.org/library/json.html?highlight=json#json
Regular expression are not needed with good programming language like Python.
Otherwise if it's Python dictionary then use Python dictionary [] style item access to read data from the dictionary.

If you are getting this data as a string from somewhere you must convert it to a python dictionary object to access it. You should not have to use any regular expressions to get this output.
import json
# get the json str somehow
json_dict = json.loads(json_str)
for domain_dict in json_dict['sip_domains']['domain']:
print 'domain : %s' % (domain_dict['name'])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python insert YAML in MongoDB - python

You cannot use "." in field names (i.e. keys). If you must, then replace occurences of "." with the unicode representation "\uff0E". Hope this helps.

As the errors says, you have errors in your key. MongoDB uses dot for nested document keys, you cannot have a key that contains dot as part of the key.

Related

python urllib: build urls including parameters with and without keyword

python variable body API

How do i 'professionally' store small data in python? [duplicate]

Comparing two documents and writing output to a third [Python?]

Extracting data from JSON / dictionary with Python

Categories

Resources