Detect empty string in numeric field using Cerberus - python

I am using the python library cerberus (http://docs.python-cerberus.org/en/stable/) and I want to check if a JSON field is a number (integer) or an empty string.
I tried using the condition:
{"empty": True, "type": "intenger"}
But when the field is an empty string, for example: (""), I get the following error.
'must be of integer type'
Is there a way of using the basic validation rules so it detects also an empty string in a numeric field?, I know it can be done by using extended validation functions but I want to avoid that solution for the moment.

Try something like this:
{"anyof":[
{"type":"string","allowed":[""]},
{"anyof_type":["float","integer"]}
]},

I would advise to not overcomplicate schemas. 1) Multiple types can be declared for the type rule. 2) The empty rule is only applied to sizable values, so it would ignore any given integer. Hence this is the simplest possible rules set for your constraints:
{'type': ('integer', 'string'),
'empty': True}
Mind that this doesn't enforce the value to be an empty string, but allows it to be, vulgo: a non-empty string would also pass. You may want to use the max_lengh rule w/ 0 as constraint instead.

Related

Defining field values using an other fields values Esri Arcmap

I´m using arcMap, Esri. I have a polyline layer with information in text which I need to convert to number values. I want accomplish this using scripting with Python in the Field calculator.
My challenge:
Using field values in one field I want to define values in another field.
In my case I need to define the width of a road in numbers, depending on the field value in text from another field.
The road "widthNumber" will depend on the value of another fields value "widthText".
there are a number of ways you can do this. I'm making the assumption both fields are in the same feature class/shapefile and the widthNumber field is an int of some type.
The ideal case is to use a switch case (Java/C#), but those don't exist in python. So we can either use a dictionary to sort of recreate the switch or simply load up on ifs'. I'm a fan of cleaner code so I've included the logic for the dictionary. But you can always move that into a bunch of ifs' as you deem necessary.
All you have to do is use the pre-logic script code area to write a function which accepts the widthText and returns the widthNumber.
def calculateValue(text):
switcher ={
"Ones":1,
"Fivers":5,
"Threebs":3,
"Twotow":2,
"Four":4,
"Fivers":5
}
return switcher.get(text,"Invalid")`
Then in the bottom section you just call that function and pass in the attribute..
calculateValue(!widthText!)
In this example I did not write in any error handling to deal with invalid values, depending on how your values are stored it may be smart to ensure everything is in the same case (Upper/Lower) to ensure consistency.

ceberus: How to ignore a field based on yaml comment?

Overview
I have a lot of .yaml files, and a schema to validate them.
Sometimes, a "incorrect" value, is in fact correct.
I need some way to ignore some fields. No validations should be performed on these fields.
Example
## file -- a.yaml
some_dict:
some_key: some_valid_value
## file -- b.yaml
some_dict:
some_key: some_INVALID_value # cerberus: ignore
How can I do this?
Quick Answer (TL;DR)
The "composite validation" approach allows for conditional (context-aware) validation rules.
The python cerberus package supports composite validation "out of the box".
YAML comments cannot be used for composite validation, however YAML fields can.
Detailed Answer
Context
python 2.7
cerberus validation package
Problem
Developer PabloPajamasCreator wishes to apply conditional validation rules.
The conditional validation rules become activated based on the presence or value other fields in the dataset.
The conditional validation rules need to be sufficiently flexible to change "on-the-fly" based on any arbitrary states or relationships in the source data.
Solution
This approach can be accomplished with composite data validation.
Under this use-case, composite validation simply means creating a sequential list of validation rules, such that:
Each individual rule operates on a composite data variable
Each individual rule specifies a "triggering condition" for when the rule applies
Each individual rule produces one of three mutually-exclusive validation outcomes: validation-success, validation-fail, or validation-skipped
Example
Sample validation rules
- rule_caption: check-required-fields
rule_vpath: "#"
validation_schema:
person_fname:
type: string
required: true
person_lname:
type: string
required: true
person_age:
type: string
required: true
- rule_caption: check-age-range
rule_vpath: '#|#.person_age'
validation_schema:
person_age:
"min": 2
"max": 120
- rule_caption: check-underage-minor
rule_vpath: '[#]|[? #.person_age < `18`]'
validation_schema:
prize_category:
type: string
allowed: ['pets','toys','candy']
prize_email:
type: string
regex: '[\w]+#.*'
The code above is a YAML formatted representation of multiple validation rules.
Rationale
This approach can be extended to any arbitrary level of complexity.
This approach is easily comprehensible by humans (although the jmespath syntax can be a challenge)
Any arbitrarily complex set of conditions and constraints can be established using this approach.
Pitfalls
The above example uses jmespath syntax to specify rule_vpath, which tells the system when to trigger specific rules, this adds a dependency on jmespath.
See also
complete code example on github

How to correctly re-format a python dictionary that is in unicode format?

I have some input data which is a python dictionary formatted as unicode. Something like this:
Input = {u'city': u'London', u'offer_type': u'3'}
And I need to create a script able to reformat the values of the dictionary. So if a value is an integer, like in the case of "offer_type" it sets it as integer. It the value is an string, like "city", it sets it as string. This can easily be done using thestr() and int() functions.
But the problem is that my input data can vary, so the keys can be different, and also the associated values to those keys. So I need to somehow automatically distinguish when the unicode value is a number or a string, and reformat it.
My first idea was to take each value, try to convert them to integer, and if I retrieve an error message, then try to convert it to an string. But this is not pythonic at all, and I have doubts about the performance.
Thank you,
Álvaro
You can try this
Output = {}
for key in Input.keys():
Output[str(key)] = int(Input[key]) if Input[key].isdigit() else str(Input[key])
print Output

String match is not working in python

here is my Django code
print request.user.role
print request.user.role is "Super"
print request.user.role == "Super"
print "Super" is "Super"
and the output on console is
Super
False
False
False
True
I am wondering why it is not matching the exact string
It is because request.user.role is not a string. As a result, comparing it with a string "Super" will return false, as there is no implicit type comparison. You must convert it to a string if you want to compare it to one. To convert to a string, you can try this:
str(request.user.role)
Your last print returns true because you are just comparing the string "Super" to itself, evidently. Also as a side note, you only want to use is when comparing identities, not values.
Please do not use string comparison to check for user roles. This approach is error prone, may use more memory for new created strings and is dangerous overall. For example if value that represents role is not it's name you will have to keep track of name-value mapping yourself. Or if library will change it's mind and swap names to integers etc.
All libraries that provide such functionality has roles enum lying somewhere with all the values for roles. So, for example, in django-user-roles you can do
user.role.is_super # maybe role.is_Super
# or
from userroles import roles
user.role == roles.super # maybe roles.Super
This is much more readable and safer aproach.

Sort lexicographically?

I am working on integrating with the Photobucket API and I came across this in their api docs:
"Sort the parameters by name
lexographically [sic] (byte ordering, the
standard sorting, not natural or case
insensitive). If the parameters have
the same name, then sort by the value."
What does that mean? How do I sort something lexicographically? byte ordering?
The rest of their docs have been ok so far, but (to me) it seems like this line bears further explanation. Unfortunately there was none to be had.
Anyway, I'm writing the application in Python (it'll eventually become a Django app) in case you want to recommend specific modules that will handle such sorting for me ^_^
I think that here lexicographic is a "alias" for ascii sort?
Lexicographic Natural
z1.doc z1.doc
z10.doc z2.doc
z100.doc z3.doc
z101.doc z4.doc
z102.doc z5.doc
z11.doc z6.doc
z12.doc z7.doc
z13.doc z8.doc
z14.doc z9.doc
z15.doc z10.doc
z16.doc z11.doc
z17.doc z12.doc
z18.doc z13.doc
z19.doc z14.doc
z2.doc z15.doc
z20.doc z16.doc
z3.doc z17.doc
z4.doc z18.doc
z5.doc z19.doc
z6.doc z20.doc
z7.doc z100.doc
z8.doc z101.doc
z9.doc z102.doc
The word should be "lexicographic"
http://www.thefreedictionary.com/Lexicographic
Dictionary order. Using the letters as they appear in the strings.
As they suggest, don't fold upper- and lower-case together. Just use the Python built-in list.sort() method.
This is similar to the Facebook API — the query string needs to be normalized before generating the signature hash.
You probably have a dictionary of parameters like:
params = {
'consumer_key': "....",
'consumer_secret': "....",
'timestamp': ...,
...
}
Create the query string like so:
urllib.urlencode(sorted(params.items()))
params.items() returns the keys and values of the dictionary as a list tuples, sorted() sorts the list, and urllib.urlencode() concatenates them into a single string while escaping.
Quote a bit more from the section:
2 Generate the Base String:
Normalize the parameters:
Add the OAuth specific parameters for this request to the input parameters, including:
oauth_consumer_key = <consumer_key>
oauth_timestamp = <timestamp>
oauth_nonce = <nonce>
oauth_version = <version>
oauth_signature_method = <signature_method>
Sort the parameters by name lexographically [sic] (byte ordering, the standard sorting, not natural or case insensitive). If the parameters have the same name, then sort by the value.
Encode the parameter values as in RFC3986 Section 2 (i.e., urlencode).
Create parameter string (). This is the same format as HTTP 'postdata' or 'querystring', that is, each parameter represented as name=value separated by &. For example, a=1&b=2&c=hello%20there&c=something%20else
I think that they are saying that the parameters must appear in the sorted order - oauth_consumer_key before oauth_nonce before ...

Categories

Resources