Python object validation thanks to a Schema - python

I want to validate a python object thanks to a schema. For this I found the schema framework.
I would like to validate a numeric string:
a = {
'phone_number': '12233'
}
Do you know how can I validate this string thanks to a regex?
At this time, I only know how to perform a string validation:
Schema(str).validate('12')

Schema will call any callables; simply provide a function that uses a regular expression:
import re
pattern = re.compile('^12\d+$')
Schema(And(str, lambda x: pattern.match(x) is not None))
Demo:
>>> import re
>>> from schema import Schema, And
>>> pattern = re.compile('^12\d+$')
>>> s = Schema(And(str, lambda x: pattern.match(x) is not None))
>>> s.validate('123234')
'123234'
>>> s.validate('42')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/schema.py", line 153, in validate
raise SchemaError([None] + x.autos, [e] + x.errors)
schema.SchemaError: <lambda>('42') should evaluate to True

Related

Python 3: Check if a string is an import command

I want to check a string - is it an import command? I have tried
# Helper - analyses a string - is it an import string?
"""
fromlike - from foo import bar
classic - import foo
classic_as - import foo as baz
"""
def check_is_import(string):
importname = ''
fromlike = False
classic = False
classic_as = False
if string[0:4] is 'from':
fromlike = True
importname = ''
if not fromlike and (string[0:6] is 'import'):
classic = True
importname = string.split(' ')[1]
if classic:
commandlist = string.split(' ')
if commandlist[2] is 'as':
classic_as = True
importname = commandlist[3]
del commandlist
if fromlike:
return ('fromlike', importname)
elif classic and (not classic_as):
return ('classic', importname)
elif classic_as:
return ('classic_as', importname)
else:
return ('no_import', importname)
but it worked for "fromlike" imports. (Note: I'm not asking "why does this code don't work?", I'm just searching a solution) What code will sure detect all imports? Basically my code takes a slice of the string. If the [0:4] slice equals 'from', the string is a "fromlike import". Else: if the [0:6] slice equals 'import', the string is a "classic import". If it detects 'as', it will find the pseudo-name. This function must return a tuple which contains the import type under index 0 and imported module-name under index 1.
If you want to be sure to handle all Python import forms, have Python do the parsing. Use the ast.parse() function and use the resulting parse tree; you'll either get Import or ImportFrom objects:
| Import(alias* names)
| ImportFrom(identifier? module, alias* names, int? level)
Each alias consists of a name and optional identifier used to import the name as:
-- import name with optional 'as' alias.
alias = (identifier name, identifier? asname)
Note that there can be multiple imports! You either have classic or fromlike imports, and both can import multiple names. Your function needs to return a list of (type, name) tuples. For invalid inputs, raise an exception (ValueError is a good fit here):
import ast
def check_is_import(string):
try:
body = ast.parse(string).body
except SyntaxError:
# not valid Python
raise ValueError('No import found')
if len(body) > 1:
# not a single statement
raise ValueError('Multiple statements found')
if not isinstance(body[0], (ast.Import, ast.ImportFrom)):
raise ValueError('No import found')
type_ = 'classic' if isinstance(body[0], ast.Import) else 'fromlike'
results = []
for alias in body[0].names:
alias_type = type_
if alias.asname:
alias_type += '_as'
results.append((alias_type, alias.asname or alias.name))
return results
The method should probably be renamed to extract_import_names(), as that reflects what it does much better.
Demo:
>>> check_is_import('from foo import bar')
[('fromlike', 'bar')]
>>> check_is_import('import foo')
[('classic', 'foo')]
>>> check_is_import('import foo as baz')
[('classic_as', 'baz')]
>>> check_is_import('from foo import bar, baz as spam, monty as python')
[('fromlike', 'bar'), ('fromlike_as', 'spam'), ('fromlike_as', 'python')]
>>> check_is_import('import foo as baz, baz, spam as ham')
[('classic_as', 'baz'), ('classic', 'baz'), ('classic_as', 'ham')]
>>> check_is_import('invalid python')
Traceback (most recent call last):
File "<stdin>", line 3, in check_is_import
File "/Users/mjpieters/Development/Library/buildout.python/parts/opt/lib/python3.6/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
invalid python
^
SyntaxError: invalid syntax
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in check_is_import
ValueError: No import found
>>> check_is_import('import foo; import bar')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 9, in check_is_import
ValueError: Multiple statements found
>>> check_is_import('1 + 1 == 2')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 11, in check_is_import
ValueError: No import found

return empty result via using Entrez,Efetch to search lineage from taxonomy db

I used biopython to search lineage information from taxonomy database, but it returns empty !
I can used it yesterday(2016/3/15) ! But now I can't used it(2016/03/16)!
The code I used is here,
>>> from Bio import Entrez
>>> Entrez.email = "myemail#gmail.com"
>>> handle = Entrez.esearch(db="Taxonomy", term="Cypripedioideae")
>>> record = Entrez.read(handle)
>>> record["IdList"]
['158330']
>>> record["IdList"][0]
'158330'
>>> handle = Entrez.efetch(db="Taxonomy", id="158330", retmode="xml")
>>> records = Entrez.read(handle)
>>> records[0].keys()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> records
[] #I can't understand why it returns empty today?

passing string to sub.re not working in Python

This is so far what has been my progress with this regex function :
import os, re
dpath="/root/tree/def/"
fmatch = re.compile(r'\s+''[\[]+''[A-Z]+''[\]]+')
pmatch = fmatch.match('[FLAC]')
def replace(pmatch,df):
m = re.sub(fmatch,df)
print (m)
def regex(dpath):
for df in os.listdir(dpath):
replace(pmatch, df)
regex (dpath)
First do a for loop and look for files in (dpath), then pass the directory name string to replace(). But I am getting missing argument 'string' error :
root#debian:~# python regex3.py
Traceback (most recent call last):
File "regex3.py", line 18, in <module>
regex (dpath)
File "regex3.py", line 16, in regex
replace(pmatch, df)
File "regex3.py", line 9, in replace
m = re.sub(fmatch,df)
TypeError: sub() missing 1 required positional argument: 'string'
It seems that you want to replace alls all matches of the RegEx \s+[\[]+[A-Z]+[\]]+ to [FLAC]
Make sure you do the following:
def replace(pmatch,df):
m = fmatch.sub('[FLAC]', df)
print (m)
Using #martin-konecny 's Example,
I got this that worked.
Create Files for Example
# Run this in your Shell/Terminal
touch /tmp/abc.FLAC
touch /tmp/abcd.FLAC
Run Python
import re
import os
dpath = '/tmp/'
fmatch = re.compile(r'.+\.FLAC')
pmatch = fmatch.match('[FLAC]')
def replace(pmatch, df):
m = fmatch.sub('[REDACTED]', df)
print(m)
def regex(dpath):
for df in os.listdir(dpath):
replace(pmatch, df)
regex(dpath)
Result:
# ...
# [REDACTED]
# [REDACTED]
# ...
Great if you want to run a search and keep a selection of your results secret.

Python - regex relation extraction

As a part of schoolwork we have been given this code:
>>> IN = re.compile(r'.*\bin\b(?!\b.+ing)')
>>> for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'):
... for rel in nltk.sem.extract_rels('ORG', 'LOC', doc,
... corpus='ieer', pattern = IN):
... print(nltk.sem.rtuple(rel))
We are asked to try it out with some sentences of our own to see the output, so for this i decided to define a function:
def extract(sentence):
import re
import nltk
IN = re.compile(r'.*\bin\b(?!\b.+ing)')
for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
print(nltk.sem.rtuple(rel))
When I try and run this code:
>>> from extract import extract
>>> extract("The Whitehouse in Washington")
I get the gollowing error:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
extract("The Whitehouse in Washington")
File "C:/Python34/My Scripts\extract.py", line 6, in extract
for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
File "C:\Python34\lib\site-packages\nltk\sem\relextract.py", line 216, in extract_rels
pairs = tree2semi_rel(doc.text) + tree2semi_rel(doc.headline)
AttributeError: 'str' object has no attribute 'text'
Can anyone help me understand where I am going wrong in my function?
The correct output for the test sentence should be:
[ORG: 'Whitehouse'] 'in' [LOC: 'Washington']
If you see the method definition of extract_rels, it expects the parsed document as third argument.
And here you are passing the sentence. To overcome this error, you can do following :
tagged_sentences = [ nltk.pos_tag(token) for token in tokens]
class doc():
pass
IN = re.compile(r'.*\bin\b(?!\b.+ing)')
doc.headline=["test headline for sentence"]
for i,sent in enumerate(tagged_sentences):
doc.text = nltk.ne_chunk(sent)
for rel in nltk.sem.relextract.extract_rels('ORG', 'LOC', doc, corpus='ieer', pattern=IN):
print(nltk.sem.rtuple(rel) )// you can change it according
Try it out..!!!

Working with Berkeley DB( bsddb module ), Python

I'm using python 2.7.3 and and Berkeley DB to store data. I didn't find much information about that module, only in python docks. I saw there some function described, but I didn't see instruction on how to delete a record from database. Help please, if you know how to delete a record and is that possible using bsddb ?
According to the documentation:
Once instantiated, hash, btree and record objects support the same methods as dictionaries.
So, you can use del db_object['key'] to delete specific record like a dictionary.
>>> import bsddb
>>> db = bsddb.hashopen('a.db', 'c')
>>> db['a'] = '1'
>>> db.keys()
['a']
>>> del db['a'] # <-----
>>> db.keys()
[]
db_object.pop('key') also works.
>>> db['b'] = '2'
>>> db.keys()
['b']
>>> db.pop('b')
'2'
del, .pop() with non-existing key will raise KeyError or similar exception. If you want ignore non-existing key, use .pop('key', None):
>>> db.pop('b') # This raises an exception
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/_abcoll.py", line 497, in pop
value = self[key]
File "/usr/lib/python2.7/bsddb/__init__.py", line 270, in __getitem__
return _DeadlockWrap(lambda: self.db[key]) # self.db[key]
File "/usr/lib/python2.7/bsddb/dbutils.py", line 68, in DeadlockWrap
return function(*_args, **_kwargs)
File "/usr/lib/python2.7/bsddb/__init__.py", line 270, in <lambda>
return _DeadlockWrap(lambda: self.db[key]) # self.db[key]
KeyError: 'b'
>>> db.pop('b', None) # This does not.
>>>

Categories

Resources