input-validation & string conversion - python

I have a script, which take input from the user and I want to validate the input first then convert the user-input to a predefined format. The input should be like this:
my_script -c 'formatDate(%d/%m) == 23/5 && userName == John Dee && status == c
At the moment I'm dealing with formatDate() bit only. The main rules are:
The format string for year,month,day can come in any order
if m [or y or d] is missing that will be added using current_month; same for others
the same delimiter must be used for the format strings and the values
the keys and values must be separated by ==
the different constraint must be separated by &&
single or multiple constraints are allowed
So, for the given example, it should return 20110523 as a valid constraint. After a bit of work, this is what I come up with, which is pretty much working:
#!/usr/bin/env python
#
import sys, re
from time import localtime, strftime
theDy = strftime('%d', localtime())
theMn = strftime('%m', localtime())
theYr = strftime('%Y', localtime())
def main(arg):
print "input string: %s" % arg
arg = "".join(arg.split()).lower()
if arg.startswith('=') or re.search('===', arg) or "==" not in arg:
sys.exit("Invalid query string!")
else: my_arg = arg.split('&&')
c_dict = {}
for ix in range(len(my_arg)):
LL = my_arg[ix].split('==')
#LL = dict(zip(LL[:-1:2], LL[1::2]))
# don't add duplicate key
if not c_dict.has_key(LL[0]):
c_dict[LL[0]] = LL[1]
for k,v in sorted(c_dict.items()):
if k.startswith('formatdate') :
ymd = re.sub(r'[(,)]', ' ', k).replace('formatdate','')
ymd = (str(ymd).strip()).split('/')
if len(ymd) <= 3 and len(ymd) == len(v.split('/')):
d_dict = dict(zip(ymd, v.split('/')))
if not d_dict.has_key('%y'):
d_dict['%y'] = theYr
if not d_dict.has_key('%m'):
d_dict['%m'] = theMn
if not d_dict.has_key('%d'):
d_dict['%d'] = theDy
else: sys.exit('date format mismatched!!')
Y = d_dict['%y'];
if d_dict['%m'].isdigit() and int(d_dict['%m']) <=12:
M = d_dict['%m'].zfill(2)
else: sys.exit("\"Month\" is not numeric or out of range.\nExiting...\n")
if d_dict['%d'].isdigit() and int(d_dict['%d']) <=31:
D = d_dict['%d'].zfill(2)
else: sys.exit("\"Day\" is not numeric or out of range.\nExiting...\n")
# next line needed for future use
fmtFile = re.compile('%s%s%s' % (Y,M,D))
print "file_name is: %s" % Y+M+D
if __name__ == "__main__":
main('formatDate(%d/%m)== 23/5')
My questions are:
Am I making it unnecessarily complected or expensive? Is there any easier way?
How to identify which "delimiter" user has used [as opposed to using a fixed one]?
Thanks for your time. Cheers!!

You're not making a complication, but have you thought of the consequences of expanding the grammar of the user requests?
The simplest parser is Shunting yard algorithm. You can adopt it to the user requests and expand the grammar easily. You can find Python implementation here.

Related

Python Trace Tables and ver 2 to ver 3 issue

I found this code for printing a program trace and it works fine in Python2.
However, in Python 3 there are issues. I addressed the first one by replacing execfile(file_name) with exec(open(filename).read()), but now there is still an error of KeyError: 'do_setlocale'
I'm out of my depth here - I just want an easy way to trace variables in programs line by line - I like the way this program works and it would be great to get it working with Python 3. I even tried an online conversion program but got the same KeyError: 'do_setlocale'
Can anyone please help me to get it working?
import sys
if len(sys.argv) < 2:
print __doc__
exit()
else:
file_name = sys.argv[1]
past_locals = {}
variable_list = []
table_content = ""
ignored_variables = set([
'file_name',
'trace',
'sys',
'past_locals',
'variable_list',
'table_content',
'getattr',
'name',
'self',
'object',
'consumed',
'data',
'ignored_variables'])
def trace(frame, event, arg_unused):
global past_locals, variable_list, table_content, ignored_variables
relevant_locals = {}
all_locals = frame.f_locals.copy()
for k,v in all_locals.items():
if not k.startswith("__") and k not in ignored_variables:
relevant_locals[k] = v
if len(relevant_locals) > 0 and past_locals != relevant_locals:
for i in relevant_locals:
if i not in past_locals:
variable_list.append(i)
table_content += str(frame.f_lineno) + " || "
for variable in variable_list:
table_content += str(relevant_locals[variable]) + " | "
table_content = table_content[:-2]
table_content += '\n'
past_locals = relevant_locals
return trace
sys.settrace(trace)
execfile(file_name)
table_header = "L || "
for variable in variable_list:
table_header += variable + ' | '
table_header = table_header[:-2]
print table_header
print table_content
# python traceTable.py problem1.py
# problem1.py
a = 1
b = 2
a = a + b
That program has a couple of major flaws – for example, if the program being traced includes any functions with local variables, it will crash, even in Python 2.
Therefore, since I have nothing better to do, I wrote a program to do something like this called pytrace. It's written for Python 3.6, although it probably wouldn't take too long to make it work on lower versions if you need to.
Its output is a little different to your program's, but not massively so – the only thing that's missing is line numbers, which I imagine you could add in fairly easily (print frame.f_lineno at appropriate points). The rest is purely how the data are presented (your program stores all the output until the end so it can work out table headers, whereas mine prints everything as it goes).

Make biopython Entrez.esearch loop through parameters

I'm trying to adapt a script (found here: https://gist.github.com/bonzanini/5a4c39e4c02502a8451d) to search and retrieve data from PubMed.
Here's what I have so far:
#!/usr/bin/env python
from Bio import Entrez
import datetime
import json
# Create dictionary of journals (the official abbreviations are not used here...)
GroupA=["Nature", "Science", "PNAS","JACS"]
GroupB=["E-life", "Mol Cell","Plos Computational","Nature communication","Cell"]
GroupC=["Nature Biotech", "Nature Chem Bio", "Nature Str Bio", "Nature Methods"]
Journals = {}
for item in GroupA:
Journals.setdefault("A",[]).append(item)
for item in GroupB:
Journals.setdefault("B",[]).append(item)
for item in GroupC:
Journals.setdefault("C",[]).append(item)
# Set dates for search
today = datetime.datetime.today()
numdays = 15
dateList = []
for x in range (0, numdays):
dateList.append(today - datetime.timedelta(days = x))
dateList[1:numdays-1] = []
today = dateList[0].strftime("%Y/%m/%d")
lastdate = dateList[1].strftime("%Y/%m/%d")
print 'Retreiving data from ' '%s to %s' % (lastdate,today)
for value in Journals['A']:
Entrez.email = "email"
handle = Entrez.esearch(db="pubmed",term="gpcr[TI] AND value[TA]",
sort="pubdate",retmax="10",retmode="xml",datetype="pdat",mindate=lastdate,maxdate=today)
record = Entrez.read(handle)
print(record["IdList"])
I would like to use each "value" of the for loop (in this case journal titles) as a parameter for Entrez.search function. There is no built in parameter for this, so it would have to be inside the term parameter, but it doesn't work as shown.
Once I have an ID list, I will then use Entrez.fetch to retrieve and print the data that I want, but that's another question...
I hope this is clear enough, first question for me! Thanks!
If I understand you correctly, I think this is what you are looking for:
term="gpcr[TI] AND {}[TA]".format(value)
using this, the each term will be:
"gpcr[TI] AND Nature[TA]"
"gpcr[TI] AND Science[TA]"
"gpcr[TI] AND PNAS[TA]"
"gpcr[TI] AND JACS[TA]"

python wordpress xmlrpc. How to pass variable to wp.call

its all about wordpress_xmlrpc for python
lets start with code:
wp1 = Client('http://example.com/xmlrpc.php', 'username1', 'pass1')
wp2 = Client('http://example.com/xmlrpc.php', 'username2', 'pass2')
this works fine when I try to add new comment:
komment_id = wp1.call(NewComment(post_id, komment))
komment_id = wp2.call(NewComment(post_id, komment))
however I would like to do it randomly, so
Im looking for solution to change X to number:
komment_id = wpX.call(NewComment(post_id, komment))
I have tried a lot of options and non of them works:
Fail1:
wpnumber = randint(1,10)
test = str('wp')+str(wpnumber)
komment_id = test.call(NewComment(post_id, komment))
Fail2:
komment_id = %d.call(NewComment(post_id, komment)) % (test)
Fail3: (and all its mutation with " ' ( ) , ( ) ) etc.
komment_id = test + .call(NewComment(post_id, komment))
Fail4:
komment_id = wp+wpnumber+.call(NewComment(post_id, komment))
To be honest, I tried 10 different ways with %s with %d, joining variables, spliting everything...
Anyone Could Help?
This is not possible using string concatenation. The test variable which you are using must have exactly the same type as wp1 or wp2, which cannot be generated using string concatenation.
You can do it like this. Store the randint(1,10) in a number and check that number through all the options.
number = randint(1,10)
if (number == 1):
test = wp1
elif (number == 2):
test = wp2
This will ensure both, randomness and that your test variable is of the same type(Client) as wp.

formatting a float to string when formatting options are chosen externally (by user)

I am formatting float values to strings.
The formatting type & accuracy are chosen by the user.
How do I use the chosen formatting parameters during the conversion?
formatType = 'e' or 'f' [enum options for user]
formatAccuracy = 0 to 7 [enum options for user]
formatCode = join(formatAccuracy,formatType)
val = 1.23456789
formattedValue = '%%' %val %formatCode
but obviously this doesn't work and gets confused with the double %%
had a bit more of a play around and came up with an answer before actually posting this question :)
formatCode = formatAccuracy + formatType (eg. '3e')
formatToString = '%.' + formatCode (eg. '%3e')
valString = formatToString % value
merged into one line...
valString = ('%.' + formatAccuracy + formatType) % val

How do I deal with list of lists correctly in python? Specifically with a list of functions

I am trying to grep some results pages for work, and then eventually print them out to an html website so someone does not have to manually look through each section.
How I would eventually use: I feed this function a result page, it greps through the 5 different sections, then I can do a html output (thats what that print substitute area is for) with all the different results.
OK MASSIVE EDIT I actually removed the old code because I was asking too many questions. I fixed my code taking some suggestions, but I am still interested in the advantage of using human-readable dict instead of just list. Here is my working code that gets all the right results into a 'list of lists', I then outputted the first section in my eventual html block
import urllib
import re
import string
import sys
def ipv6_results(input_page):
sections = ['/spec.p2/summary.html', '/nd.p2/summary.html',
'/addr.p2/summary.html', '/pmtu.p2/summary.html',
'/icmp.p2/summary.html']
variables_output=[]
for s in sections:
temp_list = []
page = input_page + s
#print page
url_reference = urllib.urlopen(page)
html_page = url_reference.read()
m = re.search(r'TOTAL</B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
temp_list.append(int(m.group(1)) )
m = re.search(r'PASS</B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
temp_list.append(int(m.group(1)))
m = re.search(r'FAIL</FONT></B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
temp_list.append(int(m.group(1)))
variables_output.append(temp_list)
#print variables to check them :)
print "------"
print variables_output
print "Ready Logo Phase 2"
print "Section | Total | Pass | Fail |"
#this next part is eventually going to output an html block
output = string.Template("""
1 - RFC2460-IPv6 Specs $spec_total $spec_pass $spec_fail
""")
print output.substitute(spec_total=variables_output[0][0], spec_pass=variables_output[0][1],
spec_fail=variables_output[0][2])
return 1
imagine the tabbing is correct :( I wish this was more like paste bin, suggestions welcome on pasting code in here
Generally, you don't declare the shape of the list first, and then fill in the values. Instead, you build the list as you discover the values.
Your variables has a lot of structure. You've got inner lists of 3 elements, always in the order of 'total', 'pass', 'fail'. Perhaps these 3-tuples should be made namedtuples. That way, you can access the three parts with humanly-recogizable names (data.total, data.pass, data.fail), instead of cryptic index numbers (data[0], data[1], data[2]).
Next, your 3-tuples differ by prefixes: 'spec', 'nd', 'addr', etc.
These sound like keys to a dict rather than elements of a list.
So perhaps consider making variables a dict. That way, you can access the particular 3-tuple you want with the humanly-recognizable variables['nd'] instead of variables[1]. And you can access the nd_fail value with variables['nd'].fail instead of variables[1][2]:
import collections
# define the namedtuple class Point (used below).
Point = collections.namedtuple('Point', 'total pass fail')
# Notice we declare `variables` empty at first; we'll fill in the values later.
variables={}
keys=('spec','nd','addr','pmtu','icmp')
for s in sections:
for key in keys:
page = input_page + s
url_reference = urllib.urlopen(page)
html_page = url_reference.read()
m = re.search(r'TOTAL</B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
ntotal = int(m.group(1))
m = re.search(r'PASS</B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
npass = int(m.group(1))
m = re.search(r'FAIL</FONT></B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
nfail = int(m.group(1))
# We create an instance of the namedtuple on the right-hand side
# and store the value in `variables[key]`, thus building the
# variables dict incrementally.
variables[key]=Point(ntotal,npass,nfail)
The first thing is that those lists there will only be the values of the variables, at the time of assignment. You would be changing the list value, but not the variables.
I would seriously consider using classes and build structures of those, including lists of class instances.
For example:
class SectionResult:
def __init__(self, total = 0, pass = 0, fail = 0):
self.total = total
self.pass = pass
self.fail = fail
Since it looks like each group should link up with a section, you can create a list of dictionaries (or perhaps a list of classes?) with the bits associated with a section:
sections = [{'results' : SectionResult(), 'filename': '/addr.p2/summary.html'}, ....]
Then in the loop:
for section in sections:
page = input_page + section['filename']
url_reference = urllib.urlopen(page)
html_page = url_reference.read()
m = re.search(r'TOTAL</B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
section['results'].total = int(m.group(1))
m = re.search(r'PASS</B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
section['results'].pass = int(m.group(1))
m = re.search(r'FAIL</FONT></B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
section['results'].fail = int(m.group(1))
I would use a dictionary inside a list. Maybe something like:
def ipv6_results(input_page):
results = [{file_name:'/spec.p2/summary.html', total:0, pass:0, fail:0},
{file_name:'/nd.p2/summary.html', total:0, pass:0, fail:0},
{file_name:'/addr.p2/summary.html', total:0, pass:0, fail:0},
{file_name:'/pmtu.p2/summary.html', total:0, pass:0, fail:0},
{file_name:'/icmp.p2/summary.html', total:0, pass:0, fail:0}]
for r in results:
url_reference = urllib.urlopen(input_page + r[file_name])
html_page = url_reference.read()
m = re.search(r'TOTAL</B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
r[total] = int(m.group(1))
m = re.search(r'PASS</B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
r[pass] = int(m.group(1))
m = re.search(r'FAIL</FONT></B></TD><TD>:</TD><TD>([0-9,]+)', html_page)
r[fail] = int(m.group(1))
for r in results:
print r[total]
print r[pass]
print r[fail]

Categories

Resources