I found myself writing some tricky algorithmic code, and I tried to comment it as well as I could since I really do not know who is going to maintain this part of the code.
Following this idea, I've wrote quite a lot of block and inline comments, also trying not to over-comment it. But still, when I go back to the code I wrote a week ago, I find it difficult to read because of the swarming presence of the comments, especially the inline ones.
I though that indenting them (to ~120char) could easy the readability, but would obviously make the line way too long according to style standards.
Here's an example of the original code:
fooDataTableOccurrence = nestedData.find("table=\"public\".")
if 0 > fooDataTableOccurrence: # selects only tables without tag value "public-"
otherDataTableOccurrence = nestedData.find("table=")
dbNamePos = nestedData.find("dbname=") + 7 # 7 is the length of "dbname="
if -1 < otherDataTableOccurrence: # selects only tables with tag value "table="
# database resource case
resourceName = self.findDB(nestedData, dbNamePos, otherDataTableOccurrence, destinationPartitionName)
if resourceName: #if the resource is in a wrong path
if resourceName in ["foo", "bar", "thing", "stuff"]:
return True, False, False # respectively isProjectAlreadyExported, isThereUnexpectedData and wrongPathResources
wrongPathResources.append("Database table: " + resourceName)
And here's how indenting inline comments would look like:
fooDataTableOccurrence = nestedData.find("table=\"public\".")
if 0 > seacomDataTableOccurrence: # selects only tables without tag value "public-"
otherDataTableOccurrence = nestedData.find("table=")
dbNamePos = nestedData.find("dbname=") + 7 # 7 is the length of "dbname="
if -1 < otherDataTableOccurrence: #selects only tables with tag value "table="
# database resource case
resourceName = self.findDB(nestedData, dbNamePos, otherDataTableOccurrence, destinationPartitionName)
if resourceName: # if the resource is in a wrong path
if resourceName in ["foo", "bar", "thing", "stuff"]:
return True, False, False # respectively isProjectAlreadyExported, isThereUnexpectedData and wrongPathResources
wrongPathResources.append("Database table: " + resourceName)
The code is in Python (my company legacy code is not drastically following the PEP8 standard so we had to stick with that), but my point is not about the cleanness of the code itself, but on the comments. I am looking for a trade-off between readability and easy understanding of the code, and sometimes I find difficult achieving both at the same time
Which of the examples is better? If none, what would be?
Maybe this is an XY_Problem?
Could the comments be eliminated altogether?
Here is a (quick & dirty) attempt at refactoring the code posted:
dataTableOccurrence_has_tag_public = nestedData.find("table=\"public\".") > 0
if dataTableOccurrence_has_tag_public:
datataTableOccurrence_has_tag_table = nestedData.find("table=") > 0
prefix = "dbname="
dbNamePos = nestedData.find(prefix) + len(prefix)
if datataTableOccurrence_has_tag_table:
# database resource case
resourceName = self.findDB(nestedData,
dbNamePos,
otherDataTableOccurrence,
destinationPartitionName)
resource_name_in_wrong_path = len(resourceName) > 0
if resourceNameInWrongPath:
if resourceName in ["foo", "bar", "thing", "stuff"]:
project_already_exported = True
unexpected_data = False
return project_already_exported,
unexpected_data,
resource_name_in_wrong_path
wrongPathResources.append("Database table: " + resourceName)
Further work could involve extracting functions out of the block of code.
Related
I'm trying to port to Python a SAP table download script, that already works on Excel VBA, but I want a command line version and I would prefer to avoid VBScript for a number of reasons that go beyond the goal of this post.
I'm stuck at the moment in which I need to fill the values in a table
from win32com.client import Dispatch
Functions = Dispatch("SAP.Functions")
Functions.Connection.Client = "400"
Functions.Connection.ApplicationServer = "myserver"
Functions.Connection.Language = "EN"
Functions.Connection.User = "myuser"
Functions.Connection.Password = "mypwd"
Functions.Connection.SystemNumber = "00"
Functions.Connection.UseSAPLogonIni = False
if (Functions.Connection.Logon (0,True) == True):
print("Logon OK")
RFC = Functions.Add("RFC_READ_TABLE")
RFC.exports("QUERY_TABLE").Value = "USR02"
RFC.exports("DELIMITER").Value = "~"
#RFC.exports("ROWSKIPS").Value = 2000
#RFC.exports("ROWCOUNT").Value = 10
tblOptions = RFC.Tables("OPTIONS")
#RETURNED DATA
tblData = RFC.Tables("DATA")
tblFields = RFC.Tables("FIELDS")
tblFields.AppendRow ()
print(tblFields.RowCount)
print(tblFields(1,"FIELDNAME"))
# the 2 lines above print 1 and an empty string, so the row in the table exists
Until here it is basically copied from VBA adapting the syntax.
In VBA at this point I'm able to do
tblFields(1,"FIELDNAME") = "BNAME"
if I do the same I get an error because the left part is a function and written that way it returns a string. In VBA it is probably a bi-dimensional array.
I unsuccessfully tried various approaches like
tblFields.setValue([{"FIELDNAME":"BNAME"}])
tblFields(1,"FIELDNAME").Value = "BNAME"
tblFields(1,"FIELDNAME").setValue("BNAME")
tblFields.FieldName = "BNAME" ##kinda desperate
The script works, without setting the FIELDS table, for outputs that produce rows shorter than 500 chars. This is a SAP limit in the function.
I know that this is not the best way, but I can't use the SAPNWRFC library and I can't use librfc32.dll.
I must be able to solve this way, or revert to the VB version.
Thanks to anyone who will provide a hint
After a lot of trial and error, i found a solution.
Instead of adding row by row to the "OPTIONS" or "FIELDS" tables, you can just submit a prefilled table.
This should work:
tblFields.Data = (('VBELN', '000000', '000000', '', ''),
('POSNR', '000000', '000000', '', ''))
same here:
tblOptions.Data = (("VBELN EQ '2557788'",),)
I have a Python script that runs many ElasticSearch aggregations, e.g.:
client = Elasticsearch(...)
q = {"aggs": {"my_name":{"terms": "field", "fieldname"}}}
res = client.search(index = "indexname*", doc_type = "doc_type", body = q)
But this returns the search query (match everything I think) res["hits"] and the aggregation results res["aggregations"].
What I want to run is the Python equivalent of the following
GET /index*/doc_type/_search?search_type=count
{"aggs": {"my_name":{"terms": "field", "fieldname"}}}
How do I make sure to include the ?search_type=count when using Python Elasticsearch?
I'd like to know this in general, but the current reason I'm looking into this is I occasionally get errors caused by timeouts or data size when running the queries. My suspicion is that if I can only ask for the counting then I'll avoid these.
The general consensus is to not use search_type=count anymore as it's been deprecated in 2.0. Instead you should simply use size: 0.
res = client.search(index = "indexname*", doc_type = "doc_type", body = q, size=0)
^
|
add this
Here is the documentation for search
Try this
res = client.search(index = "indexname*", doc_type = "doc_type", body = q, search_type='count')
Look at the answer of #Val if you are using ES 2.x
I need to parse a file that contains conditional statements, sometimes nested inside one another.
I have a file that stores configuration data but the configuration data is slightly different depending on user defined options. I can deal with the conditional statements, they're all just booleans with no operations but I don't know how to recursively evaluate the nested conditionals. For instance, a piece of the file might look like:
...
#if CELSIUS
#if FROM_KELVIN ; this is a comment about converting kelvin to celsius.
temp_conversion = 1, 273
#else
temp_conversion = 0.556, -32
#endif
#else
#if FROM_KELVIN
temp_conversion = 1.8, -255.3
#else
temp_conversion = 1.8, 17.778
#endif
#endif
...
... Also, some conditionals don't have an #else statement, just #if CONDITION statement(s) #endif.
I realize that this could be easy if the file were just written in XML or something else with a nice parser to begin with, but this is what I have to work with so I'm wondering if there's any relatively simple way to parse this file. It's similar to parenthesis matching so I imagine there would be some module for it but I haven't found anything.
I'm working in python but I can switch for this function if it's easier to solve this in another language.
Here's a simple recursive parser for this syntax:
def parse(lines):
result = []
while lines:
if lines[0].startswith('#if'):
block = [lines.pop(0).split()[1], parse(lines)]
if lines[0].startswith('#else'):
lines.pop(0)
block.append(parse(lines))
lines.pop(0) #endif
result.append(block)
elif not lines[0].startswith(('#else', '#endif')):
result.append(lines.pop(0))
else:
break
return result
tree = parse([x.strip() for x in your_code.splitlines() if x.strip()])
From your example it creates the following tree structure:
[['CELSIUS',
[['FROM_KELVIN',
['temp_conversion = 1, 273'],
['temp_conversion = 0.556, -32']]],
[['FROM_KELVIN',
['temp_conversion = 1.8, -255.3'],
['temp_conversion = 1.8, 17.778']]]]]
which should be easy to evaluate.
For more advanced parsing consider one of many parsing tools available for Python.
Since all of the conditions are binary and I know the values of all of them in advance (no need to evaluate them in order in order like a programming language), i was able to do it with a regular expression. This works better for me. It finds the lowest level conditionals (ones with no nested conditions), evaluates them and replaces them with the correct contents. Then repeats for the higher level conditionals and so on.
import re
conditions = ['CELSIUS', 'FROM_KELVIN']
def eval_conditional(matchobj):
statement = matchobj.groups()[1].split('#else')
statement.append('') # in case there was no else statement
if matchobj.groups()[0] in conditions: return statement[0]
else: return statement[1]
def parse(text):
pattern = r'#if\s*(\S*)\s*((?:.(?!#if|#endif))*.)#endif'
regex = re.compile(pattern, re.DOTALL)
while True:
if not regex.search(text): break
text = regex.sub(eval_conditional, text)
return text
if __name__ == '__main__':
i = open('input.txt', 'r').readlines()
g = ''.join([x.split(';')[0] for x in i if x.strip()])
o = parse(g)
open('output.txt', 'w').write(o)
Given the input in the original post, it outputs:
...
temp_conversion = 1, 273
...
which is what I need. Thanks to everyone for their responses, I really appreciate the help!
I'm not sure what to call what I'm looking for; so if I failed to find this question else where, I apologize. In short, I am writing python code that will interface directly with the Linux kernel. Its easy to get the required values from include header files and write them in to my source:
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
Its easy to use these values when constructing structs to send to the kernel. However, they are of almost no help to resolve the values in the responses from the kernel.
If I put the values in to dict I would have to scan all the values in the dict to look up keys for each item in each struct from the kernel I presume. There must be a simpler, more efficient way.
How would you do it? (feel free to retitle the question if its way off)
If you want to use two dicts, you can try this to create the inverted dict:
b = {v: k for k, v in a.iteritems()}
Your solution leaves a lot of work do the repeated person creating the file. That is a source for error (you actually have to write each name three times). If you have a file where you need to update those from time to time (like, when new kernel releases come out), you are destined to include an error sooner or later. Actually, that was just a long way of saying, your solution violates DRY.
I would change your solution to something like this:
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
__IFA_MAX = 8
values = {globals()[x]:x for x in dir() if x.startswith('IFA_') or x.startswith('__IFA_')}
This was the values dict is generated automatically. You might want to (or have to) change the condition in the if statement there, according to whatever else is in that file. Maybe something like the following. That version would take away the need to list prefixes in the if statement, but it would fail if you had other stuff in the file.
values = {globals()[x]:x for x in dir() if not x.endswith('__')}
You could of course do something more sophisticated there, e.g. check for accidentally repeated values.
What I ended up doing is leaving the constant values in the module and creating a dict. The module is ip_addr.py (the values are from linux/if_addr.h) so when constructing structs to send to the kernel I can use if_addr.IFA_LABEL and resolves responses with if_addr.values[2]. I'm hoping this is the most straight forward so when I have to look at this again in a year+ its easy to understand :p
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
__IFA_MAX = 8
values = {
IFA_UNSPEC : 'IFA_UNSPEC',
IFA_ADDRESS : 'IFA_ADDRESS',
IFA_LOCAL : 'IFA_LOCAL',
IFA_LABEL : 'IFA_LABEL',
IFA_BROADCAST : 'IFA_BROADCAST',
IFA_ANYCAST : 'IFA_ANYCAST',
IFA_CACHEINFO : 'IFA_CACHEINFO',
IFA_MULTICAST : 'IFA_MULTICAST',
__IFA_MAX : '__IFA_MAX'
}
I'm rewriting some code from Ruby to Python. The code is for a Perceptron, listed in section 8.2.6 of Clever Algorithms: Nature-Inspired Programming Recipes. I've never used Ruby before and I don't understand this part:
def test_weights(weights, domain, num_inputs)
correct = 0
domain.each do |pattern|
input_vector = Array.new(num_inputs) {|k| pattern[k].to_f}
output = get_output(weights, input_vector)
correct += 1 if output.round == pattern.last
end
return correct
end
Some explanation: num_inputs is an integer (2 in my case), and domain is a list of arrays: [[1,0,1], [0,0,0], etc.]
I don't understand this line:
input_vector = Array.new(num_inputs) {|k| pattern[k].to_f}
It creates an array with 2 values, every values |k| stores pattern[k].to_f, but what is pattern[k].to_f?
Try this:
input_vector = [float(pattern[i]) for i in range(num_inputs)]
pattern[k].to_f
converts pattern[k] to a float.
I'm not a Ruby expert, but I think it would be something like this in Python:
def test_weights(weights, domain, num_inputs):
correct = 0
for pattern in domain:
output = get_output(weights, pattern[:num_inputs])
if round(output) == pattern[-1]:
correct += 1
return correct
There is plenty of scope for optimising this: if num_inputs is always one less then the length of the lists in domain then you may not need that parameter at all.
Be careful about doing line by line translations from one language to another: that tends not to give good results no matter what languages are involved.
Edit: since you said you don't think you need to convert to float you can just slice the required number of elements from the domain value. I've updated my code accordingly.