Cleanest way to declare a tuple of one string - python

Declaring a tuple of one string using the function produces one element per item in the string
(Pdb) tuple('VERSION',)
('V', 'E', 'R', 'S', 'I', 'O', 'N')
Declaring a tuple using commas feels like a side effect and I feel like is easy to miss.
(Pdb) ('VERSION',)
('VERSION',)
Is there a cleaner way to make a declaration like this?
For context I'm using a tuple of tuples and I'm iterating on all of the individual values. Rather than special case the single values I'm just making them a tuple of one item.
Edit: I see I was unclear about this.
I don't personally like the declaration of ('VERSION',) so I tried
(Pdb) tuple('VERSION',)
('V', 'E', 'R', 'S', 'I', 'O', 'N')`
And found that the function declaration to have this behavior.
I was interested to find that you could enclose the tuple declaration with a tuple declaration and that works.
(Pdb) tuple(['VERSION'])
('VERSION',)
(Pdb) tuple(('VERSION',))
('VERSION',)

Well, it's really a good question, how should we distinguish the difference between two parentheses and a tuple with one element, look at the following example :
>>> a = (1)+(2)
>>> a
3
>>> b = (1,)+(2,)
>>> b
(1, 2)
it's the beauty of the syntax of python, that , may look like an extra thing, but it makes difference () operator and tuples. so you should use it when you create a tuple with length one.

If you want to call the tuple by name, you can write tuple(['foo']) or tuple(('foo',)).
The tuple constructor takes an iterable, and, unless it's already a tuple, makes its elements the tuple's elements. There's no way around that.
The idiomatic way to define a 1-tuple is (foo,). I personally find the ,) a sign which is hard to miss, but tastes vary.
For ultimate clarity, just make your own function:
def tu(value):
return (value,)
Now you can tu('foo') and get a 1-tuple.

Related

How to return a value of a key from a dictionary in python

Let me have a dictionary:
P={'S':['dB','N'],'B':['C','CcB','bA']}
How can I get second value o the second key from dictionary P ?
Also, if the value is a string with more than one character like 'bA' (third value of key 'B'), can I somehow return first character of this value ?
Like #jonrsharpe has stated before, dictionaries aren't ordered by design.
What this means, is that everytime you attempt to access a dictionary "by order" you may encounter a different result.
Observe the following (python interactive interpreter):
>>>P={'S':['dB','N'],'B':['C','CcB','bA'], 'L':["qqq"]}
>>>P.keys()
['S', 'B', 'L']
Its easy to see that in this notice, the "order" as we defined is, matches the order that we receive from the Dictionary.keys() function.
However, you may also observe this result:
>>> P={'S':['dB','N'],'B':['C','CcB','bA'], 'L':["qqq"], 'A':[]}
>>> P.keys()
['A', 'S', 'B', 'L']
In this example, the value 'A' should be fourth in our list, but, it is actually the first value.
This is just a small example why you may not treat dictionaries as ordered lists.
Maybe you could go ahead and tell us what your intentions are and an alternative may be suggested.

Please explain "set difference" in python

Trying to learn Python I encountered the following:
>>> set('spam') - set('ham')
set(['p', 's'])
Why is it set(['p', 's']) - i mean: why is 'h' missing?
The - operator on python sets is mapped to the difference method, which is defined as the members of set A which are not members of set B. So in this case, the members of "spam" which are not in "ham"are "s" and "p". Notice that this method is not commutative (that is, a - b == b - a is not always true).
You may be looking for the symmetric_difference or ^ method:
>>> set("spam") ^ set("ham")
{'h', 'p', 's'}
This operator is commutative.
Because that is the definition of a set difference. In plain English, it is equivalent to "what elements are in A that are not also in B?".
Note the reverse behavior makes this more obvious
>>> set('spam') - set('ham')
{'s', 'p'}
>>> set('ham') - set('spam')
{'h'}
To get all unique elements, disregarding the order in which you ask, you can use symmetric_difference
>>> set('spam').symmetric_difference(set('ham'))
{'s', 'h', 'p'}
There are two different operators:
Set difference. This is defined as the elements of A not present in B, and is written as A - B or A.difference(B).
Symmetric set difference. This is defined as the elements of either set not present in the other set, and is written as A ^ B or A.symmetric_difference(B).
Your code is using the former, whereas you seem to be expecting the latter.
The set difference is the set of all characters in the first set that are not in the second set. 'p' and 's' appear in the first set but not in the second, so they are in the set difference. 'h' does not appear in the first set, so it is not in the set difference (regardless of whether or not it is in the first set).
You can also obtain the desired result as:
>>> (set('spam') | set('ham')) - (set('spam') & set('ham'))
set(['p', 's', 'h'])
Create union using | and intersection using & and then do the set difference, i.e. differences between all elements and common elements.

Mutating a List in Python

I have a list of the form
['A', 'B', 'C', 'D']
which I want to mutate into:
[('Option1','A'), ('Option2','B'), ('Option3','C'), ('Option4','D')]
I can iterate over the original list and mutate successfully, but the closest that I can come to what I want is this:
["('Option1','A')", "('Option2','B')", "('Option3','C')", "('Option4','D')"]
I need the single quotes but don't want the double quotes around each list.
[EDIT] - here is the code that I used to generate the list; although I've tried many variations. Clearly, I've turned 'element' into a string--obviously, I'm not thinking about it the right way here.
array = ['A', 'B', 'C', 'D']
listOption = 0
finalArray = []
for a in array:
listOption += 1
element = "('Option" + str(listOption) + "','" + a + "')"
finalArray.append(element)
Any help would be most appreciated.
[EDIT] - a question was asked (rightly) why I need it this way. The final array will be fed to an application (Indigo home control server) to populate a drop-down list in a config dialog.
[('Option{}'.format(i+1),item) for i,item in enumerate(['A','B','C','D'])]
# EDIT FOR PYTHON 2.5
[('Option%s' % (i+1), item) for i,item in enumerate(['A','B','C','D'])]
This is how I'd do it, but honestly I'd probably try not to do this and instead want to know why I NEEDED to do this. Any time you're making a variable with a number in it (or in this case a tuple with one element of data and one element naming the data BY NUMBER) think instead how you could organize your consuming code to not need that instead.
For instance: when I started coding professionally the company I work for had an issue with files not being purged on time at a few of our locations. Not all the files, mind you, just a few. In order to provide our software developer with the information to resolve the problem, we needed a list of files from which sites the purge process was failing on.
Because I was still wet behind the ears, instead of doing something SANE like making a dictionary with keys of the files and values of the sizes, I used locals() to create new variables WITH MEANING. Don't do this -- your variables should mean nothing to anyone but future coders. Basically I had a whole bunch of variables named "J_ITEM" and "J_INV" and etc with a value 25009 and etc, one for each file, then I grouped them all together with [item for item in locals() if item.startswith("J_")]. THAT'S INSANITY! Don't do this, build a saner data structure instead.
That said, I'm interested in how you put it all together. Do you mind sharing your code by editing your answer? Maybe we can work together on a better solution than this hackjob.
x = ['A','B','C','D']
option = 1
answer = []
for element in x:
t = ('Option'+str(option),element) #Creating the tuple
answer.append(t)
option+=1
print answer
A tuple is different from a string, in that a tuple is an immutable list. You define it by writing:
t = (something, something_else)
You probably defined t to be a string "(something, something_else)" which is indicated by the quotations surrounding the expression.
In addition to adsmith great answer, I would add the map way:
>>> map(lambda (index, item): ('Option{}'.format(index+1),item), enumerate(['a','b','c', 'd']))
[('Option1', 'a'), ('Option2', 'b'), ('Option3', 'c'), ('Option4', 'd')]

difflib with more than two file names

I have several file names that I am trying to compare. Here are some examples:
files = ['FilePrefix10.jpg', 'FilePrefix11.jpg', 'FilePrefix21.jpg', 'FilePrefixOoufhgonstdobgfohj#lwghkoph[]**^.jpg']
What I need to do is extract "FilePrefix" from each file name, which changes depending on the directory. I have several folders containing many jpg's. Within each folder, each jpg has a FilePrefix in common with every other jpg in that directory. I need the variable portion of the jpg file name. I am unable to predict what FilePrefix is going to be ahead of time.
I had the idea to just compare two file names using difflib (in Python) and extract FilePrefix (and subsequently the variable portion) that way. I've run into the following issue:
>>>> comp1 = SequenceMatcher(None, files[0], files[1])
>>>> comp1.get_matching_blocks()
[Match(a=0, b=0, size=11), Match(a=12, b=12, size=4), Match(a=16, b=16, size=0)]
>>>> comp1 = SequenceMatcher(None, files[1], files[2])
>>>> comp1.get_matching_blocks()
[Match(a=0, b=0, size=10), Match(a=11, b=11, size=5), Match(a=16, b=16, size=0)]
As you can see, the first size does not match up. It's confusing the ten's and digit's place, making it hard for me to match a difference between more than two files. Is there a correct way to find a minimum size among all files within the directory? Or alternatively, is there a better way to extract FilePrefix?
Thank you.
It's not that it's "confusing the ten's and digit's place", it's that in the first matchup the ten's place isn't different, so it's considered part of the matching prefix.
For your use case, there seems to be a pretty easy solution to this ambiguity: just match all adjacent pairs, and take the minimum. Like this:
def prefix(x, y):
comp = SequenceMatcher(None, x, y)
matches = comp.get_matching_blocks()
prefix_match = matches[0]
prefix_size = prefix_match[2]
return prefix_size
pairs = zip(files, files[1:])
matches = (prefix(x, y) for x, y in pairs)
prefixlen = min(matches)
prefix = files[0][:prefixlen]
The prefix function is pretty straightforward, except for one thing: I made it take a single tuple of two values instead of two arguments, just to make it easier to call with map. And I used the [2] instead of .size because there's an annoying bug in 2.7 difflib where the second call to get_matching_blocks may return a tuple instead of a namedtuple. This won't affect the code as-is, but if you add some debugging prints it will break.
Now, pairs is a list of all adjacent pairs of names, created by zipping together names and names[1:]. (If this isn't clear, print(zip(names, names[1:]). If you're using Python 3.x, you'll need to print(list(zip(names, names[1:])) instead, because zip returns a lazy iterator instead of a printable list.)
Now we just want to call prefix on each of the pairs, and take the smallest value we get back. That's what min is for. (I'm passing it a generator expression, which can be a tricky concept at first—but if you just think of it as a list comprehension that doesn't build the list, it's pretty simple.)
You could obviously compact this into two or three lines while still leaving it readable:
prefixlen = min(SequenceMatcher(None, x, y).get_matching_blocks()[0][2]
for x, y in zip(files, files[1:]))
prefix = files[0][:prefixlen]
However, it's worth considering that SequenceMatcher is probably overkill here. It's looking for the longest matches anywhere, not just the longest prefix matches, which means it's essentially O(N^3) on the length of the strings, when it only needs to be O(NM) where M is the length of the result. Plus, it's not inconceivable that there could be, say, a suffix that's longer than the longest prefix, so it would return the wrong result.
So, why not just do it manually?
def prefixes(name):
while name:
yield name
name = name[:-1]
def maxprefix(names):
first, names = names[0], names[1:]
for prefix in prefixes(first):
if all(name.startswith(prefix) for name in names):
return prefix
prefixes(first) just gives you 'FilePrefix10.jpg', 'FilePrefix10.jp','FilePrefix10.j, etc. down to'F'`. So we just loop over those, checking whether each one is also a prefix of all of the other names, and return the first one that is.
And you can do this even faster by thinking character by character instead of prefix by prefix:
def maxprefix(names):
for i, letters in enumerate(zip(*names)):
if len(set(letters)) > 1:
return names[0][:i]
Here, we're just checking whether the first character is the same in all names, then whether the second character is the same in all names, and so on. Once we find one where that fails, the prefix is all characters up to that (from any of the names).
The zip reorganizes the list of names into a list of tuples, where the first one is the first character of each name, the second is the second character of each name, and so on. That is, [('F', 'F', 'F', 'F'), ('i', 'i', 'i', 'i'), …].
The enumerate just gives us the index along with the value. So, instead of getting ('F', 'F', 'F', 'F') you get 0, ('F, 'F', F', 'F'). We need that index for the last step.
Now, to check that ('F', 'F', 'F', 'F') are all the same, I just put them in a set. If they're all the same, the set will have just one element—{'F'}, then {'i'}, etc. If they're not, it'll have multiple elements—{'1', '2'}—and that's how we know we've gone past the prefix.
The only way to be certain is to check ALL the filenames. So just iterate through them all, checking against the kept maximum matching string as you go.
You might try something like this:
files = ['FilePrefix10.jpg',
'FilePrefix11.jpg',
'FilePrefix21.jpg',
'FilePrefixOoufhgonstdobgfohj#lwghkoph[]**^.jpg',
'FileProtector354.jpg
]
prefix=files[0]
max = 0
for f in files:
for c in range(0, len(prefix)):
if prefix[:c] != f[:c]:
prefix = f[:c-1]
max = c - 1
print prefix, max
Please pardon the 'un-Pythonicness' of the solution, but I wanted the algorithm to be obvious to any level programmer.

Expanding a tree-like data structure

I am attempting to use Python to alter some text strings using the re module (i.e., re.sub). However, I think that my question is applicable to other languages that have regex implementations.
I have a number of strings that represent tree-like data structures. They look something like this:
(A,B)-C-D
A-B-(C,D)
A-(B,C,D-(E,F,G,H,I))
Each letter represents a branch or edge. Letters in parentheses represent branches coming into or out of another branch.
Everywhere that there is a 'plain' tuple of values (a tuple with only comma separated single letter), I would like to take the prefix (X-) or suffix (-X) of that tuple and apply it to each of the values in the tuple.
Under this transformation, the above strings would become
(A-C,B-C)-D
A-(B-C,B-D)
A-(B,C,(D-E,D-F,D-G,D-H,D-I))
Applying the methodology repeatedly would ultimately yield
(A-C-D,B-C-D)
(A-B-C,A-B-D)
(A-B,A-C,A-D-E,A-D-F,A-D-G,A-D-H,A-D-I)
The strings in these tuples then represent the paths through the tree starting at a root and ending at a leaf.
Any help accomplishing this task using regular expressions (or other approaches) would be greatly appreciated.
You could not do this with regular expressions, because you have to deal with nested structures. Instead you could use pyparsing's nestedExpr
The problem you are describing is one of enumerating paths within a graph.
You describe three graphs
A B
\ /
C
|
D
A
|
B
/ \
C D
A
/ | \
B C D
// | \\
E F G H I
and for each you want to enumerate paths. This involves distributing a value across an arbitrarily nested structure. If this could be done with regexes, and I am not certain that it can, it would have to be done, I believe, in several passes.
My sense of your problem though, is that it is best solved by parsing your string into a graph structure and then enumerating the paths. If you do not want to physically build the graph, you can probably generate strings within user-supplied actions to a parser generator.
A regex-based solution would have to know how to handle both
(A,B)-C
and
(A,B,C,D,E,F,G,H)-I
You can match these strings with
\([A-Z](,[A-Z])*\)-[A-Z]
but how would you "distribute" over all submatches without some logic? Since you need this logic anyway, you might as well perform it on a real graph structure. You can also do this on a string itself, but it would be better to do this under the auspices of a parser generator which can handle context-free or context-sensitive structures.
After posting my comment referring to pyparsing's invRegex example, I looked a little closer at your input, and it looked like you could interpret this as an infix notation, with ',' and '-' as binary operators. Pyparsing has a helper method awkwardly named operatorPrecedence that parses expressions according to a precedence of operators, with grouping in parentheses. (This has a little more smarts to it than just using the nestedExpr helper method, which matches expressions nested within grouping symbols.) So here is a getting-started version of a parser using operatorPrecedence:
data = """\
(A,B)-C-D
A-B-(C,D)
A-(B,C,D-(E,F,G,H,I))""".splitlines()
from pyparsing import alphas, oneOf, operatorPrecedence, opAssoc
node = oneOf(list(alphas))
graphExpr = operatorPrecedence(node,
[
('-', 2, opAssoc.LEFT),
(',', 2, opAssoc.LEFT),
])
for d in data:
print graphExpr.parseString(d).asList()
Pyparsing actually returns a complex structure of type ParseResults which supports access to the parsed tokens as elements in a list, items in a dict, or attributes in an object. By calling asList, we just get the elements in simple list form.
The output of the above shows that we look to be on the right track:
[[['A', ',', 'B'], '-', 'C', '-', 'D']]
[['A', '-', 'B', '-', ['C', ',', 'D']]]
[['A', '-', ['B', ',', 'C', ',', ['D', '-', ['E', ',', 'F', ',', 'G', ',', 'H', ',', 'I']]]]]
Pyparsing also allows you to attach callbacks or parse actions to individual expressions, to be called at parse time. For instance, this parse action does parse-time conversion to integer:
def toInt(tokens):
return int(tokens[0])
integer = Word(nums).setParseAction(toInt)
When the value is returned in the ParseResults, it has already been converted to an integer.
Classes can also be specified as parse actions, and the ParseResults object is passed to the class's __init__ method and the resulting object returned. We can specify parse actions within operatorPrecedence by adding the parse action as a 4th element in each operator's descriptor tuple.
Here is base class for binary operators:
class BinOp(object):
def __init__(self, tokens):
self.tokens = tokens
def __str__(self):
return self.__class__.__name__ + str(self.tokens[0][::2])
__repr__ = __str__
From this base class, we can derive 2 subclasses, one for each operator - and ,:
class Path(BinOp):
pass
class Branch(BinOp):
pass
And add them to the operator definition tuples in operatorPrecedence:
node = oneOf(list(alphas))
graphExpr = operatorPrecedence(node,
[
('-', 2, opAssoc.LEFT, Path),
(',', 2, opAssoc.LEFT, Branch),
])
for d in data:
print graphExpr.parseString(d).asList()
This gives us a nested structure of objects for each input string:
[Path[Branch['A', 'B'], 'C', 'D']]
[Path['A', 'B', Branch['C', 'D']]]
[Path['A', Branch['B', 'C', Path['D', Branch['E', 'F', 'G', 'H', 'I']]]]]
The generation of paths from this structure is left as an exercise for the OP. (The pyparsing regex inverter does this using a tangle of generators - hopefully some simple recursion will be sufficient.)

Categories

Resources