Mutating a List in Python

Mutating a List in Python - python

I have a list of the form
['A', 'B', 'C', 'D']
which I want to mutate into:
[('Option1','A'), ('Option2','B'), ('Option3','C'), ('Option4','D')]
I can iterate over the original list and mutate successfully, but the closest that I can come to what I want is this:
["('Option1','A')", "('Option2','B')", "('Option3','C')", "('Option4','D')"]
I need the single quotes but don't want the double quotes around each list.
[EDIT] - here is the code that I used to generate the list; although I've tried many variations. Clearly, I've turned 'element' into a string--obviously, I'm not thinking about it the right way here.
array = ['A', 'B', 'C', 'D']
listOption = 0
finalArray = []
for a in array:
listOption += 1
element = "('Option" + str(listOption) + "','" + a + "')"
finalArray.append(element)
Any help would be most appreciated.
[EDIT] - a question was asked (rightly) why I need it this way. The final array will be fed to an application (Indigo home control server) to populate a drop-down list in a config dialog.

[('Option{}'.format(i+1),item) for i,item in enumerate(['A','B','C','D'])]
# EDIT FOR PYTHON 2.5
[('Option%s' % (i+1), item) for i,item in enumerate(['A','B','C','D'])]
This is how I'd do it, but honestly I'd probably try not to do this and instead want to know why I NEEDED to do this. Any time you're making a variable with a number in it (or in this case a tuple with one element of data and one element naming the data BY NUMBER) think instead how you could organize your consuming code to not need that instead.
For instance: when I started coding professionally the company I work for had an issue with files not being purged on time at a few of our locations. Not all the files, mind you, just a few. In order to provide our software developer with the information to resolve the problem, we needed a list of files from which sites the purge process was failing on.
Because I was still wet behind the ears, instead of doing something SANE like making a dictionary with keys of the files and values of the sizes, I used locals() to create new variables WITH MEANING. Don't do this -- your variables should mean nothing to anyone but future coders. Basically I had a whole bunch of variables named "J_ITEM" and "J_INV" and etc with a value 25009 and etc, one for each file, then I grouped them all together with [item for item in locals() if item.startswith("J_")]. THAT'S INSANITY! Don't do this, build a saner data structure instead.
That said, I'm interested in how you put it all together. Do you mind sharing your code by editing your answer? Maybe we can work together on a better solution than this hackjob.

x = ['A','B','C','D']
option = 1
answer = []
for element in x:
t = ('Option'+str(option),element) #Creating the tuple
answer.append(t)
option+=1
print answer
A tuple is different from a string, in that a tuple is an immutable list. You define it by writing:
t = (something, something_else)
You probably defined t to be a string "(something, something_else)" which is indicated by the quotations surrounding the expression.

In addition to adsmith great answer, I would add the map way:
>>> map(lambda (index, item): ('Option{}'.format(index+1),item), enumerate(['a','b','c', 'd']))
[('Option1', 'a'), ('Option2', 'b'), ('Option3', 'c'), ('Option4', 'd')]

Related

Python functions: return method inside a 'for' loop

I have the following code:
def encrypt(plaintext, k):
return "".join([alphabet[(alphabet.index(i)+k)] for i in plaintext.lower()])
I don't understand how python can read this kind of syntax, can someone break down what's the order of executions here?
I came across this kind of "one-line" writing style in python a lot, which always seemed to be so elegant and efficient but I never understood the logic.
Thanks in advance, have a wonderful day.

In Python we call this a list comprehension. There other stackoverflow posts that have covered this topic extensively such as: What does “list comprehension” mean? How does it work and how can I use it? and Explanation of how nested list comprehension works?.
In your example the code is not complete so it is hard to figure what "alphabet" or "plaintext" are. However, let's try to break down what it does on the high level.
"".join([alphabet[(alphabet.index(i)+k)] for i in plaintext.lower()])
Can be broken down as:
"".join( # The join method will stitch all the elements from the container (list) together
[
alphabet[alphabet.index(i) + k] # alphabet seems to be a list, that we index increasingly by k
for i in plaintext.lower()
# we loop through each element in plaintext.lower() (notice the i is used in the alphabet[alphabet.index(i) + k])
]
)
Note that we can re-write the for-comprehension as a for-loop. I have created a similar example that I hope can clarify things better:
alphabet = ['a', 'b', 'c']
some_list = []
for i in "ABC".lower():
some_list.append(alphabet[alphabet.index(i)]) # 1 as a dummy variable
bringing_alphabet_back = "".join(some_list)
print(bringing_alphabet_back) # abc
And last, the return just returns the result. It is similar to returning the entire result of bringing_alphabet_back.

Python - Grab Random Names

Alright, so I have a question. I am working on creating a script that grabs a random name from a list of provided names, and generates them in a list of 5. I know that you can use the command
items = ['names','go','here']
rand_item = items[random.randrange(len(items))]
This, if I am not mistaken, should grab one random item from the list. Though if I am wrong correct me, but my question is how would I get it to generate, say a list of 5 names, going down like below;
random
names
generated
using
code
Also is there a way to make it where if I run this 5 days in a row, it doesn't repeat the names in the same order?
I appreciate any help you can give, or any errors in my existing code.
Edit:
The general use for my script will be to generate task assignments for a group of users every day, 5 days a week. What I am looking for is a way to generate these names in 5 different rotations.
I apologize for any confusion. Though some of the returned answers will be helpful.
Edit2:
Alright so I think I have mostly what I want, thank you Markus Meskanen & mescalinum, I used some of the code from both of you to resolve most of this issue. I appreciate it greatly. Below is the code I am using now.
import random
items = ['items', 'go', 'in', 'this', 'string']
rand_item = random.sample(items, 5)
for item in random.sample(items, 5):
print item

random.choice() is good for selecting on element at random.
However if you want to select multiple elements at random without repetition, you could use random.sample():
for item in random.sample(items, 5):
print item
For the last question, you should trust the (pseudo-) random generator to not give the same sequence on two consecutive days. The random seed is initialized with current time by default, so it's unlikely to observe the same sequence on two consecutive days, altough not impossible, especially if the number of items is small.
If you absolutely need to avoid this, save the last sequence to a file, and load it before shuffling, and keep shuffling until it gives you a different order.

You could use random.choice() to get one item only:
items = ['names','go','here']
rand_item = random.choice(items)
Now just repeat this 5 times (a for loop!)
If you want the names just in a random order, use random.shuffle() to get a different result every time.

It is not clear in your question if you simply want to shuffle the items or make choose a subset. From what I've made sense you want the second case.
You can use random.sample, to get a given number of random items from a list in python. If I wanted to get 3 randomly items from a list of five letters, I would do:
>>> import random
>>> random.sample(['a', 'b', 'c', 'd', 'e'], 3)
['b', 'a', 'e']
Note that the letters are not necessarily returned in the same order - 'b' is returned before 'a', although that wasn't the case in the original list.
Regarding the second part of your question, preventing it from generating
the same letters in the same order, you can append every new generated sublists in a file, retrieving this file during your script execution and generating a new sublist until it is different from every past generated sublist.

random.shuffle(items) will handle the random order generation
In [15]: print items
['names', 'go', 'here']
In [16]: for item in items: print item
names
go
here
In [17]: random.shuffle(items)
In [18]: for item in items: print item
here
names
go
For completeness, I agree with the above poster on random.choice().

Initialising a large dict with unknown keys? Is there a better way than this?

So I have a list of around 75000 tuples that I want to push into a dict. It seems like after around 20,000 entries, the whole program slows down and I assume this is because the dict is being dynamically resized as it is filled.
The key value used for the dict is in a different position in the tuple depending on the data, so I can't just extract the keys from the list of tuples into list x and invoke d.fromkeys(x) to pre-initialise the large dict. I've tried put together a solution, but after the dict is evaluated by ast.literal_eval, all I get is a single {'None': 'None'} :/
My soln (which doesn't work).
d_frame = '{'+('\'None\': \'None\',' * 100000)+'}'
d = ast.literal_eval(d_frame)
Is there a builtin method for something like this..
Thanks,
EDIT: I realise the stupidity of my idea.. Obviously you can't have identical keys in a dictionary.... :/
Just to clarify, I have a list of tuples with data like this:
(assembly,strand,start_pos,end_pos,read_count)
key_format : assembly_strand_val ( where val = start_pos or end_pos depending on other factors )
Because I don't know the key until I evaluate each tuple, I can't initialise the dict with known keys so was just wondering if I can create an empty dict and then add to it.. It doesn't make sense to evluate each tuple just to build a list then create a dict then repeat tuple eval...
EDIT: I realized where the bottleneck was.. With each tuple, I was checking to see if the relevant key already exited in the dict, but I was using;
if key not in dict.keys():
dict[key] = foo
I didn't realise this builds a list of keys everytime and could be replaced with the far more economical
if key not in dict:
dict[key] = foo
Changing this resulted in a staggering increase in speed....

So I have a list of around 75000 tuples that I want to push into a dict.
Just call dict on the list. Like this:
>>> list_o_tuples = [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
>>> d = dict(list_o_tuples)
>>> d
{1: 'a', 2: 'b', 3: 'c', 4: 'd'}
The key value used for the dict is in a different position in the tuple depending on the data
That isn't demonstrated at all by the code you showed us, but if you can write an expression or a function for pulling out the key, you can use it in a dict comprehension, or in a generator expression you can pass to the dict function.
For example, let's say the first element of the tuple is the index of the actual key element. Then you'd write this:
d = {tup[tup[0]]: tup for tup in list_o_tuples}
It seems like after around 20,000 entries, the whole program slows down and I assume this is because the dict is being dynamically resized as it is filled.
That seems unlikely. Yes, the dict is resized, but it's done so exponentially, and it's still blazingly fast at sizes well beyond 20000. Profile your program to see where it's actually slowing down. My guess would either be that you're doing some quadratic work to create or pull out the values, or you're generating huge amounts of storage and causing swapping, neither of which have anything to do with inserting the values into the dict.
At any rate, if you really do want to "pre-fill" the dict, you can always do this:
d = dict.from_keys(range(100000))
for i, tup in enumerate(list_o_tuples):
del d[i]
d[list_o_tuples[0]] = list_o_tuples[1]
Then the dict never has to resize. (Obviously if your keys overlap the ints from 0-99999 you'll want to use different filler keys, but the same idea will work.)
But I'm willing to bet this makes absolutely no difference to your performance.
I've tried put together a solution, but after the dict is evaluated by ast.literal_eval, all I get is a single {'None': 'None'}
That's because you're creating a dict with 100K copies of the same key. You can't have duplicate keys in a dict, so of course you end up with just one item.
However, this is a red herring. Creating a string to eval is almost never the answer. Your big mess of code is effectively just a slower, less memory-efficient, and harder-to-read version of this:
d = {'None': 'None' for _ in range(100000)}
Or, if you prefer:
d = dict([('None', 'None')] * 100000)

difflib with more than two file names

I have several file names that I am trying to compare. Here are some examples:
files = ['FilePrefix10.jpg', 'FilePrefix11.jpg', 'FilePrefix21.jpg', 'FilePrefixOoufhgonstdobgfohj#lwghkoph[]**^.jpg']
What I need to do is extract "FilePrefix" from each file name, which changes depending on the directory. I have several folders containing many jpg's. Within each folder, each jpg has a FilePrefix in common with every other jpg in that directory. I need the variable portion of the jpg file name. I am unable to predict what FilePrefix is going to be ahead of time.
I had the idea to just compare two file names using difflib (in Python) and extract FilePrefix (and subsequently the variable portion) that way. I've run into the following issue:
>>>> comp1 = SequenceMatcher(None, files[0], files[1])
>>>> comp1.get_matching_blocks()
[Match(a=0, b=0, size=11), Match(a=12, b=12, size=4), Match(a=16, b=16, size=0)]
>>>> comp1 = SequenceMatcher(None, files[1], files[2])
>>>> comp1.get_matching_blocks()
[Match(a=0, b=0, size=10), Match(a=11, b=11, size=5), Match(a=16, b=16, size=0)]
As you can see, the first size does not match up. It's confusing the ten's and digit's place, making it hard for me to match a difference between more than two files. Is there a correct way to find a minimum size among all files within the directory? Or alternatively, is there a better way to extract FilePrefix?
Thank you.

It's not that it's "confusing the ten's and digit's place", it's that in the first matchup the ten's place isn't different, so it's considered part of the matching prefix.
For your use case, there seems to be a pretty easy solution to this ambiguity: just match all adjacent pairs, and take the minimum. Like this:
def prefix(x, y):
comp = SequenceMatcher(None, x, y)
matches = comp.get_matching_blocks()
prefix_match = matches[0]
prefix_size = prefix_match[2]
return prefix_size
pairs = zip(files, files[1:])
matches = (prefix(x, y) for x, y in pairs)
prefixlen = min(matches)
prefix = files[0][:prefixlen]
The prefix function is pretty straightforward, except for one thing: I made it take a single tuple of two values instead of two arguments, just to make it easier to call with map. And I used the [2] instead of .size because there's an annoying bug in 2.7 difflib where the second call to get_matching_blocks may return a tuple instead of a namedtuple. This won't affect the code as-is, but if you add some debugging prints it will break.
Now, pairs is a list of all adjacent pairs of names, created by zipping together names and names[1:]. (If this isn't clear, print(zip(names, names[1:]). If you're using Python 3.x, you'll need to print(list(zip(names, names[1:])) instead, because zip returns a lazy iterator instead of a printable list.)
Now we just want to call prefix on each of the pairs, and take the smallest value we get back. That's what min is for. (I'm passing it a generator expression, which can be a tricky concept at first—but if you just think of it as a list comprehension that doesn't build the list, it's pretty simple.)
You could obviously compact this into two or three lines while still leaving it readable:
prefixlen = min(SequenceMatcher(None, x, y).get_matching_blocks()[0][2]
for x, y in zip(files, files[1:]))
prefix = files[0][:prefixlen]
However, it's worth considering that SequenceMatcher is probably overkill here. It's looking for the longest matches anywhere, not just the longest prefix matches, which means it's essentially O(N^3) on the length of the strings, when it only needs to be O(NM) where M is the length of the result. Plus, it's not inconceivable that there could be, say, a suffix that's longer than the longest prefix, so it would return the wrong result.
So, why not just do it manually?
def prefixes(name):
while name:
yield name
name = name[:-1]
def maxprefix(names):
first, names = names[0], names[1:]
for prefix in prefixes(first):
if all(name.startswith(prefix) for name in names):
return prefix
prefixes(first) just gives you 'FilePrefix10.jpg', 'FilePrefix10.jp','FilePrefix10.j, etc. down to'F'`. So we just loop over those, checking whether each one is also a prefix of all of the other names, and return the first one that is.
And you can do this even faster by thinking character by character instead of prefix by prefix:
def maxprefix(names):
for i, letters in enumerate(zip(*names)):
if len(set(letters)) > 1:
return names[0][:i]
Here, we're just checking whether the first character is the same in all names, then whether the second character is the same in all names, and so on. Once we find one where that fails, the prefix is all characters up to that (from any of the names).
The zip reorganizes the list of names into a list of tuples, where the first one is the first character of each name, the second is the second character of each name, and so on. That is, [('F', 'F', 'F', 'F'), ('i', 'i', 'i', 'i'), …].
The enumerate just gives us the index along with the value. So, instead of getting ('F', 'F', 'F', 'F') you get 0, ('F, 'F', F', 'F'). We need that index for the last step.
Now, to check that ('F', 'F', 'F', 'F') are all the same, I just put them in a set. If they're all the same, the set will have just one element—{'F'}, then {'i'}, etc. If they're not, it'll have multiple elements—{'1', '2'}—and that's how we know we've gone past the prefix.

The only way to be certain is to check ALL the filenames. So just iterate through them all, checking against the kept maximum matching string as you go.
You might try something like this:
files = ['FilePrefix10.jpg',
'FilePrefix11.jpg',
'FilePrefix21.jpg',
'FilePrefixOoufhgonstdobgfohj#lwghkoph[]**^.jpg',
'FileProtector354.jpg
]
prefix=files[0]
max = 0
for f in files:
for c in range(0, len(prefix)):
if prefix[:c] != f[:c]:
prefix = f[:c-1]
max = c - 1
print prefix, max
Please pardon the 'un-Pythonicness' of the solution, but I wanted the algorithm to be obvious to any level programmer.

Python sort unique list of lists' items

I can't seem to find a question on SO about my particular problem, so forgive me if this has been asked before!
Anyway, I'm writing a script to loop through a set of URL's and give me a list of unique urls with unique parameters.
The trouble I'm having is actually comparing the parameters to eliminate multiple duplicates. It's a bit hard to explain, so some examples are probably in order:
Say I have a list of URL's like this
hxxp://www.somesite.com/page.php?id=3&title=derp
hxxp://www.somesite.com/page.php?id=4&title=blah
hxxp://www.somesite.com/page.php?id=3&c=32&title=thing
hxxp://www.somesite.com/page.php?b=33&id=3
I have it parsing each URL into a list of lists, so eventually I have a list like this:
sort = [['id', 'title'], ['id', 'c', 'title'], ['b', 'id']]
I nee to figure out a way to give me just 2 lists in my list at that point:
new = [['id', 'c', 'title'], ['b', 'id']]
As of right now I've got a bit to sort it out a little, I know I'm close and I've been slamming my head against this for a couple days now :(. Any ideas?
Thanks in advance! :)
EDIT: Sorry for not being clear! This script is aimed at finding unique entry points for web applications post-spidering. Basically if a URL has 3 unique entry points
['id', 'c', 'title']
I'd prefer that to the same link with 2 unique entry points, such as:
['id', 'title']
So I need my new list of lists to eliminate the one with 2 and prefer the one with 3 ONLY if the smaller variables are in the larger set. If it's still unclear let me know, and thank you for the quick responses! :)

I'll assume that subsets are considered "duplicates" (non-commutatively, of course)...
Start by converting each query into a set and ordering them all from largest to smallest. Then add each query to a new list if it isn't a subset of an already-added query. Since any set is a subset of itself, this logic covers exact duplicates:
a = []
for q in sorted((set(q) for q in sort), key=len, reverse=True):
if not any(q.issubset(Q) for Q in a):
a.append(q)
a = [list(q) for q in a] # Back to lists, if you want

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Mutating a List in Python - python

In addition to adsmith great answer, I would add the map way: >>> map(lambda (index, item): ('Option{}'.format(index+1),item), enumerate(['a','b','c', 'd'])) [('Option1', 'a'), ('Option2', 'b'), ('Option3', 'c'), ('Option4', 'd')]

Related

Python functions: return method inside a 'for' loop

Python - Grab Random Names

Initialising a large dict with unknown keys? Is there a better way than this?

difflib with more than two file names

Python sort unique list of lists' items

Categories

Resources