How to separate uppercase and lowercase letters in a string? - python

I have written code that separates the characters at 'even' and 'odd' indices, and I would like to modify it so that it separates characters by upper/lower case.
I can't figure out how to do this for a string such as "AbBZxYp". I have tried using .lower and .upper but I think I'm using them incorrectly.
def upperLower(string):
odds=""
evens=""
for index in range(len(string)):
if index % 2 == 0:
evens = evens + string[index]
if not (index % 2 == 0):
odds = odds + string[index]
print "Odds: ", odds
print "Evens: ", evens

Are you looking to get two strings, one with all the uppercase letters and another with all the lowercase letters? Below is a function that will return two strings, the upper then the lowercase:
def split_upper_lower(input):
upper = ''.join([x for x in input if x.isupper()])
lower = ''.join([x for x in input if x.islower()])
return upper, lower
You can then call it with the following:
upper, lower = split_upper_lower('AbBZxYp')
which gives you two variables, upper and lower. Use them as necessary.

>>> filter(str.isupper, "AbBZxYp")
'ABZY'
>>> filter(str.islower, "AbBZxYp")
'bxp'
Btw, for odd/even index you could just do this:
>>> "AbBZxYp"[::2]
'ABxp'
>>> "AbBZxYp"[1::2]
'bZY'

There is an itertools recipe called partition that can do this. Here is the implementation:
From itertools recipes:
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
Upper and Lowercase Letters
You can manually implement the latter recipe, or install a library that implements it for you, e.g. pip install more_itertools:
import more_itertools as mit
iterable = "AbBZxYp"
pred = lambda x: x.islower()
children = mit.partition(pred, iterable)
[list(c) for c in children]
# [['A', 'B', 'Z', 'Y'], ['b', 'x', 'p']]
Here partition uses a predicate function to determine if each item in an iterable is lowercase. If not, it is filtered into the false group. Otherwise, it is filtered into the group of true items. We iterate to expose these groups.
Even and Odd Indices
You can modify this to work for odd and even indices as well:
import itertools as it
import more_itertools as mit
iterable = "AbBZxYp"
pred = lambda x: x[0] % 2 != 0
children = mit.partition(pred, tuple(zip(it.count(), iterable)))
[[i[1] for i in list(c)] for c in children]
# [['A', 'B', 'x', 'p'], ['b', 'Z', 'Y']]
Here we zip an itertools.count() object to enumerate the iterable. Then we iterate the children so that the sub items yield the letters only.
See also more_itertools docs for more tools.

Related

How do I compare two letters or values in two different lists?

I am making a program that has two lists (in Python), and each list contains 5 different letters. How do I make it so that any index number I choose for both lists gets compared and uppercase a letter if the condition is true? If the first two values in the list are the same (in my case a lowercase letter), then I want the letter in the second list to become uppercase.
example/attempt (I don't know what I'm doing):
if list1[0] = list2[0]:
upper(list2[0])
Without an example of you input and output, it's difficult to understand what your goal is, but if your goal is to use .upper() on any string in list2 where list1[i] and list2[i] are equal, you can use a combination of zip and enumerate to compare, and then assign the value of list2[i] to the uppercase string like so:
list1 = ['a', 'b', 'c']
list2 = ['a', 'p', 'q']
for i, (x, y) in enumerate(zip(list1, list2)):
if x == y:
list2[i] = y.upper()
print(list2)
Output:
['A', 'p', 'q']
I think you could use something like this:
def compare_and_upper(lst1, lst2):
for i in range(len(lst1)):
if lst1[i].upper() == lst2[i].upper():
return lst1[i].upper()
return None
This is not a full solution of your problem, more of a representation of how to do the comparisons, which you can then reuse / modify to do the solution you want in the end.
import string
from random import choices
def create_random_string(str_len=10):
# k = the number of letters that we want it to return.
return "".join(choices(string.ascii_lowercase, k=str_len))
def compare(str_len=10):
# Create the two strings that we want to compare
first_string = create_random_string(str_len)
second_string = create_random_string(str_len)
# comp_string will hold the final string that we want to return.
comp_string = ""
# Because the length of the strings are based on the variable str_len,
# we can use the range of that number to iterate over our comparisions.
for i in range(str_len):
# Compares the i'th position of the strings
# If they match, then add the uppercase version to comp_string
if first_string[i] == second_string[i]:
comp_string += first_string[i].upper()
else:
comp_string += "-"
return comp_string
for _ in range(10):
print(compare(20))
Sample output:
--------------------
---AS---D---------D-
----W--Q--------E---
--------------------
-----------------E--
------T-------------
--------------------
-------------P------
-----S--------------
--B-----------------

How to use enumerate in a list comprehension with two lists?

I just started to use list comprehension and I'm struggling with it. In this case, I need to get the n number of each list (sequence_0 and sequence_1) that the iteration is at each time. How can I do that?
The idea is to get the longest sequence of equal nucleotides (a motif) between the two sequences. Once a pair is finded, the program should continue in the nexts nucleotides of the sequences, checking if they are also equal and then elonganting the motif with it. The final output should be an list of all the motifs finded.
The problem is, to continue in the next nucleotides once a pair is finded, i need the position of the pair in both sequences to the program continue. The index function does not work in this case, and that's why i need the enumerate.
Also, I don't understand exactly the reason for the x and y between (), it would be good to understand that too :)
just to explain, the content of the lists is DNA sequences, so its basically something like:
sequence_1 = ['A', 'T', 'C', 'A', 'C']
def find_shared_motif(arq):
data = fastaread(arq)
seqs = [list(sequence) for sequence in data.values()]
motifs = [[]]
i = 0
sequence_0, sequence_1 = seqs[0], seqs[1] # just to simplify
for x, y in [(x, y) for x in zip(sequence_0[::], sequence_0[1::]) for y in zip(sequence_1[::], sequence_1[1::])]:
print(f'Pairs {"".join(x)} and {"".join(y)} being analyzed...')
if x == y:
print(f'Pairs {"".join(x)} and {"".join(y)} match!')
motifs[i].append(x[0]), motifs[i].append(x[1])
k = sequence_0.index(x[0]) + 2 # NAO ESTA DEVOLVENDO O NUMERO CERTO
u = sequence_1.index(y[0]) + 2
print(k, u)
# Determines if the rest of the sequence is compatible
print(f'Starting to elongate the motif {x}...')
for j, m in enumerate(sequence_1[u::]):
try:
# Checks if the nucleotide is equal for both of the sequences
print(f'Analyzing the pair {sequence_0[k + j]}, {m}')
if m == sequence_0[k + j]:
motifs[i].append(m)
print(f'The pair {sequence_0[k + j]}, {m} is equal!')
# Stop in the first nonequal residue
else:
print(f'The pair {sequence_0[k + j]}, {m} is not equal.')
break
except IndexError:
print('IndexError, end of the string')
else:
i += 1
motifs.append([])
return motifs
...
One way to go with it is to start zipping both lists:
a = ['A', 'T', 'C', 'A', 'C']
b = ['A', 'T', 'C', 'C', 'T']
c = list(zip(a,b))
In that case, c will have the list of tuples below
c = [('A','A'), ('T','T'), ('C','C'), ('A','C'), ('C','T')]
Then, you can go with list comprehension and enumerate:
d = [(i, t) for i, t in enumerate(c)]
This will bring something like this to you:
d = [(0, ('A','A')), (1, ('T','T')), (2, ('C','C')), ...]
Of course you can go for a one-liner, if you want:
d = [(i, t) for i, t in enumerate(zip(a,b))]
>>> [(0, ('A','A')), (1, ('T','T')), (2, ('C','C')), ...]
Now, you have to deal with the nested tuples. Focus on the internal ones. It is obvious that what you want is to compare the first element of the tuples with the second ones. But, also, you will need the position where the difference resides (that lies outside). So, let's build a function for it. Inside the function, i will capture the positions, and t will capture the inner tuples:
def compare(a, b):
d = [(i, t) for i, t in enumerate(zip(a,b))]
for i, t in d:
if t[0] != t[1]:
return i
return -1
In that way, if you get -1 at the end, it means that all elements in both lists are equal, side by side. Otherwise, you will get the position of the first difference between them.
It is important to notice that, in the case of two lists with different sizes, the zip function will bring a list of tuples with the size matching the smaller of the lists. The extra elements of the other list will be ignored.
Ex.
list(zip([1,2], [3,4,5]))
>>> [(1,3), (2,4)]
You can use the function compare with your code to get the positions where the lists differ, and use that to build your motifs.

Adding certain lengthy elements to a list

I'm doing a project for my school and for now I have the following code:
def conjunto_palavras_para_cadeia1(conjunto):
acc = []
conjunto = sorted(conjunto, key=lambda x: (len(x), x))
def by_size(words, size):
result = []
for word in words:
if len(word) == size:
result.append(word)
return result
for i in range(0, len(conjunto)):
if i > 0:
acc.append(("{} ->".format(i)))
acc.append(by_size(conjunto, i))
acc = ('[%s]' % ', '.join(map(str, acc)))
print( acc.replace(",", "") and acc.replace("'", "") )
conjunto_palavras_para_cadeia1(c)
I have this list: c = ['A', 'E', 'LA', 'ELA'] and what I want is to return a string where the words go from the smallest one to the biggest on in terms of length, and in between they are organized alphabetically. I'm not being able to do that...
OUTPUT: [;1 ->, [A, E], ;2 ->, [LA], ;3 ->, [ELA]]
WANTED OUTPUT: ’[1->[A, E];2->[LA];3->[ELA]]’
Taking a look at your program, the only issue appears to be when you are formatting your output for display. Note that you can use str.format to insert lists into strings, something like this:
'{}->{}'.format(i, sublist)
Here's my crack at your problem, using sorted + itertools.groupby.
from itertools import groupby
r = []
for i, g in groupby(sorted(c, key=len), key=len):
r.append('{}->{}'.format(i, sorted(g)).replace("'", ''))
print('[{}]'.format(';'.join(r)))
[1->[A, E];2->[LA];3->[ELA]]
A breakdown of the algorithm stepwise is as follows -
sort elements by length
group consecutive elements by length
for each group, sort sub-lists alphabetically, and then format them as strings
at the end, join each group string and surround with square brackets []
Shortest solution (with using of pure python):
c = ['A', 'E', 'LA', 'ELA']
result = {}
for item in c:
result[len(item)] = [item] if len(item) not in result else result[len(item)] + [item]
str_result = ', '.join(['{0} -> {1}'.format(res, sorted(result[res])) for res in result])
I will explain:
We are getting items one by one in loop. And we adding them to dictionary by generating lists with index of word length.
We have in result:
{1: ['A', 'E'], 2: ['LA'], 3: ['ELA']}
And in str_result:
1 -> ['A', 'E'], 2 -> ['LA'], 3 -> ['ELA']
Should you have questions - ask

Sorting string values according to a custom alphabet in Python

I am looking for an efficient way to sort a list of strings according a custom alphabet.
For example, I have a string alphabet which is "bafmxpzv" and a list of strings composed from only the characters contained in that alphabet.
I would like a way to sort that list similarly to other common sorts, but using this custom alphabet. How can I do that?
Let's create an alphabet and a list of words:
In [32]: alphabet = "bafmxpzv"
In [33]: a = ['af', 'ax', 'am', 'ab', 'zvpmf']
Now let's sort them according to where the letters appear in alphabet:
In [34]: sorted(a, key=lambda word: [alphabet.index(c) for c in word])
Out[34]: ['ab', 'af', 'am', 'ax', 'zvpmf']
The above sorts in the correct order.
sorted enables a wide range of custom sorting. The sorted function has three optional arguments: cmp, key, and reverse:
cmp is good for complex sorting tasks. If specified, cmp should be a functionIt that takes two arguments. It should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument. For this case, cmp is overkill.
key, if spedified, should be a function that takes one argument and returns something that python knows natively how to sort. In this case, key returns a list of the indices of each of the word's characters in the alphabet.
In this case, key returns the index of a letter in alphabet.
reverse, if true, reverses the sort-order.
A nonworking alternative
From the comments, this alternative form was mentioned:
In [35]: sorted(a, key=lambda word: [alphabet.index(c) for c in word[0]])
Out[35]: ['af', 'ax', 'am', 'ab', 'zvpmf']
Note that this does not sort in the correct order. That is because the key function here only considers the first letter of each word. This can be demonstrated by testing key:
In [2]: key=lambda word: [alphabet.index(c) for c in word[0]]
In [3]: key('af')
Out[3]: [1]
In [4]: key('ax')
Out[4]: [1]
Observe that key returns the same value for two different strings, af and ax. The value returned reflects only the first character of each word. Because of this, sorted has no way of determining that af belongs before ax.
Update, I misread your question, you have a list of strings, not a single string, here's how to do it, the idea is the same, use a sort based on a custom comparison function:
def acmp (a,b):
la = len(a)
lb = len(b)
lm = min(la,lb)
p = 0
while p < lm:
pa = alphabet.index(a[p])
pb = alphabet.index(b[p])
if pa > pb:
return 1
if pb > pa:
return -1
p = p + 1
if la > lb:
return 1
if lb > la:
return -1
return 0
mylist = ['baf', 'bam', 'pxm']
mylist.sort(cmp = acmp)
Instead of using index() which requires finding the index of a char, a better alternative consists in building a hash map to be used in the sorting, in order to retrieve the index directly.
Example:
>>> alphabet = "bafmxpzv"
>>> a = ['af', 'ax', 'am', 'ab', 'zvpmf']
>>> order = dict(zip(alphabet, range(len(alphabet))))
>>> sorted(a, key=lambda word: [order[c] for c in word])
['ab', 'af', 'am', 'ax', 'zvpmf']

all combination of a complicated list

I want to find all possible combination of the following list:
data = ['a','b','c','d']
I know it looks a straightforward task and it can be achieved by something like the following code:
comb = [c for i in range(1, len(data)+1) for c in combinations(data, i)]
but what I want is actually a way to give each element of the list data two possibilities ('a' or '-a').
An example of the combinations can be ['a','b'] , ['-a','b'], ['a','b','-c'], etc.
without something like the following case of course ['-a','a'].
You could write a generator function that takes a sequence and yields each possible combination of negations. Like this:
import itertools
def negations(seq):
for prefixes in itertools.product(["", "-"], repeat=len(seq)):
yield [prefix + value for prefix, value in zip(prefixes, seq)]
print list(negations(["a", "b", "c"]))
Result (whitespace modified for clarity):
[
[ 'a', 'b', 'c'],
[ 'a', 'b', '-c'],
[ 'a', '-b', 'c'],
[ 'a', '-b', '-c'],
['-a', 'b', 'c'],
['-a', 'b', '-c'],
['-a', '-b', 'c'],
['-a', '-b', '-c']
]
You can integrate this into your existing code with something like
comb = [x for i in range(1, len(data)+1) for c in combinations(data, i) for x in negations(c)]
Once you have the regular combinations generated, you can do a second pass to generate the ones with "negation." I'd think of it like a binary number, with the number of elements in your list being the number of bits. Count from 0b0000 to 0b1111 via 0b0001, 0b0010, etc., and wherever a bit is set, negate that element in the result. This will produce 2^n combinations for each input combination of length n.
Here is one-liner, but it can be hard to follow:
from itertools import product
comb = [sum(t, []) for t in product(*[([x], ['-' + x], []) for x in data])]
First map data to lists of what they can become in results. Then take product* to get all possibilities. Finally, flatten each combination with sum.
My solution basically has the same idea as John Zwinck's answer. After you have produced the list of all combinations
comb = [c for i in range(1, len(data)+1) for c in combinations(data, i)]
you generate all possible positive/negative combinations for each element of comb. I do this by iterating though the total number of combinations, 2**(N-1), and treating it as a binary number, where each binary digit stands for the sign of one element. (E.g. a two-element list would have 4 possible combinations, 0 to 3, represented by 0b00 => (+,+), 0b01 => (-,+), 0b10 => (+,-) and 0b11 => (-,-).)
def twocombinations(it):
sign = lambda c, i: "-" if c & 2**i else ""
l = list(it)
if len(l) < 1:
return
# for each possible combination, make a tuple with the appropriate
# sign before each element
for c in range(2**(len(l) - 1)):
yield tuple(sign(c, i) + el for i, el in enumerate(l))
Now we apply this function to every element of comb and flatten the resulting nested iterator:
l = itertools.chain.from_iterable(map(twocombinations, comb))

Categories

Resources