Zip string-subset from tuples in a list - python

With a structure like this
hapts = [('1|2', '1|2'), ('3|4', '3|4')]
I need to zip it (sort of...) to get the following:
end = ['1|1', '2|2', '3|3', '4|4']
I started working with the following code:
zipped=[]
for i in hapts:
tete = zip(i[0][0], i[1][0])
zipped.extend(tete)
some = zip(i[0][2], i[1][2])
zipped.extend(some)
... and got it zipped like this:
zipped = [('1', '1'), ('2', '2'), ('3', '3'), ('4', '4')]
Any suggestions on how to continue? Furthermore i'm sure there should a more elegant way to do this, but is hard to pass to Google an accurate definition of the question ;)
Thx!

You are very close to solving this, I would argue the best solution here is a simple str.join() in a list comprehension:
["|".join(values) for values in zipped]
This also has the bonus of working nicely with (potentially) more values, without modification.
If you wanted tuples (which is not what your requested output shows, as brackets don't make a tuple, a comma does), then it is trivial to add that in:
[("|".join(values), ) for values in zipped]
Also note that zipped can be produced more effectively too:
>>> zipped = itertools.chain.from_iterable(zip(*[part.split("|") for part in group]) for group in hapts)
>>> ["|".join(values) for values in zipped]
['1|1', '2|2', '3|3', '4|4']
And to show what I meant before about handling more values elegantly:
>>> hapts = [('1|2|3', '1|2|3', '1|2|3'), ('3|4|5', '3|4|5', '3|4|5')]
>>> zipped = itertools.chain.from_iterable(zip(*[part.split("|") for part in group]) for group in hapts)
>>> ["|".join(values) for values in zipped]
['1|1|1', '2|2|2', '3|3|3', '3|3|3', '4|4|4', '5|5|5']

The problem in this context is to
unfold the list
reformat it
fold it
Here is how you may approach the problem
>>> reformat = lambda t: map('|'.join, izip(*(e.split("|") for e in t)))
>>> list(chain(*(reformat(t) for t in hapts)))
['1|1', '2|2', '3|3', '4|4']
You don't need the working code in this context
Instead if you need to work on your output, just rescan it and join it with "|"
>>> ['{}|{}'.format(*t) for t in zipped]
['1|1', '2|2', '3|3', '4|4']
Note
Parenthesis are redundant in your output

Your code basically works, but here's a more elegant way to do it.
First define a transposition function that takes an entry of hapts and flips it:
>>> transpose = lambda tup: zip(*(y.split("|") for y in tup))
Then map that function over hapts:
>>> map(transpose, hapts)
... [[('1', '1'), ('2', '2')], [('3', '3'), ('4', '4')]]
and then if you want to flatten this into one list
>>> y = list(chain.from_iterable(map(transpose, hapts)))
... [('1', '1'), ('2', '2'), ('3', '3'), ('4', '4')]
Finally, to join it back up into strings again:
>>> map("|".join, y)
... ['1|1', '2|2', '3|3', '4|4']

end = []
for groups in hapts:
end.extend('|'.join(regrouped) for regrouped in zip([group.split('|') for group in groups]))
This should also continue to work with n-length groups of n-length pipe-delimited characters, and n-length groups of groups, though it will truncate the regrouped values to the shortest group of characters in each group of character groups.

Related

How to extract each word consecutive to its own previous number in a string and sorting the result in Python

Input : x3b4U5i2
Output : bbbbiiUUUUUxxx
How can i solve this problem in Python. I have to print the word next to it's number n times and sort it
It wasn't clear if multiple digit counts or groups of letters should be handled. Here's a solution that does all of that:
import re
def main(inp):
parts = re.split(r"(\d+)", inp)
parts_map = {parts[i]:int(parts[i+1]) for i in range(0, len(parts)-1, 2)}
print(''.join([c*parts_map[c] for c in sorted(parts_map.keys(),key=str.lower)]))
main("x3b4U5i2")
main("x3brx4U5i2")
main("x23b4U35i2")
Result:
bbbbiiUUUUUxxx
brxbrxbrxbrxiiUUUUUxxx
bbbbiiUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUxxxxxxxxxxxxxxxxxxxxxxx
I'm assuming the formatting will always be <char><int> with <int> being in between 1 and 9...
input_ = "x3b4U5i2"
result_list = [input_[i]*int(input_[i+1]) for i in range(0, len(input_), 2)]
result_list.sort(key=str.lower)
result = ''.join(result_list)
There's probably a much more performance-oriented approach to solving this, it's just the first solution that came into my limited mind.
Edit
After the feedback in the comments I've tried to improve performance by sorting it first, but I have actually decreased performance in the following implementaiton:
input_ = "x3b4U5i2"
def sort_first(value):
return value[0].lower()
tuple_construct = [(input_[i], int(input_[i+1])) for i in range(0, len(input_), 2)]
tuple_construct.sort(key=sort_first)
result = ''.join([tc[0] * tc[1] for tc in tuple_construct])
Execution time for 100,000 iterations on it:
1) The execution time is: 0.353036
2) The execution time is: 0.4361724
One option, extract the character/digit(s) pairs with a regex, sort them by letter (ignoring case), multiply the letter by the number of repeats, join:
s = 'x3b4U5i2'
import re
out = ''.join([c*int(i) for c,i in
sorted(re.findall('(\D)(\d+)', s),
key=lambda x: x[0].casefold())
])
print(out)
Output: bbbbiiUUUUUxxx
If you want to handle multiple characters you can use '(\D+)(\d+)'
No list comprehensions or generator expressions in sight. Just using re.sub with a lambda to expand the length encoding, then sorting that, and then joing that back into a string.
import re
s = "x3b4U5i2"
''.join(sorted(re.sub(r"(\D+)(\d+)",
lambda m: m.group(1)*int(m.group(2)),
s),
key=lambda x: x[0].casefold()))
# 'bbbbiiUUUUUxxx'
If we use re.findall to extract a list of pairs of strings and multipliers:
import re
s = 'x3b4U5i2'
pairs = re.findall(r"(\D+)(\d+)", s)
Then we can use some functional style to sort that list before expanding it.
from operator import itemgetter
def compose(f, g):
return lambda x: f(g(x))
sorted(pairs, key=compose(str.lower, itemgetter(0)))
# [('b', '4'), ('i', '2'), ('U', '5'), ('x', '3')]

How to add specific characters of an elements of a list to a tuple

I have a list containing some elements that looks like this:
data = ["1: 6987", "2: 5436", "7: 9086"]
Is it possible to tuple the elements where it would look like this:
tuple_data = [("1", 6987) , ("2", 5436), ("7", 9086)]
splits = [record.split(": ") for record in data]
tuple_data = [(first, int(second)) for first, second in splits]
Can also do it in one line if you like:
tuple_data = [(first, int(second)) for first, second in [record.split(": ") for record in data]]
map and split can be used for this:
data = ["1: 6987", "2: 5436", "7: 9086"]
map(lambda i: (i.split(': ')[0], int(i.split(': ')[1])), data)
Result:
[('1', 6987), ('2', 5436), ('7', 9086)]
lambda defines an anonymous function which splits each element on ': ' and adds the first and second part of that split to a tuple, while map applies the anonymous (lambda) function to each element in data.
There may be a more elegant way :).
List comprehension is one way:
data = ["1: 6987", "2: 5436", "7: 9086"]
res = [(i.split(':')[0], int(i.split(':')[1])) for i in data]
# [('1', 6987), ('2', 5436), ('7', 9086)]
Syntax is simpler if you want only integers:
res = [tuple(map(int, i.split(':'))) for i in data]
# [(1, 6987), (2, 5436), (7, 9086)]
You can use a list comprehension:
data = [(i[:i.find(':')], int(i[i.find(':')+1:].strip())) for i in data]
result:
[('1', 6987), ('2', 5436), ('7', 9086)]

How to extract a common name from multipal filenames and delete something I don't want

For example, I have 7 files named:
g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt
g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt
g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt
g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt
g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt
g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt
g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt
I want to extract a name from these files, named as:
g18_84pp_2A_MVP_GoodiesT0_MIX.txt
Any idea for this? Thanks.
Is there any possible that I can only depend on the underscores?
For example, separating filename as
"g18_84pp_2A_MVP2", "_", "GoodiesT0-HKJ-DFG" "_", "MIX-CMVP2_Y1000-MIX", ".txt".
Take "g18_84pp_2A_MVP2" without number 2, take "GoodiesT0" from "GoodiesT0-HKJ-DFG" and take first "MIX" from "MIX-CMVP2_Y1000-MIX", B/C I have a lot of files have different names for separating parts, I want it general as well
import re
names = ['g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt',
'g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt',
'g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt',
'g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt',
'g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt',
'g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt',
'g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt']
f = lambda x: re.findall('g18_84pp_2A_MVP(.*?)_GoodiesT0(.*?)_MIX(.*?)\.txt', x)
for x in names:
print(f(x))
Produces
[('1', '-HKJ-DFG', '-CMVP1_Y1000-MIX')]
[('2', '-HKJ-DFG', '-CMVP2_Y1000-MIX')]
[('3', '-HKJ-DFG', '-CMVP3_Y1000-MIX')]
[('4', '-HKJ-DFG', '-CMVP4_Y1000-MIX')]
[('5', '-HKJ-DFG', '-CMVP5_Y1000-MIX')]
[('6', '-HKJ-DFG', '-CMVP6_Y1000-MIX')]
[('7', '-HKJ-DFG', '-CMVP7_Y1000-MIX')]
Filter the names that doesn't match this pattern:
names = list(filter(f, names))
Since it's unclear what you're trying to do, this is going to be a good starting point.
UPDATE
The question was updated. Here is what you (probably) want to achieve:
import re
names = ['g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt',
'g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt',
'g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt',
'g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt',
'g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt',
'g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt',
'g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt']
expression = 'g18_84pp_2A_MVP(.*?)_Goodies(.*?)_MIX(.*?)\.txt'
f = lambda x: re.findall(expression, x)
_f = lambda x: len(re.findall(expression, x))==3
for x in names:
print(f(x))
Outputs
[('1', 'T0-HKJ-DFG', '-CMVP1_Y1000-MIX')]
[('2', 'T0-HKJ-DFG', '-CMVP2_Y1000-MIX')]
[('3', 'T0-HKJ-DFG', '-CMVP3_Y1000-MIX')]
[('4', 'T0-HKJ-DFG', '-CMVP4_Y1000-MIX')]
[('5', 'T0-HKJ-DFG', '-CMVP5_Y1000-MIX')]
[('6', 'T0-HKJ-DFG', '-CMVP6_Y1000-MIX')]
[('7', 'T0-HKJ-DFG', '-CMVP7_Y1000-MIX')]
If you need to filter the original list:
names = list(filter(_f, names))

Logical or groups regex

I am trying to make a regex that will find certain cases of incorrectly entered fractions, and return the numerator and denominator as groups.
These cases involve a space between the slash and a number: such as either 1 /2 or 1/ 2.
I use a logical-or operator in the regex, since I'd rather not have 2 separate patterns to check for:
r'(\d) /(\d)|(\d)/ (\d)'
(I'm not using \d+ since I'm more interested in the numbers directly bordering the division sign, though \d+ would work as well).
The problem is, when it matches one of the cases, say the second (1/ 2), looking at all the groups gives (None, None, '1', '2'), but I would like to have a regex that only returns 2 groups--in both cases, I would like the groups to be ('1', '2'). Is this possible?
Edit:
I would also like it to return groups ('1', '2') for the case 1 / 2, but to not capture anything for well-formed fractions like 1/2.
(\d)(?: /|/ | / )(\d) should do it (and only return incorrectly entered fractions). Notice the use of no-capture groups.
Edit: updated with comments below.
What about just using (\d)\s*/\s*(\d)?
That way you will always have only two groups:
>>> import re
>>> regex = r'(\d)\s*/\s*(\d)'
>>> re.findall(regex, '1/2')
[('1', '2')]
>>> re.findall(regex, '1 /2')
[('1', '2')]
>>> re.findall(regex, '1/ 2')
[('1', '2')]
>>> re.findall(regex, '1 / 2')
[('1', '2')]
>>>

String Replacement Combinations

So I have a string '1xxx1' and I want to replace a certain number (maybe all maybe none) of x's with a character, let's say '5'. I want all possible combinations (...maybe permutations) of the string where x is either substituted or left as x. I would like those results stored in a list.
So the desired result would be
>>> myList = GenerateCombinations('1xxx1', '5')
>>> print myList
['1xxx1','15xx1','155x1','15551','1x5x1','1x551','1xx51']
Obviously I'd like it to be able to handle strings of any length with any amount of x's as well as being able to substitute any number. I've tried using loops and recursion to figure this out to no avail. Any help would be appreciated.
How about:
from itertools import product
def filler(word, from_char, to_char):
options = [(c,) if c != from_char else (from_char, to_char) for c in word]
return (''.join(o) for o in product(*options))
which gives
>>> filler("1xxx1", "x", "5")
<generator object <genexpr> at 0x8fa798c>
>>> list(filler("1xxx1", "x", "5"))
['1xxx1', '1xx51', '1x5x1', '1x551', '15xx1', '15x51', '155x1', '15551']
(Note that you seem to be missing 15x51.)
Basically, first we make a list of every possible target for each letter in the source word:
>>> word = '1xxx1'
>>> from_char = 'x'
>>> to_char = '5'
>>> [(c,) if c != from_char else (from_char, to_char) for c in word]
[('1',), ('x', '5'), ('x', '5'), ('x', '5'), ('1',)]
And then we use itertools.product to get the Cartesian product of these possibilities and join the results together.
For bonus points, modify to accept a dictionary of replacements. :^)

Categories

Resources