How to iteratively split a string using backward combinations? - python

I have a list of strings that look like this:
['C04.123.123.123', 'C03.456.456.456', 'C05.789.789.789']
I'm trying to split each string so I get different backward combinations of splits on the period delimiter. Basically, if I only take the example of the first string, I want to get:
['C04.123.123.123', 'C04.123.123', 'C04.123', 'C04']
How can I achieve this? I've tried looking into itertools.combinations and the standard split features but no luck.

One-line, easy to understand (was less easy to tune :)), using str.rsplit with maxsplit gradually increasing up to the number of dots:
lst = ['C04.123.123.123', 'C03.456.456.456', 'C05.789.789.789']
result = [x.rsplit(".",i)[0] for x in lst for i in range(x.count(".")+1) ]
result:
['C04.123.123.123',
'C04.123.123',
'C04.123',
'C04',
'C03.456.456.456',
'C03.456.456',
'C03.456',
'C03',
'C05.789.789.789',
'C05.789.789',
'C05.789',
'C05']
The only thing that annoys me is that it calls split a lot just to keep the first element. Too bad there isn't a built-in lazy split function we could call next on.

You can use a list comprehension:
d = ['C04.123.123.123', 'C03.456.456.456', 'C05.789.789.789']
new_d = [a+('.' if i else '')+'.'.join(i) for a, *c in map(".".split, d)
for i in [c[:h] for h in range(len(c)+1)][::-1]]
Output:
['C04.123.123.123', 'C04.123.123', 'C04.123', 'C04', 'C03.456.456.456', 'C03.456.456', 'C03.456', 'C03', 'C05.789.789.789', 'C05.789.789', 'C05.789', 'C05']

start_list = ['C04.123.123.123', 'C03.456.456.456', 'C05.789.789.789']
final_list = []
for item in start_list:
broke_up = item.split('.')
temp = []
full_item = []
for sect in broke_up:
temp.append(sect)
full_item.append(".".join(temp))
final_list.extend(full_item)
print(final_list)
Alternatively you can final_list.append(full_item) to keep seperate lists for each string in the original list.

Try this:
list(accumulate(s.split('.'), lambda a, b: a + '.' + b))[::-1]

You can use itertools.accumulate:
from itertools import accumulate
s = 'C04.123.123.123'
# define the incremental step
append = lambda s, e: s + '.' + e
result = list(accumulate(s.split('.'), append))[::-1]

Related

Regex: Split characters with "/"

I have these strings, for example:
['2300LO/LCE','2302KO/KCE']
I want to have output like this:
['2300LO','2300LCE','2302KO','2302KCE']
How can I do it with Regex in Python?
Thanks!
You can make a simple generator that yields the pairs for each string. Then you can flatten them into a single list with itertools.chain()
from itertools import product, chain
def getCombos(s):
nums, code = re.match(r'(\d+)(.*)', s).groups()
for pair in product([nums], code.split("/")):
yield ''.join(pair)
a = ['2300LO/LCE','2302KO/KCE']
list(chain.from_iterable(map(getCombos, a)))
# ['2300LO', '2300LCE', '2302KO', '2302KCE']
This has the added side benefit or working with strings like '2300LO/LCE/XX/CC' which will give you ['2300LO', '2300LCE', '2300XX', '2300CC',...]
You can try something like this:
list1 = ['2300LO/LCE','2302KO/KCE']
list2 = []
for x in list1:
a = x.split('/')
tmp = re.findall(r'\d+', a[0]) # extracting digits
list2.append(a[0])
list2.append(tmp[0] + a[1])
print(list2)
This can be implemented with simple string splits.
Since you asked the output with regex, here is your answer.
list1 = ['2300LO/LCE','2302KO/KCE']
import re
r = re.compile("([0-9]{1,4})([a-zA-Z].*)/([a-zA-Z].*)")
out = []
for s in list1:
items = r.findall(s)[0]
out.append(items[0]+items[1])
out.append(items[2])
print(out)
The explanation for the regex - (4 digit number), followed by (any characters), followed by a / and (rest of the characters).
they are grouped with () , so that when you use find all, it becomes individual elements.

most pythonic way to compare substrings l in list L to string S & edit S according to l in L?

The list ['a','a #2','a(Old)'] should become {'a'} because '#' and '(Old)' are to be excised and a list of duplicates isn't needed. I struggled to develop a list comprehension with a generator and settled on this since I knew it'd work and valued time more than looking good:
l = []
groups = ['a','a #2','a(Old)']
for i in groups:
if ('#') in i: l.append(i[:i.index('#')].strip())
elif ('(Old)') in i: l.append(i[:i.index('(Old)')].strip())
else: l.append(i)
groups = set(l)
What's the slick way to get this result?
Here is general solution, if you want to clean elements of list lst from parts in wastes:
lst = ['a','a #2','a(Old)']
wastes = ['#', '(Old)']
cleaned_set = {
min([element.split(waste)[0].strip() for waste in wastes])
for element in arr
}
You could write this whole expression in a single set comprehension
>>> groups = ['a','a #2','a(Old)']
>>> {i.split('#')[0].split('(Old)')[0].strip() for i in groups}
{'a'}
This will get everything preceding a # and everything preceding '(Old)', then trim off whitespace. The remainder is placed into a set, which only keeps unique values.
You could define a helper function to apply all of the splits and then use a set comprehension.
For example:
lst = ['a','a #2','a(Old)', 'b', 'b #', 'b(New)']
splits = {'#', '(Old)', '(New)'}
def split_all(a):
for s in splits:
a = a.split(s)[0]
return a.strip()
groups = {split_all(a) for a in lst}
#{'a', 'b'}

Cut character string every two commas

I would like to separate my string every both commas but I can not, can you help me.
This is what I want: ['nb1,nb2','nb3,nb4','nb5,nb6']
Here is what I did :
a= 'nb1,nb2,nb3,nb4,nb5,nb6'
compteur=0
for i in a:
if i==',' :
compteur+=1
if compteur%2==0:
print compteur
test = a.split(',', compteur%2==0 )
print a
print test
The result:
2
4
nb1,nb2,nb3,nb4,nb5,nb6
['nb1', 'nb2,nb3,nb4,nb5,nb6']
Thanks you by advances for you answers
You can use regex
In [12]: re.findall(r'([\w]+,[\w]+)', 'nb1,nb2,nb3,nb4,nb5,nb6')
Out[12]: ['nb1,nb2', 'nb3,nb4', 'nb5,nb6']
A quick fix could be to simply first separate the elements by commas and then join the elements by two together again. Like:
sub_result = a.split(',')
result = [','.join(sub_result[i:i+2]) for i in range(0,len(sub_result),2)]
This gives:
>>> result
['nb1,nb2', 'nb3,nb4', 'nb5,nb6']
This will also work if the number of elements is odd. For example:
>>> a = 'nb1,nb2,nb3,nb4,nb5,nb6,nb7'
>>> sub_result = a.split(',')
>>> result = [','.join(sub_result[i:i+2]) for i in range(0,len(sub_result),2)]
>>> result
['nb1,nb2', 'nb3,nb4', 'nb5,nb6', 'nb7']
You use a zip operation of the list with itself to create pairs:
a = 'nb1,nb2,nb3,nb4,nb5,nb6'
parts = a.split(',')
# parts = ['nb1', 'nb2', 'nb3', 'nb4', 'nb5', 'nb6']
pairs = list(zip(parts, parts[1:]))
# pairs = [('nb1', 'nb2'), ('nb2', 'nb3'), ('nb3', 'nb4'), ('nb4', 'nb5'), ('nb5', 'nb6')]
Now you can simply join every other pair again for your output:
list(map(','.join, pairs[::2]))
# ['nb1,nb2', 'nb3,nb4', 'nb5,nb6']
Split the string by comma first, then apply the common idiom to partition an interable into sub-sequences of length n (where n is 2 in your case) with zip.
>>> s = 'nb1,nb2,nb3,nb4,nb5,nb6'
>>> [','.join(x) for x in zip(*[iter(s.split(','))]*2)]
['nb1,nb2', 'nb3,nb4', 'nb5,nb6']

Converting Strings to two lists in Python

(This is probably really simple, but) Say I have this input as a string:
"280.2,259.8 323.1,122.5 135.8,149.5 142.9,403.5"
and I want to separate each coordinate point onto separate lists, for each x and y value, so they'd end up looking like this:
listX = [280.2, 323.1, 135.8, 142.9]
listY = [259.8, 122.5, 149.5, 403.5]
I'd need this to be able to start out with any size string, thanks in advance!
Copy and paste this and it should work:
s_input = "280.2,259.8 323.1,122.5 135.8,149.5 142.9,403.5"
listX = [float(x.split(',')[0]) for x in s_input.split()]
listY = [float(y.split(',')[1]) for y in s_input.split()]
This would work.
my_string="280.2,259.8 323.1,122.5 135.8,149.5 142.9,403.5"
listX =[item.split(",")[0] for item in my_string.split()]
listY=[item.split(",")[1] for item in my_string.split()]
or
X_list=[]
Y_list=[]
for val in [item.split(",") for item in my_string.split()]:
X_list.append(val[0])
Y_list.append(val[1])
Which version to use would probably depend on your personal preference and the length of your string.
Have a look at the split method of strings. It should get you started.
You can do the following:
>>> a ="280.2,259.8 323.1,122.5 135.8,149.5 142.9,403.5"
>>> b = a.split(" ")
>>> b
['280.2,259.8', '323.1,122.5', '135.8,149.5', '142.9,403.5']
>>> c = [ x.split(',') for x in b]
>>> c
[['280.2', '259.8'], ['323.1', '122.5'], ['135.8', '149.5'], ['142.9', '403.5']]
>>> X = [ d[0] for d in c]
>>> X
['280.2', '323.1', '135.8', '142.9']
>>> Y = [ d[1] for d in c]
>>> Y
['259.8', '122.5', '149.5', '403.5']
There's a magical method call str.split, which given a string, splits by a delimiter.
Assume we have the string in a variable s.
To split by the spaces and make a list, we would do
coords = s.split()
At this point, the most straightforward method of putting it into the lists would be to do
listX = [float(sub.split(",")[0]) for sub in coords]
listY = [float(sub.split(",")[1]) for sub in coords]
You can use a a combination of zip and split with a list comprehension:
s = "280.2,259.8 323.1,122.5 135.8,149.5 142.9,403.5"
l = zip(*[a.split(',') for a in s.split()])
This will return a list of 2 tuples.
To get lists instead, use map on it.
l = map(list, zip(*[a.split(',') for a in s.split()]))
l[0] and l[1] will have your lists.
if your list is huge, consider using itertools.izip()

Removing character in list of strings

If I have a list of strings such as:
[("aaaa8"),("bb8"),("ccc8"),("dddddd8")...]
What should I do in order to get rid of all the 8s in each string? I tried using strip or replace in a for loop but it doesn't work like it would in a normal string (that not in a list). Does anyone have a suggestion?
Try this:
lst = [("aaaa8"),("bb8"),("ccc8"),("dddddd8")]
print([s.strip('8') for s in lst]) # remove the 8 from the string borders
print([s.replace('8', '') for s in lst]) # remove all the 8s
Beside using loop and for comprehension, you could also use map
lst = [("aaaa8"),("bb8"),("ccc8"),("dddddd8")]
mylst = map(lambda each:each.strip("8"), lst)
print mylst
A faster way is to join the list, replace 8 and split the new string:
mylist = [("aaaa8"),("bb8"),("ccc8"),("dddddd8")]
mylist = ' '.join(mylist).replace('8','').split()
print mylist
mylist = [("aaaa8"),("bb8"),("ccc8"),("dddddd8")]
print mylist
j=0
for i in mylist:
mylist[j]=i.rstrip("8")
j+=1
print mylist
Here's a short one-liner using regular expressions:
print [re.compile(r"8").sub("", m) for m in mylist]
If we separate the regex operations and improve the namings:
pattern = re.compile(r"8") # Create the regular expression to match
res = [pattern.sub("", match) for match in mylist] # Remove match on each element
print res
lst = [("aaaa8"),("bb8"),("ccc8"),("dddddd8")...]
msg = filter(lambda x : x != "8", lst)
print msg
EDIT:
For anyone who came across this post, just for understanding the above removes any elements from the list which are equal to 8.
Supposing we use the above example the first element ("aaaaa8") would not be equal to 8 and so it would be dropped.
To make this (kinda work?) with how the intent of the question was we could perform something similar to this
msg = filter(lambda x: x != "8", map(lambda y: list(y), lst))
I am not in an interpreter at the moment so of course mileage may vary, we may have to index so we do list(y[0]) would be the only modification to the above for this explanation purposes.
What this does is split each element of list up into an array of characters so ("aaaa8") would become ["a", "a", "a", "a", "8"].
This would result in a data type that looks like this
msg = [["a", "a", "a", "a"], ["b", "b"]...]
So finally to wrap that up we would have to map it to bring them all back into the same type roughly
msg = list(map(lambda q: ''.join(q), filter(lambda x: x != "8", map(lambda y: list(y[0]), lst))))
I would absolutely not recommend it, but if you were really wanting to play with map and filter, that would be how I think you could do it with a single line.

Categories

Resources