How to split a string which has blank to list? - python

I have next code:
can="p1=a b c p2=d e f g"
new = can.split()
print(new)
When I execute above, I got next:
['p1=a', 'b', 'c', 'p2=d', 'e', 'f', 'g']
But what I really need is:
['p1=a b c', 'p2=d e f g']
a b c is the value of p1, d e f g is the value of p2, how could I make my aim? Thank you!

If you want to have ['p1=a b c', 'p2=d e f g'], you can split using a regex:
import re
new = re.split(r'\s+(?=\w+=)', can)
If you want a dictionary {'p1': 'a b c', 'p2': 'd e f g'}, further split on =:
import re
new = dict(x.split('=', 1) for x in re.split(r'\s+(?=\w+=)', can))
regex demo

You can just match your desired results, looking for a variable name, then equals and characters until you get to either another variable name and equals, or the end-of-line:
import re
can="p1=a b c p2=d e f g"
re.findall(r'\w+=.*?(?=\s*\w+=|$)', can)
Output:
['p1=a b c', 'p2=d e f g']

Related

remove repeating pattern in a list

I have a list with repeating patterns. I want to remove these repeating pattern to make the list as short as possible. For example:
[a, b, a, b, a, b] => [a, b]
[a, b, c, a, b, c] => [a, b, c]
[a, b, c, d, a, b, c, d] => [a, b, c, d]
[a, a, a, b, b, b, c, c] => [a, b, c]
What is the best way to cover all the possible cases?
I have tried to convert the list to string, and apply regular expression on it:
input = ['a', 'a', 'b', 'c', 'a', 'b', 'c']
temp = ",".join(input) + ","
last_temp = ""
while temp != last_temp:
last_temp = temp
temp = re.sub(r'(.+?)\1+', r'\1', temp)
print(temp)
deduped = temp[:-1]
output = deduped.split(',')
The function works well as expected result: [a, b, c]
However, there is one issue. If the input list is:
['hello', 'sell', 'hello', 'sell', 'hello', 'sell']
The result will be: ['helo', 'sel']
You see, the regular expression also replaced the 'll' to 'l', which is not desired.
How can I fix this issue with my function, or is there any better way? Thanks
sell will be substituted by sel because re.sub substitutes the repeating character l.
You can tweak your regular expression to avoid matching those cases.
For example matching repeating patterns starting from the beginning of the string:
temp = re.sub(r'^(.+?)\1+', r'\1', temp)
Or ensuring the patterns ends with a comma :
temp = re.sub(r'(.+?,)\1+', r'\1', temp)
Edit: given your last example, it's probably best to check patterns between commas:
import re
list_in = ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c']
temp = "," + ",".join(list_in) + ","
last_temp = ""
while temp != last_temp:
last_temp = temp
temp = re.sub(r'(?<=,)(.+?,)\1+', r'\1', temp)
print(temp)
deduped = temp[1:-1]
output = deduped.split(',')
A look-behind makes sure your pattern is preceded by a comma as well.
I dont get why you would use regex in this case.
Why don't you use a "set" instead :
my_set=set(['hello', 'sell', 'hello', 'sell', 'hello', 'sell'])
print(my_set)
my_set=set(['a', 'a', 'b', 'c', 'a', 'b', 'c'])
print(my_set)
Gives :
{'hello', 'sell'}
{'b', 'a', 'c'}

how to enumerate / zip as lambda

Is there a way to replace the for-loop in the groupList function with a lambda function, perhaps with map(), in Python 3.
def groupList(input_list, output_list=[]):
for i, (v, w) in enumerate(zip(input_list[:-2], input_list[2:])):
output_list.append(f'{input_list[i]} {input_list[i+1]} {input_list[i+2]}')
return output_list
print(groupList(['A', 'B', 'C', 'D', 'E', 'F', 'G']))
(Output from the groupList function would be ['A B C', 'B C D', 'C D E', 'D E F', 'E F G'])
Solution 1:
def groupList(input_list):
return [' '.join(input_list[i:i+3]) for i in range(len(input_list) - 2)]
Solution 2:
def groupList(input_list):
return list(map(' '.join, (input_list[i:i+3] for i in range(len(input_list) - 2))))
Besides the previous solutions, a more efficient (but less concise) solution is to compute a full concatenation first and then slice it.
from itertools import accumulate
def groupList(input_list):
full_concat = ' '.join(input_list)
idx = [0]
idx.extend(accumulate(len(s) + 1 for s in input_list))
return [full_concat[idx[i]:idx[i+3]-1] for i in range(len(idx) - 3)]

python regex: string with maximum one whitespace

Hello I would like to know how to create a regex pattern with a sting which might contain maximum one white space. More specificly:
s = "a b d d c"
pattern = "(?P<a>.*) +(?P<b>.*) +(?P<c>.*)"
print(re.match(pattern, s).groupdict())
returns:
{'a': 'a b d d', 'b': '', 'c': 'c'}
I would like to have:
{'a': 'a', 'b': 'b d d', 'c': 'c'}
Another option could be to use zip and a dict and generate the characters based on the length of the matches.
You can get the matches which contain at max one whitespace using a repeating pattern matching a non whitespace char \S and repeat 0+ times a space followed by a non whitespace char:
\S(?: \S)*
Regex demo | Python demo
For example:
import re
a=97
regex = r"\S(?: \S)*"
test_str = "a b d d c"
matches = re.findall(regex, test_str)
chars = list(map(chr, range(a, a+len(matches))))
print(dict(zip(chars, matches)))
Result
{'a': 'a', 'b': 'b d d', 'c': 'c'}
With the help of The fourth birds answer I managed to do it in a way I imagened it to be:
import re
s = "a b d d c"
pattern = "(?P<a>\S(?: \S)*) +(?P<b>\S(?: \S)*) +(?P<c>\S(?: \S)*)"
print(re.match(pattern, s).groupdict())
Looks like you just want to split your string with 2 or more spaces. You can do it this way:
s = "a b d d c"
re.split(r' {2,}', s)
will return you:
['a', 'b d d', 'c']
It's probably easier to use re.split, since the delimiter is known (2 or more spaces), but the patterns in-between are not. I'm sure someone better at regex than myself can work out the look-aheads, but by splitting on \s{2,}, you can greatly simplify the problem.
You can make your dictionary of named groups like so:
import re
s = "a b d d c"
x = dict(zip('abc', re.split('\s{2,}', s)))
x
{'a': 'a', 'b': 'b d d', 'c': 'c'}
Where the first arg in zip is the named groups. To extend this to more general names:
groups = ['group_1', 'another group', 'third_group']
x = dict(zip(groups, re.split('\s{2,}', s)))
{'group_1': 'a', 'another group': 'b d d', 'third_group': 'c'}
I found an other solution I even like better:
import re
s = "a b dll d c"
pattern = "(?P<a>(\S*[\t]?)*) +(?P<b>(\S*[\t ]?)*) +(?P<c>(\S*[\t ]?)*)"
print(re.match(pattern, s).groupdict())
here it's even possible to have more than one letter.

Python list joining — include separator at the start or at the end

The output for ', '.join(['a', 'b', 'c', 'd']) is:
a, b, c, d
Is there a standard way in Python to achieve the following outputs instead?
# option 1, separator is also at the start
, a, b, c, d
# option 2, separator is also at the end
a, b, c, d,
# option 3, separator is both at the start and the end
, a, b, c, d,
There is no standard approach, but a natural way is to add empty strings at the end or at the beginning (or at the end and the beginning). Using some more modern syntax:
>>> ', '.join(['', *['a', 'b', 'c', 'd']])
', a, b, c, d'
>>> ', '.join([*['a', 'b', 'c', 'd'], ''])
'a, b, c, d, '
>>> ', '.join(['', *['a', 'b', 'c', 'd'], ''])
', a, b, c, d, '
Or just use string formatting:
>>> sep = ','
>>> data = ['a', 'b', 'c', 'd']
>>> f"{sep}{sep.join(data)}"
',a,b,c,d'
>>> f"{sep.join(data)}{sep}"
'a,b,c,d,'
>>> f"{sep}{sep.join(data)}{sep}"
',a,b,c,d,'
Here is the way:
list1 = ['1','2','3','4']
s = ","
r = f"{s}{s.join(list1)}"
p = f"{s.join(list1)}{s}"
q = f"{s}{s.join(list1)}{s}"
print(r)
print(p)
print(q)

How to split a string into characters in python

I have a string 'ABCDEFG'
I want to be able to list each character sequentially followed by the next one.
Example
A B
B C
C D
D E
E F
F G
G
Can you tell me an efficient way of doing this? Thanks
In Python, a string is already seen as an enumerable list of characters, so you don't need to split it; it's already "split". You just need to build your list of substrings.
It's not clear what form you want the result in. If you just want substrings, this works:
s = 'ABCDEFG'
[s[i:i+2] for i in range(len(s))]
#=> ['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']
If you want the pairs to themselves be lists instead of strings, just call list on each one:
[list([s[i:i+2]) for i in range(len(s))]
#=> [['A', 'B'], ['B', 'C'], ['C', 'D'], ['D', 'E'], ['E', 'F'], ['F', 'G'], ['G']]
And if you want strings after all, but with something like a space between the letters, join them back together after the list call:
[' '.join(list(s[i:i+2])) for i in range(len(s))]
#=> ['A B', 'B C', 'C D', 'D E', 'E F', 'F G', 'G']
You need to keep the last character, so use izip_longest from itertools
>>> import itertools
>>> s = 'ABCDEFG'
>>> for c, cnext in itertools.izip_longest(s, s[1:], fillvalue=''):
... print c, cnext
...
A B
B C
C D
D E
E F
F G
G
def doit(input):
for i in xrange(len(input)):
print input[i] + (input[i + 1] if i != len(input) - 1 else '')
doit("ABCDEFG")
Which yields:
>>> doit("ABCDEFG")
AB
BC
CD
DE
EF
FG
G
There's an itertools pairwise recipe for exactly this use case:
import itertools
def pairwise(myStr):
a,b = itertools.tee(myStr)
next(b,None)
for s1,s2 in zip(a,b):
print(s1,s2)
Output:
In [121]: pairwise('ABCDEFG')
A B
B C
C D
D E
E F
F G
Your problem is that you have a list of strings, not a string:
with open('ref.txt') as f:
f1 = f.read().splitlines()
f.read() returns a string. You call splitlines() on it, getting a list of strings (one per line). If your input is actually 'ABCDEFG', this will of course be a list of one string, ['ABCDEFG'].
l = list(f1)
Since f1 is already a list, this just makes l a duplicate copy of that list.
print l, f1, len(l)
And this just prints the list of lines, and the copy of the list of lines, and the number of lines.
So, first, what happens if you drop the splitlines()? Then f1 will be the string 'ABCDEFG', instead of a list with that one string. That's a good start. And you can drop the l part entirely, because f1 is already an iterable of its characters; list(f1) will just be a different iterable of the same characters.
So, now you want to print each letter with the next letter. One way to do that is by zipping 'ABCDEFG' and 'BCDEFG '. But how do you get that 'BCDEFG '? Simple; it's just f1[1:] + ' '.
So:
with open('ref.txt') as f:
f1 = f.read()
for left, right in zip(f1, f1[1:] + ' '):
print left, right
Of course for something this simple, there are many other ways to do the same thing. You can iterate over range(len(f1)) and get 2-element slices, or you can use itertools.zip_longest, or you can write a general-purpose "overlapping adjacent groups of size N from any iterable" function out of itertools.tee and zip, etc.
As you want space between the characters you can use zip function and list comprehension :
>>> s="ABCDEFG"
>>> l=[' '.join(i) for i in zip(s,s[1:])]
['A B', 'B C', 'C D', 'D E', 'E F', 'F G']
>>> for i in l:
... print i
...
A B
B C
C D
D E
E F
F G
if you dont want space just use list comprehension :
>>> [s[i:i+2] for i in range(len(s))]
['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']

Categories

Resources