I have a list with repeating patterns. I want to remove these repeating pattern to make the list as short as possible. For example:
[a, b, a, b, a, b] => [a, b]
[a, b, c, a, b, c] => [a, b, c]
[a, b, c, d, a, b, c, d] => [a, b, c, d]
[a, a, a, b, b, b, c, c] => [a, b, c]
What is the best way to cover all the possible cases?
I have tried to convert the list to string, and apply regular expression on it:
input = ['a', 'a', 'b', 'c', 'a', 'b', 'c']
temp = ",".join(input) + ","
last_temp = ""
while temp != last_temp:
last_temp = temp
temp = re.sub(r'(.+?)\1+', r'\1', temp)
print(temp)
deduped = temp[:-1]
output = deduped.split(',')
The function works well as expected result: [a, b, c]
However, there is one issue. If the input list is:
['hello', 'sell', 'hello', 'sell', 'hello', 'sell']
The result will be: ['helo', 'sel']
You see, the regular expression also replaced the 'll' to 'l', which is not desired.
How can I fix this issue with my function, or is there any better way? Thanks
sell will be substituted by sel because re.sub substitutes the repeating character l.
You can tweak your regular expression to avoid matching those cases.
For example matching repeating patterns starting from the beginning of the string:
temp = re.sub(r'^(.+?)\1+', r'\1', temp)
Or ensuring the patterns ends with a comma :
temp = re.sub(r'(.+?,)\1+', r'\1', temp)
Edit: given your last example, it's probably best to check patterns between commas:
import re
list_in = ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c']
temp = "," + ",".join(list_in) + ","
last_temp = ""
while temp != last_temp:
last_temp = temp
temp = re.sub(r'(?<=,)(.+?,)\1+', r'\1', temp)
print(temp)
deduped = temp[1:-1]
output = deduped.split(',')
A look-behind makes sure your pattern is preceded by a comma as well.
I dont get why you would use regex in this case.
Why don't you use a "set" instead :
my_set=set(['hello', 'sell', 'hello', 'sell', 'hello', 'sell'])
print(my_set)
my_set=set(['a', 'a', 'b', 'c', 'a', 'b', 'c'])
print(my_set)
Gives :
{'hello', 'sell'}
{'b', 'a', 'c'}
Related
I'm wondering if I can split python string by 3 steps
First by (), secondly by {}, and finally by ","
string = "module ( a , b, c, d, {e, f, g}, {h,i}, j, k )"
result = re.split("",string)
print(result)
I want this code's result to be as below
['a', 'b', 'c', 'd', '{e,f,g}', '{h,i}', 'j', 'k']
This does what you ask, if things aren't nested any more deeply than this.
import re
pat = r'\w+|{[^}]*}'
string = "module ( a , b, c, d, {e, f, g}, {h,i}, j, k )"
result = re.findall(pat, string)
print(list(result))
Output:
C:\tmp>python x.py
['module', 'a', 'b', 'c', 'd', '{e, f, g}', '{h,i}', 'j', 'k']
The output for ', '.join(['a', 'b', 'c', 'd']) is:
a, b, c, d
Is there a standard way in Python to achieve the following outputs instead?
# option 1, separator is also at the start
, a, b, c, d
# option 2, separator is also at the end
a, b, c, d,
# option 3, separator is both at the start and the end
, a, b, c, d,
There is no standard approach, but a natural way is to add empty strings at the end or at the beginning (or at the end and the beginning). Using some more modern syntax:
>>> ', '.join(['', *['a', 'b', 'c', 'd']])
', a, b, c, d'
>>> ', '.join([*['a', 'b', 'c', 'd'], ''])
'a, b, c, d, '
>>> ', '.join(['', *['a', 'b', 'c', 'd'], ''])
', a, b, c, d, '
Or just use string formatting:
>>> sep = ','
>>> data = ['a', 'b', 'c', 'd']
>>> f"{sep}{sep.join(data)}"
',a,b,c,d'
>>> f"{sep.join(data)}{sep}"
'a,b,c,d,'
>>> f"{sep}{sep.join(data)}{sep}"
',a,b,c,d,'
Here is the way:
list1 = ['1','2','3','4']
s = ","
r = f"{s}{s.join(list1)}"
p = f"{s.join(list1)}{s}"
q = f"{s}{s.join(list1)}{s}"
print(r)
print(p)
print(q)
I'm trying to find a simple way to convert a string like this:
a = '[[a b] [c d]]'
into the corresponding nested list structure, where the letters are turned into strings:
a = [['a', 'b'], ['c', 'd']]
I tried to use
import ast
l = ast.literal_eval('[[a b] [c d]]')
l = [i.strip() for i in l]
as found here
but it doesn't work because the characters a,b,c,d are not within quotes.
in particular I'm looking for something that turns:
'[[X v] -s]'
into:
[['X', 'v'], '-s']
You can use regex to find all items between brackets then split the result :
>>> [i.split() for i in re.findall(r'\[([^\[\]]+)\]',a)]
[['a', 'b'], ['c', 'd']]
The regex r'\[([^\[\]]+)\]' will match anything between square brackets except square brackets,which in this case would be 'a b' and 'c d' then you can simply use a list comprehension to split the character.
Note that this regex just works for the cases like this, which all the characters are between brackets,and for another cases you can write the corresponding regex, also not that the regex tick won't works in all cases .
>>> a = '[[a b] [c d] [e g]]'
>>> [i.split() for i in re.findall(r'\[([^\[\]]+)\]',a)]
[['a', 'b'], ['c', 'd'], ['e', 'g']]
Use isalpha method of string to wrap all characters into brackets:
a = '[[a b] [c d]]'
a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() else x, a))
Now a is:
'[["a" "b"] ["c" "d"]]'
And you can use json.loads (as #a_guest offered):
json.loads(a.replace(' ', ','))
>>> import json
>>> a = '[[a b] [c d]]'
>>> a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() else x, a))
>>> a
'[["a" "b"] ["c" "d"]]'
>>> json.loads(a.replace(' ', ','))
[[u'a', u'b'], [u'c', u'd']]
This will work with any degree of nested lists following the above pattern, e.g.
>>> a = '[[[a b] [c d]] [[e f] [g h]]]'
>>> ...
>>> json.loads(a.replace(' ', ','))
[[[u'a', u'b'], [u'c', u'd']], [[u'e', u'f'], [u'g', u'h']]]
For the specific example of '[[X v] -s]':
>>> import json
>>> a = '[[X v] -s]'
>>> a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() or x=='-' else x, a))
>>> json.loads(a.replace('[ [', '[[').replace('] ]', ']]').replace(' ', ',').replace('][', '],[').replace('""',''))
[[u'X', u'v'], u'-s']
I have a string 'ABCDEFG'
I want to be able to list each character sequentially followed by the next one.
Example
A B
B C
C D
D E
E F
F G
G
Can you tell me an efficient way of doing this? Thanks
In Python, a string is already seen as an enumerable list of characters, so you don't need to split it; it's already "split". You just need to build your list of substrings.
It's not clear what form you want the result in. If you just want substrings, this works:
s = 'ABCDEFG'
[s[i:i+2] for i in range(len(s))]
#=> ['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']
If you want the pairs to themselves be lists instead of strings, just call list on each one:
[list([s[i:i+2]) for i in range(len(s))]
#=> [['A', 'B'], ['B', 'C'], ['C', 'D'], ['D', 'E'], ['E', 'F'], ['F', 'G'], ['G']]
And if you want strings after all, but with something like a space between the letters, join them back together after the list call:
[' '.join(list(s[i:i+2])) for i in range(len(s))]
#=> ['A B', 'B C', 'C D', 'D E', 'E F', 'F G', 'G']
You need to keep the last character, so use izip_longest from itertools
>>> import itertools
>>> s = 'ABCDEFG'
>>> for c, cnext in itertools.izip_longest(s, s[1:], fillvalue=''):
... print c, cnext
...
A B
B C
C D
D E
E F
F G
G
def doit(input):
for i in xrange(len(input)):
print input[i] + (input[i + 1] if i != len(input) - 1 else '')
doit("ABCDEFG")
Which yields:
>>> doit("ABCDEFG")
AB
BC
CD
DE
EF
FG
G
There's an itertools pairwise recipe for exactly this use case:
import itertools
def pairwise(myStr):
a,b = itertools.tee(myStr)
next(b,None)
for s1,s2 in zip(a,b):
print(s1,s2)
Output:
In [121]: pairwise('ABCDEFG')
A B
B C
C D
D E
E F
F G
Your problem is that you have a list of strings, not a string:
with open('ref.txt') as f:
f1 = f.read().splitlines()
f.read() returns a string. You call splitlines() on it, getting a list of strings (one per line). If your input is actually 'ABCDEFG', this will of course be a list of one string, ['ABCDEFG'].
l = list(f1)
Since f1 is already a list, this just makes l a duplicate copy of that list.
print l, f1, len(l)
And this just prints the list of lines, and the copy of the list of lines, and the number of lines.
So, first, what happens if you drop the splitlines()? Then f1 will be the string 'ABCDEFG', instead of a list with that one string. That's a good start. And you can drop the l part entirely, because f1 is already an iterable of its characters; list(f1) will just be a different iterable of the same characters.
So, now you want to print each letter with the next letter. One way to do that is by zipping 'ABCDEFG' and 'BCDEFG '. But how do you get that 'BCDEFG '? Simple; it's just f1[1:] + ' '.
So:
with open('ref.txt') as f:
f1 = f.read()
for left, right in zip(f1, f1[1:] + ' '):
print left, right
Of course for something this simple, there are many other ways to do the same thing. You can iterate over range(len(f1)) and get 2-element slices, or you can use itertools.zip_longest, or you can write a general-purpose "overlapping adjacent groups of size N from any iterable" function out of itertools.tee and zip, etc.
As you want space between the characters you can use zip function and list comprehension :
>>> s="ABCDEFG"
>>> l=[' '.join(i) for i in zip(s,s[1:])]
['A B', 'B C', 'C D', 'D E', 'E F', 'F G']
>>> for i in l:
... print i
...
A B
B C
C D
D E
E F
F G
if you dont want space just use list comprehension :
>>> [s[i:i+2] for i in range(len(s))]
['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']
I'm looking for an elegant way to convert
lst = [A, B, C, D, E..]
to
lst = [A, B, (C, D), E]
so given that I want to do this on index 2 and 3 but preserve the list. Is there an elegant way to perform this? I was looking with a lambda function but I did not see it.
Just alter in-place:
lst[2:4] = [tuple(lst[2:4])]
The slice assignment ensures we are replacing the old elements with the contents of the list on the right-hand side of the assignment, which contains just the one tuple.
Demo:
>>> lst = ['A', 'B', 'C', 'D', 'E']
>>> lst[2:4] = [tuple(lst[2:4])]
>>> lst
['A', 'B', ('C', 'D'), 'E']
You could use:
lst[2] = lst[2], lst.pop(3)
or more generally:
lst[i] = lst[i], lst.pop(i+1)
However you must insure that both indices are valid in avoidIndexErrorexceptions.