How to split a string into characters in python

How to split a string into characters in python - python

I have a string 'ABCDEFG'
I want to be able to list each character sequentially followed by the next one.
Example
A B
B C
C D
D E
E F
F G
G
Can you tell me an efficient way of doing this? Thanks

In Python, a string is already seen as an enumerable list of characters, so you don't need to split it; it's already "split". You just need to build your list of substrings.
It's not clear what form you want the result in. If you just want substrings, this works:
s = 'ABCDEFG'
[s[i:i+2] for i in range(len(s))]
#=> ['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']
If you want the pairs to themselves be lists instead of strings, just call list on each one:
[list([s[i:i+2]) for i in range(len(s))]
#=> [['A', 'B'], ['B', 'C'], ['C', 'D'], ['D', 'E'], ['E', 'F'], ['F', 'G'], ['G']]
And if you want strings after all, but with something like a space between the letters, join them back together after the list call:
[' '.join(list(s[i:i+2])) for i in range(len(s))]
#=> ['A B', 'B C', 'C D', 'D E', 'E F', 'F G', 'G']

You need to keep the last character, so use izip_longest from itertools
>>> import itertools
>>> s = 'ABCDEFG'
>>> for c, cnext in itertools.izip_longest(s, s[1:], fillvalue=''):
... print c, cnext
...
A B
B C
C D
D E
E F
F G
G

def doit(input):
for i in xrange(len(input)):
print input[i] + (input[i + 1] if i != len(input) - 1 else '')
doit("ABCDEFG")
Which yields:
>>> doit("ABCDEFG")
AB
BC
CD
DE
EF
FG
G

There's an itertools pairwise recipe for exactly this use case:
import itertools
def pairwise(myStr):
a,b = itertools.tee(myStr)
next(b,None)
for s1,s2 in zip(a,b):
print(s1,s2)
Output:
In [121]: pairwise('ABCDEFG')
A B
B C
C D
D E
E F
F G

Your problem is that you have a list of strings, not a string:
with open('ref.txt') as f:
f1 = f.read().splitlines()
f.read() returns a string. You call splitlines() on it, getting a list of strings (one per line). If your input is actually 'ABCDEFG', this will of course be a list of one string, ['ABCDEFG'].
l = list(f1)
Since f1 is already a list, this just makes l a duplicate copy of that list.
print l, f1, len(l)
And this just prints the list of lines, and the copy of the list of lines, and the number of lines.
So, first, what happens if you drop the splitlines()? Then f1 will be the string 'ABCDEFG', instead of a list with that one string. That's a good start. And you can drop the l part entirely, because f1 is already an iterable of its characters; list(f1) will just be a different iterable of the same characters.
So, now you want to print each letter with the next letter. One way to do that is by zipping 'ABCDEFG' and 'BCDEFG '. But how do you get that 'BCDEFG '? Simple; it's just f1[1:] + ' '.
So:
with open('ref.txt') as f:
f1 = f.read()
for left, right in zip(f1, f1[1:] + ' '):
print left, right
Of course for something this simple, there are many other ways to do the same thing. You can iterate over range(len(f1)) and get 2-element slices, or you can use itertools.zip_longest, or you can write a general-purpose "overlapping adjacent groups of size N from any iterable" function out of itertools.tee and zip, etc.

As you want space between the characters you can use zip function and list comprehension :
>>> s="ABCDEFG"
>>> l=[' '.join(i) for i in zip(s,s[1:])]
['A B', 'B C', 'C D', 'D E', 'E F', 'F G']
>>> for i in l:
... print i
...
A B
B C
C D
D E
E F
F G
if you dont want space just use list comprehension :
>>> [s[i:i+2] for i in range(len(s))]
['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']

Related

How to split a string which has blank to list?

I have next code:
can="p1=a b c p2=d e f g"
new = can.split()
print(new)
When I execute above, I got next:
['p1=a', 'b', 'c', 'p2=d', 'e', 'f', 'g']
But what I really need is:
['p1=a b c', 'p2=d e f g']
a b c is the value of p1, d e f g is the value of p2, how could I make my aim? Thank you!

If you want to have ['p1=a b c', 'p2=d e f g'], you can split using a regex:
import re
new = re.split(r'\s+(?=\w+=)', can)
If you want a dictionary {'p1': 'a b c', 'p2': 'd e f g'}, further split on =:
import re
new = dict(x.split('=', 1) for x in re.split(r'\s+(?=\w+=)', can))
regex demo

You can just match your desired results, looking for a variable name, then equals and characters until you get to either another variable name and equals, or the end-of-line:
import re
can="p1=a b c p2=d e f g"
re.findall(r'\w+=.*?(?=\s*\w+=|$)', can)
Output:
['p1=a b c', 'p2=d e f g']

All possible substring in Python

Can anyone help me with finding all the possible substring in a string using python?
E.g:
string = 'abc'
output
a, b, c, ab, bc, abc
P.s : I am a beginner and would appreciate if the solution is simple to understand.

You could do something like:
for length in range(len(string)):
for index in range(len(string) - length):
print(string[index:index+length+1])
Output:
a
b
c
ab
bc
abc

else one way is using the combinations
from itertools import combinations
s = 'abc'
[
''.join(x)
for size in range(1, len(s) + 1)
for x in (combinations(s, size))
]
Out
['a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']

Every substring contains a unique start index and a unique end index (which is greater than the start index). You can use two for loops to get all unique combinations of indices.
def all_substrings(s):
all_subs = []
for end in range(1, len(s) + 1):
for start in range(end):
all_subs.append(s[start:end])
return all_subs
s = 'abc'
print(all_substrings(s)) # prints ['a', 'ab', 'b', 'abc', 'bc', 'c']

You can do like:
def subString(s):
for i in range(len(s)):
for j in range(i+1,len(s)+1):
print(s[i:j])
subString("aashu")
a
aa
aas
aash
aashu
a
as
ash
ashu
s
sh
shu
h
hu
u

how to enumerate / zip as lambda

Is there a way to replace the for-loop in the groupList function with a lambda function, perhaps with map(), in Python 3.
def groupList(input_list, output_list=[]):
for i, (v, w) in enumerate(zip(input_list[:-2], input_list[2:])):
output_list.append(f'{input_list[i]} {input_list[i+1]} {input_list[i+2]}')
return output_list
print(groupList(['A', 'B', 'C', 'D', 'E', 'F', 'G']))
(Output from the groupList function would be ['A B C', 'B C D', 'C D E', 'D E F', 'E F G'])

Solution 1:
def groupList(input_list):
return [' '.join(input_list[i:i+3]) for i in range(len(input_list) - 2)]
Solution 2:
def groupList(input_list):
return list(map(' '.join, (input_list[i:i+3] for i in range(len(input_list) - 2))))

Besides the previous solutions, a more efficient (but less concise) solution is to compute a full concatenation first and then slice it.
from itertools import accumulate
def groupList(input_list):
full_concat = ' '.join(input_list)
idx = [0]
idx.extend(accumulate(len(s) + 1 for s in input_list))
return [full_concat[idx[i]:idx[i+3]-1] for i in range(len(idx) - 3)]

How to remove certain characters from lists (Python 2.7)?

I've got a list where each element is:
['a ',' b ',' c ',' d\n ']
I want to manipulate it so that each element just becomes:
['a','b','c','d']
I don't think the spaces matter, but for some reason I can't seem to remove the \n from the end of the 4th element. I've tried converting to string and removing it using:
str.split('\n')
No error is returned, but it doesn't do anything to the list, it still has the \n at the end.
I've also tried:
d.replace('\n','')
But this just returns an error.
This is clearly a simple problem but I'm a complete beginner to Python so any help would be appreciated, thank you.
Edit:
It seems I have a list of arrays (I think) so am I right in thinking that list[0], list[1] etc are their own arrays? Does that mean I can use a for loop for i in list to strip \n from each one?

>>> my_array = ['a ',' b ',' c ',' d\n ']
>>> my_array = [c.strip() for c in my_array]
>>> my_array
['a', 'b', 'c', 'd']
If you have a list of arrays then you can do something in the lines of:
>>> list_of_arrays = [['a', 'b', 'c', 'd'], ['a ', ' b ', ' c ', ' d\n ']]
>>> new_list = [[c.strip() for c in array] for array in list_of_arrays]
>>> new_list
[['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']]

Try this -
arr = ['a ',' b ',' c ',' d\n ']
arr = [s.strip() for s in arr]

A very simple answer is join your list, strip the nextline charcter and split to get a new list:
Newlist = ''.join(myList).strip().split()
Your Newlist is now:
['a', 'b', 'c', 'd']

how to turn a string of letters embedded in squared brackets into embedded lists

I'm trying to find a simple way to convert a string like this:
a = '[[a b] [c d]]'
into the corresponding nested list structure, where the letters are turned into strings:
a = [['a', 'b'], ['c', 'd']]
I tried to use
import ast
l = ast.literal_eval('[[a b] [c d]]')
l = [i.strip() for i in l]
as found here
but it doesn't work because the characters a,b,c,d are not within quotes.
in particular I'm looking for something that turns:
'[[X v] -s]'
into:
[['X', 'v'], '-s']

You can use regex to find all items between brackets then split the result :
>>> [i.split() for i in re.findall(r'\[([^\[\]]+)\]',a)]
[['a', 'b'], ['c', 'd']]
The regex r'\[([^\[\]]+)\]' will match anything between square brackets except square brackets,which in this case would be 'a b' and 'c d' then you can simply use a list comprehension to split the character.
Note that this regex just works for the cases like this, which all the characters are between brackets,and for another cases you can write the corresponding regex, also not that the regex tick won't works in all cases .
>>> a = '[[a b] [c d] [e g]]'
>>> [i.split() for i in re.findall(r'\[([^\[\]]+)\]',a)]
[['a', 'b'], ['c', 'd'], ['e', 'g']]

Use isalpha method of string to wrap all characters into brackets:
a = '[[a b] [c d]]'
a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() else x, a))
Now a is:
'[["a" "b"] ["c" "d"]]'
And you can use json.loads (as #a_guest offered):
json.loads(a.replace(' ', ','))

>>> import json
>>> a = '[[a b] [c d]]'
>>> a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() else x, a))
>>> a
'[["a" "b"] ["c" "d"]]'
>>> json.loads(a.replace(' ', ','))
[[u'a', u'b'], [u'c', u'd']]
This will work with any degree of nested lists following the above pattern, e.g.
>>> a = '[[[a b] [c d]] [[e f] [g h]]]'
>>> ...
>>> json.loads(a.replace(' ', ','))
[[[u'a', u'b'], [u'c', u'd']], [[u'e', u'f'], [u'g', u'h']]]
For the specific example of '[[X v] -s]':
>>> import json
>>> a = '[[X v] -s]'
>>> a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() or x=='-' else x, a))
>>> json.loads(a.replace('[ [', '[[').replace('] ]', ']]').replace(' ', ',').replace('][', '],[').replace('""',''))
[[u'X', u'v'], u'-s']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to split a string into characters in python - python

I have a string 'ABCDEFG' I want to be able to list each character sequentially followed by the next one. Example A B B C C D D E E F F G G Can you tell me an efficient way of doing this? Thanks

You need to keep the last character, so use izip_longest from itertools >>> import itertools >>> s = 'ABCDEFG' >>> for c, cnext in itertools.izip_longest(s, s[1:], fillvalue=''): ... print c, cnext ... A B B C C D D E E F F G G

def doit(input): for i in xrange(len(input)): print input[i] + (input[i + 1] if i != len(input) - 1 else '') doit("ABCDEFG") Which yields: >>> doit("ABCDEFG") AB BC CD DE EF FG G

There's an itertools pairwise recipe for exactly this use case: import itertools def pairwise(myStr): a,b = itertools.tee(myStr) next(b,None) for s1,s2 in zip(a,b): print(s1,s2) Output: In [121]: pairwise('ABCDEFG') A B B C C D D E E F F G

Related

How to split a string which has blank to list?

All possible substring in Python

how to enumerate / zip as lambda

How to remove certain characters from lists (Python 2.7)?

how to turn a string of letters embedded in squared brackets into embedded lists

Categories

Resources