How to split a text to line by line in python [duplicate] - python

Is it possible to split a string every nth character?
For example, suppose I have a string containing the following:
'1234567890'
How can I get it to look like this:
['12','34','56','78','90']
For the same question with a list, see How do I split a list into equally-sized chunks?. The same techniques generally apply, though there are some variations.

>>> line = '1234567890'
>>> n = 2
>>> [line[i:i+n] for i in range(0, len(line), n)]
['12', '34', '56', '78', '90']

Just to be complete, you can do this with a regex:
>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']
For odd number of chars you can do this:
>>> import re
>>> re.findall('..?', '123456789')
['12', '34', '56', '78', '9']
You can also do the following, to simplify the regex for longer chunks:
>>> import re
>>> re.findall('.{1,2}', '123456789')
['12', '34', '56', '78', '9']
And you can use re.finditer if the string is long to generate chunk by chunk.

There is already an inbuilt function in python for this.
>>> from textwrap import wrap
>>> s = '1234567890'
>>> wrap(s, 2)
['12', '34', '56', '78', '90']
This is what the docstring for wrap says:
>>> help(wrap)
'''
Help on function wrap in module textwrap:
wrap(text, width=70, **kwargs)
Wrap a single paragraph of text, returning a list of wrapped lines.
Reformat the single paragraph in 'text' so it fits in lines of no
more than 'width' columns, and return a list of wrapped lines. By
default, tabs in 'text' are expanded with string.expandtabs(), and
all other whitespace characters (including newline) are converted to
space. See TextWrapper class for available keyword args to customize
wrapping behaviour.
'''

Another common way of grouping elements into n-length groups:
>>> s = '1234567890'
>>> map(''.join, zip(*[iter(s)]*2))
['12', '34', '56', '78', '90']
This method comes straight from the docs for zip().

I think this is shorter and more readable than the itertools version:
def split_by_n(seq, n):
'''A generator to divide a sequence into chunks of n units.'''
while seq:
yield seq[:n]
seq = seq[n:]
print(list(split_by_n('1234567890', 2)))

Using more-itertools from PyPI:
>>> from more_itertools import sliced
>>> list(sliced('1234567890', 2))
['12', '34', '56', '78', '90']

I like this solution:
s = '1234567890'
o = []
while s:
o.append(s[:2])
s = s[2:]

You could use the grouper() recipe from itertools:
Python 2.x:
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
Python 3.x:
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
These functions are memory-efficient and work with any iterables.

This can be achieved by a simple for loop.
a = '1234567890a'
result = []
for i in range(0, len(a), 2):
result.append(a[i : i + 2])
print(result)
The output looks like
['12', '34', '56', '78', '90', 'a']

I was stuck in the same scenario.
This worked for me:
x = "1234567890"
n = 2
my_list = []
for i in range(0, len(x), n):
my_list.append(x[i:i+n])
print(my_list)
Output:
['12', '34', '56', '78', '90']

Try the following code:
from itertools import islice
def split_every(n, iterable):
i = iter(iterable)
piece = list(islice(i, n))
while piece:
yield piece
piece = list(islice(i, n))
s = '1234567890'
print list(split_every(2, list(s)))

Try this:
s='1234567890'
print([s[idx:idx+2] for idx,val in enumerate(s) if idx%2 == 0])
Output:
['12', '34', '56', '78', '90']

>>> from functools import reduce
>>> from operator import add
>>> from itertools import izip
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x)]
['12', '34', '56', '78', '90']
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x, x)]
['123', '456', '789']

As always, for those who love one liners
n = 2
line = "this is a line split into n characters"
line = [line[i * n:i * n+n] for i,blah in enumerate(line[::n])]

more_itertools.sliced has been mentioned before. Here are four more options from the more_itertools library:
s = "1234567890"
["".join(c) for c in mit.grouper(2, s)]
["".join(c) for c in mit.chunked(s, 2)]
["".join(c) for c in mit.windowed(s, 2, step=2)]
["".join(c) for c in mit.split_after(s, lambda x: int(x) % 2 == 0)]
Each of the latter options produce the following output:
['12', '34', '56', '78', '90']
Documentation for discussed options: grouper, chunked, windowed, split_after

A simple recursive solution for short string:
def split(s, n):
if len(s) < n:
return []
else:
return [s[:n]] + split(s[n:], n)
print(split('1234567890', 2))
Or in such a form:
def split(s, n):
if len(s) < n:
return []
elif len(s) == n:
return [s]
else:
return split(s[:n], n) + split(s[n:], n)
, which illustrates the typical divide and conquer pattern in recursive approach more explicitly (though practically it is not necessary to do it this way)

A solution with groupby:
from itertools import groupby, chain, repeat, cycle
text = "wwworldggggreattecchemggpwwwzaz"
n = 3
c = cycle(chain(repeat(0, n), repeat(1, n)))
res = ["".join(g) for _, g in groupby(text, lambda x: next(c))]
print(res)
Output:
['www', 'orl', 'dgg', 'ggr', 'eat', 'tec', 'che', 'mgg', 'pww', 'wza', 'z']

These answers are all nice and working and all, but the syntax is so cryptic... Why not write a simple function?
def SplitEvery(string, length):
if len(string) <= length: return [string]
sections = len(string) / length
lines = []
start = 0;
for i in range(sections):
line = string[start:start+length]
lines.append(line)
start += length
return lines
And call it simply:
text = '1234567890'
lines = SplitEvery(text, 2)
print(lines)
# output: ['12', '34', '56', '78', '90']

Another solution using groupby and index//n as the key to group the letters:
from itertools import groupby
text = "abcdefghij"
n = 3
result = []
for idx, chunk in groupby(text, key=lambda x: x.index//n):
result.append("".join(chunk))
# result = ['abc', 'def', 'ghi', 'j']

Related

Break a binary string down to segments

The task here is to break down a string 110011110110000 into a list:
['11', '00', '1111', '0', '11', '0000']
My solution is
str1='110011110110000'
seg = []
a0=str1[0]
seg0=''
for a in str1:
print('a=',a)
if a==a0:
seg0=seg0+a
else:
print('seg0=',seg0)
seg.append(seg0)
seg0=a
a0=a
seg.append(seg0)
seg
It's ugly and I am sure you guys out there have a one-liner for this. Maybe regex?
You can use itertools.groupby (doc):
str1='110011110110000'
from itertools import groupby
l = [v * len([*g]) for v, g in groupby(str1)]
print(l)
Prints:
['11', '00', '1111', '0', '11', '0000']
EDIT: version with regex:
str1='110011110110000'
import re
print([g[0] for g in re.findall(r'((\d)\2*)', str1)])
Here is a regex solution:
result = [x[0] for x in re.findall(r'(([10])\2*)', str1)]
The regex is (([10])\2*), find a 0 or 1, then keep looking for that same thing. Since findall returns all groups in the match, we need to map it to the first group (Group 2 is the ([10]) bit).
Here is an iterative regex approach, using the simple pattern 1+|0+:
str1 = "110011110110000"
pattern = re.compile(r'(1+|0+)')
result = []
for m in re.finditer(pattern, str1):
result.append(m.group(0))
print(result)
This prints:
['11', '00', '1111', '0', '11', '0000']
Note that we might want to instead use re.split here. The problem with re.split is that it doesn't seem to support splitting on lookarounds. In other languages, such as Java, we could try splitting on this pattern:
(?<=0)(?=1)|(?<=1)(?=0)
This would nicely generate the array/list we expect.
one line solution using groupy
from itertools import groupby
text='1100111101100001'
sol = [''.join(group) for key, group in groupby(text)]
print(sol)
output
['11', '00', '1111', '0', '11', '0000', '1']
not regex solution, but improvement on ur code
str1='110011110110000'
def func(string):
tmp = string[0]
res =[]
for i, v in enumerate(string, 1):
if v==tmp[-1]:
tmp+=v
else:
res.append(tmp)
tmp=v
res.append(tmp)
return res
print(func(str1))
output
['111', '00', '1111', '0', '11', '0000']
You can use general regex (.)\1*
(.) - match single character (any) and store it in first capturing group
\1* - repeat what's ca[ptured in first captruing group zero or more times
Demo
Matches collection will be your desired result.

Generated List consists of [Apparantly] unaccounted whitespaces in this code snippet

For a routine question on python programming, I was asked to generate a list of string sliced from one string (let's call it as target_string), with the length of each sliced string increasing from 1 to the length of string.
For example, if target_string is '123', I would have to generate the list like this : ['1', '2', '3', '12', '23', '123'].
For this, I wrote a code snippet that was something like this:
target_string = raw_input("Target String:")
length = len(target_string)
number_list = []
for i in range(length):
for j in range(length):
if j + i <= length:
number_list.append(target_string[j:j + i])
print(number_list)
On execution of this the result was:
Target String:12345
['', '', '', '', '', '1', '2', '3', '4', '5', '12', '23', '34', '45', '123', '234', '345', '1234', '2345']
The first thing I noticed is that the list consists of whitespaces as elements, and the number of whitespaces is equal to the length of the target_string. Why does this happen? Any kind of clarification and help is welcome.
P.S: I have a temperory workaround to generate the list that I need:
target_string = raw_input("Target String:")
length = len(target_string)
number_list = []
for i in range(length):
for j in range(length):
if j + i <= length:
number_list.append(target_string[j:j + i])
number_list.append(target_string)
del number_list[0:length]
target_list = [int(i) for i in number_list]
print(target_list)
Also feel free to suggest any changes or modifications to this, or any approach you would feel is more efficient and pythonic. Thanks in advance.
Edit: This is implemented in Pycharm, on Windows 10 , using Python 2.7, but please feel free to give the solutions in both the Python 2.7 and 3.X versions.
You can use itertools.combinations, then get the ones that the indexes are continuously adding 1, use ''.join(..) for converting it o a string and add it using .extend(..):
Python 2.7:
import itertools
target_string = raw_input("Target String:")
l=[]
for i in range(1,len(target_string)+1):
l.extend([''.join(i) for i in itertools.combinations(target_string,i) if all(int(y)-int(x)==1 for x, y in zip(i, i[1:]))])
print l
Output:
['1', '2', '3', '12', '23', '123']
Python 3.x:
import itertools
target_string = input("Target String:")
l=[]
for i in range(1,len(target_string)+1):
l.extend([''.join(i) for i in itertools.combinations(target_string,i) if all(int(y)-int(x)==1 for x, y in zip(i, i[1:]))])
print(l)
Output:
['1', '2', '3', '12', '23', '123']
Explaining why you got whitespaces in your code snippet.
Have a look at the loop part:
for i in range(length):
for j in range(length):
if j + i <= length:
number_list.append(target_string[j:j + i])
Here, both i and j gets initiated with 0.
So when we decode it, it comes like:
i = 0:
j=0:
0+0 < length
number_list.append(for i in range(length):
for j in range(length):
if j + i <= length:
number_list.append(target_string[0:0 + 0])) --> ['']
and so on.....

How to convert a string into a list without including a specific character plus without using replace and split methods?

Let's say you have:
x = "1,2,13"
and you want to achieve:
list = ["1","2","13"]
Can you do it without the split and replace methods?
What I have tried:
list=[]
for number in x:
if number != ",":
list.append(number)
print(list) # ['1', '2', '1', '3']
but this works only if its a single digit
You could use a regular expression:
>>> import re
>>> re.findall('(\d+)', '123,456')
['123', '456']
Here is a way using that assumes integers using itertools:
>>> import itertools
>>> x = "1,88,22"
>>> ["".join(g) for b,g in itertools.groupby(x,str.isdigit) if b]
['1', '88', '22']
>>>
Here is a method that uses traditional looping:
>>> digit = ""
>>> digit_list = []
>>> for c in x:
... if c.isdigit():
... digit += c
... elif c == ",":
... digit_list.append(digit)
... digit = ""
... else:
... digit_list.append(digit)
...
>>> digit_list
['1', '88', '22']
>>>
In the real world, you'd probably just use regex...

Best way of getting the longest sequence of even digits in an integer

What would be the most efficient way of getting the longest sequence of even digits in an integer in Python? For example, if I have a number 2456890048, the longest sequence should be 0048.
Should the integer be converted to a string to determine the longest sequence? Or should it be converted into the list and then, based on the indexes of each item, we would determine which sequence is the longest? Or is there a more efficient way that I am not aware of ( i am quite new to Python, and i am not sure what would be the best way to tackle this problem).
You can use itertools.groupby and max:
>>> from itertools import groupby
def solve(strs):
return max((list(g) for k, g in groupby(strs, key=lambda x:int(x)%2) if not k),
key=len)
...
>>> solve('2456890048') #or pass `str(2456890048)` if you've integers.
['0', '0', '4', '8']
>>> solve('245688888890048')
['6', '8', '8', '8', '8', '8', '8']
Here:
[list(g) for k, g in groupby('2456890048', key=lambda x:int(x)%2) if not k]
returns:
[['2', '4'], ['6', '8'], ['0', '0', '4', '8']]
Now we can apply max on this list(with key=len) to get the longest sequence. (Note that in the original code I am using a generator expression with max, so the list is not created in the memory.)
I think it is one of most efficient way
def longest(i):
curMax = m = 0
while i != 0:
d = i % 10 % 2
i = i / 10
if d == 0:
curMax += 1
else:
m = max(m, curMax)
curMax = 0
return max(m, curMax)
print longest(2456890048)
You can extract all runs of even numbers using a regexp, and find the longest using max.
import re
def longest_run(d):
return max(re.findall('[02468]+', str(d)), key=len)

How to split a string in Python by 2 or 3, etc [duplicate]

This question already has answers here:
Split string every nth character?
(19 answers)
How to iterate over a list in chunks
(39 answers)
Closed 9 years ago.
Does anyone know if it's possible in python to split a string, not necessarily by a space or commas, but just by every other entry in the string? or every 3rd or 4th etc.
For example if I had "12345678" as my string, is there a way to split it into "12", "34", "56", 78"?
You can use list comprehension:
>>> x = "123456789"
>>> [x[i : i + 2] for i in range(0, len(x), 2)]
['12', '34', '56', '78', '9']
You can use a list comprehension. Iterate over your string and grab every two characters using slicing and the extra options in the range function.
s = "12345678"
print([s[i:i+2] for i in range(0, len(s), 2)]) # >>> ['12', '34', '56', '78']
What you want is the itertools grouper() recipe, which takes any arbitrary iterable and gives you groups of n items from that iterable:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
(Note that in 2.x, this is slightly different as zip_longest() is called izip_longest()!)
E.g:
>>> list(grouper("12345678", 2))
[('1', '2'), ('3', '4'), ('5', '6'), ('7', '8')]
You can then rejoin the strings with a simple list comprehension:
>>> ["".join(group) for group in grouper("12345678", 2)]
['12', '34', '56', '78']
If you might have less than a complete set of values, just use fillvalue="":
>>> ["".join(group) for group in grouper("123456789", 2, fillvalue="")]
['12', '34', '56', '78', '9']

Categories

Resources