Is there a way to take a string that is 4*x characters long, and cut it into 4 strings, each x characters long, without knowing the length of the string?
For example:
>>>x = "qwertyui"
>>>split(x, one, two, three, four)
>>>two
'er'
>>> x = "qwertyui"
>>> chunks, chunk_size = len(x), len(x)//4
>>> [ x[i:i+chunk_size] for i in range(0, chunks, chunk_size) ]
['qw', 'er', 'ty', 'ui']
I tried Alexanders answer but got this error in Python3:
TypeError: 'float' object cannot be interpreted as an integer
This is because the division operator in Python3 is returning a float. This works for me:
>>> x = "qwertyui"
>>> chunks, chunk_size = len(x), len(x)//4
>>> [ x[i:i+chunk_size] for i in range(0, chunks, chunk_size) ]
['qw', 'er', 'ty', 'ui']
Notice the // at the end of line 2, to ensure truncation to an integer.
:param s: str; source string
:param w: int; width to split on
Using the textwrap module:
PyDocs-textwrap
import textwrap
def wrap(s, w):
return textwrap.fill(s, w)
:return str:
Inspired by Alexander's Answer
PyDocs-data structures
def wrap(s, w):
return [s[i:i + w] for i in range(0, len(s), w)]
:return list:
Inspired by Eric's answer
PyDocs-regex
import re
def wrap(s, w):
sre = re.compile(rf'(.{{{w}}})')
return [x for x in re.split(sre, s) if x]
:return list:
some_string="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
x=3
res=[some_string[y-x:y] for y in range(x, len(some_string)+x,x)]
print(res)
will produce
['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQR', 'STU', 'VWX', 'YZ']
In Split string every nth character?, "the wolf" gives the most concise answer:
>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']
Here is a one-liner that doesn't need to know the length of the string beforehand:
from functools import partial
from StringIO import StringIO
[l for l in iter(partial(StringIO(data).read, 4), '')]
If you have a file or socket, then you don't need the StringIO wrapper:
[l for l in iter(partial(file_like_object.read, 4), '')]
def split2len(s, n):
def _f(s, n):
while s:
yield s[:n]
s = s[n:]
return list(_f(s, n))
Got an re trick:
In [28]: import re
In [29]: x = "qwertyui"
In [30]: [x for x in re.split(r'(\w{2})', x) if x]
Out[30]: ['qw', 'er', 'ty', 'ui']
Then be a func, it might looks like:
def split(string, split_len):
# Regex: `r'.{1}'` for example works for all characters
regex = r'(.{%s})' % split_len
return [x for x in re.split(regex, string) if x]
Here are two generic approaches. Probably worth adding to your own lib of reusables. First one requires the item to be sliceable and second one works with any iterables (but requires their constructor to accept iterable).
def split_bylen(item, maxlen):
'''
Requires item to be sliceable (with __getitem__ defined)
'''
return [item[ind:ind+maxlen] for ind in range(0, len(item), maxlen)]
#You could also replace outer [ ] brackets with ( ) to use as generator.
def split_bylen_any(item, maxlen, constructor=None):
'''
Works with any iterables.
Requires item's constructor to accept iterable or alternatively
constructor argument could be provided (otherwise use item's class)
'''
if constructor is None: constructor = item.__class__
return [constructor(part) for part in zip(* ([iter(item)] * maxlen))]
#OR: return map(constructor, zip(* ([iter(item)] * maxlen)))
# which would be faster if you need an iterable, not list
So, in topicstarter's case, the usage is:
string = 'Baboons love bananas'
parts = 5
splitlen = -(-len(string) // parts) # is alternative to math.ceil(len/parts)
first_method = split_bylen(string, splitlen)
#Result :['Babo', 'ons ', 'love', ' ban', 'anas']
second_method = split_bylen_any(string, splitlen, constructor=''.join)
#Result :['Babo', 'ons ', 'love', ' ban', 'anas']
length = 4
string = "abcdefgh"
str_dict = [ o for o in string ]
parts = [ ''.join( str_dict[ (j * length) : ( ( j + 1 ) * length ) ] ) for j in xrange(len(string)/length )]
# spliting a string by the length of the string
def len_split(string,sub_string):
n,sub,str1=list(string),len(sub_string),')/^0*/-'
for i in range(sub,len(n)+((len(n)-1)//sub),sub+1):
n.insert(i,str1)
n="".join(n)
n=n.split(str1)
return n
x="divyansh_looking_for_intership_actively_contact_Me_here"
sub="four"
print(len_split(x,sub))
# Result-> ['divy', 'ansh', 'tiwa', 'ri_l', 'ooki', 'ng_f', 'or_i', 'nter', 'ship', '_con', 'tact', '_Me_', 'here']
There is a built in function in python for that
import textwrap
text = "Your Text.... and so on"
width = 5 #
textwrap.wrap(text,width)
Vualla
And for dudes who prefer it to be a bit more readable:
def itersplit_into_x_chunks(string,x=10): # we assume here that x is an int and > 0
size = len(string)
chunksize = size//x
for pos in range(0, size, chunksize):
yield string[pos:pos+chunksize]
output:
>>> list(itersplit_into_x_chunks('qwertyui',x=4))
['qw', 'er', 'ty', 'ui']
My solution
st =' abs de fdgh 1234 556 shg shshh'
print st
def splitStringMax( si, limit):
ls = si.split()
lo=[]
st=''
ln=len(ls)
if ln==1:
return [si]
i=0
for l in ls:
st+=l
i+=1
if i <ln:
lk=len(ls[i])
if (len(st))+1+lk < limit:
st+=' '
continue
lo.append(st);st=''
return lo
############################
print splitStringMax(st,7)
# ['abs de', 'fdgh', '1234', '556', 'shg', 'shshh']
print splitStringMax(st,12)
# ['abs de fdgh', '1234 556', 'shg shshh']
l = 'abcdefghijklmn'
def group(l,n):
tmp = len(l)%n
zipped = zip(*[iter(l)]*n)
return zipped if tmp == 0 else zipped+[tuple(l[-tmp:])]
print group(l,3)
The string splitting is required in many cases like where you have to sort the characters of the string given, replacing a character with an another character etc. But all these operations can be performed with the following mentioned string splitting methods.
The string splitting can be done in two ways:
Slicing the given string based on the length of split.
Converting the given string to a list with list(str) function, where characters of the string breakdown to form the the elements of a list. Then do the required operation and join them with 'specified character between the characters of the original string'.join(list) to get a new processed string.
Related
Say I have strings such as 'ABC)D.' or 'AB:CD/'. How can I split them at the first non-alphabetic character to end up with ['ABC', 'D.'] and ['AB', 'CD/']? Is there a way to do this without regex?
You can use a loop
a = 'AB$FDWRE'
i = 0
while i<len(a) and a[i].isalpha():
i += 1
>>> a[:i]
'AB'
>>> a[i:]
'$FDWRE'
One option would be to find the location of the first non-alphabetic character:
def split_at_non_alpha(s):
try:
split_at = next(i for i, x in enumerate(s) if not x.isalpha())
return s[:split_at], s[split_at+1:]
except StopIteration: # if not found
return (s,)
print(split_at_non_alpha('ABC)D.')) # ('ABC', 'D.')
print(split_at_non_alpha('AB:CD/')) # ('AB', 'CD/')
print(split_at_non_alpha('.ABCD')) # ('', 'ABCD')
print(split_at_non_alpha('ABCD.')) # ('ABCD', '')
print(split_at_non_alpha('ABCD')) # ('ABCD',)
With for loop, enumerate, and string indexing:
def first_non_alpha_splitter(word):
for index, char in enumerate(word):
if not char.isalpha():
break
return [word[:index], word[index+1:]]
The result
first_non_alpha_splitter('ABC)D.')
# Output: ['ABC', 'D.']
first_non_alpha_splitter('AB:CD/')
# Output: ['AB', 'CD/']
Barmar's suggestion's worked best for me. The other answers had near the same execution time but I chose the former for readability.
from itertools import takewhile
str = 'ABC)D.'
alphStr = ''.join(takewhile(lambda x: x.isalpha(), str))
print(alphStr) # Outputs 'ABC'
I have string below,and I want to get list,dict,var from this string.
How can I to split this string to specific format?
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
import re
m1 = re.findall (r'(?=.*,)(.*?=\[.+?\],?)',s)
for i in m1 :
print('m1:',i)
I only get result 1 correctly.
Does anyone know how to do?
m1: list_c=[1,2],
m1: a=3,b=1.3,c=abch,list_a=[1,2],
Use '=' to split instead, then you can work around with variable name and it's value.
You still need to handle the type casting for values (regex, split, try with casting may help).
Also, same as others' comment, using dict may be easier to handle
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
al = s.split('=')
var_l = [al[0]]
value_l = []
for a in al[1:-1]:
var_l.append(a.split(',')[-1])
value_l.append(','.join(a.split(',')[:-1]))
value_l.append(al[-1])
output = dict(zip(var_l, value_l))
print(output)
You may have better luck if you more or less explicitly describe the right-hand side expressions: numbers, lists, dictionaries, and identifiers:
re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
# [('list_c', '[1,2]'), ('a', '3'), ('b', '1.3'), ('c', 'abch'),
# ('list_a', '[1,2]'), ('dict_a', '{a:2,b:3}')]
The answer is like below
import re
from pprint import pprint
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1],Save,Record,dict_a={a:2,b:3}'
m1 = re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
temp_d = {}
for i,j in m1:
temp = i.strip(',').split(',')
if len(temp)>1:
for k in temp[:-1]:
temp_d[k]=''
temp_d[temp[-1]] = j
else:
temp_d[temp[0]] = j
pprint(temp_d)
Output is like
{'Record': '',
'Save': '',
'a': '3',
'b': '1.3',
'c': 'abch',
'dict_a': '{a:2,b:3}',
'list_a': '[1]',
'list_c': '[1,2]'}
Instead of picking out the types, you can start by capturing the identifiers. Here's a regex that captures all the identifiers in the string (for lowercase only, but see note):
regex = re.compile(r'([a-z]|_)+=')
#note if you want all valid variable names: r'([a-z]|[A-Z]|[0-9]|_)+'
cases = [x.group() for x in re.finditer(regex, s)]
This gives a list of all the identifiers in the string:
['list_c=', 'a=', 'b=', 'c=', 'list_a=', 'dict_a=']
We can now define a function to sequentially chop up s using the
above list to partition the string sequentially:
def chop(mystr, mylist):
temp = mystr.partition(mylist[0])[2]
cut = temp.find(mylist[1]) #strip leading bits
return mystr.partition(mylist[0])[2][cut:], mylist[1:]
mystr = s[:]
temp = [mystr]
mylist = cases[:]
while len() > 1:
mystr, mylist = chop(mystr, mylist)
temp.append(mystr)
This (convoluted) slicing operation gives this list of strings:
['list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'list_a=[1,2],dict_a={a:2,b:3}',
'dict_a={a:2,b:3}']
Now cut off the ends using each successive entry:
result = []
for x in range(len(temp) - 1):
cut = temp[x].find(temp[x+1]) - 1 #-1 to remove commas
result.append(temp[x][:cut])
result.append(temp.pop()) #get the last item
Now we have the full list:
['list_c=[1,2]', 'a=3', 'b=1.3', 'c=abch', 'list_a=[1,2]', 'dict_a={a:2,b:3}']
Each element is easily parsable into key:value pairs (and is also executable via exec).
Let's they I have the list ['abc', 'def', 'gh'] I need to get a string with the contents of the first char of the first string, the first of the second and so on.
So the result would look like this: "adgbehcf" But the problem is that the last string in the array could have two or one char.
I already tried to nested for loop but that didn't work.
Code:
n = 3 # The encryption number
for i in range(n):
x = [s[i] for s in partiallyEncrypted]
fullyEncrypted.append(x)
a version using itertools.zip_longest:
from itertools import zip_longest
lst = ['abc', 'def', 'gh']
strg = ''.join(''.join(item) for item in zip_longest(*lst, fillvalue=''))
print(strg)
to get an idea why this works it may help having a look at
for tpl in zip_longest(*lst, fillvalue=''):
print(tpl)
I guess you can use:
from itertools import izip_longest
l = ['abc', 'def', 'gh']
print "".join(filter(None, [i for sub in izip_longest(*l) for i in sub]))
# adgbehcf
Having:
l = ['abc', 'def', 'gh']
This would work:
s = ''
In [18]: for j in range(0, len(max(l, key=len))):
...: for elem in l:
...: if len(elem) > j:
...: s += elem[j]
In [28]: s
Out[28]: 'adgbehcf'
Please don't use this:
''.join(''.join(y) for y in zip(*x)) +
''.join(y[-1] for y in x if len(y) == max(len(j) for j in x))
I need to split text before the second occurrence of the '-' character. What I have now is producing inconsistent results. I've tried various combinations of rsplit and read through and tried other solutions on SO, with no results.
Sample file name to split: 'some-sample-filename-to-split' returned in data.filename. In this case, I would only like to have 'some-sample' returned.
fname, extname = os.path.splitext(data.filename)
file_label = fname.rsplit('/',1)[-1]
file_label2 = file_label.rsplit('-',maxsplit=3)
print(file_label2,'\n','---------------','\n')
You can do something like this:
>>> a = "some-sample-filename-to-split"
>>> "-".join(a.split("-", 2)[:2])
'some-sample'
a.split("-", 2) will split the string upto the second occurrence of -.
a.split("-", 2)[:2] will give the first 2 elements in the list. Then simply join the first 2 elements.
OR
You could use regular expression : ^([\w]+-[\w]+)
>>> import re
>>> reg = r'^([\w]+-[\w]+)'
>>> re.match(reg, a).group()
'some-sample'
EDIT: As discussed in the comments, here is what you need:
def hyphen_split(a):
if a.count("-") == 1:
return a.split("-")[0]
return "-".join(a.split("-", 2)[:2])
>>> hyphen_split("some-sample-filename-to-split")
'some-sample'
>>> hyphen_split("some-sample")
'some'
A generic form to split a string into halves on the nth occurence of the separator would be:
def split(strng, sep, pos):
strng = strng.split(sep)
return sep.join(strng[:pos]), sep.join(strng[pos:])
If pos is negative it will count the occurrences from the end of string.
>>> strng = 'some-sample-filename-to-split'
>>> split(strng, '-', 3)
('some-sample-filename', 'to-split')
>>> split(strng, '-', -4)
('some', 'sample-filename-to-split')
>>> split(strng, '-', 1000)
('some-sample-filename-to-split', '')
>>> split(strng, '-', -1000)
('', 'some-sample-filename-to-split')
You can use str.index():
def hyphen_split(s):
pos = s.index('-')
try:
return s[:s.index('-', pos + 1)]
except ValueError:
return s[:pos]
test:
>>> hyphen_split("some-sample-filename-to-split")
'some-sample'
>>> hyphen_split("some-sample")
'some'
You could use regular expressions:
import re
file_label = re.search('(.*?-.*?)-', fname).group(1)
When proceeding with the dataframe and the split needed
for the entire column values, lambda function is better than regex.
df['column_name'].apply(lambda x: "-".join(x.split('-',2)[:2]))
Here's a somewhat cryptic implementation avoiding the use of join():
def split(string, sep, n):
"""Split `stringĀ“ at the `n`th occurrence of `sep`"""
pos = reduce(lambda x, _: string.index(sep, x + 1), range(n + 1), -1)
return string[:pos], string[pos + len(sep):]
I am trying to convert 10000000C9ABCDEF to 10:00:00:00:c9:ab:cd:ef
This is needed because 10000000C9ABCDEF format is how I see HBAs or host bust adapaters when I login to my storage arrays. But the SAN Switches understand 10:00:00:00:c9:ab:cd:ef notation.
I have only been able to accomplish till the following:
#script to convert WWNs to lowercase and add the :.
def wwn_convert():
while True:
wwn = (input('Enter the WWN or q to quit- '))
list_wwn = list(wwn)
list_wwn = [x.lower() for x in list_wwn]
lower_wwn = ''.join(list_wwn)
print(lower_wwn)
if wwn == 'q':
break
wwn_convert()
I tried ':'.join, but that inserts : after each character, so I get 1:0:0:0:0:0:0:0:c:9:a:b:c:d:e:f
I want the .join to go through a loop where I can say something like for i in range (0, 15, 2) so that it inserts the : after two characters, but not quite sure how to go about it. (Good that Python offers me to loop in steps of 2 or any number that I want.)
Additionally, I will be thankful if someone could direct me to pointers where I could script this better...
Please help.
I am using Python Version 3.2.2 on Windows 7 (64 Bit)
Here is another option:
>>> s = '10000000c9abcdef'
>>> ':'.join(a + b for a, b in zip(*[iter(s)]*2))
'10:00:00:00:c9:ab:cd:ef'
Or even more concise:
>>> import re
>>> ':'.join(re.findall('..', s))
'10:00:00:00:c9:ab:cd:ef'
>>> s = '10000000C9ABCDEF'
>>> ':'.join([s[x:x+2] for x in range(0, len(s)-1, 2)])
'10:00:00:00:C9:AB:CD:EF'
Explanation:
':'.join(...) returns a new string inserting ':' between the parts of the iterable
s[x:x+2] returns a substring of length 2 starting at x from s
range(0, len(s) - 1, 2) returns a list of integers with a step of 2
so the list comprehension would split the string s in substrings of length 2, then the join would put them back together but inserting ':' between them.
>>> s='10000000C9ABCDEF'
>>> si=iter(s)
>>> ':'.join(c.lower()+next(si).lower() for c in si)
>>> '10:00:00:00:c9:ab:cd:ef'
In lambda form:
>>> (lambda x: ':'.join(c.lower()+next(x).lower() for c in x))(iter(s))
'10:00:00:00:c9:ab:cd:ef'
I think what would help you out the most is a construction in python called a slice. I believe that you can use them on any iterable object, including strings, making them quite useful and something that is generally a very good idea to know how to use.
>>> s = '10000000C9ABCDEF'
>>> [s.lower()[i:i+2] for i in range(0, len(s)-1, 2)]
['10', '00', '00', '00', 'c9', 'ab', 'cd', 'ef']
>>> ':'.join([s.lower()[i:i+2] for i in range(0, len(s)-1, 2)])
'10:00:00:00:c9:ab:cd:ef'
If you'd like to read some more about slices, they're explained very nicely in this question, as well as a part of the actual python documentation.
It may be done using grouper recipe from here.
from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
Using this function, the code will look like:
def join(it):
for el in it:
yield ''.join(el)
':'.join(join(grouper(2, s)))
It works this way:
grouper(2,s) returns tuples '1234...' -> ('1','2'), ('3','4') ...
def join(it) does this: ('1','2'), ('3','4') ... -> '12', '34' ...
':'.join(...) creates a string from iterator: '12', '34' ... -> '12:34...'
Also, it may be rewritten as:
':'.join(''.join(el) for el in grouper(2, s))
Here is my simple, straightforward solution:
s = '10000000c9abcdef'
new_s = str()
for i in range(0, len(s)-1, 2):
new_s += s[i:i+2]
if i+2 < len(s):
new_s += ':'
>>> new_s
'10:00:00:00:c9:ab:cd:ef'