Split string by count of characters

Split string by count of characters - python

I can't figure out how to do this with string methods:
In my file I have something like 1.012345e0070.123414e-004-0.1234567891.21423... which means there is no delimiter between the numbers.
Now if I read a line from this file I get a string like above which I want to split after e.g. 12 characters.
There is no way to do this with something like str.split() or any other string method as far as I've seen but maybe I'm overlooking something?
Thx

Since you want to iterate in an unusual way, a generator is a good way to abstract that:
def chunks(s, n):
"""Produce `n`-character chunks from `s`."""
for start in range(0, len(s), n):
yield s[start:start+n]
nums = "1.012345e0070.123414e-004-0.1234567891.21423"
for chunk in chunks(nums, 12):
print chunk
produces:
1.012345e007
0.123414e-00
4-0.12345678
91.21423
(which doesn't look right, but those are the 12-char chunks)

You're looking for string slicing.
>>> x = "1.012345e0070.123414e-004-0.1234567891.21423"
>>> x[2:10]
'012345e0'

line = "1.012345e0070.123414e-004-0.1234567891.21423"
firstNumber = line[:12]
restOfLine = line[12:]
print firstNumber
print restOfLine
Output:
1.012345e007
0.123414e-004-0.1234567891.21423

you can do it like this:
step = 12
for i in range(0, len(string), 12):
slice = string[i:step]
step += 12
in this way on each iteration you will get one slice of 14 characters.

from itertools import izip_longest
def grouper(n, iterable, padvalue=None):
return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

I stumbled on this while looking for a solution for a similar problem - but in my case I wanted to split string into chunks of differing lengths. Eventually I solved it with RE
In [13]: import re
In [14]: random_val = '07eb8010e539e2621cb100e4f33a2ff9'
In [15]: dashmap=(8, 4, 4, 4, 12)
In [16]: re.findall(''.join('(\S{{{}}})'.format(l) for l in dashmap), random_val)
Out[16]: [('07eb8010', 'e539', 'e262', '1cb1', '00e4f33a2ff9')]
Bonus
For those who may find it interesting - I tried to create pseudo-random ID by specific rules, so this code is actually part of the following function
import re, time, random
def random_id_from_time_hash(dashmap=(8, 4, 4, 4, 12)):
random_val = ''
while len(random_val) < sum(dashmap):
random_val += '{:016x}'.format(hash(time.time() * random.randint(1, 1000)))
return '-'.join(re.findall(''.join('(\S{{{}}})'.format(l) for l in dashmap), random_val)[0])

I always thought, since string addition operation is possible by a simple logic, may be division should be like this. When divided by a number, it should split by that length. So may be this is what you are looking for.
class MyString:
def __init__(self, string):
self.string = string
def __div__(self, div):
l = []
for i in range(0, len(self.string), div):
l.append(self.string[i:i+div])
return l
>>> m = MyString(s)
>>> m/3
['abc', 'bdb', 'fbf', 'bfb']
>>> m = MyString('abcd')
>>> m/3
['abc', 'd']
If you don't want to create an entirely new class, simply use this function that re-wraps the core of the above code,
>>> def string_divide(string, div):
l = []
for i in range(0, len(string), div):
l.append(string[i:i+div])
return l
>>> string_divide('abcdefghijklmnopqrstuvwxyz', 15)
['abcdefghijklmno', 'pqrstuvwxyz']

Try this function:
x = "1.012345e0070.123414e-004-0.1234567891.21423"
while len(x)>0:
v = x[:12]
print v
x = x[12:]

Related

n length combinations from two or more digits with repetition limit

I have the letters a & b and from them I want to take n length combinations of which a and b have limit for repetition.
For example, if n = 7, a = 4 and b = 3, here are possible desired outcome would be starting with 'b':
bbbaaaa
bbabaaa
bbaabaa
bbaaaba
bbaaaab
babbaaa
bababaa
babaaba
babaaab
baabbaa
baababa
baabaab
baaabba
baaabab
baaaabb
I've looked into a lot of python & c functions, but none do exactly what I'm asking, and I don't know how to alter/use them into doing so.
What I initially thought of was storing all possible combinations and then picking them where a=(length) of them. However, that easily runs into memory issues...
Thanks

Use itertools.permutations() on the string 'aaaabbb'. It's not "efficient" and you'd need to remove duplicates.
from itertools import permutations
for l in set(permutations('a'*4 + 'b'*3, 7)):
print(*l, sep='')
babbaaa
abaabba
bbaaaab
aaabbba
ababaab
abaaabb
baaaabb
babaaba
aababba
baaabba
aabbaab
abbbaaa
abbaaba
baababa
bababaa
aabaabb
aaabbab
abaabab
bbabaaa
baaabab
aaaabbb
aabbbaa
bbbaaaa
baabbaa
babaaab
aababab
abbabaa
bbaaaba
abababa
baabaab
aaababb
abbaaab
bbaabaa
ababbaa
aabbaba
Generalised into a function:
from itertools import permutations
def f(**kwargs):
population = ''.join(s*n for s,n in kwargs.items())
return (''.join(l) for l in set(permutations(population, len(population))))
>>> f(a=3, b=4)
<generator object f.<locals>.<genexpr> at 0x7fc1ec51fd60>
>>> list(f(a=3, b=4))
['aabbabb', 'bbaaabb', 'bbbbaaa', 'aaabbbb', 'bbbaaab', 'abaabbb', 'bbbaaba', 'baabbab', 'babbaab', 'bbabbaa', 'babaabb', 'babbaba', 'baaabbb', 'aabbbab', 'aabbbba', 'baabbba', 'bbaabab', 'baababb', 'bbabaab', 'aababbb', 'abbbbaa', 'bbaabba', 'bbababa', 'abbabab', 'abababb', 'bababab', 'abbabba', 'bababba', 'abbbaab', 'abbbaba', 'abbaabb', 'babbbaa', 'bbbabaa', 'ababbab', 'ababbba']
>>> print(*(f(a=3, b=4)))
aabbabb bbaaabb bbbbaaa aaabbbb bbbaaab abaabbb bbbaaba baabbab babbaab bbabbaa babaabb babbaba baaabbb aabbbab aabbbba baabbba bbaabab baababb bbabaab aababbb abbbbaa bbaabba bbababa abbabab abababb bababab abbabba bababba abbbaab abbbaba abbaabb babbbaa bbbabaa ababbab ababbba
>>> list(f(a=1,b=1,c=1))
['cab', 'bac', 'abc', 'acb', 'bca', 'cba']

You are looking for permutations, not combinations. Then you cast it as a set to get rid of identical permutations.
import itertools as it
def find_combos(n,a,b):
lst = ["a"]*a + ["b"]*b
return set(it.permutations(lst))
for p in find_combos(7,4,3):
print(p)

I believe your most effecient method is going to be to use the combination values from a range() as the position values for inserting new characters. Also, utilizing a recursive function, I believe we can write a function to accommodate any size alphabet.
from itertools import combinations
letters = 'abcdefghijklmnop'
def combos(*sizes,level=[]):
a = sum(sizes[len(level):])
b = sizes[len(level)]
if a!=b:
for i in combinations(range(a),b):
for r in f(*sizes, level=level + [i]):
yield r
else:
r = [letters[len(sizes)-1]]*sizes[-1]
for l,c in reversed(list(zip(level,letters))):
for i in l:
r.insert(i,c)
yield ''.join(r)
print(list(combos(3,4)))
print(list(combos(2,2)))
print(list(combos(2,1,2)))

Repeating characters results in wrong repetition counts

My function looks like this:
def accum(s):
a = []
for i in s:
b = s.index(i)
a.append(i * (b+1))
x = "-".join(a)
return x.title()
with the expected input of:
'abcd'
the output should be and is:
'A-Bb-Ccc-Dddd'
but if the input has a recurring character:
'abccba'
it returns:
'A-Bb-Ccc-Ccc-Bb-A'
instead of:
'A-Bb-Ccc-Cccc-Bbbbb-Aaaaaa'
how can I fix this?

Don't use str.index(), it'll return the first match. Since c and b and a appear early in the string you get 2, 1 and 0 back regardless of the position of the current letter.
Use the enumerate() function to give you position counter instead:
for i, letter in enumerate(s, 1):
a.append(i * letter)
The second argument is the starting value; setting this to 1 means you can avoid having to + 1 later on. See What does enumerate mean? if you need more details on what enumerate() does.
You can use a list comprehension here rather than use list.append() calls:
def accum(s):
a = [i * letter for i, letter in enumerate(s, 1)]
x = "-".join(a)
return x.title()
which could, at a pinch, be turned into a one-liner:
def accum(s):
a = '-'.join([i * c for i, c in enumerate(s, 1)]).title()

This is because s.index(a) returns the first index of the character. You can use enumerate to pair elements to their indices:
Here is a Pythonic solution:
def accum(s):
return "-".join(c*(i+1) for i, c in enumerate(s)).title()

simple:
def accum(s):
a = []
for i in range(len(s)):
a.append(s[i]*(i+1))
x = "-".join(a)
return x.title()

Splitting a string before the nth occurrence of a character [duplicate]

Is there a Python-way to split a string after the nth occurrence of a given delimiter?
Given a string:
'20_231_myString_234'
It should be split into (with the delimiter being '_', after its second occurrence):
['20_231', 'myString_234']
Or is the only way to accomplish this to count, split and join?

>>> n = 2
>>> groups = text.split('_')
>>> '_'.join(groups[:n]), '_'.join(groups[n:])
('20_231', 'myString_234')
Seems like this is the most readable way, the alternative is regex)

Using re to get a regex of the form ^((?:[^_]*_){n-1}[^_]*)_(.*) where n is a variable:
n=2
s='20_231_myString_234'
m=re.match(r'^((?:[^_]*_){%d}[^_]*)_(.*)' % (n-1), s)
if m: print m.groups()
or have a nice function:
import re
def nthofchar(s, c, n):
regex=r'^((?:[^%c]*%c){%d}[^%c]*)%c(.*)' % (c,c,n-1,c,c)
l = ()
m = re.match(regex, s)
if m: l = m.groups()
return l
s='20_231_myString_234'
print nthofchar(s, '_', 2)
Or without regexes, using iterative find:
def nth_split(s, delim, n):
p, c = -1, 0
while c < n:
p = s.index(delim, p + 1)
c += 1
return s[:p], s[p + 1:]
s1, s2 = nth_split('20_231_myString_234', '_', 2)
print s1, ":", s2

I like this solution because it works without any actuall regex and can easiely be adapted to another "nth" or delimiter.
import re
string = "20_231_myString_234"
occur = 2 # on which occourence you want to split
indices = [x.start() for x in re.finditer("_", string)]
part1 = string[0:indices[occur-1]]
part2 = string[indices[occur-1]+1:]
print (part1, ' ', part2)

I thought I would contribute my two cents. The second parameter to split() allows you to limit the split after a certain number of strings:
def split_at(s, delim, n):
r = s.split(delim, n)[n]
return s[:-len(r)-len(delim)], r
On my machine, the two good answers by #perreal, iterative find and regular expressions, actually measure 1.4 and 1.6 times slower (respectively) than this method.
It's worth noting that it can become even quicker if you don't need the initial bit. Then the code becomes:
def remove_head_parts(s, delim, n):
return s.split(delim, n)[n]
Not so sure about the naming, I admit, but it does the job. Somewhat surprisingly, it is 2 times faster than iterative find and 3 times faster than regular expressions.
I put up my testing script online. You are welcome to review and comment.

>>>import re
>>>str= '20_231_myString_234'
>>> occerence = [m.start() for m in re.finditer('_',str)] # this will give you a list of '_' position
>>>occerence
[2, 6, 15]
>>>result = [str[:occerence[1]],str[occerence[1]+1:]] # [str[:6],str[7:]]
>>>result
['20_231', 'myString_234']

It depends what is your pattern for this split. Because if first two elements are always numbers for example, you may build regular expression and use re module. It is able to split your string as well.

I had a larger string to split ever nth character, ended up with the following code:
# Split every 6 spaces
n = 6
sep = ' '
n_split_groups = []
groups = err_str.split(sep)
while len(groups):
n_split_groups.append(sep.join(groups[:n]))
groups = groups[n:]
print n_split_groups
Thanks #perreal!

In function form of #AllBlackt's solution
def split_nth(s, sep, n):
n_split_groups = []
groups = s.split(sep)
while len(groups):
n_split_groups.append(sep.join(groups[:n]))
groups = groups[n:]
return n_split_groups
s = "aaaaa bbbbb ccccc ddddd eeeeeee ffffffff"
print (split_nth(s, " ", 2))
['aaaaa bbbbb', 'ccccc ddddd', 'eeeeeee ffffffff']

As #Yuval has noted in his answer, and #jamylak commented in his answer, the split and rsplit methods accept a second (optional) parameter maxsplit to avoid making splits beyond what is necessary. Thus, I find the better solution (both for readability and performance) is this:
s = '20_231_myString_234'
first_part = text.rsplit('_', 2)[0] # Gives '20_231'
second_part = text.split('_', 2)[2] # Gives 'myString_234'
This is not only simple, but also avoids performance hits of regex solutions and other solutions using join to undo unnecessary splits.

Converting integer to digit list [duplicate]

This question already has answers here:
How to split an integer into a list of digits?
(10 answers)
Closed 4 months ago.
What is the quickest and cleanest way to convert an integer into a list?
For example, change 132 into [1,3,2] and 23 into [2,3]. I have a variable which is an int, and I want to be able to compare the individual digits so I thought making it into a list would be best, since I can just do int(number[0]), int(number[1]) to easily convert the list element back into int for digit operations.

Convert the integer to string first, and then use map to apply int on it:
>>> num = 132
>>> map(int, str(num)) #note, This will return a map object in python 3.
[1, 3, 2]
or using a list comprehension:
>>> [int(x) for x in str(num)]
[1, 3, 2]

There are already great methods already mentioned on this page, however it does seem a little obscure as to which to use. So I have added some mesurements so you can more easily decide for yourself:
A large number has been used (for overhead) 1111111111111122222222222222222333333333333333333333
Using map(int, str(num)):
import timeit
def method():
num = 1111111111111122222222222222222333333333333333333333
return map(int, str(num))
print(timeit.timeit("method()", setup="from __main__ import method", number=10000)
Output: 0.018631496999999997
Using list comprehension:
import timeit
def method():
num = 1111111111111122222222222222222333333333333333333333
return [int(x) for x in str(num)]
print(timeit.timeit("method()", setup="from __main__ import method", number=10000))
Output: 0.28403817900000006
Code taken from this answer
The results show that the first method involving inbuilt methods is much faster than list comprehension.
The "mathematical way":
import timeit
def method():
q = 1111111111111122222222222222222333333333333333333333
ret = []
while q != 0:
q, r = divmod(q, 10) # Divide by 10, see the remainder
ret.insert(0, r) # The remainder is the first to the right digit
return ret
print(timeit.timeit("method()", setup="from __main__ import method", number=10000))
Output: 0.38133582499999996
Code taken from this answer
The list(str(123)) method (does not provide the right output):
import timeit
def method():
return list(str(1111111111111122222222222222222333333333333333333333))
print(timeit.timeit("method()", setup="from __main__ import method", number=10000))
Output: 0.028560138000000013
Code taken from this answer
The answer by Duberly González Molinari:
import timeit
def method():
n = 1111111111111122222222222222222333333333333333333333
l = []
while n != 0:
l = [n % 10] + l
n = n // 10
return l
print(timeit.timeit("method()", setup="from __main__ import method", number=10000))
Output: 0.37039988200000007
Code taken from this answer
Remarks:
In all cases the map(int, str(num)) is the fastest method (and is therefore probably the best method to use). List comprehension is the second fastest (but the method using map(int, str(num)) is probably the most desirable of the two.
Those that reinvent the wheel are interesting but are probably not so desirable in real use.

The shortest and best way is already answered, but the first thing I thought of was the mathematical way, so here it is:
def intlist(n):
q = n
ret = []
while q != 0:
q, r = divmod(q, 10) # Divide by 10, see the remainder
ret.insert(0, r) # The remainder is the first to the right digit
return ret
print intlist(3)
print '-'
print intlist(10)
print '--'
print intlist(137)
It's just another interesting approach, you definitely don't have to use such a thing in practical use cases.

n = int(raw_input("n= "))
def int_to_list(n):
l = []
while n != 0:
l = [n % 10] + l
n = n // 10
return l
print int_to_list(n)

If you have a string like this: '123456'
and you want a list of integers like this: [1,2,3,4,5,6], use this:
>>>s = '123456'
>>>list1 = [int(i) for i in list(s)]
>>>print(list1)
[1,2,3,4,5,6]
or if you want a list of strings like this: ['1','2','3','4','5','6'], use this:
>>>s = '123456'
>>>list1 = list(s)
>>>print(list1)
['1','2','3','4','5','6']

Use list on a number converted to string:
In [1]: [int(x) for x in list(str(123))]
Out[2]: [1, 2, 3]

>>>list(map(int, str(number))) #number is a given integer
It returns a list of all digits of number.

you can use:
First convert the value in a string to iterate it, Them each value can be convert to a Integer value = 12345
l = [ int(item) for item in str(value) ]

By looping it can be done the following way :)
num1= int(input('Enter the number'))
sum1 = num1 #making a alt int to store the value of the orginal so it wont be affected
y = [] #making a list
while True:
if(sum1==0):#checking if the number is not zero so it can break if it is
break
d = sum1%10 #last number of your integer is saved in d
sum1 = int(sum1/10) #integer is now with out the last number ie.4320/10 become 432
y.append(d) # appending the last number in the first place
y.reverse()#as last is in first , reversing the number to orginal form
print(y)
Answer becomes
Enter the number2342
[2, 3, 4, 2]

num = 123
print(num)
num = list(str(num))
num = [int(i) for i in num]
print(num)

num = list(str(100))
index = len(num)
while index > 0:
index -= 1
num[index] = int(num[index])
print(num)
It prints [1, 0, 0] object.

Takes an integer as input and converts it into list of digits.
code:
num = int(input())
print(list(str(num)))
output using 156789:
>>> ['1', '5', '6', '7', '8', '9']

Encoding a numeric string into a shortened alphanumeric string, and back again

Quick question. I'm trying to find or write an encoder in Python to shorten a string of numbers by using upper and lower case letters. The numeric strings look something like this:
20120425161608678259146181504021022591461815040210220120425161608667
The length is always the same.
My initial thought was to write some simple encoder to utilize upper and lower case letters and numbers to shorten this string into something that looks more like this:
a26Dkd38JK
That was completely arbitrary, just trying to be as clear as possible.
I'm certain that there is a really slick way to do this, probably already built in. Maybe this is an embarrassing question to even be asking.
Also, I need to be able to take the shortened string and convert it back to the longer numeric value.
Should I write something and post the code, or is this a one line built in function of Python that I should already know about?
Thanks!

This is a pretty good compression:
import base64
def num_to_alpha(num):
num = hex(num)[2:].rstrip("L")
if len(num) % 2:
num = "0" + num
return base64.b64encode(num.decode('hex'))
It first turns the integer into a bytestring and then base64 encodes it. Here's the decoder:
def alpha_to_num(alpha):
num_bytes = base64.b64decode(alpha)
return int(num_bytes.encode('hex'), 16)
Example:
>>> num_to_alpha(20120425161608678259146181504021022591461815040210220120425161608667)
'vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w=='
>>> alpha_to_num('vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w==')
20120425161608678259146181504021022591461815040210220120425161608667

There are two functions that are custom (not based on base64), but produce shorter output:
chrs = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = len(chrs)
def int_to_cust(i):
result = ''
while i:
result = chrs[i % l] + result
i = i // l
if not result:
result = chrs[0]
return result
def cust_to_int(s):
result = 0
for char in s:
result = result * l + chrs.find(char)
return result
And the results are:
>>> int_to_cust(20120425161608678259146181504021022591461815040210220120425161608667)
'9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx'
>>> cust_to_int('9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx')
20120425161608678259146181504021022591461815040210220120425161608667L
You can also shorten the generated string, if you add other characters to the chrs variable.

Do it with 'class':
VALID_CHRS = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
BASE = len(VALID_CHRS)
MAP_CHRS = {k: v
for k, v in zip(VALID_CHRS, range(BASE + 1))}
class TinyNum:
"""Compact number representation in alphanumeric characters."""
def __init__(self, n):
result = ''
while n:
result = VALID_CHRS[n % BASE] + result
n //= BASE
if not result:
result = VALID_CHRS[0]
self.num = result
def to_int(self):
"""Return the number as an int."""
result = 0
for char in self.num:
result = result * BASE + MAP_CHRS[char]
return result
Sample usage:
>> n = 4590823745
>> tn = TinyNum(a)
>> print(n)
4590823745
>> print(tn.num)
50GCYh
print(tn.to_int())
4590823745
(Based on Tadeck's answer.)

>>> s="20120425161608678259146181504021022591461815040210220120425161608667"
>>> import base64, zlib
>>> base64.b64encode(zlib.compress(s))
'eJxly8ENACAMA7GVclGblv0X4434WrKFVW5CtJl1HyosrZKRf3hL5gLVZA2b'
>>> zlib.decompress(base64.b64decode(_))
'20120425161608678259146181504021022591461815040210220120425161608667'
so zlib isn't real smart at compressing strings of digits :(

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split string by count of characters - python

You're looking for string slicing. >>> x = "1.012345e0070.123414e-004-0.1234567891.21423" >>> x[2:10] '012345e0'

line = "1.012345e0070.123414e-004-0.1234567891.21423" firstNumber = line[:12] restOfLine = line[12:] print firstNumber print restOfLine Output: 1.012345e007 0.123414e-004-0.1234567891.21423

you can do it like this: step = 12 for i in range(0, len(string), 12): slice = string[i:step] step += 12 in this way on each iteration you will get one slice of 14 characters.

from itertools import izip_longest def grouper(n, iterable, padvalue=None): return izip_longest([iter(iterable)]n, fillvalue=padvalue)

Try this function: x = "1.012345e0070.123414e-004-0.1234567891.21423" while len(x)>0: v = x[:12] print v x = x[12:]

Related

n length combinations from two or more digits with repetition limit

Repeating characters results in wrong repetition counts

Splitting a string before the nth occurrence of a character [duplicate]

Converting integer to digit list [duplicate]

Encoding a numeric string into a shortened alphanumeric string, and back again

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split string by count of characters - python

You're looking for string slicing. >>> x = "1.012345e0070.123414e-004-0.1234567891.21423" >>> x[2:10] '012345e0'

line = "1.012345e0070.123414e-004-0.1234567891.21423" firstNumber = line[:12] restOfLine = line[12:] print firstNumber print restOfLine Output: 1.012345e007 0.123414e-004-0.1234567891.21423

you can do it like this: step = 12 for i in range(0, len(string), 12): slice = string[i:step] step += 12 in this way on each iteration you will get one slice of 14 characters.

from itertools import izip_longest def grouper(n, iterable, padvalue=None): return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

Try this function: x = "1.012345e0070.123414e-004-0.1234567891.21423" while len(x)>0: v = x[:12] print v x = x[12:]

Related

n length combinations from two or more digits with repetition limit

Repeating characters results in wrong repetition counts

Splitting a string before the nth occurrence of a character [duplicate]

Converting integer to digit list [duplicate]

Encoding a numeric string into a shortened alphanumeric string, and back again

Categories

Resources

from itertools import izip_longest def grouper(n, iterable, padvalue=None): return izip_longest([iter(iterable)]n, fillvalue=padvalue)