Split String sequentially every possible split - python

I'm struggling with ideas for this python script:
I have a string of characters, say abcdefghijklmnopqrstuvwxyz
I need to split these into a list, with 7 characters each, resulting in a list that has
[abcdefg], [bcdefgh], [cdefghi], ... , [tuvwxyz]
as elements.
I have this method, but it currently outputs
['abcdefg', 'hijklmn', 'opqrstu', 'vwxyz']
...
def chunksOf7(toSplit):
chunks = [toSplit[i:i+7] for i in range(0, len(toSplit), 7)]
print(chunks)
Any ideas?

You can use a list comprehension to iterate over slices of the string of length 7.
>>> s = 'abcdefghijklmnopqrstuvwxyz'
>>> [s[i:i+7] for i in range(len(s)-6)]
['abcdefg', 'bcdefgh', 'cdefghi', 'defghij', 'efghijk', 'fghijkl', 'ghijklm', 'hijklmn', 'ijklmno', 'jklmnop', 'klmnopq', 'lmnopqr', 'mnopqrs', 'nopqrst', 'opqrstu', 'pqrstuv', 'qrstuvw', 'rstuvwx', 'stuvwxy', 'tuvwxyz']

If I understand correctly, you want the output lists to shift only one character each. So instead of iterating the range(len(toSplit)-6) in steps of 7, it should use steps of 1.

A simple way:
>>> import string
>>> to_split = string.ascii_lowercase
>>> [to_split[i:i+7] for i in range(0, len(to_split)-6)]
['abcdefg', 'bcdefgh', 'cdefghi', 'defghij', 'efghijk', 'fghijkl', 'ghijklm', 'hijklmn', 'ijklmno', 'jklmnop', 'klmnopq', 'lmnopqr', 'mnopqrs', 'nopqrst', 'opqrstu', 'pqrstuv', 'qrstuvw', 'rstuvwx', 'stuvwxy', 'tuvwxyz']
As a function:
def chunksOf7(to_split):
return [
to_split[i:i+7]
for i in range(0, len(to_split)-6)]

Related

Spliting string into two by comma using python

I have following data in a list and it is a hex number,
['aaaaa955554e']
I would like to split this into ['aaaaa9,55554e'] with a comma.
I know how to split this when there are some delimiters between but how should i do for this case?
Thanks
This will do what I think you are looking for:
yourlist = ['aaaaa955554e']
new_list = [','.join([x[i:i+6] for i in range(0, len(x), 6)]) for x in yourlist]
It will put a comma at every sixth character in each item in your list. (I am assuming you will have more than just one item in the list, and that the items are of unknown length. Not that it matters.)
i assume you wanna split into every 6th character
using regex
import re
lst = ['aaaaa955554e']
newlst = re.findall('\w{6}', lst[0])
# ['aaaaa9', '55554e']
Using list comprehension, this works for multiple items in lst
lst = ['aaaaa955554e']
newlst = [item[i:i+6] for i in range(0,len(a[0]),6) for item in lst]
# ['aaaaa9', '55554e']
This could be done using a regular expression substitution as follows:
import re
print re.sub(r'([a-zA-Z]+\d)(.*?)', r'\1,\2', 'aaaaa955554e', count=1)
Giving you:
aaaaa9,55554e
This splits after seeing the first digit.

Python - How to add space on each 3 characters?

I need to add a space on each 3 characters of a python string but don't have many clues on how to do it.
The string:
345674655
The output that I need:
345 674 655
Any clues on how to achieve this?
Best Regards,
You just need a way to iterate over your string in chunks of 3.
>>> a = '345674655'
>>> [a[i:i+3] for i in range(0, len(a), 3)]
['345', '674', '655']
Then ' '.join the result.
>>> ' '.join([a[i:i+3] for i in range(0, len(a), 3)])
'345 674 655'
Note that:
>>> [''.join(x) for x in zip(*[iter(a)]*3)]
['345', '674', '655']
also works for partitioning the string. This will work for arbitrary iterables (not just strings), but truncates the string where the length isn't divisible by 3. To recover the behavior of the original, you can use itertools.izip_longest (itertools.zip_longest in py3k):
>>> import itertools
>>> [''.join(x) for x in itertools.izip_longest(*[iter(a)]*3, fillvalue=' ')]
['345', '674', '655']
Of course, you pay a little in terms of easy reading for the improved generalization in these latter answers ...
Best Function based on #mgilson's answer
def litering_by_three(a):
return ' '.join([a[i:i + 3] for i in range(0, len(a), 3)])
# replace (↑) with you character like ","
output example:
>>> x="500000"
>>> print(litering_by_three(x))
'500 000'
>>>
or for , example:
>>> def litering_by_three(a):
>>> return ','.join([a[i:i + 3] for i in range(0, len(a), 3)])
>>> # replace (↑) with you character like ","
>>> print(litering_by_three(x))
'500,000'
>>>
a one-line solution will be
" ".join(splitAt(x,3))
however, Python is missing a splitAt() function, so define yourself one
def splitAt(w,n):
for i in range(0,len(w),n):
yield w[i:i+n]
How about reversing the string to jump by 3 starting from the units, then reversing again. The goal is to obtain "12 345".
n="12345"
" ".join([n[::-1][i:i+3] for i in range(0, len(n), 3)])[::-1]
Join with '-' the concatenated of the first, second and third characters of each 3 characters:
' '.join(a+b+c for a,b,c in zip(x[::3], x[1::3], x[2::3]))
Be sure string length is dividable by 3

How to concatenate (join) items in a list to a single string

How do I concatenate a list of strings into a single string?
For example, given ['this', 'is', 'a', 'sentence'], how do I get "this-is-a-sentence"?
For handling a few strings in separate variables, see How do I append one string to another in Python?.
For the opposite process - creating a list from a string - see How do I split a string into a list of characters? or How do I split a string into a list of words? as appropriate.
Use str.join:
>>> words = ['this', 'is', 'a', 'sentence']
>>> '-'.join(words)
'this-is-a-sentence'
>>> ' '.join(words)
'this is a sentence'
A more generic way (covering also lists of numbers) to convert a list to a string would be:
>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
12345678910
It's very useful for beginners to know
why join is a string method.
It's very strange at the beginning, but very useful after this.
The result of join is always a string, but the object to be joined can be of many types (generators, list, tuples, etc).
.join is faster because it allocates memory only once. Better than classical concatenation (see, extended explanation).
Once you learn it, it's very comfortable and you can do tricks like this to add parentheses.
>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'
>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'
Edit from the future: Please don't use the answer below. This function was removed in Python 3 and Python 2 is dead. Even if you are still using Python 2 you should write Python 3 ready code to make the inevitable upgrade easier.
Although #Burhan Khalid's answer is good, I think it's more understandable like this:
from str import join
sentence = ['this','is','a','sentence']
join(sentence, "-")
The second argument to join() is optional and defaults to " ".
list_abc = ['aaa', 'bbb', 'ccc']
string = ''.join(list_abc)
print(string)
>>> aaabbbccc
string = ','.join(list_abc)
print(string)
>>> aaa,bbb,ccc
string = '-'.join(list_abc)
print(string)
>>> aaa-bbb-ccc
string = '\n'.join(list_abc)
print(string)
>>> aaa
>>> bbb
>>> ccc
We can also use Python's reduce function:
from functools import reduce
sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)
We can specify how we join the string. Instead of '-', we can use ' ':
sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)
If you have a mixed content list and want to stringify it, here is one way:
Consider this list:
>>> aa
[None, 10, 'hello']
Convert it to string:
>>> st = ', '.join(map(str, map(lambda x: f'"{x}"' if isinstance(x, str) else x, aa)))
>>> st = '[' + st + ']'
>>> st
'[None, 10, "hello"]'
If required, convert back to the list:
>>> ast.literal_eval(st)
[None, 10, 'hello']
If you want to generate a string of strings separated by commas in final result, you can use something like this:
sentence = ['this','is','a','sentence']
sentences_strings = "'" + "','".join(sentence) + "'"
print (sentences_strings) # you will get "'this','is','a','sentence'"
def eggs(someParameter):
del spam[3]
someParameter.insert(3, ' and cats.')
spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)
Without .join() method you can use this method:
my_list=["this","is","a","sentence"]
concenated_string=""
for string in range(len(my_list)):
if string == len(my_list)-1:
concenated_string+=my_list[string]
else:
concenated_string+=f'{my_list[string]}-'
print([concenated_string])
>>> ['this-is-a-sentence']
So, range based for loop in this example , when the python reach the last word of your list, it should'nt add "-" to your concenated_string. If its not last word of your string always append "-" string to your concenated_string variable.

extracting data from matchobjects

I have a long sequence with multiple repeats of a specific string( 'say GAATTC') randomly throughout the sequence string. I'm currently using the regular expression .span() to provide with me with the indices of where the pattern 'GAATTC' is found. Now I want to use those indices to slice the pattern between the G and A (i.e. 'G|AATTC').
How do I use the data from the match object to slice those out?
If I understand you correctly, you have the string and an index where the sequence GAATTC starts, so do you need this (i here is the m.start for the group)?
>>> seq = "GAATTC"
>>> s = "AATCCTGAGAATTCAAC"
>>> i = 8 # the index where seq starts in s
>>> s[i:]
'GAATTCAAC'
>>> s[i:i+len(seq)]
'GAATTC'
That extracts it. You can also slice the original sequence at the G like this:
>>> s[:i+1]
'AATCCTGAG'
>>> s[i+1:]
'AATTCAAC'
>>>
If what you want to do is replace the 'GAATTC' by the 'G|AATTC' one (not sure of what you want to do in the end), I think that you can manage this without regex:
>>> string = 'GAATTCAAGAATTCTTGAATTCGAATTCAATATATA'
>>> string.replace('GAATTC', 'G|AATTC')
'G|AATTCAAG|AATTCTTG|AATTCG|AATTCAATATATA'
EDIT: ok, this way can be adapted to suit what you want to do:
>>> groups = string.replace('GAATTC', 'G|AATTC').split('|')
>>> groups
['G', 'AATTCAAG', 'AATTCTTG', 'AATTCG', 'AATTCAATATATA']
>>> map(len, groups)
[1, 8, 8, 6, 13]

python: keep char only if it is within this list

i have a list:
a = ['a','b','c'.........'A','B','C'.........'Z']
and i have string:
string1= 's#$%ERGdfhliisgdfjkskjdfW$JWLI3590823r'
i want to keep ONLY those characters in string1 that exist in a
what is the most effecient way to do this? perhaps instead of having a be a list, i should just make it a string? like this a='abcdefg..........ABC..Z' ??
This should be faster.
>>> import re
>>> string1 = 's#$%ERGdfhliisgdfjkskjdfW$JWLI3590823r'
>>> a = ['E', 'i', 'W']
>>> r = re.compile('[^%s]+' % ''.join(a))
>>> print r.sub('', string1)
EiiWW
This is even faster than that.
>>> all_else = ''.join( chr(i) for i in range(256) if chr(i) not in set(a) )
>>> string1.translate(None, all_else)
'EiiWW'
44 microsec vs 13 microsec on my laptop.
How about that?
(Edit: turned out, translate yields the best performance.)
''.join([s for s in string1 if s in a])
Explanation:
[s for s in string1 if s in a]
creates a list of all characters in string1, but only if they are also in the list a.
''.join([...])
turns it back into a string by joining it with nothing ('') in between the elements of the given list.
List comprehension to the rescue!
wanted = ''.join(letter for letter in string1 if letter in a)
(Note that when passing a list comprehension to a function you can omit the brackets so that the full list isn't generated prior to being evaluated. While semantically the same as a list comprehension, this is called a generator expression.)
If, you are going to do this with large strings, there is a faster solution using translate; see this answer.
#katrielalex: To spell it out:
import string
string1= 's#$%ERGdfhliisgdfjkskjdfW$JWLI3590823r'
non_letters= ''.join(chr(i) for i in range(256) if chr(i) not in string.letters)
print string1.translate(None,non_letters)
print 'Simpler, but possibly less correct'
print string1.translate(None, string.punctuation+string.digits+string.whitespace)

Categories

Resources