Spliting string after certain amount of characters - python

I have a lengthy string and want to split it after a certain number of characters. I already have done this:
if len(song.lyrics) > 2048:
string1 = string[:2048]
string2 = string[2049:]
The problem with this is that sometimes it breaks in the middle of text and I don't want to. Is there a way to get the last linebreak before the character limit is reached and break it there?
Thanks

Does this give you the result you're looking for? If not, could you please provide an example string with expected output?
import re
CHARACTER_LIMIT = 2048
for m in re.finditer(r'.{,%s}(?:\n|$)' % CHARACTER_LIMIT, string, re.DOTALL):
print(m.group(0))

Find the index of newline character just-left-of your length limit then use it to split.
if len(song.lyrics) > 2048:
index = string[:2048].rfind('\n')
string1 = string[:index]
string2 = string[index+1:]
Example:
>>> s = 'aaaaaaa\nbbbbbbbbbbbbbbbb\nccccccc\ndddddddddddddddd'
>>> limit = 31 # ↑
>>> index = s[:limit].rfind('\n')
>>> index
24
>>> s1,s2 = s[:index],s[index+1:]
>>> s1
'aaaaaaa\nbbbbbbbbbbbbbbbb'
>>> s2
'ccccccc\ndddddddddddddddd'
>>>

Related

FInding position of number in string

I would like to separate the letters from the numbers like this
inp= "AE123"
p= #position of where the number start in this case "2"
I've already tried to use str.find() but its has a limit of 3
Extracting the letters and the digits
If the goal is to extract both the letters and the digits, regular expressions can solve the problem directly without need for indices or slices:
>>> re.match(r'([A-Za-z]+)(\d+)', inp).groups()
('AE', '123')
Finding the position of the number
If needed, regular expressions can also locate the indices for the match.
>>> import re
>>> inp = "AE123"
>>> mo = re.search(r'\d+', inp)
>>> mo.span()
(2, 5)
>>> inp[2 : 5]
'123'
You can run a loop that checks for digits:
for p, c in enumerate(inp):
if c.isdigit():
break
print(p)
Find out more about str.isdigit
this should work
for i in range(len(inp)):
if inp[i].isdigit():
p = i
break
#Assuming all characters come before the first numeral as mentioned in the question
def findWhereNoStart(string):
start_index=-1
for char in string:
start_index+=1
if char.isdigit():
return string[start_index:]
return "NO NUMERALS IN THE GIVEN STRING"
#TEST
print(findWhereNoStart("ASDFG"))
print(findWhereNoStart("ASDFG13213"))
print(findWhereNoStart("ASDFG1"))
#OUTPUT
"""
NO NUMERALS IN THE GIVEN STRING
13213
1
"""

Python: Search a string for a number, decrement that number and replace in the string

If I have a string such as:
string = 'Output1[10].mystruct.MyArray[4].mybool'
what I want to do is search the string for the number in the array, decrement by 1 and then replace the found number with my decremented number.
What I have tried:
import string
import re
string = 'Output1[10].mystruct.MyArray[4].mybool'
pattern = r'\[(\d+)\]'
num = re.findall(pattern, string)
So, I can get a list of the numbers, convert to integers but I don't know how to use re.sub to search the string to replace, it should be considered that there might be multiple arrays. If anyone is expert enough to do that, help much appreciated.
Cheers
I don't undestand a thing... If there is more than 1 array, do you want to decrease the number in all arrays? or just in 1 of them?
If you want to decrease in all arrays, you can do this:
import re
string = 'Output1[10].mystruct.MyArray[4].mybool'
pattern = r'\[(\d+)\]'
num = re.findall(pattern, string)
num = [int(elem) for elem in num]
num.sort()
for elem in num:
aux = elem - 1
string = string.replace(str(elem), str(aux))
If you want to decrease just the first array, you can do this
import string
import re
string = 'Output1[10].mystruct.MyArray[4].mybool'
pattern = r'\[(\d+)\]'
num = re.findall(pattern, string)
new_num = int(num[0]) - 1
string = string.replace(num[0], str(new_num), 1)
Thanks to #João Castilho for his answer, based on this I changed it slightly to work exactly how I want:
import string
import re
string = 'Output1[2].mystruct.MyArray[2].mybool'
pattern = r'\[(\d+)\]'
num = re.findall(pattern, string)
num = [int(elem) for elem in set(num)]
num.sort()
for elem in num:
aux = elem - 1
string = string.replace('[%d]'% elem, '[%d]'% aux)
print(string)
This will now replace any number between brackets with the decremented value in all of the conditions that the numbers may occur.
Cheers
ice.

Check special symbols in string endings

How to check special symbols such as !?,(). in the words ending? For example Hello??? or Hello,, or Hello! returns True but H!??llo or Hel,lo returns False.
I know how to check the only last symbol of string but how to check if two or more last characters are symbols?
You may have to use regex for this.
import re
def checkword(word):
m = re.match("\w+[!?,().]+$", word)
if m is not None:
return True
return False
That regex is:
\w+ # one or more word characters (a-zA-z)
[!?,().]+ # one or more of the characters inside the brackets
# (this is called a character class)
$ # assert end of string
Using re.match forces the match to begin at the beginning of the string, or else we'd have to use ^ before the regular expression.
You can try something like this:
word = "Hello!"
def checkSym(word):
return word[-1] in "!?,()."
print(checkSym(word))
The result is:
True
Try giving different strings as input and check the results.
In case you want to find every symbol from the end of the string, you can use:
def symbolCount(word):
i = len(word)-1
c = 0
while word[i] in "!?,().":
c = c + 1
i = i - 1
return c
Testing it with word = "Hello!?.":
print(symbolCount(word))
The result is:
3
If you want to get a count of the 'special' characters at the end of a given string.
special = '!?,().'
s = 'Hello???'
count = 0
for c in s[::-1]:
if c in special:
count += 1
else:
break
print("Found {} special characters at the end of the string.".format(count))
You can use re.findall:
import re
s = "Hello???"
if re.findall('\W+$', s):
pass
You could try this.
string="gffrwr."
print(string[-1] in "!?,().")

How to count the number of characters at the start of a string?

How can I count the number of characters at the start/end of a string in Python?
For example, if the string is
'ffffhuffh'
How would I count the number of fs at the start of the string? The above string with a f should output 4.
str.count is not useful to me as a character could be in the middle of the string.
A short and simple way will be to use the str.lstrip method, and count the difference of length.
s = 'ffffhuffh'
print(len(s)-len(s.lstrip('f')))
# output: 4
str.lstrip([chars]):
Return a copy of the string with leading characters removed. The chars
argument is a string specifying the set of characters to be removed.
Try this, using itertools.takewhile():
import itertools as it
s = 'ffffhuffh'
sum(1 for _ in it.takewhile(lambda c: c == 'f', s))
=> 4
Similarly, for counting the characters at the end:
s = 'huffhffff'
sum(1 for _ in it.takewhile(lambda c: c == 'f', reversed(s)))
=> 4
You may use regular expression with re.match to find the occurrence of any character at the start of the string as:
>>> import re
>>> my_str = 'ffffhuffh'
>>> my_char = 'f'
>>> len(re.match('{}*'.format(my_char), my_str).group())
4
Building on Oscar Lopez's answer, I want to handle the case you mention of the end of the string: use reversed()
import itertools as it
my_string = 'ffffhuffh'
len(list(it.takewhile(lambda c: c == my_string[-1], reversed(my_string))))
=> 1
You can create a function and iterate through your string and return the count of the desired char in the input string's beginning or end like this example:
# start = True: Count the chars in the beginning of the string
# start = False: Count the chars in the end of the string
def count_char(string= '', char='', start=True):
count = 0
if not start:
string = string[::-1]
for k in string:
if k is char:
count += 1
else:
break
return count
a = 'ffffhuffh'
print(count_char(a, 'f'))
b = a[::-1]
print(count_char(b, 'f', start=False))
Output:
4
4
You may also use itertools.groupby to find the count of the occurrence of the first element at the start of the string as:
from itertools import groupby
def get_first_char_count(my_str):
return len([list(j) for _, j in groupby(my_str)][0])
Sample run:
>>> get_first_char_count('ffffhuffh')
4
>>> get_first_char_count('aywsnsb')
1
re.sub select first letter with repeat( (^(\w)\2*) ), len count frequency.
len(re.sub(r'((^\w)\2*).*',r'\1',my_string))

Python - Find words in string

I know that I can find a word in a string with
if word in my_string:
But I want to find all "word" in the string, like this.
counter = 0
while True:
if word in my_string:
counter += 1
How can I do it without "counting" the same word over and over again?
If you want to make sure that it counts a full word like is will only have one in this is even if there is an is in this, you can split, filter and count:
>>> s = 'this is a sentences that has is and is and is (4)'
>>> word = 'is'
>>> counter = len([x for x in s.split() if x == word])
>>> counter
4
However, if you just want count all occurrences of a substring, ie is would also match the is in this then:
>>> s = 'is this is'
>>> counter = len(s.split(word))-1
>>> counter
3
in other words, split the string at every occurrence of the word, then minus one to get the count.
Edit - JUST USE COUNT:
It's been a long day so I totally forgot but str has a built-in method for this str.count(substring) that does the same as my second answer but way more readable. Please consider using this method (and look at other people's answers for how to)
Use the beg argument for the .find method.
counter = 0
search_pos = 0
while True:
found = my_string.find(word, search_pos)
if found != -1: # find returns -1 when it's not found
#update counter and move search_pos to look for the next word
search_pos = found + len(word)
counter += 1
else:
#the word wasn't found
break
This is kinda a general purpose solution. Specifically for counting in a string you can just use my_string.count(word)
String actually already has the functionality you are looking for. You simply need to use str.count(item) for example.
EDIT: This will search for all occurrences of said string including parts of words.
string_to_search = 'apple apple orange banana grapefruit apple banana'
number_of_apples = string_to_search.count('apple')
number_of_bananas = string_to_search.count('banana')
The following will search for only complete words, just split the string you want to search.
string_to_search = 'apple apple orange banana grapefruit apple banana'.split()
number_of_apples = string_to_search.count('apple')
number_of_bananas = string_to_search.count('banana')
Use regular expressions:
import re
word = 'test'
my_string = 'this is a test and more test and a test'
# Use escape in case your search word contains periods or symbols that are used in regular expressions.
re_word = re.escape(word)
# re.findall returns a list of matches
matches = re.findall(re_word, my_string)
# matches = ['test', 'test', 'test']
print len(matches) # 3
Be aware that this will catch other words that contain your word like testing. You could change your regex to just match exactly your word

Categories

Resources