Splits a string by number

Splits a string by number - python

I have just written a very simple function which splits a string by a given number. It works, but got flaws. It sometimes breaks a word apart, for example:
string = "He could not meet her in conversation"
number_of_lines = 5
result = (textwrap.fill(string, count(string, number_of_lines)))
print result
He could
not meet
her in c
onversat
ion
Please note it breaks the word "conversation." I need suggestions how to overcome this problem, or there is an inbuilt function for this task already available.
Here is the actual function:
import textwrap
import re
def count (s, no_of_lines):
result = (textwrap.fill(s.upper(), 1))
count = 1
while (len(re.split('[\n]', result)) != no_of_lines):
count = count + 1
result = (textwrap.fill(s.upper(), count))
return count

You could use the break_long_words option of the TextWrapper constructor:
import textwrap
import re
# define a customised object with option set to not break long words
textwrap = textwrap.TextWrapper(break_long_words=False)
def count (s, no_of_lines):
# set the width option instead of using a count
textwrap.width = 1
result = textwrap.fill(s.upper())
while len(re.split('\n', result)) > no_of_lines:
textwrap.width += 1
result = textwrap.fill(s.upper())
return textwrap.width
string = "He could not meet her in conversation"
number_of_lines = 5
textwrap.width = count(string, number_of_lines)
result = textwrap.fill(string)
print (result)
Output:
He could
not meet
her in
conversation

Related

combine multiple outputs in python

I am a noob and was wondering how to combine multiple outputs into one string that outputs
Here is my code
print ("password size (use numbers)")
passwordsize = int(input(""))
passwordsize = passwordsize -1
papers = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','?','!','1','2','3','4','5','6','7','8','9','0',]
q_list = [random.choice(papers) for i in range(passwordsize)]
' '.join(q_list)
poopers = q_list[0].replace("'", '')
print("/")
print(q_list)
for word in q_list:
result = word.replace("'", '')
print(result)
lets say that the random stuff picked was 3 a b c
it outputs...
3
a
b
c
I want it to output...
3abc
Any help is very much appreciated

Along the same lines as what #Chuck and #Ash proposed, but streamlining things a bit by taking fuller advantage of Python's standard library as well as the handy join string method:
import random
import string
passwordsize = 4 # int(input()) - 1
a = [*string.ascii_uppercase, *string.digits, "?", "!"]
print("".join(random.sample(a, passwordsize)))
Output:
W16H

import random
passwordsize = 4 # int(input("")) - 1
papers = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T',
'U','V','W','X','Y','Z','?','!','1','2','3','4','5','6','7','8','9','0',]
q_list = [random.choice(papers) for i in range(passwordsize)]
result = ""
for word in q_list:
result += word
print(result)
String object in python can use add "+" operation to do concatenation.
For example, if you want to create s = "ABC", you can create it by s = 'A' + 'B' + 'C'.
The += operation can iteratively do the + operation.
Thus, you can create "ABC" by a for loop:
s = ""
for w in ['A', 'B', 'C']:
s += w

If you want every generated char is unique, use random.sample. But if you want every char can occur more than one, use random.choices instead:
from random import choices; from string import ascii_uppercase as caps, digits as nums
result = ''.join(choices(caps+'?!'+nums,k=int(input('password size (use numbers)\n'))))
print(result)
# password size (use numbers)
# 15
# DK6V1DZOFKA?3HG

How to group consecutive letters in a string in Python?

For example: string = aaaacccc, then I need the output to be 4a4c. Is there a way to do this without using any advanced methods, such as libraries or functions?
Also, if someone knows how to do the reverse: turning "4a4c: into aaaacccc, that would be great to know.

This will do the work in one iteration
Keep two temp variable one for current character, another for count of that character and one variable for the result.
Just iterate through the string and keep increasing the count if it matches with the previous one.
If it doesn't then update the result with count and value of character and update the character and count.
At last add the last character and the count to the result. Done!
input_str = "aaaacccc"
if input_str.isalpha():
current_str = input_str[0]
count = 0
final_string = ""
for i in input_str:
if i==current_str:
count+=1
else:
final_string+=str(count)+current_str
current_str = i
count = 1
final_string+=str(count)+current_str
print (final_string)

Another solution and I included even a patchwork reverse operation like you mentioned in your post. Both run in O(n) and are fairly simple to understand. The encode is basically identical one posted by Akanasha, he was just a bit faster in posting his answer while i was writing the decode().
def encode(x):
if not x.isalpha():
raise ValueError()
output = ""
current_l = x[0]
counter = 0
for pos in x:
if current_l != pos:
output += str(counter) + current_l
counter = 1
current_l = pos
else:
counter += 1
return output + str(counter) + current_l
def decode(x):
output = ""
i = 0
while i < len(x):
if x[i].isnumeric():
n = i + 1
while x[n].isnumeric():
n += 1
output += int(x[i:n])*x[n]
i = n
i += 1
return output
test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasasggggggbbbbdd"
test1 = encode(test)
print(test1)
test2 = decode(test1)
print(test2)
print(test == test2)

yes, you do not need any libraries:
list1 = list("aaaacccc")
letters = []
for i in list1:
if i not in letters:
letters.append(i)
string = ""
for i in letters:
string += str(list1.count(i))
string+=str(i)
print(string)
Basically, it loops through the list, finds the unique letters and then prints the count with the letter itself. Reversing would be the same function, just print the amount.

Removing all occurrences of a letter and replacing with a count of how many errors

I've got a code that in theory should take an input of DNA that has errors in it and removes all errors (N in my case) and places a count of how many N's were removing in that location.
My code:
class dnaString (str):
def __new__(self,s):
#the inputted DNA sequence is converted as a string in all upper cases
return str.__new__(self,s.upper())
def getN (self):
#returns the count of value of N in the sequence
return self.count("N")
def remove(self):
print(self.replace("N", "{}".format(coolString.getN())))
#asks the user to input a DNA sequence
dna = input("Enter a dna sequence: ")
#takes the inputted DNA sequence, ???
coolString = dnaString(dna)
coolString.remove()
When I input AaNNNNNNGTC I should get AA{6}GTC as the answer, but when I run my code it prints out AA666666GTC because I ended up replacing every error with the count. How do I go about just inputting the count once?

If you want to complete the task without external libraries, you can do it with the following:
def fix_dna(dna_str):
fixed_str = ''
n_count = 0
n_found = False
for i in range(len(dna_str)):
if dna_str[i].upper() == 'N':
if not n_found:
n_found = True
n_count += 1
elif n_found:
fixed_str += '{' + str(n_count) + '}' + dna_str[i]
n_found = False
n_count = 0
elif not n_found:
fixed_str += dna_str[i]
return fixed_str

Not the cleanest solution, but does the job
from itertools import accumulate
s = "AaNNNNNNGTC"
for i in reversed(list(enumerate(accumulate('N'*100, add)))):
s=s.replace(i[1], '{'+str(i[0] + 1)+'}')
s = 'Aa{6}GTC'

That's expected, from the documentation:
Return a copy of string s with all occurrences of substring old replaced by new.
One solution could be using regexes. The re.sub can take a callable that generates the replacement string:
import re
def replace_with_count(x):
return "{%d}" % len(x.group())
test = 'AaNNNNNNGTNNC'
print re.sub('N+', replace_with_count, test)

regex is counting only one pattern, when two same patterns are kept consecutive.why?

The following is an input.
INPUT
2
businessman video demeanor demeanor dishonest acknowledge dvd honor sister opportunity
keen labour artistic favourite red definition impatient take behaviour warmth
1
demeanour
OUTPUT
2
Because here demeanour is converted to its US launguage equivalent 'demenor' and then the number of 'demeanour' and 'demeanor' have to be counted.
I wrote the following code but it outputs 1 instead of 2
import re
n = int(raw_input())
b = []
for i in range(n):
b.append(raw_input())
b = " ".join(b)
b = b + " "
t = int(raw_input())
c = []
for i in range(t):
c = raw_input()
d = c[:-2]+"r"
match = re.findall(r"\s"+re.escape(c)+"\s",b)
match2 = re.findall(r"\s"+re.escape(d)+"\s",b)
print len(match)+len(match2)
I may have not completely explained you the scenario to know more please visit,
https://www.hackerrank.com/challenges/uk-and-us-2
PS: This is my first question on stackoverflow. Please correct me, if the problem is presented incorrectly.
EDIT:
Correct Answer:
import re
n = int(raw_input())
b = []
for i in range(n):
b.append(raw_input())
b = " ".join(b)
b = b + " "
t = int(raw_input())
for i in range(t):
c = raw_input()
d = c.replace("ou","o")
k = re.compile(r'\b%s\b'%c,re.I)
l = re.compile(r'\b%s\b'%d,re.I)
match = k.findall(b)
match2 = l.findall(b)
print len(match)+len(match2)

Use Alternation in your regex:
import re
input='''\
businessman video demeanor demeanour dishonest acknowledge dvd honor sister opportunity keen labour artistic favourite red definition impatient take behaviour warmth'''
matches=re.findall(r'(demeanour|demeanor)', input)
print matches, len(matches)
# ['demeanor', 'demeanour'] 2
Or, use a optional quantifier:
matches=re.findall(r'(demeanou?r)', input)
print matches, len(matches)
To keep from matching xyzdemeanour use a word boundary:
matches=re.findall(r'(\bdemeanou?r\b)', 'demeanor demeanour xyzdemeanour demeanourxyz')
print matches, len(matches)
# ['demeanor', 'demeanour'] 2

If all you have to consider is words like odour -> odor (missing the u), you can make something like:
import re
n = int(raw_input()) # Read number of lines
b = ""
for i in range(n): # Read lines and concatenate them to a string
b += raw_input() + " "
t = int(raw_input()) # Read number of words
c = []
for i in range(t): # Read words
word = raw_input()
c.append(word) # Add each word to a list
c.append(word.replace("u","")) # Add also the word without the u to the list
totallen = 0
for i in c: # Search for all words
match = re.findall(r""+i+"\s",b) # find all occurrences of a word
totallen += len(match) # Add it to total count
print totallen # Print total
I tested it on the website you write and it passed all the tests, but I will recommend you to write names for a variable that explain better what the are suppose to have, like numberoflines, numberofwords, text,words, etc

Count number of occurrences of a substring in a string

How can I count the number of times a given substring is present within a string in Python?
For example:
>>> 'foo bar foo'.numberOfOccurrences('foo')
2
To get indices of the substrings, see How to find all occurrences of a substring?.

string.count(substring), like in:
>>> "abcdabcva".count("ab")
2
This is for non overlapping occurrences.
If you need to count overlapping occurrences, you'd better check the answers here, or just check my other answer below.

s = 'arunununghhjj'
sb = 'nun'
results = 0
sub_len = len(sb)
for i in range(len(s)):
if s[i:i+sub_len] == sb:
results += 1
print results

Depending what you really mean, I propose the following solutions:
You mean a list of space separated sub-strings and want to know what is the sub-string position number among all sub-strings:
s = 'sub1 sub2 sub3'
s.split().index('sub2')
>>> 1
You mean the char-position of the sub-string in the string:
s.find('sub2')
>>> 5
You mean the (non-overlapping) counts of appearance of a su-bstring:
s.count('sub2')
>>> 1
s.count('sub')
>>> 3

The best way to find overlapping sub-strings in a given string is to use a regular expression. With lookahead, it will find all the overlapping matches using the regular expression library's findall(). Here, left is the substring and right is the string to match.
>>> len(re.findall(r'(?=aa)', 'caaaab'))
3

To find overlapping occurences of a substring in a string in Python 3, this algorithm will do:
def count_substring(string,sub_string):
l=len(sub_string)
count=0
for i in range(len(string)-len(sub_string)+1):
if(string[i:i+len(sub_string)] == sub_string ):
count+=1
return count
I myself checked this algorithm and it worked.

You can count the frequency using two ways:
Using the count() in str:
a.count(b)
Or, you can use:
len(a.split(b))-1
Where a is the string and b is the substring whose frequency is to be calculated.

Scenario 1: Occurrence of a word in a sentence.
eg: str1 = "This is an example and is easy". The occurrence of the word "is". lets str2 = "is"
count = str1.count(str2)
Scenario 2 : Occurrence of pattern in a sentence.
string = "ABCDCDC"
substring = "CDC"
def count_substring(string,sub_string):
len1 = len(string)
len2 = len(sub_string)
j =0
counter = 0
while(j < len1):
if(string[j] == sub_string[0]):
if(string[j:j+len2] == sub_string):
counter += 1
j += 1
return counter
Thanks!

The current best answer involving method count doesn't really count for overlapping occurrences and doesn't care about empty sub-strings as well.
For example:
>>> a = 'caatatab'
>>> b = 'ata'
>>> print(a.count(b)) #overlapping
1
>>>print(a.count('')) #empty string
9
The first answer should be 2 not 1, if we consider the overlapping substrings.
As for the second answer it's better if an empty sub-string returns 0 as the asnwer.
The following code takes care of these things.
def num_of_patterns(astr,pattern):
astr, pattern = astr.strip(), pattern.strip()
if pattern == '': return 0
ind, count, start_flag = 0,0,0
while True:
try:
if start_flag == 0:
ind = astr.index(pattern)
start_flag = 1
else:
ind += 1 + astr[ind+1:].index(pattern)
count += 1
except:
break
return count
Now when we run it:
>>>num_of_patterns('caatatab', 'ata') #overlapping
2
>>>num_of_patterns('caatatab', '') #empty string
0
>>>num_of_patterns('abcdabcva','ab') #normal
2

The question isn't very clear, but I'll answer what you are, on the surface, asking.
A string S, which is L characters long, and where S[1] is the first character of the string and S[L] is the last character, has the following substrings:
The null string ''. There is one of these.
For every value A from 1 to L, for every value B from A to L, the string S[A]..S[B]
(inclusive). There are L + L-1 + L-2 + ... 1 of these strings, for a
total of 0.5*L*(L+1).
Note that the second item includes S[1]..S[L],
i.e. the entire original string S.
So, there are 0.5*L*(L+1) + 1 substrings within a string of length L. Render that expression in Python, and you have the number of substrings present within the string.

One way is to use re.subn. For example, to count the number of
occurrences of 'hello' in any mix of cases you can do:
import re
_, count = re.subn(r'hello', '', astring, flags=re.I)
print('Found', count, 'occurrences of "hello"')

How about a one-liner with a list comprehension? Technically its 93 characters long, spare me PEP-8 purism. The regex.findall answer is the most readable if its a high level piece of code. If you're building something low level and don't want dependencies, this one is pretty lean and mean. I'm giving the overlapping answer. Obviously just use count like the highest score answer if there isn't overlap.
def count_substring(string, sub_string):
return len([i for i in range(len(string)) if string[i:i+len(sub_string)] == sub_string])

If you want to count all the sub-string (including overlapped) then use this method.
import re
def count_substring(string, sub_string):
regex = '(?='+sub_string+')'
# print(regex)
return len(re.findall(regex,string))

I will keep my accepted answer as the "simple and obvious way to do it", however, it does not cover overlapping occurrences.
Finding out those can be done naively, with multiple checking of the slices - as in:
sum("GCAAAAAGH"[i:].startswith("AAA") for i in range(len("GCAAAAAGH")))
which yields 3.
Or it can be done by trick use of regular expressions, as can be seen at How to use regex to find all overlapping matches - and it can also make for fine code golfing.
This is my "hand made" count for overlapping occurrences of patterns in a string which tries not to be extremely naive (at least it does not create new string objects at each interaction):
def find_matches_overlapping(text, pattern):
lpat = len(pattern) - 1
matches = []
text = array("u", text)
pattern = array("u", pattern)
indexes = {}
for i in range(len(text) - lpat):
if text[i] == pattern[0]:
indexes[i] = -1
for index, counter in list(indexes.items()):
counter += 1
if text[i] == pattern[counter]:
if counter == lpat:
matches.append(index)
del indexes[index]
else:
indexes[index] = counter
else:
del indexes[index]
return matches
def count_matches(text, pattern):
return len(find_matches_overlapping(text, pattern))

For overlapping count we can use use:
def count_substring(string, sub_string):
count=0
beg=0
while(string.find(sub_string,beg)!=-1) :
count=count+1
beg=string.find(sub_string,beg)
beg=beg+1
return count
For non-overlapping case we can use count() function:
string.count(sub_string)

Overlapping occurences:
def olpcount(string,pattern,case_sensitive=True):
if case_sensitive != True:
string = string.lower()
pattern = pattern.lower()
l = len(pattern)
ct = 0
for c in range(0,len(string)):
if string[c:c+l] == pattern:
ct += 1
return ct
test = 'my maaather lies over the oceaaan'
print test
print olpcount(test,'a')
print olpcount(test,'aa')
print olpcount(test,'aaa')
Results:
my maaather lies over the oceaaan
6
4
2

Here's a solution that works for both non-overlapping and overlapping occurrences. To clarify: an overlapping substring is one whose last character is identical to its first character.
def substr_count(st, sub):
# If a non-overlapping substring then just
# use the standard string `count` method
# to count the substring occurences
if sub[0] != sub[-1]:
return st.count(sub)
# Otherwise, create a copy of the source string,
# and starting from the index of the first occurence
# of the substring, adjust the source string to start
# from subsequent occurences of the substring and keep
# keep count of these occurences
_st = st[::]
start = _st.index(sub)
cnt = 0
while start is not None:
cnt += 1
try:
_st = _st[start + len(sub) - 1:]
start = _st.index(sub)
except (ValueError, IndexError):
return cnt
return cnt

If you're looking for a power solution that works every case this function should work:
def count_substring(string, sub_string):
ans = 0
for i in range(len(string)-(len(sub_string)-1)):
if sub_string == string[i:len(sub_string)+i]:
ans += 1
return ans

If you want to find out the count of substring inside any string; please use below code.
The code is easy to understand that's why i skipped the comments. :)
string=raw_input()
sub_string=raw_input()
start=0
answer=0
length=len(string)
index=string.find(sub_string,start,length)
while index<>-1:
start=index+1
answer=answer+1
index=string.find(sub_string,start,length)
print answer

You could use the startswith method:
def count_substring(string, sub_string):
x = 0
for i in range(len(string)):
if string[i:].startswith(sub_string):
x += 1
return x

def count_substring(string, sub_string):
inc = 0
for i in range(0, len(string)):
slice_object = slice(i,len(sub_string)+i)
count = len(string[slice_object])
if(count == len(sub_string)):
if(sub_string == string[slice_object]):
inc = inc + 1
return inc
if __name__ == '__main__':
string = input().strip()
sub_string = input().strip()
count = count_substring(string, sub_string)
print(count)

def count_substring(string, sub_string):
k=len(string)
m=len(sub_string)
i=0
l=0
count=0
while l<k:
if string[l:l+m]==sub_string:
count=count+1
l=l+1
return count
if __name__ == '__main__':
string = input().strip()
sub_string = input().strip()
count = count_substring(string, sub_string)
print(count)

2+ others have already provided this solution, and I even upvoted one of them, but mine is probably the easiest for newbies to understand.
def count_substring(string, sub_string):
slen = len(string)
sslen = len(sub_string)
range_s = slen - sslen + 1
count = 0
for i in range(range_s):
if string[i:i+sslen] == sub_string:
count += 1
return count

I'm not sure if this is something looked at already, but I thought of this as a solution for a word that is 'disposable':
for i in xrange(len(word)):
if word[:len(term)] == term:
count += 1
word = word[1:]
print count
Where word is the word you are searching in and term is the term you are looking for

string="abc"
mainstr="ncnabckjdjkabcxcxccccxcxcabc"
count=0
for i in range(0,len(mainstr)):
k=0
while(k<len(string)):
if(string[k]==mainstr[i+k]):
k+=1
else:
break
if(k==len(string)):
count+=1;
print(count)

my_string = """Strings are amongst the most popular data types in Python.
We can create the strings by enclosing characters in quotes.
Python treats single quotes the same as double quotes."""
Count = my_string.lower().strip("\n").split(" ").count("string")
Count = my_string.lower().strip("\n").split(" ").count("strings")
print("The number of occurance of word String is : " , Count)
print("The number of occurance of word Strings is : " , Count)

For a simple string with space delimitation, using Dict would be quite fast, please see the code as below
def getStringCount(mnstr:str, sbstr:str='')->int:
""" Assumes two inputs string giving the string and
substring to look for number of occurances
Returns the number of occurances of a given string
"""
x = dict()
x[sbstr] = 0
sbstr = sbstr.strip()
for st in mnstr.split(' '):
if st not in [sbstr]:
continue
try:
x[st]+=1
except KeyError:
x[st] = 1
return x[sbstr]
s = 'foo bar foo test one two three foo bar'
getStringCount(s,'foo')

Below logic will work for all string & special characters
def cnt_substr(inp_str, sub_str):
inp_join_str = ''.join(inp_str.split())
sub_join_str = ''.join(sub_str.split())
return inp_join_str.count(sub_join_str)
print(cnt_substr("the sky is $blue and not greenthe sky is $blue and not green", "the sky"))

Here's the solution in Python 3 and case insensitive:
s = 'foo bar foo'.upper()
sb = 'foo'.upper()
results = 0
sub_len = len(sb)
for i in range(len(s)):
if s[i:i+sub_len] == sb:
results += 1
print(results)

j = 0
while i < len(string):
sub_string_out = string[i:len(sub_string)+j]
if sub_string == sub_string_out:
count += 1
i += 1
j += 1
return count

#counting occurence of a substring in another string (overlapping/non overlapping)
s = input('enter the main string: ')# e.g. 'bobazcbobobegbobobgbobobhaklpbobawanbobobobob'
p=input('enter the substring: ')# e.g. 'bob'
counter=0
c=0
for i in range(len(s)-len(p)+1):
for j in range(len(p)):
if s[i+j]==p[j]:
if c<len(p):
c=c+1
if c==len(p):
counter+=1
c=0
break
continue
else:
break
print('number of occurences of the substring in the main string is: ',counter)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splits a string by number - python

Related

combine multiple outputs in python

How to group consecutive letters in a string in Python?

Removing all occurrences of a letter and replacing with a count of how many errors

regex is counting only one pattern, when two same patterns are kept consecutive.why?

Count number of occurrences of a substring in a string

Categories

Resources