How to find check only first index in each split string?

How to find check only first index in each split string? - python

I am trying to create define a function that:
Splits a string called text at every new line (ex text="1\n2\n\3)
Checks ONLY the first index in each of the individual split items to see if number is 0-9.
Return any index that has 0-9, it can be more than one line
ex: count_digit_leading_lines ("AAA\n1st") → 1 # 2nd line starts w/ digit 1
So far my code is looking like this but I can't figure out how to get it to only check the first index in each split string:
def count_digit_leading_lines(text):
for line in range(len(text.split('\n'))):
for index, line in enumerate(line):
if 0<=num<=9:
return index
It accepts the arguement text, it iterates over each individual line (new split strings,) I think it goes in to check only the first index but this is where I get lost...

The code should be as simple as :
text=text.strip() #strip all whitespace : for cases ending with '\n' or having two '\n' together
text=text.replace('\t','') #for cases with '\t' etc
s=text.split('\n') #Split each sentence (# '\n')
#s=[words.strip() for words in s] #can also use this instead of replace('\t')
for i,sentence in enumerate(s):
char=sentence[0] #get first char in each sentence
if char.isdigit(): #if 1st char is a digit (0-9)
return i
UPDATE:
Just noticed OP's comment on another answer stating you don't want to use enumerate in your code (though its good practice to use enumeration). So the for loop modified version without enumerate is :
for i in range(len(s)):
char=s[i][0] #get first char in each sentence
if char.isdigit(): #if 1st char is a digit (0-9)
return i

This should do it:
texts = ["1\n2\n\3", 'ABC\n123\n456\n555']
def _get_index_if_matching(text):
split_text = text.split('\n')
if split_text:
for line_index, line in enumerate(split_text):
try:
num = int(line[0])
if 0 < num < 9:
return line_index
except ValueError:
pass
for text in texts:
print(_get_index_if_matching(text))
It will return 0 and then 1

You could change out your return statement for a yield, making your function a generator. Then you could get the indexes one by one in a loop, or make them into a list. Here's a way you could do it:
def count_digit_leading_lines(text):
for index, line in enumerate(text.split('\n')):
try:
int(line[0])
yield index
except ValueError: pass
# Usage:
for index in count_digit_leading_lines(text):
print(index)
# Or to get a list
print(list(count_digit_leading_lines(text)))
Example:
In : list(count_digit_leading_lines('he\n1\nhto2\n9\ngaga'))
Out: [1, 3]

Related

Replace sequence of the same letter with single one

I am trying to replace the number of letters with a single one, but seems to be either hard either I am totally block how this should be done
So example of input:
aaaabbcddefff
The output should be abcdef
Here is what I was able to do, but when I went to the last piece of the string I can't get it done. Tried different variants, but I am stucked. Can someone help me finish this code?
text = "aaaabbcddefff"
new_string = ""
count = 0
while text:
for i in range(len(text)):
l = text[i]
for n in range(len(text)):
if text[n] == l:
count += 1
continue
new_string += l
text = text.replace(l, "", count)
break
count = 0
break

Using regex
re.sub(r"(.)(?=\1+)", "", text)
>>> import re
>>> text = "aaaabbcddefff"
>>> re.sub(r"(.)(?=\1+)", "", text)
abcdeaf

Side note: You should consider building your string up in a list and then joining the list, because it is expensive to append to a string, since strings are immutable.
One way to do this is to check if every letter you look at is equal to the previous letter, and only append it to the new string if it is not equal:
def remove_repeated_letters(s):
if not s: return ""
ret = [s[0]]
for index, char in enumerate(s[1:], 1):
if s[index-1] != char:
ret.append(char)
return "".join(ret)
Then, remove_repeated_letters("aaaabbcddefff") gives 'abcdef'.
remove_repeated_letters("aaaabbcddefffaaa") gives 'abcdefa'.
Alternatively, use itertools.groupby, which groups consecutive equal elements together, and join the keys of that operation
import itertools
def remove_repeated_letters(s):
return "".join(key for key, group in itertools.groupby(s))

Remove string character after run of n characters in string

Suppose you have a given string and an integer, n. Every time a character appears in the string more than n times in a row, you want to remove some of the characters so that it only appears n times in a row. For example, for the case n = 2, we would want the string 'aaabccdddd' to become 'aabccdd'. I have written this crude function that compiles without errors but doesn't quite get me what I want:
def strcut(string, n):
for i in range(len(string)):
for j in range(n):
if i + j < len(string)-(n-1):
if string[i] == string[i+j]:
beg = string[:i]
ends = string[i+1:]
string = beg + ends
print(string)
These are the outputs for strcut('aaabccdddd', n):
n
output
expected
1
'abcdd'
'abcd'
2
'acdd'
'aabccdd'
3
'acddd'
'aaabccddd'
I am new to python but I am pretty sure that my error is in line 3, 4 or 5 of my function. Does anyone have any suggestions or know of any methods that would make this easier?

This may not answer why your code does not work, but here's an alternate solution using regex:
import re
def strcut(string, n):
return re.sub(fr"(.)\1{{{n-1},}}", r"\1"*n, string)
How it works: First, the pattern formatted is "(.)\1{n-1,}". If n=3 then the pattern becomes "(.)\1{2,}"
(.) is a capture group that matches any single character
\1 matches the first capture group
{2,} matches the previous token 2 or more times
The replacement string is the first capture group repeated n times
For example: str = "aaaab" and n = 3. The first "a" is the capture group (.). The next 3 "aaa" matches \1{2,} - in this example a{2,}. So the whole thing matches "a" + "aaa" = "aaaa". That is replaced with "aaa".
regex101 can explain it better than me.

you can implement a stack data structure.
Idea is you add new character in stack, check if it is same as previous one or not in stack and yes then increase counter and check if counter is in limit or not if yes then add it into stack else not. if new character is not same as previous one then add that character in stack and set counter to 1
# your code goes here
def func(string, n):
stack = []
counter = None
for i in string:
if not stack:
counter = 1
stack.append(i)
elif stack[-1]==i:
if counter+1<=n:
stack.append(i)
counter+=1
elif stack[-1]!=i:
stack.append(i)
counter = 1
return ''.join(stack)
print(func('aaabbcdaaacccdsdsccddssse', 2)=='aabbcdaaccdsdsccddsse')
print(func('aaabccdddd',1 )=='abcd')
print(func('aaabccdddd',2 )=='aabccdd')
print(func('aaabccdddd',3 )=='aaabccddd')
output
True
True
True
True

The method I would use is creating a new empty string at the start of the function and then everytime you exceed the number of characters in the input string you just not insert them in the output string, this is computationally efficient because it is O(n) :
def strcut(string,n) :
new_string = ""
first_c, s = string[0], 0
for c in string :
if c != first_c :
first_c, s= c, 0
s += 1
if s > n : continue
else : new_string += c
return new_string
print(strcut("aabcaaabbba",2)) # output : #aabcaabba

Simply, to anwer the question
appears in the string more than n times in a row
the following code is small and simple, and will work fine :-)
def strcut(string: str, n: int) -> str:
tmp = "*" * (n+1)
for char in string:
if tmp[len(tmp) - n:] != char * n:
tmp += char
print(tmp[n+1:])
strcut("aaabccdddd", 1)
strcut("aaabccdddd", 2)
strcut("aaabccdddd", 3)
Output:
abcd
aabccdd
aaabccddd
Notes:
The character "*" in the line tmp = "*"*n+string[0:1] can be any character that is not in the string, it's just a placeholder to handle the start case when there are no characters.
The print(tmp[n:]) line simply removes the "*" characters added in the beginning.

You don't need nested loops. Keep track of the current character and its count. include characters when the count is less or equal to n, reset the current character and count when it changes.
def strcut(s,n):
result = '' # resulting string
char,count = '',0 # initial character and count
for c in s: # only loop once on the characters
if c == char: count += 1 # increase count
else: char,count = c,1 # reset character/count
if count<=n: result += c # include character if count is ok
return result

Just to give some ideas, this is a different approach. I didn't like how n was iterating each time even if I was on i=3 and n=2, I still jump to i=4 even though I already checked that character while going through n. And since you are checking the next n characters in the string, you method doesn't fit with keeping the strings in order. Here is a rough method that I find easier to read.
def strcut(string, n):
for i in range(len(string)-1,0,-1): # I go backwards assuming you want to keep the front characters
if string.count(string[i]) > n:
string = remove(string,i)
print(string)
def remove(string, i):
if i > len(string):
return string[:i]
return string[:i] + string[i+1:]
strcut('aaabccdddd',2)

What is the meaning of the syntax as mentioned below?

I am working on the famous hamlet bot program to in Python 3.7. So I have a partial script (in the form of string input) from the famous Hamlet play of Shakespeare.
My task is to split the sentences of the script into lists and then further create list of the words in the sentences.
I am using the following code copied from the internet:
'''
### BEGIN SOLUTION
def hamsplits__soln0():
cleanham = ""
for char in hamlet_text:
swaplist = ["?","!", "."] #define the puntuations which we need to replace.
if char in swaplist:
cleanham += "." #replace all the puntuations with .
elif char is " ":
cleanham += char #convert all the spaces to character type.
elif char.isalpha():
cleanham += char.lower() #bringing all the letters in lower case.
hamlist = cleanham.split(". ") #spliting all the sentences as the parts of a list.
for sentence in hamlist:
hamsplits.append(sentence.split()) #spliting all the words of the sentences as the part of list.
if hamsplits[-1][-1][-1] == '.':
hamsplits[-1][-1] = hamsplits[-1][-1][:-1] # Remove trailing punctuation
'''
Here in I want to understand the meaning of the last two lines of the code.
if hamsplits[-1][-1][-1] == '.':
hamsplits[-1][-1] = hamsplits[-1][-1][:-1] # Remove trailing punctuation
If anyone can help me on this???

Let's suppose that hamsplits it's a 3D array.
The first line check that the last element in the last line of last plane is dot and then remove this last element from the last line
>>> x = [1, 2, 3]
>>> x = x[:-1] # Remove last element
>>> x
[1, 2]
Should have the same effect with
del hamsplits[-1][-1][-1]

Let's take an example, suppose we have hamsplits like
hamsplits=['test',['test1',['test2','.']]]
print(hamsplits[-1][-1][-1]) # it would be equal to '.'
if hamsplits[-1][-1][-1] == '.': # here we are comparing it with "."
hamsplits[-1][-1] = hamsplits[-1][-1][:-1] # in this we are just removing the '.' from third list in hamsplits and taking all remaining elements
print(hamsplits[-1][-1][:-1]) # it would print ['test2'] (removing last element from list) and overwriting in hamsplits[-1][-1]
**Note**:
hamsplits[:-1] is removing the last element, it's a slicing in python
hamsplits[-1] you are accessing the last element
Hope this helps!

Why is my index printing when I try to print the contents of the list?

I am trying to square the every digit in a number provided by the user. I am getting the correct output but I get an additional index at the end I'm not really sure why. I've put comments in my code to explain what I'm trying to do at every step. How do I get rid of that index on the end?
def square_digits(num):
print(num)
string_num =(str(num)) #convert the num to a string
for word in string_num: #iterate through every digit
word = int(word) #convert each digit to an int so we can square it
square_num = word * word
str_sq_num = list(str(square_num)) #create a list of those nums
for count in str_sq_num: #iterate through list to make it one number
print(count, end = "")
print(str_sq_num)
return str_sq_num
So an example number is being given 3212. I should output 9414, instead my output is 9414['4']. Another example is a number 6791, the output should be 3649811, but my output is 3649811['1'].

The problem is the way for loops work in python. The variable str_square_num is left over from the last iteration of for word in string_num.
For example, assuming your number is 12, in the first iteration str_square_num will be [1], or 1 squared. But this value is overriden in the second iteration, when it is set to [4], or 2 squared. Thus the array will always contain only the square of the last digit.
If your goal is to get the array of all indicies, try this:
def square_digits(num):
print(num)
string_num =(str(num)) #convert the num to a string
str_sq_num = []
for word in string_num: #iterate through every digit
word = int(word) #convert each digit to an int so we can square it
square_num = word * word
str_sq_num.extend(str(square_num)) #create a list of those nums
for count in str_sq_num: #iterate through list to make it one number
print(count, end = "")
print(str_sq_num)
return str_sq_num

I'm not sure exactly what you're trying to achieve here, but looking at your examples I suppose this should work:
def square_digits(num):
print(num)
string_num = (str(num))
str_sq_num = []
for word in string_num:
word = int(word)
square_num = word * word
str_sq_num.append(square_num)
for count in str_sq_num:
print(count, end = "")
return str_sq_num

Couldn't test it, but this should do it:
def square_digits(string_num):
return "".join([str(int(num)**2) for num in string_num])

Binary Search using a for loop, searching for words in a list and comparing

I'm trying to compare the words in "alice_list" to "dictionary_list", and if a word isnt found in the "dictionary_list" to print it and say it is probably misspelled. I'm having issues where its not printing anything if its not found, maybe you guys could help me out. I have the "alice_list" being appended to uppercase, as the "dictionary_list" is all in capitals. Any help with why its not working would be appreciated as I'm about to pull my hair out over it!
import re
# This function takes in a line of text and returns
# a list of words in the line.
def split_line(line):
return re.findall('[A-Za-z]+(?:\'[A-Za-z]+)?', line)
# --- Read in a file from disk and put it in an array.
dictionary_list = []
alice_list = []
misspelled_words = []
for line in open("dictionary.txt"):
line = line.strip()
dictionary_list.extend(split_line(line))
for line in open("AliceInWonderLand200.txt"):
line = line.strip()
alice_list.extend(split_line(line.upper()))
def searching(word, wordList):
first = 0
last = len(wordList) - 1
found = False
while first <= last and not found:
middle = (first + last)//2
if wordList[middle] == word:
found = True
else:
if word < wordList[middle]:
last = middle - 1
else:
first = middle + 1
return found
for word in alice_list:
searching(word, dictionary_list)
--------- EDITED CODE THAT WORKED ----------
Updated a few things if anyone has the same issue, and used "for word not in" to double check what was being outputted in the search.
"""-----Binary Search-----"""
# search for word, if the word is searched higher than list length, print
words = alice_list
for word in alice_list:
first = 0
last = len(dictionary_list) - 1
found = False
while first <= last and not found:
middle = (first + last) // 2
if dictionary_list[middle] == word:
found = True
else:
if word < dictionary_list[middle]:
last = middle - 1
else:
first = middle + 1
if word > dictionary_list[last]:
print("NEW:", word)
# checking to make sure words match
for word in alice_list:
if word not in dictionary_list:
print(word)

Your function split_line() returns a list. You then take the output of the function and append it to the dictionary list, which means each entry in the dictionary is a list of words rather than a single word. The quick fix it to use extend instead of append.
dictionary_list.extend(split_line(line))
A set might be a better choice than a list here, then you wouldn't need the binary search.
--EDIT--
To print words not in the list, just filter the list based on whether your function returns False. Something like:
notfound = [word for word in alice_list if not searching(word, dictionary_list)]

Are you required to use binary search for this program? Python has this handy operator called "in". Given an element as the first operand and and a list/set/dictionary/tuple as the second, it returns True if that element is in the structure, and false if it is not.
Examples:
1 in [1, 2, 3, 4] -> True
"APPLE" in ["HELLO", "WORLD"] -> False
So, for your case, most of the script can be simplified to:
for word in alice_list:
if word not in dictionary_list:
print(word)
This will print each word that is not in the dictionary list.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to find check only first index in each split string? - python

Related

Replace sequence of the same letter with single one

Remove string character after run of n characters in string

What is the meaning of the syntax as mentioned below?

Why is my index printing when I try to print the contents of the list?

Binary Search using a for loop, searching for words in a list and comparing

Categories

Resources