Python - Capture string with or without specific character - python

I am trying to capture the sentence after a specific word. Each sentences are different in my code and those sentence doesn't necessarily have to have this specific word to split by. If the word doesn't appear, I just need like blank string or list.
Example 1: working
my_string="Python is a amazing programming language"
print(my_string.split("amazing",1)[1])
programming language
Example 2:
my_string="Java is also a programming language."
print(my_string.split("amazing",1)[1]) # amazing word doesn't appear in the sentence.
Error: IndexError: list index out of range
Output needed :empty string or list ..etc.
I tried something like this, but it still fails.
my_string.split("amazing",1)[1] if my_string.split("amazing",1)[1] == None else my_string.split("amazing",1)[1]

When you use the .split() argument you can specify what part of the list you want to use with either integers or slices. If you want to check a specific word in your string you can do is something like this:
my_str = "Python is cool"
my_str_list = my_str.split()
if 'cool' in my_str_list:
print(my_str)`
output:
"Python is cool"
Otherwise, you can run a for loop in a list of strings to check if it finds the word in multiple strings.

You have some options here. You can split and check the result:
tmp = my_string.split("amazing", 1)
result = tmp[1] if len(tmp) > 1 else ''
Or you can check for containment up front:
result = my_string.split("amazing", 1)[1] if 'amazing' in my_string else ''
The first option is more efficient if most of the sentences have matches, the second one if most don't.
Another option similar to the first is
result = my_string.split("amazing", 1)[-1]
if result == my_string:
result = ''
In all cases, consider doing something equivalent to
result = result.lstrip()

Instead of calling index 1, call index -1. This calls the last item in the list.
my_string="Java is also a programming language."
print(my_string.split("amazing",1)[1])
returns ' programming language.'

Related

How to search and get the second word in a string that has two of the search word in Python 3?

I am making a Python script that finds the word "Hold" in a list string and confirms if it is holdable or not.
File = [
"Hold_Small_Far_BG1_123456789.jpg",
"Firm_Large_Near_BG1_123456789.jpg",
"Move_Large_Far_BG1_123456789.jpg",
"Firm_Large_Far_BG1_123456789.jpg",
"Hold_Small_Hold_BG1_123456789.jpg",
"Hold_Small_Near_BG1_123456789.jpg",
"Small_Small_Far_BG1_123456789.jpg",
]
for item in File:
if "Hold" in item: return print('Yes, object is holdable.')
else: return print('No, object is not holdable.')
The code above sees the first 'Hold' word and returns true. The holdable objects are the ones that have 'Hold' as the third word.
The problem is the code sees the first 'Hold' word and returns true. I want the code to check if there's a word 'Hold' in the filename while ignoring the first 'Hold' word.
Please note that I cannot split the string using the '_' because it is generated by people. So, sometimes it can be a comma, dot, or space even.
Is there an expression for this? Sorry for the bad English.
Thank you. :)
You can use a regex pattern:
import re
holdables = ['yes' if re.findall(r'\w.*(Hold)', x) else 'no' for x in File]
for x in holdables:
print(x)
The regex here only assumes that 'Hold' is not the first word in the string but does exist elsewhere, since you said you can't be sure whether underscores or other delimiters will be present. If you need more stringent conditions for the regex pattern, you can always update it.
If I have understood the question correctly, we want to find if the filename contains "Hold" anywhere ignoring the first occurrence. Without definite separators, however, it is difficult. Here are two approaches that I think could work:
Using regex like:
import re
for fname in File:
if re.match("(^.+Hold.*$)", fname):
#code if hold is found
Assumptions: This answer relies on the assumption that Hold can only occur only in the third position and first position if it does occur. We ignore the first in this case and search for the third "hold"
>>> re.match("(^.+Hold.*$)", "Hold_Small_Hold_BG1_123456789.jpg")
<re.Match object; span=(0, 33), match='Hold_Small_Hold_BG1_123456789.jpg'>
>>> re.match("(^.+Hold.*$)", "Hold_Small_Far_BG1_123456789.jpg")
>>>
Use split()
We can split the string with "Hold". When "Hold" is present in the third position, we get a list with either 2 or 3 elements.
for fname in File:
if len(fname.split("Hold")) == 3:
#code if hold is found
Again the assumption is that Hold can only occur only at the third position and first position if it does occur.
>>> "Hold_Small_Hold_BG1_123456789.jpg".split("Hold")
['', '_Small_', '_BG1_123456789.jpg'] #list with 3 elements
>>> "Hold_Small_Far_BG1_123456789.jpg".split("Hold")
['', '_Small_Far_BG1_123456789.jpg'] #list with 2 elements
i = 0
while (i< len(File)):
s = File[i]; ct = 0
ct = s.count('Hold')
if ct >1:
print ('Yes, object is holdable.')
else:
print ('No, object is not holdable.')
i+=1
Edited, now it works only if 'Hold' appears more than once.

Finding letters within items in lists in Python

I'm trying to go through a list to find letter combinations that don't exist in English. After a fair amount of arguing, I have a word list that I can mess with. Each word is listed as 'word\n' since each word is on a line. If I wanted to find, say, the word 'winter', if in works but only if I'm looking for 'winter\n'. I can't look just for 'winter' so I can't find individual letter pairs which is the goal.
There's over a quarter million items, so I can't cycle through the list every time, it would take ages. I don't care about index, I just need a true/false of if a letter pair is anywhere in the list.
Sorry if this was a bit rambly, I hope I got my point across. Thanks!
Assuming you don't want to alter your wordlist, it sounds like you're looking for something like this:
def search(word_list, word): # word_list is your list of words, word is the word you're searching for
for w in word_list: # iterate over the list
if w.startswith(word): # check if any of them start with the word you're looking for
return True # return true if a match is found
return False # return false if no matches are found
If you instead want to find a substring anywhere in a word instead of at the beginning, replace w.startswith(word) with word in w.
There are several ways to do that but the easiest one is like this:
flag = True
STRING = 'YOUR STRING'
def check(letter):
for k in range(33 ,127):
if chr(k) == letter:
return True
return False
for i in STRING:
if not check(i):
break
flag = False
The reason for 33 and 127 in for loop is that they are the ascii code for English words and other things(such as: ?,!,*,(,), etc)
Notice: This code is just for one string!
And also you can use regex library to do that.
You can create a variable like pattern like this:
pattern = '[A-Za-z]'
this pattern is for all of the English letters.
And then:
new_string = re.sub(pattern,STRING,'')
if new_string == '':
flag = True
else:
flag = False
sub method is just like replace and you give a pattern, a string and the replace for pattern in string.
So we replace all of the English letters in a string with '' and when there is nothing left on your string it means that your string is made of English letters.
But I'm not sure about syntax for re. You have to take look at doc.
If you are looking for a fast algorithm, DO NOT USE THE FIRST WAY! BECAUSE THE ORDER OF CODE IS O(2) FOR A SINGLE STRING(NOT A LIST)

I want to split a string by a character on its first occurence, which belongs to a list of characters. How to do this in python?

Basically, I have a list of special characters. I need to split a string by a character if it belongs to this list and exists in the string. Something on the lines of:
def find_char(string):
if string.find("some_char"):
#do xyz with some_char
elif string.find("another_char"):
#do xyz with another_char
else:
return False
and so on. The way I think of doing it is:
def find_char_split(string):
char_list = [",","*",";","/"]
for my_char in char_list:
if string.find(my_char) != -1:
my_strings = string.split(my_char)
break
else:
my_strings = False
return my_strings
Is there a more pythonic way of doing this? Or the above procedure would be fine? Please help, I'm not very proficient in python.
(EDIT): I want it to split on the first occurrence of the character, which is encountered first. That is to say, if the string contains multiple commas, and multiple stars, then I want it to split by the first occurrence of the comma. Please note, if the star comes first, then it will be broken by the star.
I would favor using the re module for this because the expression for splitting on multiple arbitrary characters is very simple:
r'[,*;/]'
The brackets create a character class that matches anything inside of them. The code is like this:
import re
results = re.split(r'[,*;/]', my_string, maxsplit=1)
The maxsplit argument makes it so that the split only occurs once.
If you are doing the same split many times, you can compile the regex and search on that same expression a little bit faster (but see Jon Clements' comment below):
c = re.compile(r'[,*;/]')
results = c.split(my_string)
If this speed up is important (it probably isn't) you can use the compiled version in a function instead of having it re compile every time. Then make a separate function that stores the actual compiled expression:
def split_chars(chars, maxsplit=0, flags=0, string=None):
# see note about the + symbol below
c = re.compile('[{}]+'.format(''.join(chars)), flags=flags)
def f(string, maxsplit=maxsplit):
return c.split(string, maxsplit=maxsplit)
return f if string is None else f(string)
Then:
special_split = split_chars(',*;/', maxsplit=1)
result = special_split(my_string)
But also:
result = split_chars(',*;/', my_string, maxsplit=1)
The purpose of the + character is to treat multiple delimiters as one if that is desired (thank you Jon Clements). If this is not desired, you can just use re.compile('[{}]'.format(''.join(chars))) above. Note that with maxsplit=1, this will not have any effect.
Finally: have a look at this talk for a quick introduction to regular expressions in Python, and this one for a much more information packed journey.

Procedure in Python

Question
Write a procedure that takes a string of words separated by spaces (assume no punctuation or capitalization), together with a ”target” word, and shows the position of the target word in the string of words. For example, if the string is:
'we dont need no education we dont need no thought control no we dont'
and the target is the word ”dont” then your procedure should return the list 1, 6, 13 because ”dont” appears at the 1st, 6th, and 13th position in the string. (We start counting positions of words in the string from 0.) Your procedure should return False if the target word doesn’t appear in the string.
My solution-
def procedure(string,target):
words=string.split(" ") #turn the string into a list of words
solution=[] #list that will be displayed
for i in range(len(words)):
if words[i]==target: solution.append(i)
if len(solution)==0: return False
return solution
string="we dont need no education we dont need no thought control no we dont"
print procedure(string, "dont")
assert procedure(string, "dont")
Why is this not running in python?! The problem is on print procedure(string, "dont") it mentions invalid syntax. I am running it in the IDLE.
The following is your code with the indentation fixed, compare this with what you posted and you should see why it now works.
It is unclear to me why your original code has a problem because the indentation controls how python views the blocks of code and will fail to run if the indentation is incorrect. I suspect that your problem is that you had these lines in your code:
for i in range(len(words)):
if words[i]==target: solution.append(i)
if len(solution)==0: return False
The above will fail and return False because solution length will be 0 on the first iteration if your word is not found on the first iteration, you should check the len of solution outside the scope of the for loop.
In [42]:
def procedure(string,target):
words=string.split(" ") #turn the string into a list of words
solution=[] #list that will be displayed
for i in range(len(words)):
if words[i]==target: solution.append(i)
if len(solution)==0: return False
return solution
string="we dont need no education we dont need no thought control no we dont"
print(procedure(string, "dont"))
assert(procedure(string, "dont"))
[1, 6, 13]
You can user a list comprehension for this:
def list_word_indexes(word, text):
return [index for index, text_word in enumerate(text.split())
if text_word == word]
The problem is on print procedure(string, "dont") it mentions invalid syntax
This means you are using python 3, where print is a function and not a statement. You should add brackets around the argument(s) to print or make sure to use python 2.
eg.
print(procedure(string, "dont"))

Replacing reoccuring characters in strings in Python 3.1

Is it possible to replace a single character inside a string that occurs many times?
Input:
Sentence=("This is an Example. Thxs code is not what I'm having problems with.") #Example input
^
Sentence=("This is an Example. This code is not what I'm having problems with.") #Desired output
Replace the 'x' in "Thxs" with an i, without replacing the x in "Example".
You can do it by including some context:
s = s.replace("Thxs", "This")
Alternatively you can keep a list of words that you don't wish to replace:
whitelist = ['example', 'explanation']
def replace_except_whitelist(m):
s = m.group()
if s in whitelist: return s
else: return s.replace('x', 'i')
s = 'Thxs example'
result = re.sub("\w+", replace_except_whitelist, s)
print(result)
Output:
This example
Sure, but you essentially have to build up a new string out of the parts you want:
>>> s = "This is an Example. Thxs code is not what I'm having problems with."
>>> s[22]
'x'
>>> s[:22] + "i" + s[23:]
"This is an Example. This code is not what I'm having problems with."
For information about the notation used here, see good primer for python slice notation.
If you know whether you want to replace the first occurrence of x, or the second, or the third, or the last, you can combine str.find (or str.rfind if you wish to start from the end of the string) with slicing and str.replace, feeding the character you wish to replace to the first method, as many times as it is needed to get a position just before the character you want to replace (for the specific sentence you suggest, just one), then slice the string in two and replace only one occurrence in the second slice.
An example is worth a thousands words, or so they say. In the following, I assume you want to substitute the (n+1)th occurrence of the character.
>>> s = "This is an Example. Thxs code is not what I'm having problems with."
>>> n = 1
>>> pos = 0
>>> for i in range(n):
>>> pos = s.find('x', pos) + 1
...
>>> s[:pos] + s[pos:].replace('x', 'i', 1)
"This is an Example. This code is not what I'm having problems with."
Note that you need to add an offset to pos, otherwise you will replace the occurrence of x you have just found.

Categories

Resources