I have created the following program and imported a CSV file containing words related to common phone problems. My problem is, that it will pick out "smashed" but it won't pick out "smashed," because of the comma.
So, my question is, how can I make it read the word without the comma and not giving me any errors or anything?
Any help will be appreciated :)
import csv
screen_list = {}
with open('keywords.csv') as csvfile:
readCSV = csv.reader(csvfile)
for row in readCSV:
screen_list[row[0]] = row[1]
print("Welcome to the troubleshooting program. Here we will help you solve your problems which you are having with your phone. Let's get started: ")
what_issue = input("What is the issue with your phone?: ")
what_issue = what_issue.split(' ')
results = [(solution, screen_list[solution]) for solution in what_issue if solution in screen_list]
if len(results) > 6:
print('Please only insert a maximum of 6 problems at once. ')
else:
for solution, problems in results:
print('As you mentioned the word in your sentence which is: {}, the possible outcome solution for your problem is: {}'.format(solution, problems))
exit_program = input("Type 0 and press ENTER to exit/switch off the program.")
Your problem is when you split the what_issue string. The best solution is to use here a regular expression:
>>> import re
>>> what_issue = "My screen is smashed, usb does not charge"
>>> what_issue.split(' ')
['My', 'screen', 'is', 'smashed,', 'usb', 'does', 'not', 'charge']
>>> print re.findall(r"[\w']+", what_issue )
['My', 'screen', 'is', 'smashed', 'usb', 'does', 'not', 'charge']
You've encountered a topic in Computer Science called tokenization.
It looks like you want to remove all non-alphabetical characters from the user input. An easy way to do that is to use Python's re library, which has support for regular expressions.
Here's an example of using re to do this:
import re
regex = re.compile('[^a-zA-Z]')
regex.sub('', some_string)
First we create a regular expression that matches all characters that aren't letters. Then we use this regex to replace all the matching characters in some_string with an empty string, which deletes them from the string.
A quick-and-dirty method for doing the same thing would be to use the isAlpha method that belongs to all Python strings to filter out the unwanted characters.
some_string = ''.join([char for char in some_string if char.isAlpha()])
Here we make a list that only includes the alphabetical characters from some_string. Then we join it together to create a new string, which we assign to some_string.
Related
How to remove user defined letters from a user defined sentence in Python?
Hi, if anyone is willing to take the time to try and help me out with some python code.
I am currently doing a software engineering bootcamp which the current requirement is that I create a program where a user inputs a sentence and then a user will input the letters he/she wishes to remove from the sentence.
I have searched online and there are tons of articles and threads about removing letters from strings but I cannot find one article or thread about how to remove user defined letters from a user defined string.
import re
sentence = input("Please enter a sentence: ")
letters = input("Please enter the letters you wish to remove: ")
sentence1 = re.sub(letters, '', sentence)
print(sentence1)
The expected result should remove multiple letters from a user defined string, yet this will remove a letter if you only input 1 letter. If you input multiple letters it will just print the original sentence. Any help or guidance would be much appreciated.
If I understood correctly we can use str.maketrans and str.translate methods here like
from itertools import repeat
sentence1 = sentence.translate(str.maketrans(dict(zip(letters, repeat(None)))))
What this does line by line:
create mapping of letters to None which will be interpreted as "remove this character"
translation_mapping = dict(zip(letters, repeat(None))
create translation table from it
translation_table = str.maketrans(translation_mapping)
use translation table for given str
sentence1 = sentence.translate(translation_table)
Test
>>> sentence = 'Some Text'
>>> letters = 'te'
>>> sentence.translate(str.maketrans(dict(zip(letters, repeat(None)))))
'Som Tx'
Comparison
from timeit import timeit
print('this solution:',
timeit('sentence.translate(str.maketrans(dict(zip(letters, repeat(None)))))',
'from itertools import repeat\n'
'sentence = "Hello World" * 100\n'
'letters = "el"'))
print('#FailSafe solution using `re` module:',
timeit('re.sub(str([letters]), "", sentence)',
'import re\n'
'sentence = "Hello World" * 100\n'
'letters = "el"'))
print('#raratiru solution using `str.join` method:',
timeit('"".join([x for x in sentence if x not in letters])',
'sentence = "Hello World" * 100\n'
'letters = "el"'))
gives on my PC
this solution: 3.620041800000024
#FailSafe solution using `re` module: 66.5485033
#raratiru solution using `str.join` method: 70.18480099999988
so we probably should think twice before using regular expressions everywhere and str.join'ing one-character strings.
>>> sentence1 = re.sub(str([letters]), '', sentence)
Preferably with letters entered in the form letters = 'abcd'. No spaces or punctuation marks if necessary.
.
Edit:
These are actually better:
>>> re.sub('['+letters+']', '', sentence)
>>> re.sub('['+str(letters)+']', '', sentence)
The first also removes \' if it appears in the string, although it is the prettier solution
You can use a list comprehension:
result = ''.join([x for x in sentence if x not in letters])
Your code doesn't work as expected because the regex you provide only matches the exact combination of letters you give it. What you want is to match either one of the letters, which can be achieved by putting them in brackets, for example:
import re
sentence = input("Please enter a sentence: ")
letters = input("Please enter the letters you wish to remove: ")
regex_str = '[' + letters + ']'
sentence1 = re.sub(regex_str, '', sentence)
print(sentence1)
For more regex help I would suggest visiting https://regex101.com/
user_word = input("What is your prefered sentence? ")
user_letter_to_remove = input("which letters would you like to delete? ")
#list of letter to remove
letters =str(user_letter_to_remove)
for i in letters:
user_word = user_word.replace(i,"")
print(user_word)
I'm trying to solve this problem were they give me a set of strings where to count how many times a certain word appears within a string like 'code' but the program also counts any variant where the 'd' changes like 'coze' but something like 'coz' doesn't count this is what I made:
def count(word):
count=0
for i in range(len(word)):
lo=word[i:i+4]
if lo=='co': # this is what gives me trouble
count+=1
return count
Test if the first two characters match co and the 4th character matches e.
def count(word):
count=0
for i in range(len(word)-3):
if word[i:i+1] == 'co' and word[i+3] == 'e'
count+=1
return count
The loop only goes up to len(word)-3 so that word[i+3] won't go out of range.
You could use regex for this, through the re module.
import re
string = 'this is a string containing the words code, coze, and coz'
re.findall(r'co.e', string)
['code', 'coze']
from there you could write a function such as:
def count(string, word):
return len(re.findall(word, string))
Regex is the answer to your question as mentioned above but what you need is a more refined regex pattern. since you are looking for certain word appears you need to search for boundary words. So your pattern should be sth. like this:
pattern = r'\bco.e\b'
this way your search will not match with the words like testcodetest or cozetest but only match with code coze coke but not leading or following characters
if you gonna test for multiple times, then it's better to use a compiled pattern, that way it'd be more memory efficient.
In [1]: import re
In [2]: string = 'this is a string containing the codeorg testcozetest words code, coze, and coz'
In [3]: pattern = re.compile(r'\bco.e\b')
In [4]: pattern.findall(string)
Out[4]: ['code', 'coze']
Hope that helps.
Well i have a sort of telephone directory in a .txt file,
what i want to do is find all the numbers with this pattern e.g. 829-2234 and append the number 5 to the beginning of the numbers.
so the result now becomes 5829-2234.
my code begins like this:
import os
import re
count=0
#setup our regex
regex=re.compile("\d{3}-\d{4}\s"}
#open file for scanning
f= open("samplex.txt")
#begin find numbers matching pattern
for line in f:
pattern=regex.findall(line)
#isolate results
for word in pattern:
print word
count=count+1 #calculate number of occurences of 7-digit numbers
# replace 7-digit numbers with 8-digit numbers
word= '%dword' %5
well i don't really know how to append the prefix 5 and then overwrite the 7-digit number with 7-digit number with 5 prefix. I tried a few things but all failed :/
Any tip/help would be greatly appreciated :)
Thanks
You're almost there, but you got your string formatting the wrong way. As you know that 5 will always be in the string (because you're adding it), you do:
word = '5%s' % word
Note that you can also use string concatenation here:
word = '5' + word
Or even use str.format():
word = '5{}'.format(word)
If you're doing it with regex then use re.sub:
>>> strs = "829-2234 829-1000 111-2234 "
>>> regex = re.compile(r"\b(\d{3}-\d{4})\b")
>>> regex.sub(r'5\1', strs)
'5829-2234 5829-1000 5111-2234 '
I'm looking for a clean way to get a set (list, array, whatever) of words starting with # inside a given string.
In C#, I would write
var hashtags = input
.Split (' ')
.Where (s => s[0] == '#')
.Select (s => s.Substring (1))
.Distinct ();
What is comparatively elegant code to do this in Python?
EDIT
Sample input: "Hey guys! #stackoverflow really #rocks #rocks #announcement"
Expected output: ["stackoverflow", "rocks", "announcement"]
With #inspectorG4dget's answer, if you want no duplicates, you can use set comprehensions instead of list comprehensions.
>>> tags="Hey guys! #stackoverflow really #rocks #rocks #announcement"
>>> {tag.strip("#") for tag in tags.split() if tag.startswith("#")}
set(['announcement', 'rocks', 'stackoverflow'])
Note that { } syntax for set comprehensions only works starting with Python 2.7.
If you're working with older versions, feed list comprehension ([ ]) output to set function as suggested by #Bertrand.
[i[1:] for i in line.split() if i.startswith("#")]
This version will get rid of any empty strings (as I have read such concerns in the comments) and strings that are only "#". Also, as in Bertrand Marron's code, it's better to turn this into a set as follows (to avoid duplicates and for O(1) lookup time):
set([i[1:] for i in line.split() if i.startswith("#")])
the findall method of regular expression objects can get them all at once:
>>> import re
>>> s = "this #is a #string with several #hashtags"
>>> pat = re.compile(r"#(\w+)")
>>> pat.findall(s)
['is', 'string', 'hashtags']
>>>
I'd say
hashtags = [word[1:] for word in input.split() if word[0] == '#']
Edit: this will create a set without any duplicates.
set(hashtags)
there are some problems with the answers presented here.
{tag.strip("#") for tag in tags.split() if tag.startswith("#")}
[i[1:] for i in line.split() if i.startswith("#")]
wont works if you have hashtag like '#one#two#'
2 re.compile(r"#(\w+)") wont work for many unicode languages (even using re.UNICODE)
i had seen more ways to extract hashtag, but found non of them answering on all cases
so i wrote some small python code to handle most of the cases. it works for me.
def get_hashtagslist(string):
ret = []
s=''
hashtag = False
for char in string:
if char=='#':
hashtag = True
if s:
ret.append(s)
s=''
continue
# take only the prefix of the hastag in case contain one of this chars (like on: '#happy,but i..' it will takes only 'happy' )
if hashtag and char in [' ','.',',','(',')',':','{','}'] and s:
ret.append(s)
s=''
hashtag=False
if hashtag:
s+=char
if s:
ret.append(s)
return set(ret)
Another option is regEx:
import re
inputLine = "Hey guys! #stackoverflow really #rocks #rocks #announcement"
re.findall(r'(?i)\#\w+', inputLine) # will includes #
re.findall(r'(?i)(?<=\#)\w+', inputLine) # will not include #
v=vi nod-u
i want to split this string to obtain
l=[vi],[nod],[u]
l.split(" ") splits on the basis of space.
And i dont know the usage of the regular expression import functions properly.
Could anyone explain how to do that?
Are you trying to split the string to get words? If so, try the following:
>>> import re
>>> pattern = re.compile(r'\W+')
>>> pattern.split('vi nod-u')
['vi', 'nod', 'u']