Remove punctuation from list using for loop - python

I'm using python 2.7.13 and I'm stuck on an assignment. I'm completely new to python.
I'm supposed to remove punctuation from names in a list, this is the code that has been given to me:
import string
name = ""
result = []
persons = [["Lisa", "Georgia"],
["Chris", "New York"],
["Wes", "Oregon"],
["Jo-Ann", "Texas"],
["Angie", "Florida"]]
I want to print the exact same list, except "Jo-Ann" needs to be printed as "JoAnn". The assignment says that I need to check every character and if it's not a punctuation I need to add it to the variable "name". I'm completely lost; I have no idea how to do this with a for loop.
My teacher gave me some pointers:
for every letter in name
if letter is not a punctuation, add to variable "name"
print
This doesn't make things clearer for a complete newbie like me. Is there someone that can give me some pointers? I would very much appreciate it.

try this:
import string
new_persons = [[x[0].translate(None, string.punctuation), x[1]] for x in persons]
Explanation:
to remove punctuations from a string, we can use 'one-example'.translate(None, string.punctuation)
[... for x in persons] is a list comprehension (short-hand looping) to create a new list by using elements( assigned to x on every loop) in the list persons
Within a loop iteration, x is just the inner array of two elements. e.g. ["Jo-Ann", "Texas"]
x[0] is "Jon-Ann" and x[1] is "Texas"
[x[0].translate(None, string.punctuation), x[1]] means we create an array of two elements from x but with the punctuations removed from the first one.

I think this a pretty obvious and simple way how a beginner could do it.
import string
result = []
# Loop over the [name, state] pairs.
for [name, state] in persons:
# Make a new name by only keeping desired
# characters.
newName = ""
for letter in name:
if letter not in string.punctuation:
newName += letter
# Add to result.
result.append([newName, state])
It makes use of a few very handy Python tricks to know!
The first one is unpacking of composed values in a loop, in this case the [name, state] pairs. It roughly amounts to doing like
[a, b] = [1, 2]
to extract values from a list.
The second is implicit looping over characters in a string. If you write
for l in "word":
print(l)
you'll see that each letter is printed on a new line. Python automatically splits a string in characters.
Afterwards, you can start looking into list comprehensions.

get all punctuation marks in a string
use triple quotes for marking start and end of string
import string
punc_marks=string.punctuation
name = ''
for i in persons:
for j in i:
for k in j:
if k not in punc_marks:
name+=k
print(name+"\n")

Here is an example using your teacher's approach, basically what you should learn right now and what your teacher wants:
import string
name = ""
result = []
persons = [["Lisa", "Georgia"],
["Chris", "New York"],
["Wes", "Oregon"],
["Jo-Ann", "Texas"],
["Angie", "Florida"]]
for person in persons:
for every_letter in person[0]: # name
if every_letter not in string.punctuation: # check if it isn't a punctuation
name += every_letter # add it to name
result.append([name, person[1]]) # add the name with no punctuation to result
name = "" # reset name
print(result)
Please try to learn from this and not just copy and paste to your home work

from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')
tokenizer=tokenizer.tokenize(persons)

Related

How do I count the number of words spoken by each character in a dialogue and store the count in a dictionary?

I'm trying to count the number of words spoken by the characters "Michael" and "Jim" in the following dialogue and store them in a dictionary that looks like like {"Michael:":15, "Jim:":10}.
string = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."
I thought of creating an empty dictionary containing the character names as keys, splitting the string by " " and then counting the number of resulting list elements between the the character names by using the keys as a reference and then storing the count of words as values. This is the code I've used so far:
dict = {"Michael:" : 0,
"Jim:" : 0}
list = string.split(" ")
indices = [i for i, x in enumerate(list) if x in dict.keys()]
nums = []
for i in range(1,len(indices)):
nums.append(indices[i] - indices[i-1])
print(nums)
The result is a list that prints as [15, 10, 15, 9]
I think I need help with the following:
A better approach if possible
A way to count the number of words spoken by a character when that line is the last line of the dialogue
A way to update the dict with an automatic count of words spoken by the character
The last point is crucial because I'm trying to replicate this process for an episode's worth of quotes.
Thank you in advance!
Loop through the words, incrementing the appropriate counts as you go.
dialogue_dict = {"Michael:" : 0, "Jim:" : 0}
words = string.split(" ")
current_character = None
for word in words:
if word in dialogue_dict:
current_character = word
elif current_character:
dialogue_dict[current_character] += 1
BTW, don't use list and dict as variable names, that overwrites the built-in functions with those names.
Use a regex to split by character names, keeping the character separators,
then iterate on the character/line pairs using chunks of 2.
use a collections.defaultdict(int) to automatically add a new character at 0 and add the word split for the current line,
string_ = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."
import re
from collections import defaultdict
#This assumes a character name has no blanks and is followed by a `:`
pat = re.compile("([A-Z][a-z'-]+:)")
#splitting like returns the delimeters (characters) as well
li = [v for v in pat.split(string_) if v]
# split 2 by 2
def chunks(l, n):
n = max(1, n)
return (l[i:i+n] for i in range(0, len(l), n))
#use a defaultdict to start new characters at 0
#collections.Counter could also work
counter = defaultdict(int)
pairs = chunks(li,2)
for character, line in pairs:
counter[character.rstrip(":")] += len(line.split())
print(f"{counter=}")
output:
counter=defaultdict(<class 'int'>, {'Michael': 38, 'Jim': 17})
we can do this using regex.without provide speaker name
import re
string = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."
dialog_count = {}
#extract speakers using regex
speakers = re.findall(r'\w+:',string)
#split sentences using regex
sentencs = re.split(r'\w+:',string)
speakers = filter(lambda x: x.strip()!='' ,speakers)
sentencs = filter(lambda x: x.strip()!='' ,sentencs)
#remap each speaker to it's sentence
dialogs = zip(list(speakers),list(sentencs))
#count total words
for speaker,dialog in dialogs:
dialog = dialog.split(" ")
dialog = list(filter(lambda x: x.strip()!='',dialog))
dialog_count[speaker] = dialog_count.get(speaker,0) + len(dialog)
print(dialog_count)
{'Michael:': 38, 'Jim:': 17}

Python coding flaw for making acronyms

The code written below should give results like below. For example, if input is ' Lion head and Snake tail', output should be - 'LHAST'.
Instead the result is 'LLLLL'. Please check my code. If possible please suggest better practice and help me with better code.
Code is as follows:
#ask for Input
name = input('Input words to make acroname :')
#make all in caps
name = name.upper()
#turn them in list
listname = name.split()
#cycle through
for namee in listname:
#Get the first letter & type in same line
print(name[0],end="")
print()
input (' press a key to move out' )
You may correct your code. Instead of print(name[0]) you should use print(namee[0]) as you want to print the first letter of the word, not the original name.
A good practice is to name the variables the more descriptive you can so as to avoid such typos.
If you want to print the acronym in same line I would suggest to use below code to get variable acronym with the desired output:
phrase = raw_input('Input words to make acronym:')
phrase = phrase.upper()
list_words = phrase.split()
acronym = [word[0] for word in list_words]
acronym = "".join(acronym)
print acronym
You could use str.join with a generator-expression for a one-line solution to the problem:
>>> name = "Lion head and Snake tail"
>>> ''.join(i[0].upper() for i in name.split())
'LHAST'
why?
Well if we start from inside the generator, we are iterating through name.split(). The .split method of a str returns a list of all the different strings which have been found by splitting on what is passed into the method. The default character is a space and since we want the words, this is fine for us.
We then say that for each word i in this list, take the first character from the string with: i[0]. We then convert this to upper case with str.upper().
Then, the final step is to join all these characters together and that is done with the str.join method.
Simply:
print ''.join([P[0] for P in input('Input words to make acroname :').upper().split()])
Use input('') for python 3 and raw_input('') for python 2

How to print random items from a dictionary?

everyone. I'm trying to complete a basic assignment. The program should allow a user to type in a phrase. If the phrase contains the word "happy" or "sad", that word should then be randomly replaced by a synonym (stored in a dictionary). The new phrase should then be printed out. What am I doing wrong? Every time I try to run it, the program crashes. This is the error I get:
0_part1.py", line 13, in <module>
phrase["happy"] = random.choice(thesaurus["happy"])
TypeError: 'str' object does not support item assignment
Here is what I have so far:
import random
thesaurus = {
"happy": ["glad", "blissful", "ecstatic", "at ease"],
"sad": ["bleak", "blue", "depressed"]
}
phrase = input("Enter a phrase: ")
phrase2 = phrase.split(' ')
if "happy" in phrase:
phrase["happy"] = random.choice(thesaurus["happy"])
if "sad" in phrase:
phrase["sad"] = random.choice(thesaurus["sad"])
print(phrase)
The reason for your error is that phrase is a string, and strings are immutable. On top of that, strings are sequences, not mappings; you can index them or slice them (e.g., happy_index = phrase.find("happy"); phrase[happy_index:happy_index+len("happy")]), but you can't use them like dictionaries.
If you want to create a new string, replacing the substring happy with another word, use the replace method.
And there's no reason to check first; if happy isn't found, replace wil do nothing.
So:
phrase = phrase.replace("happy", random.choice(thesaurus["happy"]))
While we're at it, instead of explicitly looking up each key, you may want to loop over the dictionary and apply all the synonyms:
for key, replacements in thesaurus.items():
phrase = phrase.replace(key, random.choice(replacements))
Finally, notice that this code will replace all instances of happy with the same replacement. Which I think your intended code was also trying to do. If you want to replace each of them with a separately randomly-chosen synonym, that's a bit more complicated. You could loop over phrase.find("happy", offset) until it returns -1, but a neat trick might make it simpler: split the string around each instance of happy, substitute in a different synonym for each split part, then join them all back together. Like this:
parts = phrase.split("happy")
parts[:-1] = [part + random.choice(thesaurus["happy"]) for part in parts[:-1]]
phrase = ''.join(parts)
Generate a random number from (0..[size of list - 1]). Then, access that index of the list. To get the length of a list, just do len(list_name).

Create new words from start word python

def make_new_words(start_word):
"""create new words from given start word and returns new words"""
new_words=[]
for letter in start_word:
pass
#for letter in alphabet:
#do something to change letters
#new_words.append(new_word)
I have a three letter word input for example car which is the start word.
I then have to create new word by replacing one letter at a time with every letter from the alphabet. Using my example car I want to create the words, aar, bar, car, dar, ear,..., zar. Then create the words car, cbr, ccr, cdr, cer,..., czr. Finally caa, cab, cac, cad, cae,..., caz.
I don't really know what the for loop should look like. I was thinking about creating some sort of alphabet list and by looping through that creating new words but I don't know how to choose what parts of the original word should remain. The new words can be appended to a list to be returned.
import string
def make_new_words(start_word):
"""create new words from given start word and returns new words"""
new_words = []
for i, letter in enumerate(start_word):
word_as_list = list(start_word)
for char in string.ascii_lowercase:
word_as_list[i] = char
new_words.append("".join(word_as_list))
return new_words
lowercase is just a string containing the lowercase letters...
We want to change each letter of the original word (here w) so we
iterate on the letters of w, but we'll mostly need the index of the letter, so we do our for loop on enumerate(w).
First of all, in python strings are immutable so we build a list x from w... lists are mutable
Now a second, inner loop on the lowercase letters: we change the current element of the x list accordingly (having changed x, we need to reset it before the next inner loop) and finally we print it.
Because we want to print a string rather than the characters in a list, we use the join method of the null string '' that glue together the elements of x using, of course, the null string.
I have not reported the output but it's exactly what you've asked for, just try...
from string import lowercase
w = 'car'
for i, _ in enumerate(w):
x = list(w)
for s in lowercase:
x[i] = s
print ''.join(x)
import string
all_letters = string.ascii_lowercase
def make_new_words(start_word):
for index, letter in enumerate(start_word):
template = start_word[:index] + '{}' + start_word[index+1:]
for new_letter in all_letters:
print template.format(new_letter)
You can do this with two loops, by looping over the word and then looping over a range for all letters. By keeping an index for the first loop, you can use a slice to construct your new strings:
for index in enumerate(start_word):
for let in range(ord('a'), ord('z')+1):
new_words.append(start_word[:index] + chr(let) + start_word[index+1:])
This could work as a brute-force approach, although you might end up with some performance issues when you go to try it with longer words.
It also sounds like you might want to constrain it only to words that exist in a dictionary at some point, which is a whole other can of worms.
But for right now, for three-letter words, you're onto something of the right track, although I worry that the question might be a little too specific for Stack Overflow.
First, you will probably have more success if you loop through the index for the word, rather than the letter:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
for i in range(len(start_word)):
Then, you can use a slice to grab the letters before and after the index.
for letter in alphabet:
new_word = start_word[:i] + letter + start_word[i + 1:]
Another approach is given above, which casts the string to a list. That works around the fact that python will disallow simply setting start_word[i] = letter, which you can read about here.

separate line in words by slash and use /W but avoid :

i am trying to parse a txt file with a lot of lines like this:
470115572 cms_trk_dcs_05:CAEN/CMS_TRACKER_SY1527_7/branchController00/easyCrate3/easyBoard16/channel003
i am making a dictionary where the key is the first number on the line, and the values are (for each key) the words separated by the slash "/", every one of this words is saved into a list, for example list1 gets all cms_trk_dcs_05:CAEN, list2 would be all CMS_TRACKER_SY1527_7, etc
but when i use pattern = re.split('\W',line) to split the line, it takes into account
the ":" character, i mean when i try to print cms_trk_dcs_05:CAEN it only returns cms_trk_dcs_05, how can i save in the list all the word cms_trk_dcs_05:CAEN, and save in my list all the words separated by slash
I am new at python, so i apologize if this is for dummys
anyway thank you in advance
Use split() to match first the space after the number, and then the '/':
>>> stringin = "470115572 cms_trk_dcs_05:CAEN/CMS_TRACKER_SY1527_7/branchController00/easyCrate3/easyBoard16/channel003"
>>> splitstring = stringin.split(' ')
>>> num = splitstring[0]
>>> stringlist = splitstring[1].split('/')
>>> num
'470115572'
>>> stringlist
['cms_trk_dcs_05:CAEN', 'CMS_TRACKER_SY1527_7', 'branchController00', 'easyCrate3', 'easyBoard16', 'channel003']
>>>
Or as a (less obvious) one-liner:
>>> [x.split('/') for x in stringin.split(' ')]
[['470115572'], ['cms_trk_dcs_05:CAEN', 'CMS_TRACKER_SY1527_7', 'branchController00', 'easyCrate3', 'easyBoard16', 'channel003']]
Note, though, that the second approach creates the first element as a list.
As in Trimax's comment: : (colon) is a nonword character, so to split line correctly you need to include it in pattern. Or use SiHa's answer.
About pattern, \W equals to [^a-zA-Z0-9_] (https://docs.python.org/2/library/re.html#regular-expression-syntax), so you can just add colon to it: [^a-zA-Z0-9_:]
As for second part, just use first element of result list as dict key and assign remained list to it in form of slice.
Something like this:
result_dict = {}
for line in file_lines:
line_splitted = re.split('[^a-zA-Z0-9_:]+', line)
result_dict[line_splitted[0]] = line_splitted[1:]
Note though, if your text contains lines with same numbers, you'll lose data, as when assigning new value (list of words in this case) to existing key, it will overwrite previous value.

Categories

Resources