Converting to lower-case: every letter gets tokenized - python

I have a text document that I want to convert to lower case, but when I do it in the following way every letter of my document gets tokenized. Why does it happen?
with open('assign_1.txt') as g:
assign_1 = g.read()
assign_new = [word.lower() for word in assign_1]
What I get:
assign_new
['b',
'a',
'n',
'g',
'l',
'a',
'd',
'e',
's',
'h',]

You iterated through the entire input, one character at a time, dropped each to lower-case, and specified the result as a list. It's simpler than that:
assign_lower = g.read().lower()
Using the variable "word" doesn't make you iterate over words -- assign_1 still a sequence of characters.
If you want to break this into words, use the split method ... which is independent of the lower-case operation.

Related

Can you add characters from a string to a list?

I'm wondering if it's possible to take a string e.g. str(input()) and split it into individual chars, then add them to a list. I'm trying to make a simple script (something similar to a hangman game) and at the beginning I wrote this:
x=input('Choose word: ').lower()
letters=[]
letters.append(list(x))
print(letters)
but this code appends the whole list to a list and not individual chars
Edit: this outputs [['o', 'u', 't', 'p', 'u', 't']] meaning that the whole list got appended as one item, but I want this to output ['o', 'u', 't', 'p', 'u', 't'], how do I make it append individual chars and not the whole list
You are simply wrapping the char list in another list.
Try this one-liner instead:
print(list(x))
If you want to remove a character:
letters = list(x)
letters.remove('o')
print(letters)
Use extend instead of append function.
#Extend
x=input('Choose word: ').lower()
letters=[]
letters.extend(list(x))
print(letters)
# ['p', 'y', 't', 'h', 'o', 'n']
And to remove a character from a list while retaining position as blank after removing, use replace while within a list:
y=input("Choose a letter to remove: ").lower()
removed=[s.replace(y,'') for s in letters]
print(removed)
#['p', '', 't', 'h', 'o', 'n']
I hope this help, unless its different from what you want. Then let me know. Otherwise, happy coding!
You don't need to create an empty list and then populate it with individual letters. Simply apply the list() function directly for the user input to create it:
letters = list(input('Choose word: ').lower())
print(letters)
For adding letters from the other user input, use the same approach with the .extend() method:
letters.extend(input('Choose word: ').lower()) # No need to use list() here
A simple one liner:
x = input().lower().split()
print(x)
here we are taking the input and then we are converting to lowercase and then using the split function which will split the string on white spaces you can split the string on whatever string you feel like just give the string you want to split on as the argument in the split function for example:
x = input().lower().split(',')
print(x)
this will split on the ',' so you can give the input in csv format
You may use the + operator (preferably in the form of an augmented assignment statement, i.e. +=, for extending the list to an iterable.
No need to use the list() function here, because the string is iterable:
letters = []
letters += input('Choose word: ').lower()
print(letters)
this outputs [['o', 'u', 't', 'p', 'u', 't']] meaning that the whole
list got appended as one item, but i want this to output ['o', 'u', 't', 'p', 'u', 't']
Based on you comment, you can use:
x = [*input('Choose word: ').lower()]
print(x)
# ['p', 'y', 't', 'h', 'o', 'n']
Demo

How to split a string by spaces and remove non-ASCII characters?

When I am given a string like "Ready[[[, steady, go!", I want to turn it into a list like this: [Ready, steady, go!]. Currently, the best I could do are two list comprehensions but I couldn't figure out a way to combine them.
text_list = [i for i in text.split()]
output: ['Ready[[[,', 'steady,', 'go!']
clean_list = [x for x in list(text) if x in string.ascii_letters]
output: ['R', 'e', 'a', 'd', 'y', 's', 't', 'e', 'a', 'd', 'y', 'g', 'o']
clean_list does succeed in removing non-ASCII letters but literally turns every single character into a list element. text_list keeps the format intact but does not remove non-ASCII characters. How do I combine the two logics to give me the output that I want?
This should work:
import re, string
# filter out all unwanted characters using regex
pattern = re.compile(f"[^{string.ascii_letters} !]")
filtered = pattern.sub('', "Ready[[[, steady, go!")
# split
result = filtered.split()

Python Function That Receives Letter, Returns (0-Based) Numerical Position Within Alphabet

I'm trying to create a Python function that receives a letter (a string with only one alphabetic character) and returns the 0-based numerical position of that letter in the alphabet. It should not be case-sensitive, and I can't use import.
So entering "a" should return
0
Entering "A" should also return
0
Entering "O" should return
14
And so on.
I had noticed this question but the top answer uses import and the second answer doesn't make any sense to me / doesn't work. I tried to apply the second answer like this:
letter = input("enter a letter")
def alphabet_position(letter):
return ord(letter) - 97
print((alphabet_position)(letter))
but I got a TypeError:
TypeError: ord() expected a character, but string of length 2 found
Just like the asker in the question that I linked, I'm also trying to send the characters "x" amount of steps back in the alphabet, but in order to do that I need to create this helper function first.
I'm thinking there must be a way to store the letters in two separate lists, one lower-case and one upper-case, and then see if the string that the user entered matches one of the items in that list? Then once we find the match, we return it's (0-based) numerical position?
letter = input("enter a letter")
def alphabet_position(letter):
position = 0
#letter_position = index value that matches input
lower_case_list ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
'y', 'z']
upper_case_list ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
'Y','Z']
#if letter is in lower_case_list, return it's 0-based numerical position.
#else, if letter is in upper_case_list, return it's 0-based numerical position.
#else, print("Enter a valid letter")
return letter_position
Please help if you have any suggestions. Thank you.
It's probably simpler to just convert from uppercase or lowercase to specifically lowercase with the .lower() method, and use the built in list of letters (string.ascii_lowercase). You can find the index of a list's element by using the .index() method.
import string
letter = input('enter a letter: ')
def alphabet_position(letter):
letter = letter.lower()
return list(string.ascii_lowercase).index(letter)
print(alphabet_position(letter))
When you called alphabet_position, it is expecting an argument so you need to do func_name(arg) format.
Another way you could do this is to use dictionary comprehension to create a dict of letter-position pairs, like so:
from string import lowercase as l
alphabet_lookup = {letter:pos for letter,pos in zip(l, range(len(l)))}
and then
f = lambda letter: alphabet_lookup[letter.lower()]
is the desired function.
I suggest using a dictionary. It might be a large amount of code to do something relatively simple, but I find it easier to make sense of it this way (and if you are new to python it will help you learn). If you google python dictionaries you can learn lots more about them, but here is the basic concept:
Code for python version 3.X:
def alphabet_position(letter):
alphabet_pos = {'A':0, 'a':0, 'B':1, 'b':1}
pos = alphabet_pos[letter]
return pos
letter = input('Enter a letter: ')
print(alphabet_position(letter))
Code for python version 2.7:
def alphabet_position(letter):
alphabet_pos = {'A':0, 'a':0, 'B':1, 'b':1}
pos = alphabet_pos[letter]
return pos
letter = raw_input('Enter a letter: ')
print alphabet_position(letter)
If you run that, it will print 1 because it searches through the alpahbet_pos dictionary and finds the value that corresponds to the entry entitled 'B'. Notice that you can have multiple entries with the same value, so you can do uppercase and lowercase in the same dictionary. I only did letters A and B for the sake of time, so you can fill out the rest yourself ;)
I once had to enter every element on the periodic table and their corresponding atomic mass, and that took forever (felt much longer than it actually was).

Python: How to print a list with labels for each item within the list

I am doing a python project for my Intro to CSC class. We are given a .txt file that is basically 200,000 lines of single words. We have to read in the file line by line, and count how many times each letter in the alphabet appears as the first letter of a word. I have the count figured out and stored in a list. But now I need to print it in the format
"a:10,898 b:9,950 c:17,045 d:10,596 e:8,735
f:11,257 .... "
Another aspect is that it has to print 5 of the letter counts per line, as I did above.
This is what I am working with so far...
def main():
file_name = open('dictionary.txt', 'r').readlines()
counter = 0
totals = [0]*26
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for i in file_name:
for n in range(0,26):
if i.startswith(alphabet[n]):
totals[n] = totals[n]+1
print(totals)
main()
This code currently outputs
[10898, 9950, 17045, 10675, 7421, 7138, 5998, 6619, 6619, 7128, 1505, 1948, 5393, 10264, 4688, 6079, 15418, 890, 10790, 20542, 9463, 5615, 2924, 3911, 142, 658]
I would highly recommend using a dictionary to store the counts. It will greatly simplify your code, and make it much faster. I'll leave that as an exercise for you since this is clearly homework. (other hint: Counter is even better). In addition, right now your code is only correct for lowercase letters, not uppercase ones. You need to add additional logic to either treat uppercase letters as lowercase ones, or treat them independently. Right now you just ignore them.
Having said that, the following will get it done for your current format:
print(', '.join('{}:{}'.format(letter, count) for letter, count in zip(alphabet, total)))
zip takes n lists and generates a new list of tuples with n elements, with each element coming from one of the input lists. join concatenates a list of strings together using the supplied separator. And format does string interpolation to fill in values in a string with the provided ones using format specifiers.
python 3.4
the solution is to read the line of the file into words variable below in cycle and use Counter
from collections import Counter
import string
words = 'this is a test of functionality'
result = Counter(map(lambda x: x[0], words.split(' ')))
words = 'and this is also very cool'
result = result + Counter(map(lambda x: x[0], words.split(' ')))
counters = ['{letter}:{value}'.format(letter=x, value=result.get(x, 0)) for x in string.ascii_lowercase]
if you print counters:
['a:3', 'b:0', 'c:1', 'd:0', 'e:0', 'f:1', 'g:0', 'h:0', 'i:2', 'j:0', 'k:0', 'l:0', 'm:0', 'n:0', 'o:1', 'p:0', 'q:0', 'r:0', 's:0', 't:3', 'u:0', 'v:1', 'w:0', 'x:0', 'y:0', 'z:0']

Comparing and printing elements in nested loops

The program identifies if one of the elements in the string word is a consonant by looping though the word string, and then for each iteration though the word string, iterating though the consonants list and comparing if the current element in word string equals to the current element of consonant list.
If yes, the current element of the word string is a consonant and the consonant gets printed (not the index of the consonant, but the actual consonant, for e.g. "d".)
The problem is, I get this instead:
1
1
What am I doing wrong? Shouldn't the nested loops work so that the below loop iterates every element for each element in the above loop? That is, each index above makes the below loop iterate though each index?
That's the program:
word = "Hello"
consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'z']
for character in range(len(word)):
for char in range(len(consonants)):
if consonants[char] == word[character]:
consonant = word[character]
print consonant
You are misreading the output. The character is the letter L lowercase, not the number 1.
In other words, your code is working as designed. The captital letter H is not in your consonants list, but the two lowercase letters l in Hello are.
Note that it'd be much more efficient to use a set for consonants here; you'd not have to loop over that whole list and just use in to test for membership. That works with lists too, but is much more efficient with a set. If you lowercase the word value you'd also be able to match the H.
Last but not least, you can loop over the word string directly rather than use range(len(word)) then use the generated index:
word = "Hello"
consonants = set('bcdfghjklmnpqrstvwxz')
for character in word.lower():
if character in consonants:
print character
Demo:
>>> word = "Hello"
>>> consonants = set('bcdfghjklmnpqrstvwxz')
>>> for character in word.lower():
... if character in consonants:
... print character
...
h
l
l

Categories

Resources