How to find duplicate letters in Python using Hashtables - python

I was wondering how would you get the number of duplicate letters in a string and store the information into a hashtable? As in this order: {(DUPLICATE LETTER HERE) : (NUMBER OF DUPLICATES FOR THAT LETTER HERE)}
I have been having a lot of trouble with getting an answer and am looking for code for this specific function.
Thanks!

Related

How to account for amount of letters in a string [duplicate]

This question already has answers here:
How can I check if two strings are anagrams of each other?
(27 answers)
Closed 11 months ago.
I want to write a function that finds anagrams. Anagrams are words that are written with the same characters. For example, "abba" and "baba".
I have gotten as far as writing a function that can recognize if a certain string has the same letters as another string. However, I can't account for the number of repeated letters in the string.
How should I do this?
This is the code I have written so far:
def anagrams(word, words):
list1 = []
for i in words:
if set(word) == set(i):
list1.append(i)
return list1
The inputs look something like this:
('abba', ['aabb', 'abcd', 'bbaa', 'dada'])
I want to find an anagram for the first string, within the list.
You kind of point to the solution in your question. Your current problem is that using a set, you ignore the count of times each individual letter is contained in the input and target strings. So to fix this, start using these counts. For instance you can have a mapping between letter and the number of its occurrences in each of the two strings and then compare those mappings.
For the purposes of learning I would like to encourage you to use dict to solve the problem. Still after you know how to do that, there is a built-in container in collections, called Counter that can do the same for you.

Itertools compress

I am trying to extract all the words that are possible within a string as part of vocabulary game. Consider the string "driver". I would like to find all the English words that can be formed by using the available letters from left to right.
From “driver” we could extract drive, dive, river & die.
But we could not extract “rid” because is not all the letter appears in order from left to right.
For now I would be content of extracting all the letter combination disregarding whether or not it is a word.
I was considering using a loop to extract binary pattern
1=“r”
10=“e”
11=“re”
100=“v”
101=“vr”
110=“ve”
111=“ver”
1000=“i”
1001=”ir”
1010=”ie”
1011=”ier”
1100=”iv”
1101=”ivr”
1110=”ive”
1111=”iver”
…
111110=”drive”
Please help!
Thank-you
Simple maths suggests that the approach you have is the best possible approach there is.
Since index i can either be present or absent, hence the number of combinations will be 2^n (since we are not shuffling).

How can I join different segments of a list?

I'm having trouble in a school project because I don't know how to join elements of a list in segments. Here's an example: Let's say I have the following list:
list = ["T","h","i","s","I","s","A","L","i","s","t",]
How could I join this list so that the program outputs the following?:
Output: ["This","Is","A","List"]
Assuming list is your input, and without giving you the answer outright since it's a school project you should do yourself, here are some hints.
You'll want to check if a character is uppercase to know when the start of a word is. With python, you can use isupper() (ex: 'C'.isupper() would return True).
Python strings are iterable.
You can add a character to the end of a string using += (ex: myWord += 'a')
You can add a string to a list using append (ex: myList.append(myWord))
Remember this is a learning experience and there's no real value to being given the answer outright, if that's what you were hoping for. Best of luck and welcome to StackOverflow.
You can use regex for this
import re
list = ["T","h","i","s","I","s","A","L","i","s","t",]
sep=[s for s in re.split("([A-Z][^A-Z]*)", ''.join(list)) if s]
print(sep)

Identify 5 "forbidden" characters that result in *fewest* exclusions from word list

From "Think Python" - The author provides word filtering exercises (tasks are to include/exclude words from a list based on minimum length, characters required, or forbidden, etc.)
An extra question he includes: Can you find a combination of 5 forbidden letters that excludes the smallest number of words? (I found topics here and elsewhere generally related to above exercises, but not an algorithm/answer for this extra question.) Here's my start in working it out, and where I got stuck:
For each character in the word list identify the number of words it occupies
Build a dictionary with each key = to a given character; each key value = total number of words occupied by that character.
Sort by value to identify the 5 characters, in ascending order, that occupy the least number of words.
I'm a bit stuck at this point - because if characters jointly occur in some words in various combinations, that can reduce the total number of words they cause to get excluded from this list.
I wasn't sure how to follow that reasoning to 'abstract' the problem and figure out a general solution. Any pointers?
Your approach will find an upper bound for the set of forbidden characters. You can use sets and unions of sets to find out whether there is a set of characters that is better than your upper-bound set.
The following approach should work, but it will create large sets:
Create a dictionary with the 26 letters as keys and with an empty set as value. Read the words and add them to the sets for the letters that they contain.
Find the letters with the five smallest word sets. The sum of the set lengths for these letters is your upper bound. Filter all letters whose sets are larger than that upper bound out of the dictionary.
Now find the union of all combinations of five of the remaining letters and find the one whose union is smallest. You can do that recursively.

Going through a list of words and counting the words that are in alphabetical order

I have a text file with thousands of words in it. I have to count the number of words that are in alphabetical order. The following is cut out from a bunch of other code I've got:
Counter = 0
for word in wordStr:
word = word.strip()
if len(word) > 4:
a = 0
b = 1
while word[a] < word[b]:
a += 1
b += 1
Counter += 1
return Counter
There are some obvious things wrong here and I know it, but don't know how to fix it. My reasoning is this: if the first letter of a word is < the second letter of the word, that part of the word is alphabetical. So I need to go through and perform this kind of operation on a word until I find the entire word to be alphabetical or run into a situation where letter a is > letter b.
At the moment, my code increases the Counter when word[a] < word[b]. However, I need to change this so it only increases when the entire word is alphabetical, not just the first two letters. My other problem is that I get errors because eventually the while loop tries to compare string indexes that don't exist because of the way I am incrementing a and b. I know lots need to be rewritten and I have got the logic down.. just struggling to implement it.
EDIT: I forgot I have had this problem before and someone on my other question helped me solve it. Sorry for the confusion.
An easy way to see if a word is in alphabetical order is to sort it, then see if the sorted version is the same as the original version. Python has a function sorted() that can be use to sort a string; however, the result will come out as a list. So you'll need to convert the sorted version back to a string, or else convert the original string to a list (the second is a bit easier, just pass the string into list()), before comparing them.
You might also want to convert the string to lower case (or upper case -- doesn't matter as long as it's consistent) first because that will affect the sorting order: all capital letters come before lower case ones, so Cat would test as already being in alphabetical order even though it isn't. You can do this using the .lower() method on the string object.
Since this looks like homework I won't post working code but it should be very simple to put together from what I've given you.

Categories

Resources