Python - Printing words by length - python

I have a task where I have to print words in a sentence out by their length.
For example:
Sentence: I like programming in python because it is very fun and simple.
>>> I
>>> in it is
>>> fun and
>>> like very
>>> python simple
>>> because
And if there is no repetitions:
Sentence: Nothing repeated here
>>> here
>>> Nothing
>>> repeated
So far I have got this so far:
wordsSorted = sorted(sentence, key=len)
That sorts the words by their length, but I dont know how to get the correct output from the sorted words. Any help appreciated. I also understand that dictionaries are needed, but Im not sure.
Thanks in advance.

First sort the words based on length and then group them using itertools.groupby again on length:
>>> from itertools import groupby
>>> s = 'I like programming in python because it is very fun and simple'
>>> for _, g in groupby(sorted(s.split(), key=len), key=len):
print ' '.join(g)
...
I
in it is
fun and
like very
python simple
because
programming
You can also do it using a dict:
>>> d = {}
>>> for word in s.split():
d.setdefault(len(word), []).append(word)
...
Now d contains:
>>> d
{1: ['I'], 2: ['in', 'it', 'is'], 3: ['fun', 'and'], 4: ['like', 'very'], 6: ['python', 'simple'], 7: ['because'], 11: ['programming']}
Now we need to iterate over sorted keys and fetch the related value:
>>> for _, v in sorted(d.items()):
print ' '.join(v)
...
I
in it is
fun and
like very
python simple
because
programming
If you want to ignore punctuation then you can strip them using str.strip with string.punctuation:
>>> from string import punctuation
>>> s = 'I like programming in python. Because it is very fun and simple.'
>>> sorted((word.strip(punctuation) for word in s.split()), key=len)
['I', 'in', 'it', 'is', 'fun', 'and', 'like', 'very', 'python', 'simple', 'Because', 'programming']

This can be done using a defaultdict (or a regular dict) in O(N) time. sort+groupby is O(N log N)
words = "I like programming in python because it is very fun and simple".split()
from collections import defaultdict
D = defaultdict(list)
for w in words:
D[len(w)].append(w)
for k in sorted(D):
print " ".join(d[k])
I
in it is
fun and
like very
python simple
because
programming

try this:
str='I like programming in python because it is very fun and simple'
l=str.split(' ')
sorted(l,key=len)
it will return
['I', 'in', 'it', 'is', 'fun', 'and', 'like', 'very', 'python', 'simple', 'because', 'programming']

Using dictionary simplifies it
input = "I like programming in python because it is very fun and simple."
output_dict = {}
for word in input.split(" "):
if not word[-1].isalnum():
word = word[:-1]
if len(word) not in output_dict:
output_dict[len(word)] = []
output_dict[len(word)].append(word)
for key in sorted(output_dict.keys()):
print " ".join(output_dict[key])
This actually removes the comma, semicolon or full stop in a sentence.

Related

padding a list of lists to make it equal to the size of the largest list [duplicate]

This question already has answers here:
Some built-in to pad a list in python
(14 answers)
Finding length of the longest list in an irregular list of lists
(10 answers)
Closed 6 months ago.
I have a list of lists of sentences and I want to pad all sentences so that they are of the same length.
I was able to do this but I am trying to find most optimal ways to do things and challenge myself.
max_length = max(len(sent) for sent in sents)
list_length = len(sents)
sents_padded = [[pad_token for i in range(max_length)] for j in range(list_length)]
for i,sent in enumerate(sents):
sents_padded[i][0:len(sent)] = sent
and I used the inputs:
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
Is my method an efficient way to do it or there are better ways to do it?
This is provided in itertools (in python3) for iteration, with zip_longest, which you can just invert normally with zip(*), and pass it to list if you prefer that over an iterator.
import itertools
from pprint import pprint
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
padded = zip(*itertools.zip_longest(*sents, fillvalue=pad_token))
pprint (list(padded))
[['Hello', 'World', 'Hi', 'Hi'],
['Where', 'are', 'you', 'Hi'],
['I', 'am', 'doing', 'fine']]
Here is how you can use str.ljust() to pad each string, and use max() with a key of len to find the number in which to pad each string:
lst = ['Hello World', 'Good day!', 'How are you?']
l = len(max(lst, key=len)) # The length of the longest sentence
lst = [s.ljust(l) for s in lst] # Pad each sentence with l
print(lst)
Output:
['Hello World ',
'Good day! ',
'How are you?']
Assumption:
The output should be the same as OP output (i.e. same number of words in each sublist).
Inputs:
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
Following 1-liner produces the same output as OP code.
sents_padded = [sent + [pad_token]*(max_length - len(sent)) for sent in sents]
print(sents_padded)
# [['Hello', 'World', 'Hi', 'Hi'], ['Where', 'are', 'you', 'Hi'], ['I', 'am', 'doing', 'fine']]
This seemed to be faster when I timed it:
maxi = 0
for sent in sents:
if sent.__len__() > maxi:
maxi = sent.__len__()
for sent in sents:
while sent.__len__() < maxi:
sent.append(pad_token)
print(sents)

How to sort unique words in order of appearance?

restart = True
while restart == True:
option = input("Would you like to compress or decompress this file?\nIf you would like to compress type c \nIf you would like to decompress type d.\n").lower()
if option == 'c':
text = input("Please type the text you would like to compress.\n")
text = text.split()
for count,word in enumerate(text):
if text.count(word) < 2:
order.append (max(order)+1)
else:
order.append (text.index(word)+1)
print (uniqueWords)
print (order)
break
elif option == 'd':
pass
else:
print("Sorry that was not an option")
For part of my assignment I need to identify unique words and send them to a text file. I understand how to write text to a text file I do not understand how I can order this code appropriately so it reproduces in a text file (if I was to input "the world of the flowers is a small world to be in":
the,world,of,flowers,is,a,small,to,be,in
1, 2, 3, 1, 5, 6, 7, 8, 2, 9, 10
The top line stating the unique words and the second line showing the order of the words in order to be later decompressed. I have no issue with the decompression or the sorting of the numbers but only the unique words being in order.
Any assistance would be much appreciated!
text = "the world of the flowers is a small world to be in"
words = text.split()
unique_ordered = []
for word in words:
if word not in unique_ordered:
unique_ordered.append(word)
from collections import OrderedDict
text = "the world of the flowers is a small world to be in"
words = text.split()
print list(OrderedDict.fromkeys(words))
output
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']
That's an interesting problem, in fact it can be solved using a dictionary to keep the index of the first occurence and to check if it was already encountered:
string = "the world of the flowers is a small world to be in"
dct = {}
words = []
indices = []
idx = 1
for substring in string.split():
# Check if you've seen it already.
if substring in dct:
# Already seen it, so append the index of the first occurence
indices.append(dct[substring])
else:
# Add it to the dictionary with the index and just append the word and index
dct[substring] = idx
words.append(substring)
indices.append(idx)
idx += 1
>>> print(words)
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']
>>> print(indices)
[1, 2, 3, 1, 4, 5, 6, 7, 2, 8, 9, 10]
If you don't want the indices there are also some external modules that have such a function to get the unique words in order of appearance:
>>> from iteration_utilities import unique_everseen
>>> list(unique_everseen(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']
>>> from more_itertools import unique_everseen
>>> list(unique_everseen(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']
>>> from toolz import unique
>>> list(unique(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']
To remove the duplicate entries from list whilst preserving the order, you my check How do you remove duplicates from a list in whilst preserving order?'s answers. For example:
my_sentence = "the world of the flowers is a small world to be in"
wordlist = my_sentence.split()
# Accepted approach in linked post
def get_ordered_unique(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if not (x in seen or seen_add(x))]
unique_list = get_ordered_unique(wordlist)
# where `unique_list` holds:
# ['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']
Then in order to print the position of word, you may list.index() with list comprehension expression as:
>>> [unique_list.index(word)+1 for word in wordlist]
[1, 2, 3, 1, 4, 5, 6, 7, 2, 8, 9, 10]

Translate both ways, print word if not in keys or values

I am having trouble with this one have looked up much possible solutions and can't seem to find the right one, my trouble here is I can't get the program to print the word typed in the input if the word isn't a key or value using Python 2.7
Tuc={"i":["o"],"love":["wau"],"you":["uo"],"me":["ye"],"my":["yem"],"mine":["yeme"],"are":["sia"]}
while True:
#Translates English to Tuccin and visa versa
translation = str(raw_input("Enter content for translation.\n").lower())
#this is for translating full phrases, both ways.
input_list = translation.split()
for word in input_list:
#English to Tuccin
if word in Tuc and word not in v:
print ("".join(Tuc[word]))
#Tuccin to English
for k, v in Tuc.iteritems():
if word in v and word not in Tuc:
print k
You can create a set of your keys and values with a set comprehension then check for intersection :
>>> set_list={k[0] if isinstance(k,list) else k for it in Tuc.items() for k in it}
set(['me', 'love', 'i', 'ye', 'mine', 'o', 'sia', 'yeme', 'are', 'uo', 'yem', 'wau', 'my', 'you'])
if set_list.intersection(input_list):
#do stuff
lets do this in simple way....create two dict for Tuccin to English and English to Tuccin translations.
In [28]: Tuc_1 = {k:Tuc[k][0] for k in Tuc} # this dict will help in translation from English to Tuccin
In [29]: Tuc_1
Out[29]:
{'are': 'sia',
'i': 'o',
'love': 'wau',
'me': 'ye',
'mine': 'yeme',
'my': 'yem',
'you': 'uo'}
In [30]: Tuc_2 = {Tuc[k][0]:k for k in Tuc} # this dict will help in translation from Tuccin to English
In [31]: Tuc_2
Out[31]:
{'o': 'i',
'sia': 'are',
'uo': 'you',
'wau': 'love',
'ye': 'me',
'yem': 'my',
'yeme': 'mine'}
example usage:
In [53]: translation = "I love You"
In [54]: input_list = translation.split()
In [55]: print " ".join(Tuc_1.get(x.lower()) for x in input_list if x.lower() in Tuc_1)
o wau uo
In [56]: print " ".join(Tuc_2.get(x.lower()) for x in input_list if x.lower() in Tuc_2)
In [57]: translation = "O wau uo"
In [58]: input_list = translation.split()
In [59]: print " ".join(Tuc_1.get(x.lower()) for x in input_list if x.lower() in Tuc_1)
In [60]: print " ".join(Tuc_2.get(x.lower()) for x in input_list if x.lower() in Tuc_2)
i love you
You can use the following lambda for finding translations. If the word does not exist, it will return empty.
find_translation = lambda w: [(k, v) for k, v in Tuc.items() if w==k or w in v]
Usage = find_translation(translation)
>>> find_translation('i')
[('i', ['o'])]
Edit:
Modifying the result to convert whole string.
Since you mentioned that you want to convert list of words let's take the same lambda and use it for multiple words.
line = 'i me you rubbish' # Only first three words will return something
# Let's change the lambda to either return something from Tuc or the same word back
find_translation = lambda w: ([v[0] for k, v in Tuc.items() if w==k or w in v] or [w])[0]
# Split the words and keep using find_transaction to either get a conversion or to return the same word
results_splits = [find_translation(part) for part in line.split()]
You will get the following results:
['o', 'ye', 'uo', 'rubbish']
You can put the string back together by joining results_splits
' '.join(results_splits)
And you get the translation back
'o ye uo rubbish'

Turn a list, that is in another list, into a string, then reverse the string

I'm new to programming in Python (and programming in general) and we were asked to develop a function to encrypt a string by rearranging the text. We were given this as a test:
encrypt('THE PRICE OF FREEDOM IS ETERNAL VIGILENCE', 5)
'SI MODEERF FO ECIRP EHT ECNELIGIV LANRETE'
We have to make sure it works for any string of any length though. I got as far as this before getting stuck:
##Define encrypt
def encrypt(text, encrypt_value):
##Split string into list
text_list = text.split()
##group text_list according to encrypt_value
split_list = [text_list[index:index+encrypt_value] for index in xrange\
(0, len(text_list), encrypt_value)]
If I printed the result now, this would give me:
encrypt("I got a jar of dirt and you don't HA", 3)
[['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
So I need to combine each of the lists in the list into a string (which I think is ' '.join(text)?), reverse it with [::-1], before joining the whole thing together into one string. But how in the world do I do that?
To combine your elements, you can try to using reduce:
l = [['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
str = reduce(lambda prev,cur: prev+' '+reduce(lambda subprev,word: subprev+' '+word,cur, ''), l, '')
It will result in:
" I got a jar of dirt and you don't HA"
If you want to remove extra spaces:
str.replace(' ',' ').strip()
This reduce use can be easily modified to reverse each sublist right before combining their elements:
str = reduce(lambda prev,cur: prev+' '+reduce(lambda subprev,word: subprev+' '+word,cur[::-1], ''), l, '')
Or to reverse the combined substrings just before joining all together:
str = reduce(lambda prev,cur: prev+' '+reduce(lambda subprev,word: subprev+' '+word,cur, '')[::-1], l, '')
You can do what you're looking for fairly simply with a few nested list comprehensions.
For example, you already have
split_list = [['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
What you want now is to reverse each triplet of words with a list comprehension, e.g. like so:
reversed_sublists = [sublist[::-1] for sublist in split_list]
// [['a', 'got', 'I'], ['dirt', 'of', 'jar'], ["don't", 'you', 'and'], ['HA']]
Then reverse each string in each sublist
reversed_strings = [[substr[::-1] for substr in sublist] for sublist in split_list]
// [['a', 'tog', 'I'], ['trid', 'fo', 'raj'], ["t'nod", 'uoy', 'dna'], ['AH']]
And then join them all up, as you said, with ' '.join(), e.g.
' '.join([' '.join(sublist) for sublist in reversed_strings])
// "a tog I trid fo raj t'nod uoy dna AH"
But nothing says you can't just do all those things at the same time with some nesting:
' '.join([' '.join([substring[::-1] for substring in sublist[::-1]]) for sublist in split_list])
// "a tog I trid fo raj t'nod uoy dna AH"
I personally prefer the aesthetic of this (and the fact you don't need to go back to strip spaces), but I'm not sure whether it performs better than Pablo's solution.
b = [['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
print "".join([j[::-1]+' ' for i in b for j in reversed(i)])
a tog I trid fo raj t'nod uoy dna AH
Is this what you wanted...
Is there any reason you are trying to do it in one list comprehension?
It's probably easier to conceptualize (and implement) by breaking it down into parts:
def encrypt(text, encrypt_value):
reversed_words = [w[::-1] for w in text.split()]
rearranged_words = reversed_words[encrypt_value:] + reversed_words[:encrypt_value]
return ' '.join(rearranged_words[::-1])
Example output:
In [6]: encrypt('THE PRICE OF FREEDOM IS ETERNAL VIGILENCE', 5)
Out[6]: 'SI MODEERF FO ECIRP EHT ECNELIGIV LANRETE'

Can python regex negate a list of words?

I have to match all the alphanumeric words from a text.
>>> import re
>>> text = "hello world!! how are you?"
>>> final_list = re.findall(r"[a-zA-Z0-9]+", text)
>>> final_list
['hello', 'world', 'how', 'are', 'you']
>>>
This is fine, but further I have few words to negate i.e. the words that shouldn't be in my final list.
>>> negate_words = ['world', 'other', 'words']
A bad way to do it
>>> negate_str = '|'.join(negate_words)
>>> filter(lambda x: not re.match(negate_str, x), final_list)
['hello', 'how', 'are', 'you']
But i can save a loop if my very first regex-pattern can be changed to consider negation of those words. I found negation of characters but i have words to negate, also i found regex-lookbehind in other questions, but that doesn't help either.
Can it be done using python re?
Update
My text can span a few hundered lines. Also, list of negate_words can be lengthy too.
Considering this, is using regex for such task, correct in the first place?? Any suggestions??
I don't think there is a clean way to do this using regular expressions. The closest I could find was bit ugly and not exactly what you wanted:
>>> re.findall(r"\b(?:world|other|words)|([a-zA-Z0-9]+)\b", text)
['hello', '', 'how', 'are', 'you']
Why not use Python's sets instead. They are very fast:
>>> list(set(final_list) - set(negate_words))
['hello', 'how', 'are', 'you']
If order is important, see the reply from #glglgl below. His list comprehension version is very readable. Here's a fast but less readable equivalent using itertools:
>>> negate_words_set = set(negate_words)
>>> list(itertools.ifilterfalse(negate_words_set.__contains__, final_list))
['hello', 'how', 'are', 'you']
Another alternative is the build-up the word list in a single pass using re.finditer:
>>> result = []
>>> negate_words_set = set(negate_words)
>>> result = []
>>> for mo in re.finditer(r"[a-zA-Z0-9]+", text):
word = mo.group()
if word not in negate_words_set:
result.append(word)
>>> result
['hello', 'how', 'are', 'you']
Maybe is worth trying pyparsing for this:
>>> from pyparsing import *
>>> negate_words = ['world', 'other', 'words']
>>> parser = OneOrMore(Suppress(oneOf(negate_words)) ^ Word(alphanums)).ignore(CharsNotIn(alphanums))
>>> parser.parseString('hello world!! how are you?').asList()
['hello', 'how', 'are', 'you']
Note that oneOf(negate_words) must be before Word(alphanums) to make sure that it matches earlier.
Edit: Just for the fun of it, I repeated the exercise using lepl (also an interesting parsing library)
>>> from lepl import *
>>> negate_words = ['world', 'other', 'words']
>>> parser = OneOrMore(~Or(*negate_words) | Word(Letter() | Digit()) | ~Any())
>>> parser.parse('hello world!! how are you?')
['hello', 'how', 'are', 'you']
Don't ask uselessly too much to regex.
Instead, think to generators.
import re
unwanted = ('world', 'other', 'words')
text = "hello world!! how are you?"
gen = (m.group() for m in re.finditer("[a-zA-Z0-9]+",text))
li = [ w for w in gen if w not in unwanted ]
And a generator can be created instead of li, also

Categories

Resources