Take in a file and shuffle all the middle letters in between - python

I need to take a file and shuffle the middles letters of each word but I can't shuffle the first and last letters, and I only shuffle words longer then 3 characters. I think I can figure out a way to shuffle them if I can put each word into their own separate list where all the letters are separated. Any help would be appreciated. Thanks.

text = "Take in a file and shuffle all the middle letters in between"
words = text.split()
def shuffle(word):
# get your word as a list
word = list(word)
# perform the shuffle operation
# return the list as a string
word = ''.join(word)
return word
for word in words:
if len(word) > 3:
print word[0] + ' ' + shuffle(word[1:-1]) + ' ' + word[-1]
else:
print word
The shuffle algorithm is intentionally not implemented.

Look at random.shuffle. It shuffles a list object in place which seems to be what youre aiming for. You can do something like this for shuffling the letters around
`
def scramble(word):
output = list(word[1:-1])
random.shuffle(output)
output.append(word[-1])
return word[0] + "".join(output)`
Just remember to import random

#with open("words.txt",'w') as f:
# f.write("one two three four five\nsix seven eight nine")
def get_words(f):
for line in f:
for word in line.split():
yield word
import random
def shuffle_word(word):
if len(word)>3:
word=list(word)
middle=word[1:-1]
random.shuffle(middle)
word[1:-1]=middle
word="".join(word)
return word
with open("words.txt") as f:
#print list(get_words(f))
#['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine']
#print map(shuffle_word,get_words(f))
#['one', 'two', 'trhee', 'four', 'fvie', 'six', 'sveen', 'eihgt', 'nnie']
import tempfile
with tempfile.NamedTemporaryFile(delete=False) as tmp:
tmp.write(" ".join(map(shuffle_word,get_words(f))))
fname=tmp.name
import shutil
shutil.move(fname,"words.txt")

Related

Python count words of split sentence?

Not sure how to remove the "\n" thing at the end of output
Basically, i have this txt file with sentences such as:
"What does Bessie say I have done?" I asked.
"Jane, I don't like cavillers or questioners; besides, there is something truly forbidding in a child
taking up her elders in that manner.
Be seated somewhere; and until you can speak pleasantly, remain silent."
I managed to split the sentences by semicolon with code:
import re
with open("testing.txt") as file:
read_file = file.readlines()
for i, word in enumerate(read_file):
low = word.lower()
re.split(';',low)
But not sure how to count the words of the split sentences as len() doesn't work:
The output of the sentences:
['"what does bessie say i have done?" i asked.\n']
['"jane, i don\'t like cavillers or questioners', ' besides, there is something truly forbidding in a
child taking up her elders in that manner.\n']
['be seated somewhere', ' and until you can speak pleasantly, remain silent."\n']
The third sentence for example, i am trying to count the 3 words at left and 8 words at right.
Thanks for reading!
The number of words is the number of spaces plus one:
e.g.
Two spaces, three words:
World is wonderful
Code:
import re
import string
lines = []
with open('file.txt', 'r') as f:
lines = f.readlines()
DELIMETER = ';'
word_count = []
for i, sentence in enumerate(lines):
# Remove empty sentance
if not sentence.strip():
continue
# Remove punctuation besides our delimiter ';'
sentence = sentence.translate(str.maketrans('', '', string.punctuation.replace(DELIMETER, '')))
# Split by our delimeter
splitted = re.split(DELIMETER, sentence)
# The number of words is the number of spaces plus one
word_count.append([1 + x.strip().count(' ') for x in splitted])
# [[9], [7, 9], [7], [3, 8]]
print(word_count)
Use str.rstrip('\n') to remove the \n at the end of each sentence.
To count the words in a sentence, you can use len(sentence.split(' '))
To transform a list of sentences into a list of counts, you can use the map function.
So here it is:
import re
with open("testing.txt") as file:
for i, line in enumerate(file.readlines()):
# Ignore empty lines
if line.strip(' ') != '\n':
line = line.lower()
# Split by semicolons
parts = re.split(';', line)
print("SENTENCES:", parts)
counts = list(map(lambda part: len(part.split()), parts))
print("COUNTS:", counts)
Outputs
SENTENCES: ['"what does bessie say i have done?" i asked.']
COUNTS: [9]
SENTENCES: ['"jane, i don\'t like cavillers or questioners', ' besides, there is something truly forbidding in a child ']
COUNTS: [7, 9]
SENTENCES: [' taking up her elders in that manner.']
COUNTS: [7]
SENTENCES: ['be seated somewhere', ' and until you can speak pleasantly, remain silent."']
COUNTS: [3, 8]
You'll need the library nltk
from nltk import sent_tokenize, word_tokenize
mytext = """I have a dog.
The dog is called Bob."""
for sent in sent_tokenize(mytext):
print(len(word_tokenize(sent)))
Output
5
6
Step by step explanation:
for sent in sent_tokenize(mytext):
print('Sentence >>>',sent)
print('List of words >>>',word_tokenize(sent))
print('Count words per sentence>>>', len(word_tokenize(sent)))
Output:
Sentence >>> I have a dog.
List of words >>> ['I', 'have', 'a', 'dog', '.']
Count words per sentence>>> 5
Sentence >>> The dog is called Bob.
List of words >>> ['The', 'dog', 'is', 'called', 'Bob', '.']
Count words per sentence>>> 6
`
import re
sentences = [] #empty list for storing result
with open('testtext.txt') as fileObj:
lines = [line.strip() for line in fileObj if line.strip()] #makin list of lines allready striped from '\n's
for line in lines:
sentences += re.split(';', line) #spliting lines by ';' and store result in sentences
for sentence in sentences:
print(sentence +' ' + str(len(sentence.split()))) #out
try this one:
import re
with open("testing.txt") as file:
read_file = file.readlines()
for i, word in enumerate(read_file):
low = word.lower()
low = low.strip()
low = low.replace('\n', '')
re.split(';',low)

Comparing lists with text files

I have the following list: t = ['one', 'two', 'three']
I want to read a file and add a point for every word that exists in the list. E.g. if "one" and "two" exists in "CV.txt", points = 2. If all of them exist, then points = 3.
import nltk
from nltk import word_tokenize
t = ['one', 'two', 'three']
CV = open("cv.txt","r").read().lower()
points = 0
for words in t:
if words in CV:
#print(words)
words = nltk.word_tokenize(words)
print(words)
li = len(words)
print(li)
points = li
print(points)
Assuming 'CV.txt' contains the words "one" and "two", and it is split by words (tokenized), 2 points should be added to the variable "points"
However, this code returns:
['one']
1
1
['two']
1
1
As you can see, the length is only 1, but it should be 2. I'm sure there's a more efficient way to to this with iterating loops or something rather than len.
Any help with this would be appreciated.
I don't think you need to tokenize within loop, so may be easier way to do it would be as following:
First tokenize the words in txt file
Check each word that is common
in t
And finally the points would be number of words in common_words.
import nltk
from nltk import word_tokenize
t = ['one', 'two', 'three']
CV = open("untitled.txt","r").read().lower()
points = 0
words = nltk.word_tokenize(CV)
common_words = [word for word in words if word in t]
points = len(common_words)
Note: if you want to avoid duplicates then, you need set of common words as following in above code:
common_words = set(word for word in words if word in t)

How to get my definite loop to print one per line

I'm trying to process a list of words and return a new list
containing only unique word. My definite loop works, however it will only print the words all together, instead of one per line. Can anyone help me out? This is probably a simple question but I am very new to Python. Thank you!
uniqueWords = [ ]
for word in allWords:
if word not in uniqueWords:
uniqueWords.append(word)
else:
uniqueWords.remove(word)
return uniqueWords
You can use str.join:
>>> all_words = ['two', 'two', 'one', 'uno']
>>> print('\n'.join(get_unique_words(all_words)))
one
uno
Or plain for loop:
>>> for word in get_unique_words(all_words):
... print(word)
...
one
uno
However, your method won't work for odd counts:
>>> get_unique_words(['three', 'three', 'three'])
['three']
If your goal is to get all words that appear exactly once, here's a shorter method that works using collections.Counter:
from collections import Counter
def get_unique_words(all_words):
return [word for word, count in Counter(all_words).items() if count == 1]
This code may help, it prints unique words line by line, is what I understood in your question:
allWords = ['hola', 'hello', 'distance', 'hello', 'hola', 'yes']
uniqueWords = [ ]
for word in allWords:
if word not in uniqueWords:
uniqueWords.append(word)
else:
uniqueWords.remove(word)
for i in uniqueWords:
print i
If the order of the words is not important I recommend you to create a set to store the unique words:
uniqueWords = set(allWords)
As you can see running the code below, it can be much faster, but it may depend on the original list of words:
import timeit
setup="""
word_list = [str(x) for x in range(1000, 2000)]
allWords = []
for word in word_list:
allWords.append(word)
allWords.append(word)
"""
smt1 = "unique = set(allWords)"
smt2 = """
uniqueWords = [ ]
for word in allWords:
if word not in uniqueWords:
uniqueWords.append(word)
else:
uniqueWords.remove(word)
"""
print("SET:", timeit.timeit(smt1, setup, number=1000))
print("LOOP:", timeit.timeit(smt2, setup, number=1000))
OUTPUT:
SET: 0.03147706200002176
LOOP: 0.12346845000001849
maybe this fits your idea:
allWords=['hola', 'hello', 'distance', 'hello', 'hola', 'yes']
uniqueWords=dict()
for word in allWords:
if word not in uniqueWords:
uniqueWords.update({word:1})
else:
uniqueWords[word]+=1
for k, v in uniqueWords.items():
if v==1:
print(k)
Prints:
distance
yes

best way count the number of matches between the list and the string in python

What is the best way to count the number of matches between the list and the string in python??
for example if I have this list:
list = ['one', 'two', 'three']
and this string:
line = "some one long. two phrase three and one again"
I want to get 4 because I have
one 2 times
two 1 time
three 1 time
I try below code based on this question answers and it's worked but I got error if I add many many words (4000 words) to list:
import re
word_list = ['one', 'two', 'three']
line = "some one long. two phrase three and one again"
words_re = re.compile("|".join(word_list))
print(len(words_re.findall(line)))
This is my error:
words_re = re.compile("|".join(word_list))
File "/usr/lib/python2.7/re.py", line 190, in compile
If you want case insensitive and to match whole words ignoring punctuation, split the string and strip the punctuation using a dict to store the words you want to count:
lst = ['one', 'two', 'three']
from string import punctuation
cn = dict.fromkeys(lst, 0)
line = "some one long. two phrase three and one again"
for word in line.lower().split():
word = word.strip(punctuation)
if word in cn:
cn[word] += 1
print(cn)
{'three': 1, 'two': 1, 'one': 2}
If you just want the sum use a set with the same logic:
from string import punctuation
st = {'one', 'two', 'three'}
line = "some one long. two phrase three and one again"
print(sum(word.strip(punctuation) in st for word in line.lower().split()))
This does a single pass over the the words after they are split, the set lookup is 0(1) so it is substantially more efficient than list.count.

Python, While-True loop that gets an integer and returns a word with same length

I'm not sure how to begin this function. Conceptually I think it should look through a list of words, and until a word with the same length as the integer given is found, returns that word.
example for lenword(num)
def lenword(4):
wrdlst = [ 'i' , 'to', 'two', 'four']
while True:
#stuck here
#returns 'four'
please help!
def lenword(n):
wrdlst = [ 'i' , 'to', 'two', 'four']
# for every item in the list, if the length of that item is
# equal to n (in this case 4) then print the item, and return it.
for word in wrdlst:
if len(word) == n:
print word
return word
# call the function
four_letter_word = lenword(4)
Or if there's more than one four letter word in your list.
def lenword(n):
wrdlst = [ 'i' , 'to', 'two', 'four']
found_words = []
# for every item in the list, if the length of that item is
# equal to n (in this case 4) then print the item, and return it.
for word in wrdlst:
if len(word) == n:
found_words.append(word)
return found_words
# call the function
four_letter_words = lenword(4)
# print your words from the list
for item in four_letter_words:
print item
If your list going to be in order, such that the nth element has a length of n+1 then you can just:
wrdlist = [ 'i' , 'to', 'two', 'four']
def lenword(n):
return wrdlist[n-1]
lenword(4)
You probably want something like this:
def lenword(word_length):
wordlist = [ 'i' , 'to', 'two', 'four']
for word in wordlist:
if len(word) == word_length:
# found word with matching length
return word
# not found, returns None
Using closures to create specialized searcher for a specific list of words can make it nicer:
def create_searcher(words):
def searcher(word_length, words=words):
for word in words:
if len(word) == word_length:
# found word with matching length
return word
# not found, returns None
return searcher
And use it like this:
# create search function specialized for a list of words
words_searcher = create_searcher(['list', 'of', 'words'])
# use it
words_searcher(4) # returns 'list'
words_searcher(3) # returns None
words_searcher(2) # returns 'of'
def lenword(n, words):
for word in words:
if len(word) == n:
return word
>>> wrdlst = [ 'i' , 'to', 'two', 'four']
>>> lenword(4, wrdlst)
'four'
First, you function definition is off. When defining the function, you choose a name for the arguments, and then you would pass the value of 4 when you call the function.
def lenword(length):
wrdlst = ['i', 'to', 'two', 'for']
for word in wrdlst:
if len(word) == length:
return word
return None
And then you would call this function like:
print lenword(4)
In first place you have to define the list of words like you did. The next step is understand the concepts you need to learn, like length of a string and loops.
So it should look like this:
def lenword(word_length):
wrdlst = [ 'i' , 'to', 'two', 'four']
for word in wrdlist:
if len(word) == word_length:
return word
You could improve this by giving the string array as a parameter to the function.

Categories

Resources