python - interpreting two spaces as one from text file

python - interpreting two spaces as one from text file - python

I'm trying to make a program that translates morse code from a text file. In theory it should be pretty easy but the problem is that I find the formatting of the text file a bit silly (its school work so can't change that). What I meant by that is that in the file one space separates two characters (like this -. ---) but two spaces equal end of a word (so space in the translated text). Like this: .--. .-.. . .- ... . .... . .-.. .--. .-.-.-
This is what I have, but it gives me translated text without the spaces.
translator = {} #alphabet and the equivalent code, which I got from another file
message = []
translated = ("")
msg_file = open(msg.txt,"r")
for line in msg_file:
line = line.rstrip()
part = line.rsplit(" ")
message.extend(part)
for i in message:
if i in translator.keys():
translated += (translator[i])
print(translated)
I also dont know how to intercept the line change (\n).

Why don't you split on two spaces to get the words, then on space to get the characters? Something like:
translated = "" # store for the translated text
with open("msg.txt", "r") as f: # open your file for reading
for line in f: # read the file line by line
words = line.split(" ") # split by two spaces to get our words
parsed = [] # storage for our parsed words
for word in words: # iterate over the words
word = [] # we'll re-use this to store our translated characters
for char in word.split(" "): # get characters by splitting and iterate over them
word.append(translator.get(char, " ")) # translate the character
parsed.append("".join(word)) # join the characters and add the word to `parsed`
translated += " ".join(parsed) # join the parsed words and add them to `translated`
# uncomment if you want to add new line after each line of the file:
# translated += "\n"
print(translated) # print the translated string
# PLEASE HELP!
Of course, all this assuming your translator dict has proper mapping.

Split on double-space first to get a list of words in each line then you can split the words on a single space to get characters to feed your translator
translator = {} #alphabet and the equivalent code, which I got from another file
message = []
translated = ("")
with open('msg.txt',"r") as msg_file:
for line in msg_file:
line = line.strip()
words = line.split(' ')
line = []
for word in words:
characters = word.split()
word = []
for char in characters:
word.append(translator[char])
line.append(''.join(word))
message.append(' '.join(line))
print('\n'.join(message))

Related

Why are the last characters in a cell in the input file treated differently in the output?

I'm trying to create a pronunciation dictionary with python which takes the words and splits them up into individual letters. The data are from a csv file with just one column and 37 rows.
This is the code I'm using
master = io.open(sys.argv[1], 'r', encoding = 'UTF-8', errors = 'ignore')
words = set()
for line in master:
line = line.split(",")
for word in line[0].split(' '):
words.add(word)
def splitword(word):
output = []
for i in range(len(word)):
output.append(word[i])
return output
orthographic = {w: splitword(w) for w in words}
outfile = io.open('dictionary.txt', 'w', encoding='UTF-8')
for w in sorted(orthographic):
outfile.write(w + ' ' + ' '.join(orthographic[w]) + '\n')
What I want is to generate an output file with the word and the word segmented into letters next to it on the same line. For example
mast m a s t
But it seems to treat words at the end of a cell differently and in the output the word is on one line and the split up letters on the line below it. This only seems to happen for words at the end of a cell. Any way I can make it so that the words are on one line and the separated letters on the same line, regardless of the position of the word in the cell in my csv file?
Edit
I tried to fix this with the strip() command but the output was exactly the same. This is the edited portion of my code:
for line in master:
line = line.split(",")
line.append([c.strip() for c in line])
for word in line[0].split(' '):
words.add(word)

How to get first word from text file removing \n - python

If the text file is /n/n Hello world!/n I like python./n
How do I get the first word from that text?
I tried to code:
def word_file(file):
files = open(file, 'r')
l = files.readlines()
for i in range(len(l)):
a = l[i].rstrip("\n")
line = l[0]
word = line.strip().split(" ")[0]
return word
There is space in front Hello.
The result I get is NONE. How should I correct it?
Can anybody help?

Assuming there is a word in the file:
def word_file(f):
with open(f) as file:
return file.read().split()[0]
file.read reads the entire file as a string. Do a split with no parameters on that string (i.e. sep=None). Then according to the Python manual "runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace." So the splitting will be done on consecutive white space and there will be no empty strings returned as a result of the split. Therefore the first element of the returned list will be the first word in the file.
If there is a possibility that the file is empty or contains nothing but white space, then you would need to check the return value from file.read().split() to ensure it is not an empty list.
If you need to avoid having to read the entire file into memory at once, then the following, less terse code can be used:
def word_file(f):
with open(f) as file:
for line in file:
words = line.split()
if words:
return words[0]
return None # No words found

Edit: #Booboo answer is far better than my answer
This should work:
def word_file(file):
with open(file, 'r') as f:
for line in f:
for index, character in enumerate(line):
if not character.isspace():
line = line[index:]
for ind, ch in enumerate(line):
if ch.isspace():
return line[:ind]
return line # could not find whitespace character at end
return None # no words found
output:
Hello

Python - Calculating length of string is inaccurate for certain strings only

I'm new to programming and trying to make a basic hangman game. For some reason when calculating the length of a string from a text file some words have the length of the string calculated incorrectly. Some strings have values too high and some too low. I can't seem to figure out why. I have already ensured that there are no spaces in the text file so that the space is counted as a character.
import random
#chooses word from textfile for hangman
def choose_word():
words = []
with open("words.txt", "r") as file:
words = file.readlines()
#number of words in text file
num_words = sum(1 for line in open("words.txt"))
n = random.randint(1, num_words)
#chooses the selected word from the text file
chosen_word = (words[n-1])
print(chosen_word)
#calculates the length of the word
len_word = len(chosen_word)
print(len_word)
choose_word()
#obama returns 5
#snake, turtle, hoodie, all return 7
#intentions returns 11
#racecar returns 8
words.txt
snake
racecar
turtle
cowboy
intentions
hoodie
obama

Use strip().
string.strip(s[, chars])
Return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the both ends of the string this method is called on.
Example:
>>> ' Hello'.strip()
'Hello'
Try this:
import random
#chooses word from textfile for hangman
def choose_word():
words = []
with open("words.txt", "r") as file:
words = file.readlines()
#number of words in text file
num_words = sum(1 for line in open("words.txt"))
n = random.randint(1, num_words)
#chooses the selected word from the text file
chosen_word = (words[n-1].strip())
print(chosen_word)
#calculates the length of the word
len_word = len(chosen_word)
print(len_word)
choose_word()

You are reading a random line from a text file.
Probably you have spaces in some lines after the words in those lines.
For example, the word "snake" is written in the file as "snake ", so it has length of 7.
To solve it you can either:
A) Manually or by a script remove the spaces in the file
B) When you read a random line from the text, before you check the length of the word, write: chosen_word = chosen_word.replace(" ", "").
This will remove the spaces from your word.

You need to strip all spaces from each line. This removes the beginning and trailing spaces. Here is your corrected code.
import random
# chooses word from textfile for hangman
def choose_word():
words = []
with open("./words.txt", "r") as file:
words = file.readlines()
# number of words in text file
num_words = sum(1 for line in open("words.txt"))
n = random.randint(1, num_words)
# chooses the selected word from the text file
# Added strip() to remove spaces surrounding your words
chosen_word = (words[n-1]).strip()
print(chosen_word)
# calculates the length of the word
len_word = len(chosen_word)
print(len_word)
choose_word()

Im supposing that the .txt file contains one word per line and without commas.
Maybe try to change some things here:
First, notice that the readlines() method is returning a list with all the lines but that also includes the newline string "\n".
# This deletes the newline from each line
# strip() also matches new lines as Hampus Larsson suggested
words = [x.strip() for x in file.readlines()]
You can calculate the number of words from the length of the words list itself:
num_words = len(words)
You do not need parenthesis to get the random word
chosen_word = words[n]
It should now work correctly!

in the file everyword has an \n to symbolize a new line.
in order to cut that out you have to replace:
chosen_word = (words[n-1])
by
chosen_word = (words[n-1][:-1])
this will cut of the last two letters of the chosen word!

Python: how to find a keyword in a text file, save 60 characters to the left of that keyword, loop until end of text file

After defining two keywords, my goal is to:
read full contents of an unstructured text file (1000+ lines of text)
loop through contents, fetch 60 characters to the left of keyword each time it is hit
append each 60 character string in a separate line of a new text file
I have the code to read unstructured text file and write to the new text file.
I am having trouble creating code which will seek each keyword, fetch contents, then loop through end of file.
Very simply, here is what I have so far:
#read file, store in variable
content=open("demofile.txt", "r")
#seek "KW1" or "KW2", take 60 characters to the left, append to text file, loop
#open a text file, write variable contents, close file
file=open("output.txt","w")
file.writelines(content)
file.close()
I need help with the middle portion of this code. For example, if source text file says:
"some text, some text, some text, KEYWORD"
I would like to return:
"some text, some text, some text, "
In a new row for each keyword found.
Thank you.

result = []
# Open the file
with open('your_file') as f:
# Iterate through lines
for line in f.readlines():
# Find the start of the word
index = line.find('your_word')
# If the word is inside the line
if index != -1:
if index < 60:
result.append(line[:index])
else:
result.append(line[index-60:index])
After it you can write result to a file
If you have several words, you can modify your code like this:
words = ['waka1', 'waka2', 'waka3']
result = []
# Open the file
with open('your_file') as f:
# Iterate through lines
for line in f.readlines():
for word in words:
# Find the start of the word
index = line.find(word)
# If the word is inside the line
if index != -1:
if index < 60:
result.append(line[:index])
else:
result.append(line[index-60:index])

You could go for a regex based solution as well!
import re
# r before the string makes it a raw string so the \'s aren't used as escape chars.
# \b indicates a word border to regex. like a new line, space, tab, punctuation, etc...
kwords = [r"\bparameter\b", r"\bpointer\b", r"\bfunction\b"]
in_file = "YOUR_IN_FILE"
out_file = "YOUR_OUT_FILE"
patterns = [r"([\s\S]{{0,60}}?){}".format(i) for i in kwords]
# patterns is now a list of regex pattern strings which will match between 0-60
# characters (as many as possible) followed by a word boder, followed by your
# keyword, and finally followed by another word border. If you don't care about
# the word borders then remove both the \b from each string. The actual object
# matched will only be the 0-60 characters before your parameter and not the
# actual parameter itself.
# This WILL include newlines when trying to scan backwards 60 characters.
# If you DON'T want to include newlines, change the `[\s\S]` in patterns to `.`
with open(in_file, "r") as f:
data = f.read()
with open(out_file, "w") as f:
for pattern in patterns:
matches = re.findall(pattern, data)
# The above will find all occurences of your pattern and return a list of
# occurences, as strings.
matches = [i.replace("\n", " ") for i in matches]
# The above replaces any newlines we found with a space.
# Now we can print the messages for you to see
print("Matches for " + pattern + ":", end="\n\t")
for match in matches:
print(match, end="\n\t")
# and write them to a file
f.write(match + "\r\n")
print("\n")
Depending on the specifics of what you need captured, you should have enough information here to adapt it to your problem. Leave a comment if you have any questions about regex.

python file manipulation

I have a file with entries such as:
26 1
33 2
.
.
.
and another file with sentences in english
I have to write a script to print the 1st word in sentence number 26
and the 2nd word in sentence 33.
How do I do it?

The following code should do the task. With assumptions that files are not too large. You may have to do some modification to deal with edge cases (like double space, etc)
# Get numers from file
num = []
with open('1.txt') as file:
num = file.readlines()
# Get text from file
text = []
with open('2.txt') as file:
text = file.readlines()
# Parse text into words list.
data = []
for line in text: # For each paragraoh in the text
sentences = l.strip().split('.') # Split it into sentences
words = []
for sentence in sentences: # For each sentence in the text
words = sentence.split(' ') # Split it into words list
if len(words) > 0:
data.append(words)
# get desired result
for i = range(0, len(num)/2):
print data[num[i+1]][num[i]]

Here's a general sketch:
Read the first file into a list (a numeric entry in each element)
Read the second file into a list (a sentence in each element)
Iterate over the entry list, for each number find the sentence and print its relevant word
Now, if you show some effort of how you tried to implement this in Python, you will probably get more help.

The big issue is that you have to decide what separates "sentences". For example, is a '.' the end of a sentence? Or maybe part of an abbreviation, e.g. the one I've just used?-) Secondarily, and less difficult, what separates "words", e.g., is "TCP/IP" one word, or two?
Once you have sharply defined these rules, you can easily read the file of text into a a list of "sentences" each of which is a list of "words". Then, you read the other file as a sequence of pairs of numbers, and use them as indices into the overall list and inside the sublist thus identified. But the problem of sentence and word separation is really the hard part.

In the following code, I am assuming that sentences end with '. '. You can modify it easily to accommodate other sentence delimiters as well. Note that abbreviations will therefore be a source of bugs.
Also, I am going to assume that words are delimited by spaces.
sentences = []
queries = []
english = ""
for line in file2:
english += line
while english:
period = english.find('.')
sentences += english[: period+1].split()
english = english[period+1 :]
q=""
for line in file1:
q += " " + line.strip()
q = q.split()
for i in range(0, len(q)-1, 2):
sentence = q[i]
word = q[i+1]
queries.append((sentence, query))
for s, w in queries:
print sentences[s-1][w-1]
I haven't tested this, so please let me know (preferably with the case that broke it) if it doesn't work and I will look into bugs
Hope this helps

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python - interpreting two spaces as one from text file - python

Related

Why are the last characters in a cell in the input file treated differently in the output?

How to get first word from text file removing \n - python

Python - Calculating length of string is inaccurate for certain strings only

Python: how to find a keyword in a text file, save 60 characters to the left of that keyword, loop until end of text file

python file manipulation

Categories

Resources