I have a file file.md that I want to read and get it as a string.
Then I want to take that string and save it in another file, but as a string with quotes (and all). The reason is I want to transfer the content of my markdown file to a markdown string so that I can include it in html using the javascript marked library.
How can I do that using a python script?
Here's what I have tried so far:
with open('file.md', 'r') as md:
text=""
lines = md.readlines()
for line in lines:
line = "'" + line + "'" + '+'
text = text + line
with open('file.txt', 'w') as txt:
txt.write(text)
Input file.md
This is one line of markdown
This is another line of markdown
This is another one
Desired output: file.txt
"This is one line of markdown" +
"This is another line of markdown" +
(what should come here by the way to encode an empty line?)
"This is another one"
There are two things you need to pay attention here.
First is that you should not change your iterator line while it is running through lines. Instead, assign it to a new string variable (I call it new_line).
Second, if you add more characters at the end of each line, it will be placed after the end-of-line character and thus be moved into the next line when you write it to a new file. Instead, skip the last character of each line and add the line break manually.
If I understand you right, this should give you the wanted output:
with open('file.md', 'r') as md:
text = ""
lines = md.readlines()
for line in lines:
if line[-1] == "\n":
text += "'" + line[:-1] + "'+\n"
else:
text += "'" + line + "'+"
with open('file.txt', 'w') as txt:
txt.write(text)
Note how the last line is treated different than the others (no eol-char and no + sign).
text += ... adds more characters to the existing string.
This also works and might be a bit nicer, because it avoids the if-statement. You can remove the newline-character right at reading the content from file.md. In the end you skip the last two characters of your content, which is the + and the \n.
with open('file.md', 'r') as md:
text = ""
lines = [line.rstrip('\n') for line in md]
for line in lines:
text += "'" + line + "' +\n"
with open('file.txt', 'w') as txt:
txt.write(text[:-2])
...and with using a formatter:
text += "'{}' +\n".format(line)
...checking for empty lines as you asked in the comments:
for line in lines:
if line == '':
text += '\n'
else:
text += "'{}' +\n".format(line)
This works:
>>> a = '''This is one line of markdown
... This is another line of markdown
...
... This is another one'''
>>> lines = a.split('\n')
>>> lines = [ '"' + i + '" +' if len(i) else i for i in lines]
>>> lines[-1] = lines[-1][:-2] # drop the '+' at the end of the last line
>>> print '\n'.join( lines )
"This is one line of markdown" +
"This is another line of markdown" +
"This is another one"
You may add reading/writing to files yourself.
Related
I've written a script that appends a given character onto every line of a text file (I'm adding ',' to the end of IP addresses, one per row)
I want to prevent accidentally running the script multiple times and adding multiple of the same characters to the end of the script. i.e. adding one , is what I want, accidentally adding ten ,'s is annoying and I'll need to undo what I've done.
I'm trying to update the code to identify if the last character in a line is the same as the character that's trying to be added and if it is, not to add it.
This code adds char to the end of each line.
file = 'test.txt' # file to append text to, keep the ''
char = ','
newf=""
with open(file,'r') as f:
for line in f:
newf+=line.strip()+ char + '\n'
f.close()
with open(file,'w') as f:
f.write(newf)
f = open("test.txt", "r")
check = ","
And I've written this code to check what the last character per line is, it returns a ',' successfully for each line. What I can't figure out is how to combine if char and check are the same value, not to append anything.
f = open("test.txt", "r")
check = ","
for line in f:
l = line.strip()
if l[-1:].isascii():
check = l[-1:]
else:
check = 0
print(check)
f.close()
use the endswith() function to check if it already ends with ,.
check = ","
newf = ""
with open(file) as f:
for line in f:
line = line.strip()
if not line.endswith(check):
line += check
newf += line + "\n"
I have this project I am working on but need help. My main goal is to make the translated text file look the same as the original file with the exception of the translated words.
Here is what a snippet of the original file looks like:
Original Text File
Here is my python code:
# Step 1: Import the english.txt file
import json
english_text = open('/home/jovyan/english_to_lolspeak_fellow/english.txt', 'r')
text = english_text.readlines()
english_text.close()
# Step 2: Import the glossary (the tranzlashun.json file)
with open('/home/jovyan/english_to_lolspeak_fellow/tranzlashun.json') as translationFile:
data = json.load(translationFile)
# Step 3:Translate the English text into Lolspeak
translated_text= ''
for line in text:
for word in line.split():
if word in data:
translated_text += data[word.lower()]+" "
else:
translated_text += word.lower()+ " "
pass
# Step 4 :Save the translated text as the "lolcat.txt" file
with open('/home/jovyan/english_to_lolspeak_fellow/lolcat.txt', 'w') as lolcat_file:
lolcat_file.write(translated_text)
lolcat_file.close()
And lastly, here is what my output looks like:
Output Translated File
As you can see, I was able to translate the file but the original spacing is ignored. How do I change my code to keep the spacing as it was before?
You can keep the spaces by reading one line at a time.
with open('lolcat.txt', 'w') as fw, open('english.txt') as fp:
for line in fp:
for word in line.split():
line = line.replace(word, data.get(word.lower(), word))
fw.write(line)
I'd suggest combining steps 3 and 4 to translate each line and write the line and then \n to start the next line.
I haven't checked the following on a compiler so you might have to modify it to get it to work.
Note I changed the 'w' to 'a' so it appends instead of just writes and afaik using 'with' means the file will close so you don't need the explicit close().
for line in text:
translated_line = ""
for word in line.split():
if word in data:
translated_line += data[word.lower()]+" "
else:
translated_line += word.lower()+ " "
with open('/home/jovyan/english_to_lolspeak_fellow/lolcat.txt', 'a') as lolcat_file:
lolcat_file.write(translated_line)
write("\n")
I have made my own corpus of misspelled words.
misspellings_corpus.txt:
English, enlist->Enlish
Hallowe'en, Halloween->Hallowean
I'm having an issue with my format. Thankfully, it is at least consistent.
Current format:
correct, wrong1, wrong2->wrong3
Desired format:
wrong1,wrong2,wrong3->correct
The order of wrong<N> isn't of concern,
There might be any number of wrong<N> words per line (separated by a comma: ,),
There's only 1 correct word per line (which should be to the right of ->).
Failed Attempt:
with open('misspellings_corpus.txt') as oldfile, open('new.txt', 'w') as newfile:
for line in oldfile:
correct = line.split(', ')[0].strip()
print(correct)
W = line.split(', ')[1].strip()
print(W)
wrong_1 = W.split('->')[0] # however, there might be loads of wrong words
wrong_2 = W.split('->')[1]
newfile.write(wrong_1 + ', ' + wrong_2 + '->' + correct)
Output new.txt (isn't working):
enlist, Enlish->EnglishHalloween, Hallowean->Hallowe'en
Solution: (Inspired by #alexis)
with open('misspellings_corpus.txt') as oldfile, open('new.txt', 'w') as newfile:
for line in oldfile:
#line = 'correct, wrong1, wrong2->wrong3'
line = line.strip()
terms = re.split(r", *|->", line)
newfile.write(",".join(terms[1:]) + "->" + terms[0] + '\n')
Output new.txt:
enlist,Enlish->English
Halloween,Hallowean->Hallowe'en
Let's assume all the commas are word separators. I'll break each line on commas and arrows, for convenience:
import re
line = 'correct, wrong1, wrong2->wrong3'
terms = re.split(r", *|->", line)
new_line = ", ".join(terms[1:]) + "->" + terms[0]
print(new_line)
You can put that back in a file-reading loop, right?
I'd suggest building up a list, rather than assuming the number of elements. When you split on the comma, the first element is the correct word, elements [1:-1] are misspellings, and [-1] is going to be the one you have to split on the arrow.
I think you're also finding that write needs a newline character as in "\n" as suggested in the comments.
I want to have each line of a .txt file to end with ", but the coding of file is gb2312 or gbk, since Chinese is include. So I create a file named heheda.txt, whose content is as follows (the end of each line contains a return):
从前有座山"
shan里有个庙
"庙里有个"
laohe尚
Then what I tried is as follows:
for line in open('heheda.txt', 'r'):
if not line[-2] == r'"':
print line
line = line[:-1] + r'"' + line[-1:]
print line
and it returns:
shan里有个庙
shan里有个庙"
laohe尚
laohe�"�
I don't know why end for each line is line[-2], since I have tried line.endswith(r'"') and line[-1] == r'"'. And the first sentence get the right format, while second sentence with something wrong (�).
Then I tried to read in binary mode with rb, which makes me surprises me again:
a_file = open(data_path+'heheda.txt', 'rb')
for line in a_file:
if line[-3] != r'"':
print line
line = line[:-2] + r'"' + line[-2:]
print line
and it returns:
shan里有个庙
shan里有个庙"
laohe尚
laohe�"��
This time, I have to use line[-3] != r'"' as the condition to judge whether sentence end with " or not.
I cannot figure out what happens.
By the way I work in Windows7 with python 2.7.11
Does anyone know what's going on??
Windows uses "\r\n" as newline which is automatically translated to "\n" with text-reading mode. But your last line has no newline character.
Just strip newline characters and then test for ":
with open('heheda.txt', 'r') as lines:
for line in lines:
line = line.rstrip()
if not line.endswith('"'):
line += '"'
print line
I want to strip spaces to single space but preserve one empty line separator in a file. I have tried the following code and it seems to work.
How can I do this with out writing to the file twice?
I want to collect all my substitutions may be in a text file and write them all at once.
i = open('inputfile.txt','r')
infile = i.readlines()
o = open('outputfile.txt','w')
for line in infile:
if line == '\n':
o.write('\n\n')
else:
o.write(re.sub(r'\s+',' ',line))
o.close()
i.close()
See my answer in this question here: Python save file to csv
I think the re.sub() replacement is tripping you up with the '\s' value. Just replace ' ' instead.
i = open('inputfile.txt','r')
infile = i.readlines()
o = open('outputfile.txt','w')
newoutputfile = ""
for line in infile:
if line == '\n':
newoutputfile+= '\n\n'
else:
newoutputfile +=' '.join(line.split())
o.write(newoutputfile)
o.close()