I've got an input file with some string containing double quotes in it, and want to generate a C-style header file with Python.
Say,
input file: Hello "Bob"
output file: Hello \"Bob\"
I can't write the code to obtain such a file, here's what I've tried so far:
key = 'key'
val = 'Hello "Bob"'
buf_list = list()
...
val = val.replace('"', b'\x5c\x22')
# also tried: val = val.replace('"', r'\"')
# also tried: val = val.replace('"', '\\"')
buf_list.append((key + '="' + val + '";\n').encode('utf-8'))
...
for keyval in buf_list:
lang_file.write(keyval)
lang_file.close()
The output file always contains:
Hello \\\"Bob\\\"
I had no problems writing \n, \t strings into the output file.
It seems I can only write zero or two backslashes, can someone help please ?
You need to escape both the double-quote and the backslash. The following works for me (using Python 2.7):
with open('temp.txt', 'r') as f:
data = f.read()
with open('temp2.txt', 'w') as g:
g.write(data.replace('\"', '\\\"'))
The conversion of string to raw string during replacement should do.
a='Hello "Bob"'
print a.replace('"', r'\"')
The above will give you:
Hello \"Bob\"
Related
I want to save some mathjax code to a .txt file in python.
x = "$\infty$"
with open("sampletext.txt", "a+") as f:
f.write(x)
Works exactly as expected
sampletext.txt
$\infty$
However when i try to save the escape sequence in a list
x = ["$\infty$"]
with open("sampletext.txt", "a+") as f :
f.write(str(x))
sampletext.txt
['$\\infty$']
How do i remove the double backslash in the latter and save it as ['$\infty$'] ?
Try this:
x = [r"$\infty$"]
with open("sampletext.txt", "a+") as f:
f.write(str(x))
The r means that the string is to be treated as a raw string, which means all escape codes will be ignored.
Maybe this can help you:
x = [r"$\infty$"]
with open("sampletext.txt", "a+") as f:
f.write(''.join(x))
Flag "r" (raw) can be use to save string with special symbols like "\"
Or if you don't know how many items in the list:
x = ["$\infty$"]
with open("sampletext.txt", "a+") as f:
f.write(f"{''.join(x)}")
I have a CSV file that has some data in it. I want to replace all the newlines within "" by some character. But the new lines outside of these quotes should stay. What is the best way to achieve this?
import sys, getopt
def main(argv):
inputfile = ''
outputfile = ''
print(argv[0:])
inputfile = argv[0:]
file_object = open(argv[0:], "r")
print(file_object)
data = file.read(file_object)
strings = data.split('"')[1::2]
for string in strings:
string.replace("\r", "")
string.replace("\n", "")
print(string)
f = open("output.csv", "w")
for string in strings:
string = string.replace("\r", "")
string = string.replace("\n", "")
f.write(string)
f.close()
if __name__ == "__main__":
main(sys.argv[1])
This does not quite work, since the "" get lost as well as the ,'s.
Expected input:
“dssdlkfjsdfj \r\n ashdiowuqhduwqh \r\n”,
"3"
Expected output:
"dssdlkfjsdfj ashdiowuqhduwqh",
"3"
A real sample would help, but given in.csv:
"multi
line
data","more data"
"more multi
line data","other data"
The following will replace newlines in quotes:
import csv
with open('in.csv',newline='') as fin:
with open('out.csv','w',newline='') as fout:
r = csv.reader(fin)
w = csv.writer(fout)
for row in r:
row = [col.replace('\r\n','**') for col in row]
w.writerow(row)
out.csv:
multi**line**data,more data
more multi**line data,other data
The problem got solved in a very easy way. Create an output file, and read the input file for each character. Write each character to the output file, but toggle replace mode by using the ~ operator when a " appears. When in replace mode, replace all \r\n with '' (nothing).
I want to extract the specific word from the text file.
Here is the example text file:
https://drive.google.com/file/d/0BzQ6rtO2VN95d3NrTjktMExfNkU/view?usp=sharing
Kindly review it.
I am trying to extract the string as:
"Name": "the name infront of it"
"Link": "Link infront of it"
Say from the input file, I am expecting to get output like this:
"Name":"JTLnet"
"Link":"http://jtlnet.com"
"Name":"Apache 1.3"
"Link":"http://httpd.apache.org/docs/1.3"
"Name":"Apache"
"Link":"http://httpd.apache.org/"
.
.
.
"Name":"directNIC"
"Link":"http://directnic.com"
If these words are anywhere in the file, it should get extracted to another file.
Kindly let me know how I can achieve this sort of extraction? Kindly consider the file as the small part of big file.
Also, it is text file not json.
Kindly help me.
Since the text file is not formatted properly, the only option for you is regex. The below snippet works for the given sample file.
Keep in mind that this requires you to load the entire file into memory
import re, json
f = open(r'filepath')
textCorpus = f.read()
f.close()
# replace empty strings to non-empty, match regex easily
textCorpus = textCorpus.replace('""', '" "')
lstMatches = re.findall(r'"Name".+?"Link":".+?"', textCorpus)
with open(r'new_file.txt', 'ab+) as wf:
for eachMatch in lstMatches:
convJson = "{" + eachMatch + "}"
json_data = json.loads(convJson)
wf.write(json_data["Name"] + "\n")
wf.write(json_data["Link"] + "\n")
Short solution using re.findall() and str.split() functions:
import re
with open('test.txt', 'r') as fh:
p = re.compile(r'(?:"Categories":[^,]+,)("Name":"[^"]+"),(?:[^,]+,)("Link":"[^"]+")')
result = [pair for l in re.findall(p, fh.read()) for pair in l]
print('\n'.join(result))
The output(fragment):
"Name":"JTLnet"
"Link":"http://jtlnet.com"
"Name":"Apache 1.3"
"Link":"http://httpd.apache.org/docs/1.3"
"Name":"Apache"
"Link":"http://httpd.apache.org/"
"Name":"PHP"
....
Your file is a wrongly formatted json with extraneous double quote. But it is enough for the json module not to be able to load it. You are left with lower level regex parsing.
Assumptions:
the interesting part after "Name" or "Link" is:
separated from the identifier by a colon (:)
enclosed in double quotes (") with no included double quote
the file is structured in lines
Name and Link fields are always on one single line (no new line in fields)
You can process your file line by line with a simple re.finditer on each line:
rx = re.compile(r'(("Name":".*?")|("Link":".*?"))')
with open(inputfile) as fd:
for line in fd:
l = rx.finditer(line)
for elt in l:
print(elt.group(0))
If you want to output data to another file, just open it before above snippet with open(outputfile, "w") as fdout: and replace the print line with:
fdout.write(elt.group(0) + "\n")
I want my code to accept input that is a string (for example: "ABC") and then read through a txt file and find that string; once it finds the string, it should output the closest integer (for example: 456) to the string in the file. Is this possible?
So far, I've found code that can print "related lines," so lines that are 2 lines away from my string. This code is:
f = open("textfile.txt", "r")
searchlines = f.readlines()
f.close()
for i, line in enumerate(searchlines):
if "string" in line:
for l in searchlines[i:i+2]: print
print
This code outputs the two lines in front of and behind my string. But however, I need to print a specific integer, so I'm not sure how to proceed from there.
For my purposes, I need the "closest integer" to the right of the string.
You could read the file into one string with f.read() and then get the integer via a regular expression that captures the integer into a group:
import re
f = open("textfile.txt", "r")
content = f.read()
f.close()
find = 'hello' # the string to find
result = re.search(find + r'\D*(\d+)', content)
print(result.group(1))
Is there a way to detect the new line character after I've read from a file and stored the results into a string? Here is the code:
with open("text.txt") as file:
content_string = file.read()
file.close()
re.search("\n", content_string)
The content_string looks like this:
Hello world!
Hello WORLD!!!!!
I want to extract the new line character after the first line "Hello world!". Does this character even exist at that point?
As per Jongware comment, the regex search you perform finds the newline. You just need to use that result.
From the re module documentation
re.search(pattern, string, flags=0)
Scan through string looking for the first location where the regular
expression pattern produces a match, and return a corresponding
MatchObject instance. Return None if no position in the string matches
the pattern; note that this is different from finding a zero-length
match at some point in the string.
In terms of code, checking that translates into:
with open("text.txt") as file:
content_string = file.read()
file.close()
m = re.search("\n", content_string)
if m:
print "Found a newline"
else:
print "No newline found"
Now, your file might very well contain "\r" rather than "\n": they print likely the same, but the regex would not match. In that case, give also this test a try, replacing the correct line in the code:
m = re.search("\n", content_string)
with:
m = re.search("[\r\n]", content_string)
which will look for either.
Is there a way to detect the new line character after I've read from a
file and stored the results into a string?
If I understand you correctly, you want to concatenate multiple lines into one string.
Input:
Hello world!
Hello WORLD!!!!!
test.py:
result = []
with open("text.txt", "rb") as inputs:
for line in inputs:
result.append(line.strip()) # strip() removes newline charactor
print " ".join([x for x in result])
output:
Hello world! Hello WORLD!!!!!
How about if I have more lines, and I want to detect the first
newline? For some reason, in my text it won't detect it.
with open("text.txt") as f:
first_line = f.readline()
print(first_line)