Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I need to parse a bunch of unformatted text similar to the one below.
those|DT|O considered|VBN|O anarchists|NNS|O at|IN|O best|JJS|O share|NN|O a|DT|O certain|JJ|O family|NN|O resemblance|NN|O .|.|O "|RQU|O
I need to use regular expression to parse the data into a format which would be like this:
The DT I-MISC
certain JJ O
in IN O
the DT B
pound NN I
with open('outfile.txt', 'wb') as outfile, open('infile.txt', 'r') as infile:
[outfile.write(i.replace('|', ' ') + '\n') for i in infile.read().split()]
You basically just want to split by whitespace then replace the | with whitespace correct? That seems to be what you're looking for.
EDIT:
Code now writes to file.
EDIT 2:
Code now reads from a file
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I want to create a new text file and cut the string in each line.
for line in rescut:
rescutfinal.write("pretext_" + line.rsplit('delimiter', 1)[-1] + "1\t0\t10\tLinear\t0\t0")
But my code doesn't work as expected. My Outpout contains a new line after the "line.rsplit"-string
pretext_linesplitstring
string after linesplitstring pretext_linesplitstring
string after linesplitstring pretext_linesplitstring
string after linesplitstring pretext_linesplitstring"
How do I get rid of the "\n" after the linesplitstring?
you can use str.rstrip() or str.strip():
line.rstrip().rsplit('delimiter', 1)[-1]
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am trying to make a function that takes typically copy-pasted text that very often includes \n characters. An example of such is as follows:
func('''This
is
some
text
that I entered''')
The problem with this function is the text can sometimes be rather large, so taking it line by line to avoid ', " or ''' isn't plausible. A piece of text that can cause issues is as follows:
func('''This
is'''
some"
text'
that I entered''')
I wanted to know if there is any way I can take the text as seen in the second example and use it as a string regardless of what it is comprised of.
Thanks!
To my knowledge, you won't be able to paste the text directly into your file. However, you could paste it into a text file.
Use regex to find triple quotes ''' and other invalid characters.
Example python:
def read_paste(file):
import re
with open(file,'r') as f:
data = f.readlines()
for i,line in enumerate(data):
data[i] = re.sub('("|\')',r'\\\1',line)
output = str()
for line in data:
output += line
return output
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
How can I save the result from this code to .csv format?
import re
import CSV
text = open('example.txt').read()
pattern = r'([0-9]+)[:]([0-9]+)[:](.*)'
regex = re.compile(pattern)
for match in regex.finditer(text):
result = ("{},{}".format(match.group(2),match.group(3)))
If I understood your question correctly, you can generate the CSV as follows:
import re
text = open('example.txt').read()
pattern = r'([0-9]+)[:]([0-9]+)[:](.*)'
regex = re.compile(pattern)
with open('csv_file.csv', 'w') as csv_file:
# Add header row with two columns
csv_file.write('{},{}\n'.format('Id', 'Tile'))
for match in regex.finditer(text):
result = ("{},{}".format(match.group(2),match.group(3)))
csv_file.write('{}\n'.format(result))
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
with open('twit/example.json', encoding='utf8') as json_data:
for line in json_data:
try:
dataText = json.loads(line)
except ValueError:
continue
for a in dataText:
print(a["user"]["location"])
the result is: string indices must be integers
Update: The below answer is for printing
print(dataText["user"]["location"])
now I want this one:
print(a["user"]["location"])
If your json file is in a normal format, use this instead:
with open('twit/example.json', encoding='utf8') as json_data:
dataText = json.loads(line)
for a in dataText:
print(dataText["user"]["location"])
The way your code is currently written makes me think you have multiple json structures in a single file, separated by new lines. This is not how json is usually formatted.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
with open('twit/example.json', encoding='utf8') as json_data:
for line in json_data:
try:
dataText = json.loads(line)
except ValueError:
continue
for a in dataText:
print(a["user"]["location"])
the result is: string indices must be integers
Update: The below answer is for printing
print(dataText["user"]["location"])
now I want this one:
print(a["user"]["location"])
If your json file is in a normal format, use this instead:
with open('twit/example.json', encoding='utf8') as json_data:
dataText = json.loads(line)
for a in dataText:
print(dataText["user"]["location"])
The way your code is currently written makes me think you have multiple json structures in a single file, separated by new lines. This is not how json is usually formatted.