How to use regular expression parse text with symbol "| " [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I need to parse a bunch of unformatted text similar to the one below.
those|DT|O considered|VBN|O anarchists|NNS|O at|IN|O best|JJS|O share|NN|O a|DT|O certain|JJ|O family|NN|O resemblance|NN|O .|.|O "|RQU|O
I need to use regular expression to parse the data into a format which would be like this:
The DT I-MISC
certain JJ O
in IN O
the DT B
pound NN I

with open('outfile.txt', 'wb') as outfile, open('infile.txt', 'r') as infile:
[outfile.write(i.replace('|', ' ') + '\n') for i in infile.read().split()]
You basically just want to split by whitespace then replace the | with whitespace correct? That seems to be what you're looking for.
EDIT:
Code now writes to file.
EDIT 2:
Code now reads from a file

Related

No "\n" after line.rsplit in python [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I want to create a new text file and cut the string in each line.
for line in rescut:
rescutfinal.write("pretext_" + line.rsplit('delimiter', 1)[-1] + "1\t0\t10\tLinear\t0\t0")
But my code doesn't work as expected. My Outpout contains a new line after the "line.rsplit"-string
pretext_linesplitstring
string after linesplitstring pretext_linesplitstring
string after linesplitstring pretext_linesplitstring
string after linesplitstring pretext_linesplitstring"
How do I get rid of the "\n" after the linesplitstring?
you can use str.rstrip() or str.strip():
line.rstrip().rsplit('delimiter', 1)[-1]

Python take the string of anything, ignoring all escape characters [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am trying to make a function that takes typically copy-pasted text that very often includes \n characters. An example of such is as follows:
func('''This
is
some
text
that I entered''')
The problem with this function is the text can sometimes be rather large, so taking it line by line to avoid ', " or ''' isn't plausible. A piece of text that can cause issues is as follows:
func('''This
is'''
some"
text'
that I entered''')
I wanted to know if there is any way I can take the text as seen in the second example and use it as a string regardless of what it is comprised of.
Thanks!
To my knowledge, you won't be able to paste the text directly into your file. However, you could paste it into a text file.
Use regex to find triple quotes ''' and other invalid characters.
Example python:
def read_paste(file):
import re
with open(file,'r') as f:
data = f.readlines()
for i,line in enumerate(data):
data[i] = re.sub('("|\')',r'\\\1',line)
output = str()
for line in data:
output += line
return output

How to save the following python output result to .csv format [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
How can I save the result from this code to .csv format?
import re
import CSV
text = open('example.txt').read()
pattern = r'([0-9]+)[:]([0-9]+)[:](.*)'
regex = re.compile(pattern)
for match in regex.finditer(text):
result = ("{},{}".format(match.group(2),match.group(3)))
If I understood your question correctly, you can generate the CSV as follows:
import re
text = open('example.txt').read()
pattern = r'([0-9]+)[:]([0-9]+)[:](.*)'
regex = re.compile(pattern)
with open('csv_file.csv', 'w') as csv_file:
# Add header row with two columns
csv_file.write('{},{}\n'.format('Id', 'Tile'))
for match in regex.finditer(text):
result = ("{},{}".format(match.group(2),match.group(3)))
csv_file.write('{}\n'.format(result))

Getting users location in twitter [duplicate]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
with open('twit/example.json', encoding='utf8') as json_data:
for line in json_data:
try:
dataText = json.loads(line)
except ValueError:
continue
for a in dataText:
print(a["user"]["location"])
the result is: string indices must be integers
Update: The below answer is for printing
print(dataText["user"]["location"])
now I want this one:
print(a["user"]["location"])
If your json file is in a normal format, use this instead:
with open('twit/example.json', encoding='utf8') as json_data:
dataText = json.loads(line)
for a in dataText:
print(dataText["user"]["location"])
The way your code is currently written makes me think you have multiple json structures in a single file, separated by new lines. This is not how json is usually formatted.

Parsing JSON fails as strings appear instead of dicts/lists [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
with open('twit/example.json', encoding='utf8') as json_data:
for line in json_data:
try:
dataText = json.loads(line)
except ValueError:
continue
for a in dataText:
print(a["user"]["location"])
the result is: string indices must be integers
Update: The below answer is for printing
print(dataText["user"]["location"])
now I want this one:
print(a["user"]["location"])
If your json file is in a normal format, use this instead:
with open('twit/example.json', encoding='utf8') as json_data:
dataText = json.loads(line)
for a in dataText:
print(dataText["user"]["location"])
The way your code is currently written makes me think you have multiple json structures in a single file, separated by new lines. This is not how json is usually formatted.

Categories

Resources