Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a file like this:
Hi:
fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds
fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds
Exampples:
>>fdsfds
>>ok
This is it.
Hello:
fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd
fdsfdsfdsfdsfds
fdsfdsfsd
The section of Hi is from fds... to This is it. The section of Hello is from fds.. to fds..
I want to get only the section of all the headings. I thought of the following approach:
Start from : and then look upto \n\n which will give me the section respectively. But this won't because the section itself can have the same format. I don't want to do this using regex or Configparser. I am looking for simple parsing. How to tackle this problem?
You could search for lines not starting with five spaces:
tab = " " # five spaces
with open('input.txt', 'r') as f:
for line in f:
if line.startswith(tab):
print line
This is really easy with a regex:
txt='''\
Hi:
fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds
fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds
Exampples:
>>fdsfds
>>ok
This is it.
Hello:
fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd
fdsfdsfdsfdsfds
fdsfdsfsd'''
import re
print(re.findall(r'^(\w+:.*?)(?=^\w+:|\Z)', txt, re.S | re.M))
Prints:
['Hi:\n fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds\n fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds\n Exampples:\n\n >>fdsfds\n >>ok\n\n This is it.\n\n', 'Hello:\n fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd\n fdsfdsfdsfdsfds\n fdsfdsfsd']
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have a text file in the following format:
Car: Replace:Brakes<10
Car: Renew: Engine=100000
Truck: Renew: Engine=1000
Truck: Replace: Brakes<504
I am looking to write a regex to parse this file and extract only the lines with Car in it and also only extract values after Car and return them as a python dictionary.
So my output would look like
'Replace' :' Brakes<10'
'Renew' : 'Engine=100000'
Any inputs on how I can achieve this?
I tried.
re.search
but get a re.Match object which I am not sure how to interpret.
Thank you!
There we go:
https://regex101.com/r/UTdN6B/1
use ^Car: (.*)|.* as pattern and \1 as substitute also gm as flags.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am trying to make a function that takes typically copy-pasted text that very often includes \n characters. An example of such is as follows:
func('''This
is
some
text
that I entered''')
The problem with this function is the text can sometimes be rather large, so taking it line by line to avoid ', " or ''' isn't plausible. A piece of text that can cause issues is as follows:
func('''This
is'''
some"
text'
that I entered''')
I wanted to know if there is any way I can take the text as seen in the second example and use it as a string regardless of what it is comprised of.
Thanks!
To my knowledge, you won't be able to paste the text directly into your file. However, you could paste it into a text file.
Use regex to find triple quotes ''' and other invalid characters.
Example python:
def read_paste(file):
import re
with open(file,'r') as f:
data = f.readlines()
for i,line in enumerate(data):
data[i] = re.sub('("|\')',r'\\\1',line)
output = str()
for line in data:
output += line
return output
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
consider below the output of my subprocess module of python. Now from here I want to grab the very first username only like root ,daemon and continuew. I tried but not able to write the exact regex to fetch the users only .
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
You do not a regex in the first place, as per your comments. So, you may iterate over the output line by line (str.splitlines()), split the line with : (str.split(':')) and take the first result (result[0]). This expects the output to be consistent, else it will fail.
Use split instead of regex
for line in output.split('\n'):
print line.split(':')[0]
>>>root
>>>daemon
....
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am working with a ~0.5 GB text file, and I want to extract a representative subset of lines. Say, one millionth of them. I've create a small script to do this:
import random
result = []
with open("data.txt") as f:
for line in f:
if random.random() < 0.000001:
result.append(line)
But it would be more useful for my purpose if I could do this from the command line, without a script. Note, I don't care how many lines out output, I just want to be able to set a percent/probability of outputting each line.
MY QUESTION/REQUEST: Is how to do this with just a short one-liner which is suitable for the commandline.
Is perl ok? Try this:
cat yourfile.txt | perl -ne 'print if (rand() < 0.000001)'
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Basically I want a Python script to search a .txt for any line containing,
" #1111. "
1111 < = any number from 0-9, so any possibility 0-9 with 4 numbers, containing # at the start and . at the end.
You'll want to use what's called a Regular Expression.
Python has a regular expression module called re.
import re
with open('file.txt', 'r') as f:
matches = [line for line in f if re.search(r'#\d{4}\.', line)]
print matches