How to get filename from stdin - python

I am writing a script and i am running it from the console like this
cat source_text/* | ./mapper.py
and i would like to get the filename of each file reading at the time. Source texts folder contains a bunch of text files whose filename i need to extract as well in my mapper script.
Is that possible?
import sys
import re
import os
# re is for regular expressions
pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*",
re.MULTILINE | re.DOTALL | re.IGNORECASE)
# Read pairs as lines of input from STDIN
for line in sys.stdin:
....

You cannot do that directly, but fileinput module can help you.
You just have to call you script that way:
./mapper.py source_text/*
And change it that way:
import fileinput
...
# Read pairs as lines of input from STDIN
for line in fileinput.input():
...
Then the name of the file being processed is available as fileinput.filename(), and you can also have access the the number of the line in current file as fileinput.filelineno() and still other goodies...

That is not possible. You can modify your program to read directly from the files like this:
import sys
import re
# re is for regular expressions
pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*",
re.MULTILINE | re.DOTALL | re.IGNORECASE)
for filename in sys.argv[1:]:
with open(filename, "rU") as f:
for line in f.readlines():
if pattern.search(line) is not None:
print filename, line,
Then you can call it with:
$ ./grep_files.py source_text/*

If you use this instead of cat:
grep -r '' source_text/ | ./mapper.py
The input for mapper.py will be like:
source_text/answers.txt:42
source_text/answers.txt:42
source_text/file1.txt:Hello world
You can then retrieve the filename using:
for line in sys.stdin:
filename, line = line.split(':', 1)
...
However Python is more than capable to iterate over files in a directory and reading them line-by-line, for example:
for filename in os.listdir(path):
for line in open(filename):
...

Related

How do you read lines from all files in a directory?

I have two files in a directory. I'd like to read the lines from each of the files. Unfortunately when I try to do so with the following code, there is no output.
from pathlib import Path
p = Path('tmp')
for file in p.iterdir():
print(file.name)
functions.py
test.txt
for file in p.iterdir():
f = open(file, 'r')
f.readlines()
You're reading all the lines from the file, but you're not outputting them. If you want to print the lines to standard output, you need to use print() as you did in your first example.
You can also write this somewhat more elegantly using contexts and more iterators:
from pathlib import Path
file = Path('test.txt')
with file.open() as open_file:
for line in open_file:
print(line, end="")
test.txt:
Spam
Spam
Spam
Wonderful
Spam!
Spamity
Spam
Result:
Spam
Spam
Spam
Wonderful
Spam!
Spamity
Spam
Using a context for opening the file (with file.open()) means you inherently set up closing the file, and the iterator for the lines (for line in open_file) means you're not loading the whole file at once (an important consideration with larger files).
Setting end="" in print() is optional depending on how your source files are structured, as you might otherwise end up printing extra blank lines in your output.
You could use fileinput:
import os
import fileinput
for line in fileinput.input(os.listdir('.')):
print(line)
You should print data like this from text.py
count = 1
f = open(file, 'r')
Lines = f.readlines()
for line in Lines:
count += 1
print("Line {}: {}".format(count, line.strip()))
Output will look like:
Line 1: ...
Line 2: ...
Line 3: ...
you can see reading line example here - Line reading

looping regex in directory and saving output

I was able to run the regex on multiple files, I want to save the output of this like name_of_file_clean.txt.
Trying to find the best way.
import os, re
import glob
pattern = re.compile(r'(?<=CN=)(.*?)(?=,)')
for file in glob.glob('*.txt'):
with open(file) as fp:
for result in pattern.findall(fp.read()):
print(result)
We'll just open the output file and use the print functions file keyword argument to write to the file
import os, re
import glob
pattern = re.compile(r'(?<=CN=)(.*?)(?=,)')
for file in glob.glob('*.txt'):
with open(file) as fp:
with open(file[:-4] + '_clean.txt', 'w') as outfile:
for result in pattern.findall(fp.read()):
print(result, file=outfile)

facing issue with "wget" in python

I am very novice to python. I am facing issue with "wget" as well as " urllib.urlretrieve(str(myurl),tail)"
when I run script it's downloading files but filename are ending with "?"
my complete code :
import os
import wget
import urllib
import subprocess
with open('/var/log/na/na.access.log') as infile, open('/tmp/reddy_log.txt', 'w') as outfile:
results = set()
for line in infile:
if ' 200 ' in line:
tokens = line.split()
results.add(tokens[6]) # 7th token
for result in sorted(results):
print >>outfile, result
with open ('/tmp/reddy_log.txt') as infile:
results = set()
for line in infile:
head, tail = os.path.split(line)
print tail
myurl = "http://data.xyz.com" + str(line)
print myurl
wget.download(str(myurl))
# urllib.urlretrieve(str(myurl),tail)
output :
# python last.py
0011400026_recap.xml
http://data.na.com/feeds/mobile/android/v2.0/video/games/high/0011400026_recap.xml
latest_1.xml
http://data.na.com/feeds/mobile/iphone/article/league/news/latest_1.xml
currenttime.js
Listing the files :
# ls
0011400026_recap.xml? currenttime.js? latest_1.xml? today.xml?
A possible explanation of the behaviour you experience is that you do
not sanitize your input line
with open ('/tmp/reddy_log.txt') as infile:
...
for line in infile:
...
myurl = "http://data.xyz.com" + str(line)
wget.download(str(myurl))
When you iterate on a file object, (for line in infile:) the string
you get is terminated by a newline ('\n') character — if you do not
remove the newline before using line, oh well, the newline character
is still there in what is produced by your use of line …
As an illustration of this concept, have a look at the transcript
of a test I've done
08:28 $ cat > a_file
a
b
c
08:29 $ cat > test.py
data = open('a_file')
for line in data:
new_file = open(line, 'w')
new_file.close()
08:31 $ ls
a_file test.py
08:31 $ python test.py
08:31 $ ls
a? a_file b? c? test.py
08:31 $ ls -b
a\n a_file b\n c\n test.py
08:31 $
As you can see, I read lines from a file and create some files using
line as the filename and guess what, the filenames as listed by ls
have a ? at the end — but we can do better, as it's explained in the
fine manual page of ls
-b, --escape
print C-style escapes for nongraphic characters
and, as you can see in the output of ls -b, the filenames are not
terminated by a question mark (it's just a placeholder used by default
by the ls program) but are terminated by a newline character.
While I'm at it, I have to say that you should avoid to use a
temporary file to store the intermediate results of your computation.
A nice feature of Python is the presence of generator expressions,
if you want you can write your code as follows
import wget
# you matched on a '200' on the whole line, I assume that what
# you really want is to match a specific column, the 'error_column'
# that I symbolically load from an external resource
from my_constants import error_column, payload_column
# here it is a sequence of generator expressions, each one relying
# on the previous one
# 1. the lines in the file, stripped from the white space
# on the right (the newline is considered white space)
# === not strictly necessary, just convenient because
# === below we want to test for non-empty lines
lines = (line.rstrip() for line in open('whatever.csv'))
# 2. the lines are converted to a list of 'tokens'
all_tokens = (line.split() for line in lines if line)
# 3. for each 'tokens' in the 'all_tokens' generator expression, we
# check for the code '200' and possibly generate a new target
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')
# eventually, use the 'targets' generator to proceed with the downloads
for target in targets: wget.download(target)
Don't be fooled by the amount of comments, w/o comments my code is just
import wget
from my_constants import error_column
lines = (line.rstrip() for line in open('whatever.csv'))
all_tokens = (line.split() for line in lines if line)
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')
for target in targets: wget.download(target)

Python - Read value of variable from file

In bash, I have a file that stores my passwords in variable format.
e.g.
cat file.passwd
password1=EncryptedPassword1
password2=EncryptedPassword2
Now if I want to use the value of password1, this is all that I need to do in bash.
grep password1 file.passwd | cut -d'=' -f2
I am looking for an alternative for this in python. Is there any library that gives functionality to simply extract the value or do we have to do it manually
like below?
with open(file, 'r') as input:
for line in input:
if 'password1' in line:
re.findall(r'=(\w+)', line)
Read the file and add the check statement:
if line.startswith("password1"):
print re.findall(r'=(\w+)',line)
Code:
import re
with open(file,"r") as input:
lines = input.readlines()
for line in lines:
if line.startswith("password1"):
print re.findall(r'=(\w+)',line)
There's nothing wrong with what you've written. If you want to play code golf:
line = next(line for line in open(file, 'r') if 'password1' in line)
I found this module very useful ! Made life much easier.

Replace a whole line in a txt file

I'am new to Python 3 and could really use a little help. I have a txt file containing:
InstallPrompt=
DisplayLicense=
FinishMessage=
TargetName=D:\somewhere
FriendlyName=something
I have a python script that in the end, should change just two lines to:
TargetName=D:\new
FriendlyName=Big
Could anyone help me, please? I have tried to search for it, but I didnt find something I could use. The text that should be replaced could have different length.
import fileinput
for line in fileinput.FileInput("file",inplace=1):
sline=line.strip().split("=")
if sline[0].startswith("TargetName"):
sline[1]="new.txt"
elif sline[0].startswith("FriendlyName"):
sline[1]="big"
line='='.join(sline)
print(line)
A very simple solution for what you're doing:
#!/usr/bin/python
import re
import sys
for line in open(sys.argv[1],'r').readlines():
line = re.sub(r'TargetName=.+',r'TargetName=D:\\new', line)
line = re.sub(r'FriendlyName=.+',r'FriendlyName=big', line)
print line,
You would invoke this from the command line as ./test.py myfile.txt > output.txt
Writing to a temporary file and the renaming is the best way to make sure you won't get a damaged file if something goes wrong
import os
from tempfile import NamedTemporaryFile
fname = "lines.txt"
with open(fname) as fin, NamedTemporaryFile(dir='.', delete=False) as fout:
for line in fin:
if line.startswith("TargetName="):
line = "TargetName=D:\\new\n"
elif line.startswith("FriendlyName"):
line = "FriendlyName=Big\n"
fout.write(line.encode('utf8'))
os.rename(fout.name, fname)
Is this a config (.ini) file you're trying to parse? The format looks suspiciously similar, except without a header section. You can use configparser, though it may add extra space around the "=" sign (i.e. "TargetName=D:\new" vs. "TargetName = D:\new"), but if those changes don't matter to you, using configparser is way easier and less error-prone than trying to parse it by hand every time.
txt (ini) file:
[section name]
FinishMessage=
TargetName=D:\something
FriendlyName=something
Code:
import sys
from configparser import SafeConfigParser
def main():
cp = SafeConfigParser()
cp.optionxform = str # Preserves case sensitivity
cp.readfp(open(sys.argv[1], 'r'))
section = 'section name'
options = {'TargetName': r'D:\new',
'FriendlyName': 'Big'}
for option, value in options.items():
cp.set(section, option, value)
cp.write(open(sys.argv[1], 'w'))
if __name__ == '__main__':
main()
txt (ini) file (after):
[section name]
FinishMessage =
TargetName = D:\new
FriendlyName = Big
subs_names.py script works both Python 2.6+ and Python 3.x:
#!/usr/bin/env python
from __future__ import print_function
import sys, fileinput
# here goes new values
substitions = dict(TargetName=r"D:\new", FriendlyName="Big")
inplace = '-i' in sys.argv # make substitions inplace
if inplace:
sys.argv.remove('-i')
for line in fileinput.input(inplace=inplace):
name, sep, value = line.partition("=")
if name in substitions:
print(name, sep, substitions[name], sep='')
else:
print(line, end='')
Example:
$ python3.1 subs_names.py input.txt
InstallPrompt=
DisplayLicense=
FinishMessage=
TargetName=D:\new
FriendlyName=Big
If you are satisfied with the output then add -i parameter to make changes inplace:
$ python3.1 subs_names.py -i input.txt

Categories

Resources