facing issue with "wget" in python - python

I am very novice to python. I am facing issue with "wget" as well as " urllib.urlretrieve(str(myurl),tail)"
when I run script it's downloading files but filename are ending with "?"
my complete code :
import os
import wget
import urllib
import subprocess
with open('/var/log/na/na.access.log') as infile, open('/tmp/reddy_log.txt', 'w') as outfile:
results = set()
for line in infile:
if ' 200 ' in line:
tokens = line.split()
results.add(tokens[6]) # 7th token
for result in sorted(results):
print >>outfile, result
with open ('/tmp/reddy_log.txt') as infile:
results = set()
for line in infile:
head, tail = os.path.split(line)
print tail
myurl = "http://data.xyz.com" + str(line)
print myurl
wget.download(str(myurl))
# urllib.urlretrieve(str(myurl),tail)
output :
# python last.py
0011400026_recap.xml
http://data.na.com/feeds/mobile/android/v2.0/video/games/high/0011400026_recap.xml
latest_1.xml
http://data.na.com/feeds/mobile/iphone/article/league/news/latest_1.xml
currenttime.js
Listing the files :
# ls
0011400026_recap.xml? currenttime.js? latest_1.xml? today.xml?

A possible explanation of the behaviour you experience is that you do
not sanitize your input line
with open ('/tmp/reddy_log.txt') as infile:
...
for line in infile:
...
myurl = "http://data.xyz.com" + str(line)
wget.download(str(myurl))
When you iterate on a file object, (for line in infile:) the string
you get is terminated by a newline ('\n') character — if you do not
remove the newline before using line, oh well, the newline character
is still there in what is produced by your use of line …
As an illustration of this concept, have a look at the transcript
of a test I've done
08:28 $ cat > a_file
a
b
c
08:29 $ cat > test.py
data = open('a_file')
for line in data:
new_file = open(line, 'w')
new_file.close()
08:31 $ ls
a_file test.py
08:31 $ python test.py
08:31 $ ls
a? a_file b? c? test.py
08:31 $ ls -b
a\n a_file b\n c\n test.py
08:31 $
As you can see, I read lines from a file and create some files using
line as the filename and guess what, the filenames as listed by ls
have a ? at the end — but we can do better, as it's explained in the
fine manual page of ls
-b, --escape
print C-style escapes for nongraphic characters
and, as you can see in the output of ls -b, the filenames are not
terminated by a question mark (it's just a placeholder used by default
by the ls program) but are terminated by a newline character.
While I'm at it, I have to say that you should avoid to use a
temporary file to store the intermediate results of your computation.
A nice feature of Python is the presence of generator expressions,
if you want you can write your code as follows
import wget
# you matched on a '200' on the whole line, I assume that what
# you really want is to match a specific column, the 'error_column'
# that I symbolically load from an external resource
from my_constants import error_column, payload_column
# here it is a sequence of generator expressions, each one relying
# on the previous one
# 1. the lines in the file, stripped from the white space
# on the right (the newline is considered white space)
# === not strictly necessary, just convenient because
# === below we want to test for non-empty lines
lines = (line.rstrip() for line in open('whatever.csv'))
# 2. the lines are converted to a list of 'tokens'
all_tokens = (line.split() for line in lines if line)
# 3. for each 'tokens' in the 'all_tokens' generator expression, we
# check for the code '200' and possibly generate a new target
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')
# eventually, use the 'targets' generator to proceed with the downloads
for target in targets: wget.download(target)
Don't be fooled by the amount of comments, w/o comments my code is just
import wget
from my_constants import error_column
lines = (line.rstrip() for line in open('whatever.csv'))
all_tokens = (line.split() for line in lines if line)
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')
for target in targets: wget.download(target)

Related

How to check if a block of lines has a particular keyword using python?

I am checking a text file with blocks of commands as following -
File start -
!
interface Vlan100
description XYZ
ip vrf forwarding XYZ
ip address 10.208.56.62 255.255.255.192
!
interface Vlan101
description ABC
ip vrf forwarding ABC
ip address 10.208.55.126 255.255.255.192
no ip redirects
no ip unreachables
no ip proxy-arp
!
File End
and I want to create a txt file where if in source file I am getting a pattern vrf forwarding ABC output should be interface Vlan101
as of now what I have done following script but it showing only the line which contains the pattern.
import re
f = open("output_file.txt","w") #output file to be generated
shakes = open("input_file.txt","r") #input file to read
for lines in shakes:
if re.match("(.*)ABC(.*)",lines):
f.write(lines)
f.close()
Easiest: read the file, cut where ! is, then for each of those, if there's the desired text, get the first line:
with open("input_file.txt") as r, open("output_file.txt", "w") as w:
txt = r.read()
result = [block.strip().split("\n")[0]
for block in txt.split('!')
if 'vrf forwarding ABC' in block]
w.write("\n".join(result))
Just to be clear, I imagine that you want to replace any instances of "interface Vlan101" with "vrf forwarding ABC". In this case, I had test.txt as the input file and out.txt as the output file with all the replaced instances as was needed. I used a list comprehension--with a list string method-- to replace the substrings of "interface Vlan101" with "vrf forwarding ABC".
with open("test.txt") as f:
lines = f.readlines()
new_lines = [line.replace("interface Vlan101", "vrf forwarding ABC" for line in lines]
with open("out.txt", "w") as f1:
f1.writelines(new_lines)
Hope this helps.
If you are just interested in the interface, you can do following as well.
#Read File
with open('sample.txt', 'r') as f:
lines = f.readlines()
#Capture 'interfaces'
interfaces = [i for i in lines if i.strip().startswith('inter')]
#Write it to a file
with open('output.txt', 'w') as f:
f.writelines(interfaces)
With your code you are going through the document line by line.
If you want to parse blocks (between "!"-signs) you could split the blocks into lines first (though if it's a really large document, you may need to consider something else as this will read the entire document into memory)
import re
f = open("output_file.txt","w") #output file to be generated
source = open("input_file.txt","r") #input file to read
lines = "".join(source) #creates a string from the document
shakes = lines.replace("\n","").replace("! ","\n")
# remove all newlines and create new ones from "!"-block delimiter
# retrieve all text before "vrf forwarding ABC"
finds = re.findall("(.*)vrf forwarding ABC",shakes)
# return start of line
# if the part you want is the same length in all,
# then you could use find[:17] instead of
# find to get only the beginning. otherwise you need to modify your
# regex to only take the first 2 words of the line.
for find in finds:
f.write(find)
f.close()
Alternatively, if you want to use match per line, you can do the same as above, however instead of replacing "!" with new line, you can just split it, and then use the previous code and go line by line.
Hope this helps!

python replacement for xml

I have file, FF_tuningConfig_AMPKi.xml, contains of records such as:
<KiConfig active="%{active}" id="AMP_RET_W_LIN_SUSPICIOUS_MULTIPLE_LOGIN_IN_SHORT_PERIOD$KiConfig"/>
<KiConfig active="%{active}" id="AMP_RET_W_LIN_UNUSUAL_SESSION_HOUR_OF_DAY$KiConfig"/>
I have the following code:
def replace_content(path,se,search,String_Replace):
for root, dirs, files in os.walk(path):
for filename in files:
if((se in filename)):
file=open(os.path.join(root, filename),'r')
lines = file.readlines()
file=open(os.path.join(root, filename),'w')
for line in lines:
if search in line:
#print "found="+line
words=line.split('=')
# print words
# print "line=" + words[0] +"="+ "8\n"
line=line.replace(line,String_Replace)
#print "after="+line
file.write(line)
file.close()
print (os.path.join(root,filename) + " was replaced")
replace_content(Path,'FF_tuningConfig_AMPKi.xml','<KiConfig active="%{active}"','<KiConfig active="true"')
I am getting the below:
active="true" <Thresholds>
Instead of:
<KiConfig active="true" id="AMP_RET_W_LIN_UNUSUAL_SESSION_HOUR_OF_DAY$KiConfig"/>
Your problem is with line=line.replace(line,String_Replace). Take a look at the documentation for str.replace()
line = line.replace(search,String_Replace)
To test your code, you could have written a separate script with only the part that seemed to be failing.
# test input
s = '''<KiConfig active="%{active}" id="AMP_RET_W_LIN_SUSPICIOUS_MULTIPLE_LOGIN_IN_SHORT_PERIOD$KiConfig"/>
<KiConfig active="%{active}" id="AMP_RET_W_LIN_UNUSUAL_SESSION_HOUR_OF_DAY$KiConfig"/>'''
lines = s.split('\n')
# parameters
search, String_Replace = '<KiConfig active="%{active}"','<KiConfig active="true"'
# Then the part of your code that seems to be failing
for line in lines:
if search in line:
line = line.replace(line, String_Replace)
print(line)
That lets you focus on the problem and makes it easy and fast to modify then test your code. Once you have that functionality working, copy and paste it into your working code. If that part of your code actually works then you have eliminated it as a source for errors and you can test other parts.
As an aside, no need to test if your search string is in the line before attempting to replace. If the search string isn't in the line, str.replace() will return the line without modification.

Python Find script

I have a script that looks into a .txt file like this:
house.txt:
1289
534
9057
12873
(every line is meant to be a "CODE" for a product)
and it looks for a filename with that code in a given folder and copies it to another folder.
Everything works fine, except if this happens:
0001_filename_blablalba.jpg
00011 filename.jpg
000123Filename.jpg
I want to copy the file with the string "0001" but the script copies all the above because indeed they have 0001, but it's not the whole code.
Here's my script:
import subprocess
with open('CASA.txt','r') as f:
lines = [line.rstrip('\n') for line in f]
for ID in lines:
id_produto = str(ID+'*')
command = "find . -maxdepth 1 -name '%s' -exec ditto -v {} ./imagenss/ \;"%id_produto
print "A copiar: %s"%id_produto
proc = subprocess.Popen(command,shell=True,stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
Is there a simple way to do this?
You are somewhat mixing Python and shellsripting - but still you could try another filename pattern:
Instead of
id_produto = str(ID+'*')
try
id_produto = str(ID+'[!0-9]*')
This will match anything that starts with the ID followed by anything else but a number.
If you want to do a pythonic way, use the package glob for filename matching and os for copying ...
You can try to use the "split" function to split your filename string. You'll have to analyze the remaining part of the string and see if there are digits left (i.e. the ID only match a part of the complete ID of the file) or if there are no digits left (i.e. you found the full ID so you can copy the file):
completeFilename = '12345_filename.jpg'
ID = '123'
fileName = completeFilename.split(ID)[1]
if fileName[0].isdigit():
#There are some digit left, so this file should not be copied
else:
#No digits left, copy this file
It's better to use python code instead of shell command to find the file.
import os
def get_base(filename):
'get the "code" for a filename'
out=''
for char in filename:
if char.isdigit():
out+=char
else:
return out
with open('path/to/the/txt_file.txt','r') as f:
lines=f.splitlines()
files=os.listdir('path/to/the/folder')
files_dict={get_base(x):x for x in files}
for line in lines:
print('copy %s'%files_dict.get(line,None))

Python 2.7 Loop through multiple subprocess.check_output calls

I am having an issue with printing output from subprocess.check_output calls.
I have a list of IP addresses in ip.txt that I read from and save to list ips.
I then iterate over that list and call wmic command to get some details from that machine, however only the last command called prints output. By looking at CLI output, I can see that print 'Complete\n' is called for each, but check_output is not returning anything to output variable.
Any ideas? Thanks
Python Code:
from subprocess import check_output
f_in = open('ip.txt', 'r')
ips = []
for ip in f_in:
ips.append(ip)
f_in.close()
f_out = open('pcs.txt','w')
for ip in ips:
cmd = 'wmic /node:%s computersystem get name,username' % (ip)
f_out.write('Trying %s\n'%ip)
print 'Trying: %s' % (ip)
try:
output = check_output(cmd,shell=True)
f_out.write(output)
print 'Output\n--------\n%s' % output
print 'Complete\n'
except:
f_out.write('Could not complete wmic call... \n\n')
print 'Failed\n'
f_out.close()
File Output:
Trying 172.16.5.133
Trying 172.16.5.135
Trying 172.16.5.98
Trying 172.16.5.131
Name UserName
DOMAINWS48 DOMAIN\staff
CLI Output
Trying: 172.16.5.133
Output
Complete
Trying: 172.16.5.135
Output
Complete
Trying: 172.16.5.98
Output
Complete
Trying: 172.16.5.131
Output
Name UserName
DOMAINWS48 DOMAIN\staff
Complete
In these lines you read a file line by line:
f_in = open('ip.txt', 'r')
ips = []
for ip in f_in:
ips.append(ip)
Unfortunately each line has an end of line character still terminating each line. You then pass the newline in as part of the IP address. You might want to consider stripping the newlines \n from the end of each line you read:
f_in = open('ip.txt', 'r')
ips = []
for ip in f_in:
ips.append(ip.strip('\n'))
strip('\n') will strip all the newlines from the beginning and end of the string. Information on this string method can be found in the Python documentation:
str.strip([chars])
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped:
You can also read all the lines from the file with something like:
ips = [line.strip('\n') for line in f_in.readlines()]
My guess is that your ip.txt file has an IP address on each line and the last line of the file is not terminated with a newline \n and in that case your code worked.

How does one parse a .lua file with Python and pull out the require statements?

I am not very good at parsing files but have something I would like to accomplish. The following is a snippet of a .lua script that has some require statements. I would like to use Python to parse this .lua file and pull the 'require' statements out.
For example, here are the require statements:
require "common.acme_1"
require "common.acme_2"
require "acme_3"
require "common.core.acme_4"
From the example above I would then like to split the directory from the required file. In the example 'require "common.acme_1"' the directory would be common and the required file would be acme_1. I would then just add the .lua extention to acme_1. I need this information so I can validate if the file exists on the file system (which I know how to do) and then against luac (compiler) to make sure it is a valid lua file (which I also know how to do).
I simply need help pulling these require statements out using Python and splitting the directory name from the filename.
You can do this with built in string methods, but since the parsing is a little bit complicated (paths can be multi-part) the simplest solution might be to use regex. If you're using regex, you can do the parsing and splitting using groups:
import re
data = \
'''
require "common.acme_1"
require "common.acme_2"
require "acme_3"
require "common.core.acme_4"
'''
finds = re.findall(r'require\s+"(([^."]+\.)*)?([^."]+)"', data, re.MULTILINE)
print [dict(path=x[0].rstrip('.'),file=x[2]) for x in finds]
The first group is the path (including the trailing .), the second group is the inner group needed for matching repeated path parts (discarded), and the third group is the file name. If there is no path you get path=''.
Output:
[{'path': 'common', 'file': 'acme_1'}, {'path': 'common', 'file': 'acme_2'}, {'path': '', 'file': 'acme_3'}, {'path': 'common.core', 'file': 'acme_4'}]
Here ya go!
import sys
import os.path
if len(sys.argv) != 2:
print "Usage:", sys.argv[0], "<inputfile.lua>"
exit()
f = open(sys.argv[1], "r")
lines = f.readlines()
f.close()
for line in lines:
if line.startswith("require "):
path = line.replace('require "', '').replace('"', '').replace("\n", '').replace(".", "/") + ".lua"
fName = os.path.basename(path)
path = path.replace(fName, "")
print "File: " + fName
print "Directory: " + path
#do what you want to each file & path here
Here's a crazy one-liner, not sure if this was exactly what you wanted and most certainly not the most optimal one...
In [270]: import re
In [271]: [[s[::-1] for s in rec[::-1].split(".", 1)][::-1] for rec in re.findall(r"require \"([^\"]*)", text)]
Out[271]:
[['common', 'acme_1'],
['common', 'acme_2'],
['acme_3'],
['common.core', 'acme_4']]
This is straight forward
One liners are great but they take too much effort to understand early and this is not a job for using regular expressions in my opinion
mylines = [line.split('require')[-1] for line in open(mylua.lua).readlines() if line.startswith('require')]
paths = []
for line in mylines:
if 'common.' in line:
paths.append('common, line.split('common.')[-1]
else:
paths.append('',line)
You could use finditer:
lua='''
require "common.acme_1"
require "common.acme_2"
require "acme_3"
require 'common.core.acme_4'
'''
import re
print [m.group(2) for m in re.finditer(r'^require\s+(\'|")([^\'"]+)(\1)', lua, re.S | re.M)]
# ['common.acme_1', 'common.acme_2', 'acme_3', 'common.core.acme_4']
Then just split on the '.' to split into paths:
for e in [m.group(2) for m in re.finditer(r'^require\s+(\'|")([^\'"]+)(\1)', lua, re.S | re.M)]:
parts=e.split('.')
if parts[:-1]:
print '/'.join(parts[:-1]), parts[-1]
else:
print parts[0]
Prints:
common acme_1
common acme_2
acme_3
common/core acme_4
file = '/path/to/test.lua'
def parse():
with open(file, 'r') as f:
requires = [line.split()[1].strip('"') for line in f.readlines() if line.startswith('require ')]
for r in requires:
filename = r.replace('.', '/') + '.lua'
print(filename)
The with statement opens the file in question. The next line creates a list of all lines that start with 'require ' and splits them, ignoring the 'require' and grabbing only the last part and strips off the double quotes. Then go though the list and replace the dots with slashes and appends '.lua'. The print statement shows the results.

Categories

Resources