Python Find script

Python Find script - python

I have a script that looks into a .txt file like this:
house.txt:
1289
534
9057
12873
(every line is meant to be a "CODE" for a product)
and it looks for a filename with that code in a given folder and copies it to another folder.
Everything works fine, except if this happens:
0001_filename_blablalba.jpg
00011 filename.jpg
000123Filename.jpg
I want to copy the file with the string "0001" but the script copies all the above because indeed they have 0001, but it's not the whole code.
Here's my script:
import subprocess
with open('CASA.txt','r') as f:
lines = [line.rstrip('\n') for line in f]
for ID in lines:
id_produto = str(ID+'*')
command = "find . -maxdepth 1 -name '%s' -exec ditto -v {} ./imagenss/ \;"%id_produto
print "A copiar: %s"%id_produto
proc = subprocess.Popen(command,shell=True,stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
Is there a simple way to do this?

You are somewhat mixing Python and shellsripting - but still you could try another filename pattern:
Instead of
id_produto = str(ID+'*')
try
id_produto = str(ID+'[!0-9]*')
This will match anything that starts with the ID followed by anything else but a number.
If you want to do a pythonic way, use the package glob for filename matching and os for copying ...

You can try to use the "split" function to split your filename string. You'll have to analyze the remaining part of the string and see if there are digits left (i.e. the ID only match a part of the complete ID of the file) or if there are no digits left (i.e. you found the full ID so you can copy the file):
completeFilename = '12345_filename.jpg'
ID = '123'
fileName = completeFilename.split(ID)[1]
if fileName[0].isdigit():
#There are some digit left, so this file should not be copied
else:
#No digits left, copy this file

It's better to use python code instead of shell command to find the file.
import os
def get_base(filename):
'get the "code" for a filename'
out=''
for char in filename:
if char.isdigit():
out+=char
else:
return out
with open('path/to/the/txt_file.txt','r') as f:
lines=f.splitlines()
files=os.listdir('path/to/the/folder')
files_dict={get_base(x):x for x in files}
for line in lines:
print('copy %s'%files_dict.get(line,None))

Related

python replacement for xml

I have file, FF_tuningConfig_AMPKi.xml, contains of records such as:
<KiConfig active="%{active}" id="AMP_RET_W_LIN_SUSPICIOUS_MULTIPLE_LOGIN_IN_SHORT_PERIOD$KiConfig"/>
<KiConfig active="%{active}" id="AMP_RET_W_LIN_UNUSUAL_SESSION_HOUR_OF_DAY$KiConfig"/>
I have the following code:
def replace_content(path,se,search,String_Replace):
for root, dirs, files in os.walk(path):
for filename in files:
if((se in filename)):
file=open(os.path.join(root, filename),'r')
lines = file.readlines()
file=open(os.path.join(root, filename),'w')
for line in lines:
if search in line:
#print "found="+line
words=line.split('=')
# print words
# print "line=" + words[0] +"="+ "8\n"
line=line.replace(line,String_Replace)
#print "after="+line
file.write(line)
file.close()
print (os.path.join(root,filename) + " was replaced")
replace_content(Path,'FF_tuningConfig_AMPKi.xml','<KiConfig active="%{active}"','<KiConfig active="true"')
I am getting the below:
active="true" <Thresholds>
Instead of:
<KiConfig active="true" id="AMP_RET_W_LIN_UNUSUAL_SESSION_HOUR_OF_DAY$KiConfig"/>

Your problem is with line=line.replace(line,String_Replace). Take a look at the documentation for str.replace()
line = line.replace(search,String_Replace)
To test your code, you could have written a separate script with only the part that seemed to be failing.
# test input
s = '''<KiConfig active="%{active}" id="AMP_RET_W_LIN_SUSPICIOUS_MULTIPLE_LOGIN_IN_SHORT_PERIOD$KiConfig"/>
<KiConfig active="%{active}" id="AMP_RET_W_LIN_UNUSUAL_SESSION_HOUR_OF_DAY$KiConfig"/>'''
lines = s.split('\n')
# parameters
search, String_Replace = '<KiConfig active="%{active}"','<KiConfig active="true"'
# Then the part of your code that seems to be failing
for line in lines:
if search in line:
line = line.replace(line, String_Replace)
print(line)
That lets you focus on the problem and makes it easy and fast to modify then test your code. Once you have that functionality working, copy and paste it into your working code. If that part of your code actually works then you have eliminated it as a source for errors and you can test other parts.
As an aside, no need to test if your search string is in the line before attempting to replace. If the search string isn't in the line, str.replace() will return the line without modification.

facing issue with "wget" in python

I am very novice to python. I am facing issue with "wget" as well as " urllib.urlretrieve(str(myurl),tail)"
when I run script it's downloading files but filename are ending with "?"
my complete code :
import os
import wget
import urllib
import subprocess
with open('/var/log/na/na.access.log') as infile, open('/tmp/reddy_log.txt', 'w') as outfile:
results = set()
for line in infile:
if ' 200 ' in line:
tokens = line.split()
results.add(tokens[6]) # 7th token
for result in sorted(results):
print >>outfile, result
with open ('/tmp/reddy_log.txt') as infile:
results = set()
for line in infile:
head, tail = os.path.split(line)
print tail
myurl = "http://data.xyz.com" + str(line)
print myurl
wget.download(str(myurl))
# urllib.urlretrieve(str(myurl),tail)
output :
# python last.py
0011400026_recap.xml
http://data.na.com/feeds/mobile/android/v2.0/video/games/high/0011400026_recap.xml
latest_1.xml
http://data.na.com/feeds/mobile/iphone/article/league/news/latest_1.xml
currenttime.js
Listing the files :
# ls
0011400026_recap.xml? currenttime.js? latest_1.xml? today.xml?

A possible explanation of the behaviour you experience is that you do
not sanitize your input line
with open ('/tmp/reddy_log.txt') as infile:
...
for line in infile:
...
myurl = "http://data.xyz.com" + str(line)
wget.download(str(myurl))
When you iterate on a file object, (for line in infile:) the string
you get is terminated by a newline ('\n') character — if you do not
remove the newline before using line, oh well, the newline character
is still there in what is produced by your use of line …
As an illustration of this concept, have a look at the transcript
of a test I've done
08:28 $ cat > a_file
a
b
c
08:29 $ cat > test.py
data = open('a_file')
for line in data:
new_file = open(line, 'w')
new_file.close()
08:31 $ ls
a_file test.py
08:31 $ python test.py
08:31 $ ls
a? a_file b? c? test.py
08:31 $ ls -b
a\n a_file b\n c\n test.py
08:31 $
As you can see, I read lines from a file and create some files using
line as the filename and guess what, the filenames as listed by ls
have a ? at the end — but we can do better, as it's explained in the
fine manual page of ls
-b, --escape
print C-style escapes for nongraphic characters
and, as you can see in the output of ls -b, the filenames are not
terminated by a question mark (it's just a placeholder used by default
by the ls program) but are terminated by a newline character.
While I'm at it, I have to say that you should avoid to use a
temporary file to store the intermediate results of your computation.
A nice feature of Python is the presence of generator expressions,
if you want you can write your code as follows
import wget
# you matched on a '200' on the whole line, I assume that what
# you really want is to match a specific column, the 'error_column'
# that I symbolically load from an external resource
from my_constants import error_column, payload_column
# here it is a sequence of generator expressions, each one relying
# on the previous one
# 1. the lines in the file, stripped from the white space
# on the right (the newline is considered white space)
# === not strictly necessary, just convenient because
# === below we want to test for non-empty lines
lines = (line.rstrip() for line in open('whatever.csv'))
# 2. the lines are converted to a list of 'tokens'
all_tokens = (line.split() for line in lines if line)
# 3. for each 'tokens' in the 'all_tokens' generator expression, we
# check for the code '200' and possibly generate a new target
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')
# eventually, use the 'targets' generator to proceed with the downloads
for target in targets: wget.download(target)
Don't be fooled by the amount of comments, w/o comments my code is just
import wget
from my_constants import error_column
lines = (line.rstrip() for line in open('whatever.csv'))
all_tokens = (line.split() for line in lines if line)
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')
for target in targets: wget.download(target)

Infinite Loop when opening file

Okay so I'm trying to write a script that takes two files and modifies the first before writing it into the destination fine, but whenever I run it, the script only prints the first modified line over and over again.
#3a
def modify(string):
"""Takes a string and returns a modified version of the string using two modifications. One must be a replacement of some kind.
string -> string"""
while string != "":
string = string.upper()
string = string.replace("A","4").replace("B","8").replace("C","<").replace("E","3").replace("G","6").replace("I","1").replace("O","0").replace("R","|2").replace("S","5").replace("T","7").replace("Z","2")
print(string)
#3b - asks the user to type in a source code filename and destination filename; opens the files; loops through the contents of the source file line-by-line, using modify() to modify eat line before writing it to the destination file; the closes both files.
source = input("What file would you like to use?")
destination = input("Where would you like it to go?")
filesource = ""
while filesource == "":
try:
file_source = open(source, "r")
file_destination = open(destination, "w")
for item in file_source:
mod = modify(item)
file_destination.write(mod)
file_source.close()
file_destination.close()
break
except IOError:
source = input("I'm sorry, something went wrong. Give me the source file again please?")
Any help?

Hint: if you run modify("TEST ME") what does it return?
add return string to the end of the modify function.

The string never empties - try parsing through it char by char using an index int and your conditional being while i < len(string)

How does one parse a .lua file with Python and pull out the require statements?

I am not very good at parsing files but have something I would like to accomplish. The following is a snippet of a .lua script that has some require statements. I would like to use Python to parse this .lua file and pull the 'require' statements out.
For example, here are the require statements:
require "common.acme_1"
require "common.acme_2"
require "acme_3"
require "common.core.acme_4"
From the example above I would then like to split the directory from the required file. In the example 'require "common.acme_1"' the directory would be common and the required file would be acme_1. I would then just add the .lua extention to acme_1. I need this information so I can validate if the file exists on the file system (which I know how to do) and then against luac (compiler) to make sure it is a valid lua file (which I also know how to do).
I simply need help pulling these require statements out using Python and splitting the directory name from the filename.

You can do this with built in string methods, but since the parsing is a little bit complicated (paths can be multi-part) the simplest solution might be to use regex. If you're using regex, you can do the parsing and splitting using groups:
import re
data = \
'''
require "common.acme_1"
require "common.acme_2"
require "acme_3"
require "common.core.acme_4"
'''
finds = re.findall(r'require\s+"(([^."]+\.)*)?([^."]+)"', data, re.MULTILINE)
print [dict(path=x[0].rstrip('.'),file=x[2]) for x in finds]
The first group is the path (including the trailing .), the second group is the inner group needed for matching repeated path parts (discarded), and the third group is the file name. If there is no path you get path=''.
Output:
[{'path': 'common', 'file': 'acme_1'}, {'path': 'common', 'file': 'acme_2'}, {'path': '', 'file': 'acme_3'}, {'path': 'common.core', 'file': 'acme_4'}]

Here ya go!
import sys
import os.path
if len(sys.argv) != 2:
print "Usage:", sys.argv[0], "<inputfile.lua>"
exit()
f = open(sys.argv[1], "r")
lines = f.readlines()
f.close()
for line in lines:
if line.startswith("require "):
path = line.replace('require "', '').replace('"', '').replace("\n", '').replace(".", "/") + ".lua"
fName = os.path.basename(path)
path = path.replace(fName, "")
print "File: " + fName
print "Directory: " + path
#do what you want to each file & path here

Here's a crazy one-liner, not sure if this was exactly what you wanted and most certainly not the most optimal one...
In [270]: import re
In [271]: [[s[::-1] for s in rec[::-1].split(".", 1)][::-1] for rec in re.findall(r"require \"([^\"]*)", text)]
Out[271]:
[['common', 'acme_1'],
['common', 'acme_2'],
['acme_3'],
['common.core', 'acme_4']]

This is straight forward
One liners are great but they take too much effort to understand early and this is not a job for using regular expressions in my opinion
mylines = [line.split('require')[-1] for line in open(mylua.lua).readlines() if line.startswith('require')]
paths = []
for line in mylines:
if 'common.' in line:
paths.append('common, line.split('common.')[-1]
else:
paths.append('',line)

You could use finditer:
lua='''
require "common.acme_1"
require "common.acme_2"
require "acme_3"
require 'common.core.acme_4'
'''
import re
print [m.group(2) for m in re.finditer(r'^require\s+(\'|")([^\'"]+)(\1)', lua, re.S | re.M)]
# ['common.acme_1', 'common.acme_2', 'acme_3', 'common.core.acme_4']
Then just split on the '.' to split into paths:
for e in [m.group(2) for m in re.finditer(r'^require\s+(\'|")([^\'"]+)(\1)', lua, re.S | re.M)]:
parts=e.split('.')
if parts[:-1]:
print '/'.join(parts[:-1]), parts[-1]
else:
print parts[0]
Prints:
common acme_1
common acme_2
acme_3
common/core acme_4

file = '/path/to/test.lua'
def parse():
with open(file, 'r') as f:
requires = [line.split()[1].strip('"') for line in f.readlines() if line.startswith('require ')]
for r in requires:
filename = r.replace('.', '/') + '.lua'
print(filename)
The with statement opens the file in question. The next line creates a list of all lines that start with 'require ' and splits them, ignoring the 'require' and grabbing only the last part and strips off the double quotes. Then go though the list and replace the dots with slashes and appends '.lua'. The print statement shows the results.

Python writing to file using stdout and fileinput

I have the following code, which modifies each line of the file test.tex by making a regular expression substitution.
import re
import fileinput
regex=re.compile(r'^([^&]*)(&)([^&]*)(&)([^&]*)')
for line in fileinput.input('test.tex',inplace=1):
print regex.sub(r'\3\2\1\4\5',line),
The only problem is that I only want the substitution to apply to certain lines in the file, and there's no way to define a pattern to select the correct lines. So, I want to display each line and prompt the user at the command line, asking whether to make the substitution at the current line. If the user enters "y", the substitution is made. If the user simply enters nothing, the substitution is not made.
The problem, of course, is that by using the code inplace=1 I've effectively redirected stdout to the opened file. So there's no way to show output (e.g. asking whether to make the substitution) to the command line that doesn't get sent to the file.
Any ideas?

The file input module is really for dealing with more than one input file.
You can use the regular open() function instead.
Something like this should work.
By reading the file then resetting the pointer with seek(), we can override the file instead of appending to the end, and so edit the file in-place
import re
regex = re.compile(r'^([^&]*)(&)([^&]*)(&)([^&]*)')
with open('test.tex', 'r+') as f:
old = f.readlines() # Pull the file contents to a list
f.seek(0) # Jump to start, so we overwrite instead of appending
for line in old:
s = raw_input(line)
if s == 'y':
f.write(regex.sub(r'\3\2\1\4\5',line))
else:
f.write(line)
http://docs.python.org/tutorial/inputoutput.html

Based on the help everyone provided, here's what I ended up going with:
#!/usr/bin/python
import re
import sys
import os
# regular expression
regex = re.compile(r'^([^&]*)(&)([^&]*)(&)([^&]*)')
# name of input and output files
if len(sys.argv)==1:
print 'No file specified. Exiting.'
sys.exit()
ifilename = sys.argv[1]
ofilename = ifilename+'.MODIFIED'
# read input file
ifile = open(ifilename)
lines = ifile.readlines()
ofile = open(ofilename,'w')
# prompt to make substitutions wherever a regex match occurs
for line in lines:
match = regex.search(line)
if match is not None:
print ''
print '***CANDIDATE FOR SUBSTITUTION***'
print '--: '+line,
print '++: '+regex.sub(r'\3\2\1\4\5',line),
print '********************************'
input = raw_input('Make subsitution (enter y for yes)? ')
if input == 'y':
ofile.write(regex.sub(r'\3\2\1\4\5',line))
else:
ofile.write(line)
else:
ofile.write(line)
# replace original file with modified file
os.remove(ifilename)
os.rename(ofilename, ifilename)
Thanks a lot!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Find script - python

Related

python replacement for xml

facing issue with "wget" in python

Infinite Loop when opening file

How does one parse a .lua file with Python and pull out the require statements?

Python writing to file using stdout and fileinput

Categories

Resources