grep bash command into Python script - python

There is a bash command, I am trying to convert the logic into python. But I don't know what to do, I need some help with that.
bash command is this :
ls
ls *
TODAY=`date +%Y-%m-%d`
cd xx/xx/autotests/
grep -R "TEST_F(" | sed s/"TEST_F("/"#"/g | cut -f2 -d "#" | while read LINE
The logic is inside a directory, reads the filename one by one, includes all sub-folders, then lists the file matches. Any help here will be much appreciated
I tried something like follows, but it is not what I would like to have. And there are some subfolders inside, which the code is not reading the file inside them
import fnmatch
import os
from datetime import datetime
time = datetime.now()
dir_path = "/xx/xx/autotests"
dirs = os.listdir(dir_path)
TODAY = time.strftime("%Y-%m-%d")
filesOfDirectory = os.listdir(dir_path)
print(filesOfDirectory)
pattern = "TEST_F("
for file in filesOfDirectory:
if fnmatch.fnmatch(file, pattern):
print(file)

Use os.walk() to scan the directory recursively.
Open each file, loop through the lines of the file looking for lines with "TEST_F(". Then extract the part of the line after that (that's what sed and cut are doing).
for root, dirs, files in os.walk(dir_path):
for file in files:
with open(os.path.join(root, file)) as f:
for line in f:
if "TEST_F(" in line:
data = line.split("TEST_F(")[1]
print(data)

Related

How to convert multiple .doc files to .docx using antiword?

This manual command is working:
!antiword "test" > "test.docx"
but the following script convert files to empty .docx files:
for file in os.listdir(directory):
subprocess.run(["bash", "-c", "antiword \"$1\" > \"$1\".docx", "_", file])
also it stores the .docx file in the previous directly e-g file is in \a\b this command will store the files to \a
I have tried many different ways including running directly on terminal adn bash loops. ony the manual way works.
Something like this should work (adjust dest_path etc. accordingly).
import os
import shlex
for filename in os.listdir(directory):
if ".doc" not in filename:
continue
path = os.path.join(directory, filename)
dest_path = os.path.splitext(path)[0] + ".txt"
cmd = "antiword %s > %s" % (shlex.quote(path), shlex.quote(dest_path))
print(cmd)
# If the above seems to print correct commands, add:
# os.system(cmd)

Copy files from one folder to another with matching names in .txt file

I want to copy files from one big folder to another folder based on matching file names in a .txt file.
My list.txt file contains file names:
S001
S002
S003
and another big folder contains many files for ex. S001, S002, S003, S004, S005.
I only want to copy the files from this big folder that matches the file names in my list.txt file.
I have tried Bash, Python - not working.
for /f %%f in list.txt do robocopy SourceFolder/ DestinationFolder/ %%f
is not working either.
My logic in Python is not working:
import os
import shutil
def main():
destination = "DestinationFolder/copy"
source = "SourceFolder/MyBigData"
with open(source, "r") as lines:
filenames_to_copy = set(line.rstrip() for line in lines)
for filenames in os.walk(destination):
for filename in filenames:
if filename in filenames_to_copy:
shutil.copy(source, destination)
Any answers in Bash, Python or R?
Thanks
I think the issue with your Python code is that with os.walk() your filename will be a list everytime, which will not be found in your filenames_to_copy.
I'd recommend trying with os.listdir() instead as this will return a list of the names of filenames/folders as strings - easier to compare against your filenames_to_copy.
Other note - perhaps you want to do os.listdir() (or os.walk()) on the source instead of the destination. Currently, you're only copying files from the source to the destination if the file already exists in the destination.
os.walk() will return a tuple of three elements: the name of the current directory inspected, the list of folders in it, and the list of files in it. You are only interested in the latter. So your should iterate with:
for _, _, filenames in os.walk(destination):
As pointed out by JezMonkey, os.listdir() is easier to use as it will list of the files and folders in the requested directory. However, you will lose the recursive search that os.walk() enables. If all your files are in the same folder and not hidden in some folders, you'd rather use os.listdir().
The second problem I see in you code is that you copy source when I think you want to copy os.path.join(source, filename).
Can you publish the exact error you have with the Python script so that we can better help you.
UPDATE
You actually don't need to list all the files in the source folder. With os.path.exists you can check that the file exists and copy it if it does.
import os
import shutil
def main():
destination = "DestinationFolder/copy"
source = "SourceFolder/MyBigData"
with open("list.txt", "r") as lines: # adapt the name of the file to open to your exact location.
filenames_to_copy = set(line.rstrip() for line in lines)
for filename in filenames_to_copy:
source_path = os.path.join(source, filename)
if os.path.exists(source_path):
print("copying {} to {}".format(source_path, destination))
shutil.copy(source_path, destination)
Thank you #PySaad and #Guillaume for your contributions, although my script is working now: I added:
if os.path.exists(copy_to):
shutil.rmtree(copy_to)
shutil.copytree(file_to_copy, copy_to)
to the script and its working like a charm :)
Thanks a lot for your help!
You can try with below code -
import glob
big_dir = "~\big_dir"
copy_to = "~\copy_to"
copy_ref = "~\copy_ref.txt"
big_dir_files = [os.path.basename(f) for f in glob.glob(os.path.join(big_dir, '*'))]
print 'big_dir', big_dir_files # Returns all filenames from big directory
with open(copy_ref, "r") as lines:
filenames_to_copy = set(line.rstrip() for line in lines)
print filenames_to_copy # prints filename which you have in .txt file
for file in filenames_to_copy:
if file in big_dir_files: # Matches filename from ref.txt with filename in big dir
file_to_copy = os.path.join(big_dir, file)
copy_(file_to_copy, copy_to)
def copy_(source_dir, dest_dir):
files = glob.iglob(os.path.join(source_dir, '*'))
for file in files:
dest = os.path.join(dest_dir, os.path.basename(os.path.dirname(file)))
if not os.path.exists(dir_name):
os.mkdir(dest)
shutil.copy2(file, dest)
Reference:
https://docs.python.org/3/library/glob.html
If you want an overkill bash script/tool. Check https://github.com/jordyjwilliams/copy_filenames_from_txt out.
This can be invoked by ./copy_filenames_from_txt.sh -f ./text_file_with_names -d search_dir-o output_dir
The script can be summarised (without error handling/args etc) to:
cat $SEARCH_FILE | while read i; do
find $SEARCH_DIR -name "*$i*" | while read file; do
cp -r $file $OUTPUT_DIR/
done
done
The 2nd while loop here is not even strictly necessary... One could just pass $i to the cp. (eg a list of files if multiple matches) I just wanted to have each file handled separately for the tool I was writing...
To make this a bit nicer and add error handling...
The active part of the my tool in the repo is (ignore the color markers):
cat $SEARCH_FILE | while read i; do
# Handle non-matching file
if [[ -z $(find $SEARCH_DIR -name "*$i*") && $VERBOSE ]]; then
echo -e "❌: ${RED}$i${RESET} ||${YELLOW} No matching file found${RESET}"
continue
fi
# Handle case of more than one matching file
find $SEARCH_DIR -name "*$i*" | while read file; do
if [[ -n "$file" ]]; then # Check file matching
FILENAME=$(basename ${file}); DIRNAME=$(dirname "${file}")
if [[ -f $OUTPUT_DIR/"$FILENAME" ]]; then
if [[ $DIRNAME/ != $OUTPUT_DIR && $VERBOSE ]]; then
echo -e "📁: ${CYAN}$FILENAME ${GREEN}already exists${RESET} in ${MAGENTA}$OUTPUT_DIR${RESET}"
fi
else
if [[ $VERBOSE ]]; then
echo -e "✅: ${GREEN}$i${RESET} >> ${CYAN}$file${RESET} >> ${MAGENTA}$OUTPUT_DIR${RESET}"
fi
cp -r $file $OUTPUT_DIR/
fi
fi
done
done

Python - Read file names in directory, write twice to text file (one without file extensions), and separate with pipe

I have a directory (c:\temp) with some files:
a.txt
b.py
c.html
I need to read all of the files in a directory and output it to a text file. I've got that part handled (I think):
WD = "c:\\temp"
import glob
files = glob.glob('*.*')
with open('dirList.txt', 'w') as in_files:
for eachfile in files: in_files.write(eachfile + '\n')
I need the output to look like:
a|a.txt
b|b.py
c|c.html
I'm not quite sure where to look next.
I'd split the file name by . and take the first part:
for eachfile in files:
in_files.write('%s|%s\n' % (eachfile.split('.')[0], eachfile))
You have almost solved your problem. I am not quite sure where you are getting stuck. If all you need to write is the file name (without extension) followed by | then you just need to update your code as this:
import glob
files = glob.glob('*.*')
with open('dirList.txt', 'w') as in_files:
for eachfile in files:
file_name_without_extension = eachfile.split(".")[0]
in_files.write( file_name_without_extension + "|" + eachfile
+ '\n')

Removing six.b from multiple files

I have dozens of files in the project and I want to change all occurences of six.b("...") to b"...". Can I do that with some sort of regex bash script?
It's possible entirely in Python, But I would first make a backup of my project tree, and then:
import re
import os
indir = 'files'
for root, dirs, files in os.walk(indir):
for f in files:
fname = os.path.join(root, f)
with open(fname) as f:
txt = f.read()
txt = re.sub(r'six\.(b\("[^"]*"\))', r'\1', txt)
with open(fname, 'w') as f:
f.write(txt)
print(fname)
A relatively simple bash solution (change *.foo to *.py or whatever filename pattern suits your situation):
#!/bin/bash
export FILES=`find . -type f -name '*.foo' -exec egrep -l 'six\.b\("[^\"]*"\)' {} \; 2>/dev/null`
for file in $FILES
do
cp $file $file.bak
sed 's/six\.b(\(\"[^\"]*[^\\]\"\))/b\1/' $file.bak > $file
echo $file
done
Notes:
It will only consider/modify files that match the pattern
It will make a '.bak' copy of each file it modifies
It won't handle embedded \"), e.g. six.b("asdf\")"), but I don't know that there is a trivial solution to that problem, without knowing more about the files you're manipulating. Is the end of six.b("") guaranteed to be the last ") on the line? etc.

Read all files in directory and subdirectories in Python

I'm trying to translate this bash line in python:
find /usr/share/applications/ -name "*.desktop" -exec grep -il "player" {} \; | sort | while IFS=$'\n' read APPLI ; do grep -ilqw "video" "$APPLI" && echo "$APPLI" ; done | while IFS=$'\n' read APPLI ; do grep -iql "nodisplay=true" "$APPLI" || echo "$(basename "${APPLI%.*}")" ; done
The result is to show all the videos apps installed in a Ubuntu system.
-> read all the .desktop files in /usr/share/applications/ directory
-> filter the strings "video" "player" to find the video applications
-> filter the string "nodisplay=true" and "audio" to not show audio players and no-gui apps
The result I would like to have is (for example):
kmplayer
smplayer
vlc
xbmc
So, I've tried this code:
import os
import fnmatch
apps = []
for root, dirnames, filenames in os.walk('/usr/share/applications/'):
for dirname in dirnames:
for filename in filenames:
with open('/usr/share/applications/' + dirname + "/" + filename, "r") as auto:
a = auto.read(50000)
if "Player" in a or "Video" in a or "video" in a or "player" in a:
if "NoDisplay=true" not in a or "audio" not in a:
print "OK: ", filename
filename = filename.replace(".desktop", "")
apps.append(filename)
print apps
But I've a problem with the recursive files...
How can I fix it?
Thanks
Looks like you are doing os.walk() loop incorrectly. There is no need for nested dir loop.
Please refer to Python manual for the correct example:
https://docs.python.org/2/library/os.html?highlight=walk#os.walk
for root, dirs, files in os.walk('python/Lib/email'):
for file in files:
with open(os.path.join(root, file), "r") as auto:

Categories

Resources