Read all files in directory and subdirectories in Python

Read all files in directory and subdirectories in Python - python

I'm trying to translate this bash line in python:
find /usr/share/applications/ -name "*.desktop" -exec grep -il "player" {} \; | sort | while IFS=$'\n' read APPLI ; do grep -ilqw "video" "$APPLI" && echo "$APPLI" ; done | while IFS=$'\n' read APPLI ; do grep -iql "nodisplay=true" "$APPLI" || echo "$(basename "${APPLI%.*}")" ; done
The result is to show all the videos apps installed in a Ubuntu system.
-> read all the .desktop files in /usr/share/applications/ directory
-> filter the strings "video" "player" to find the video applications
-> filter the string "nodisplay=true" and "audio" to not show audio players and no-gui apps
The result I would like to have is (for example):
kmplayer
smplayer
vlc
xbmc
So, I've tried this code:
import os
import fnmatch
apps = []
for root, dirnames, filenames in os.walk('/usr/share/applications/'):
for dirname in dirnames:
for filename in filenames:
with open('/usr/share/applications/' + dirname + "/" + filename, "r") as auto:
a = auto.read(50000)
if "Player" in a or "Video" in a or "video" in a or "player" in a:
if "NoDisplay=true" not in a or "audio" not in a:
print "OK: ", filename
filename = filename.replace(".desktop", "")
apps.append(filename)
print apps
But I've a problem with the recursive files...
How can I fix it?
Thanks

Looks like you are doing os.walk() loop incorrectly. There is no need for nested dir loop.
Please refer to Python manual for the correct example:
https://docs.python.org/2/library/os.html?highlight=walk#os.walk
for root, dirs, files in os.walk('python/Lib/email'):
for file in files:
with open(os.path.join(root, file), "r") as auto:

Related

grep bash command into Python script

There is a bash command, I am trying to convert the logic into python. But I don't know what to do, I need some help with that.
bash command is this :
ls
ls *
TODAY=`date +%Y-%m-%d`
cd xx/xx/autotests/
grep -R "TEST_F(" | sed s/"TEST_F("/"#"/g | cut -f2 -d "#" | while read LINE
The logic is inside a directory, reads the filename one by one, includes all sub-folders, then lists the file matches. Any help here will be much appreciated
I tried something like follows, but it is not what I would like to have. And there are some subfolders inside, which the code is not reading the file inside them
import fnmatch
import os
from datetime import datetime
time = datetime.now()
dir_path = "/xx/xx/autotests"
dirs = os.listdir(dir_path)
TODAY = time.strftime("%Y-%m-%d")
filesOfDirectory = os.listdir(dir_path)
print(filesOfDirectory)
pattern = "TEST_F("
for file in filesOfDirectory:
if fnmatch.fnmatch(file, pattern):
print(file)

Use os.walk() to scan the directory recursively.
Open each file, loop through the lines of the file looking for lines with "TEST_F(". Then extract the part of the line after that (that's what sed and cut are doing).
for root, dirs, files in os.walk(dir_path):
for file in files:
with open(os.path.join(root, file)) as f:
for line in f:
if "TEST_F(" in line:
data = line.split("TEST_F(")[1]
print(data)

fins and move files with path name with python3

I am trying recreate python script from my perl script to find all files with common name model1_r.pdb and them move it to a new folder with newname of their previous existing folder.
This is the python code I wrote;
import os, shutil, glob
# This program will list all pdb file named (model1_r.pdb) in a given directory.
# inpout: directory name
# output: pdb files
source = input("enter filepath: ")
# build the command to look for model1_r.pdb
for file_path in glob.glob(os.path.join(source, '*model1_r.pdb')):
new_dir = file_path.rsplit('/', 1)[1]
find = os.system("find -type f -name \"model1_r.pdb\" ")
print(filepath)
destination = "/gs/gsfs0/users/kdhusia/dir_all_pdb/"
#Move the files to new folder based with filepath folder name
shutil.move(find, ios.path.join(destination, os.path.basename(filepath)))
Here is my perl script that works fine, but is slow
use strict;
use warnings;
use Data::Dumper;
use File::Basename;
# This program will list all pdb file named (model1_r.pdb) in a given directory.
# inpout: directory name
# output: pdb files
my $dir = $ARGV[0] || undef;
die "a directory is required\n" unless $dir;
chomp $dir;
# build the command to look for model1_r.pdb
my $cmd = "find $dir -type f -name \"model1_r.pdb\" ";
#my $cmd = "find $dir -type f -name \"*.pdb\" ";
print $cmd, "\n";
my #output = `$cmd`;
my $DEST = "/gs/gsfs0/users/kdhusia/dcomplex_all";
for my $line (#output) {
# remove CR from $line
chomp $line;
#print $line;
# get pair-name from line
my #pairs = split("\/", $line);
#print Dumper(\#pairs);
my $new_name = $pairs[2].".pdb";
#print "new_name = $new_name\n";
# copy model1_r.pdb to dinal destination
# change the name of pdb to "pair-name.pdb"
my $cmd = "cp $line $DEST/$new_name";
# my $cmd = "./dcomplex $pdbfile A B "
print "$cmd\n";
# do the copy
system($cmd);
}
I did the conversion using previous discussions at stack(Python: Moving files to folder based on filenames)
I am missing out something during the storing the filepath. Any comments would be highly appreciated.

Here is the reciprocal python3 code:
import os
import shutil
import glob
import argparse
# This program will list all pdb file named (model1_r.pdb) in a given directory.
# inpout: directory name
# output: pdb files
parser = argparse.ArgumentParser()
parser.add_argument("file", help="filename is required", type=str)
args = parser.parse_args()
print (args.file)
dest = "/gs/gsfs0/users/.../dir_all_pdb/"
# build the command to look for model1_r.pdb
#for f in os.path.listdir(source):
for root, dirs, files in os.walk(args.file):
for file in files:
if file.endswith("l1_r.pdb"):
#print (file)
print ("moving ...")
print(os.path.join(root, file)) # printing full pathname
# extratc the name ofthe pair. position 6 identifies tha name of the pair
pair_name = os.path.join(root, file).split("/")
name = pair_name[6]
# buildong final destination ful path name
f_name = dest + name+'.pdb'
print ("to ...", f_name)
src = os.path.join(root, file)
shutil.copy( src, f_name)
This takes path of directory to look for files name and then store it to the desination folder with the names of folder in which the file was found.

Copy files from one folder to another with matching names in .txt file

I want to copy files from one big folder to another folder based on matching file names in a .txt file.
My list.txt file contains file names:
S001
S002
S003
and another big folder contains many files for ex. S001, S002, S003, S004, S005.
I only want to copy the files from this big folder that matches the file names in my list.txt file.
I have tried Bash, Python - not working.
for /f %%f in list.txt do robocopy SourceFolder/ DestinationFolder/ %%f
is not working either.
My logic in Python is not working:
import os
import shutil
def main():
destination = "DestinationFolder/copy"
source = "SourceFolder/MyBigData"
with open(source, "r") as lines:
filenames_to_copy = set(line.rstrip() for line in lines)
for filenames in os.walk(destination):
for filename in filenames:
if filename in filenames_to_copy:
shutil.copy(source, destination)
Any answers in Bash, Python or R?
Thanks

I think the issue with your Python code is that with os.walk() your filename will be a list everytime, which will not be found in your filenames_to_copy.
I'd recommend trying with os.listdir() instead as this will return a list of the names of filenames/folders as strings - easier to compare against your filenames_to_copy.
Other note - perhaps you want to do os.listdir() (or os.walk()) on the source instead of the destination. Currently, you're only copying files from the source to the destination if the file already exists in the destination.

os.walk() will return a tuple of three elements: the name of the current directory inspected, the list of folders in it, and the list of files in it. You are only interested in the latter. So your should iterate with:
for _, _, filenames in os.walk(destination):
As pointed out by JezMonkey, os.listdir() is easier to use as it will list of the files and folders in the requested directory. However, you will lose the recursive search that os.walk() enables. If all your files are in the same folder and not hidden in some folders, you'd rather use os.listdir().
The second problem I see in you code is that you copy source when I think you want to copy os.path.join(source, filename).
Can you publish the exact error you have with the Python script so that we can better help you.
UPDATE
You actually don't need to list all the files in the source folder. With os.path.exists you can check that the file exists and copy it if it does.
import os
import shutil
def main():
destination = "DestinationFolder/copy"
source = "SourceFolder/MyBigData"
with open("list.txt", "r") as lines: # adapt the name of the file to open to your exact location.
filenames_to_copy = set(line.rstrip() for line in lines)
for filename in filenames_to_copy:
source_path = os.path.join(source, filename)
if os.path.exists(source_path):
print("copying {} to {}".format(source_path, destination))
shutil.copy(source_path, destination)

Thank you #PySaad and #Guillaume for your contributions, although my script is working now: I added:
if os.path.exists(copy_to):
shutil.rmtree(copy_to)
shutil.copytree(file_to_copy, copy_to)
to the script and its working like a charm :)
Thanks a lot for your help!

You can try with below code -
import glob
big_dir = "~\big_dir"
copy_to = "~\copy_to"
copy_ref = "~\copy_ref.txt"
big_dir_files = [os.path.basename(f) for f in glob.glob(os.path.join(big_dir, '*'))]
print 'big_dir', big_dir_files # Returns all filenames from big directory
with open(copy_ref, "r") as lines:
filenames_to_copy = set(line.rstrip() for line in lines)
print filenames_to_copy # prints filename which you have in .txt file
for file in filenames_to_copy:
if file in big_dir_files: # Matches filename from ref.txt with filename in big dir
file_to_copy = os.path.join(big_dir, file)
copy_(file_to_copy, copy_to)
def copy_(source_dir, dest_dir):
files = glob.iglob(os.path.join(source_dir, '*'))
for file in files:
dest = os.path.join(dest_dir, os.path.basename(os.path.dirname(file)))
if not os.path.exists(dir_name):
os.mkdir(dest)
shutil.copy2(file, dest)
Reference:
https://docs.python.org/3/library/glob.html

If you want an overkill bash script/tool. Check https://github.com/jordyjwilliams/copy_filenames_from_txt out.
This can be invoked by ./copy_filenames_from_txt.sh -f ./text_file_with_names -d search_dir-o output_dir
The script can be summarised (without error handling/args etc) to:
cat $SEARCH_FILE | while read i; do
find $SEARCH_DIR -name "*$i*" | while read file; do
cp -r $file $OUTPUT_DIR/
done
done
The 2nd while loop here is not even strictly necessary... One could just pass $i to the cp. (eg a list of files if multiple matches) I just wanted to have each file handled separately for the tool I was writing...
To make this a bit nicer and add error handling...
The active part of the my tool in the repo is (ignore the color markers):
cat $SEARCH_FILE | while read i; do
# Handle non-matching file
if [[ -z $(find $SEARCH_DIR -name "*$i*") && $VERBOSE ]]; then
echo -e "❌: ${RED}$i${RESET} ||${YELLOW} No matching file found${RESET}"
continue
fi
# Handle case of more than one matching file
find $SEARCH_DIR -name "*$i*" | while read file; do
if [[ -n "$file" ]]; then # Check file matching
FILENAME=$(basename ${file}); DIRNAME=$(dirname "${file}")
if [[ -f $OUTPUT_DIR/"$FILENAME" ]]; then
if [[ $DIRNAME/ != $OUTPUT_DIR && $VERBOSE ]]; then
echo -e "📁: ${CYAN}$FILENAME ${GREEN}already exists${RESET} in ${MAGENTA}$OUTPUT_DIR${RESET}"
fi
else
if [[ $VERBOSE ]]; then
echo -e "✅: ${GREEN}$i${RESET} >> ${CYAN}$file${RESET} >> ${MAGENTA}$OUTPUT_DIR${RESET}"
fi
cp -r $file $OUTPUT_DIR/
fi
fi
done
done

Removing six.b from multiple files

I have dozens of files in the project and I want to change all occurences of six.b("...") to b"...". Can I do that with some sort of regex bash script?

It's possible entirely in Python, But I would first make a backup of my project tree, and then:
import re
import os
indir = 'files'
for root, dirs, files in os.walk(indir):
for f in files:
fname = os.path.join(root, f)
with open(fname) as f:
txt = f.read()
txt = re.sub(r'six\.(b\("[^"]*"\))', r'\1', txt)
with open(fname, 'w') as f:
f.write(txt)
print(fname)

A relatively simple bash solution (change *.foo to *.py or whatever filename pattern suits your situation):
#!/bin/bash
export FILES=`find . -type f -name '*.foo' -exec egrep -l 'six\.b\("[^\"]*"\)' {} \; 2>/dev/null`
for file in $FILES
do
cp $file $file.bak
sed 's/six\.b(\(\"[^\"]*[^\\]\"\))/b\1/' $file.bak > $file
echo $file
done
Notes:
It will only consider/modify files that match the pattern
It will make a '.bak' copy of each file it modifies
It won't handle embedded \"), e.g. six.b("asdf\")"), but I don't know that there is a trivial solution to that problem, without knowing more about the files you're manipulating. Is the end of six.b("") guaranteed to be the last ") on the line? etc.

Moving files by starting letter in powershell, python or other scripting language running windows

I need a script than can recursively traverse c:\somedir\ and move files to c:\someotherdir\x\ - where x is the starting letter of the file.
Can anyone help?
Ended up with this one:
import os
from shutil import copy2
import uuid
import random
SOURCE = ".\\pictures\\"
DEST = ".\\pictures_ordered\\"
for path, dirs, files in os.walk(SOURCE):
for f in files:
print(f)
starting_letter = f[0].upper()
source_path = os.path.join(path, f)
dest_path = os.path.join(DEST, starting_letter)
if not os.path.isdir(dest_path):
os.makedirs(dest_path)
dest_fullfile = os.path.join(dest_path, f)
if os.path.exists(dest_fullfile):
periodIndex = source_path.rfind(".")
renamed_soruce_path = source_path[:periodIndex] + "_" + str(random.randint(100000, 999999)) + source_path[periodIndex:]
os.rename(source_path, renamed_soruce_path)
copy2(renamed_soruce_path, dest_path)
os.remove(renamed_soruce_path)
else:
copy2(source_path, dest_path)
os.remove(source_path)`

Here's a simple script that does what you want. It doesn't tell you anything about what it's doing, and will just overwrite the old file if there are two files with the same name.
import os
from shutil import copy2
SOURCE = "c:\\source\\"
DEST = "c:\\dest\\"
# Iterate recursively through all files and folders under the source directory
for path, dirs, files in os.walk(SOURCE):
# For each directory iterate over the files
for f in files:
# Grab the first letter of the filename
starting_letter = f[0].upper()
# Construct the full path of the current source file
source_path = os.path.join(path, f)
# Construct the destination path using the first letter of the
# filename as the folder
dest_path = os.path.join(DEST, starting_letter)
# Create the destination folder if it doesn't exist
if not os.path.isdir(dest_path):
os.makedirs(dest_path)
# Copy the file to the destination path + starting_letter
copy2(source_path, dest_path)

I suspect this will work in PowerShell.
gci -path c:\somedir -filter * -recurse |
where { -not ($_.PSIsContainer) } |
foreach { move-item -path $_.FullName -destination $_.Substring(0, 1) }

ls c:\somedir\* -recurse | ? { -not ($_.PSIsContainer)} | mv -destination "C:\someotherdir\$($_.Name.substring(0,1))" } ... -whatif :P

Here's an answer in Python, note that warning message, you may want to deal with overwrites differently. Also save this to a file in the root directory and run it there, otherwise you have to change the argument to os.walk and also how the paths are joined together.
import os
import sys
try:
letter = sys.argv[1]
except IndexError:
print 'Specify a starting letter'
sys.exit(1)
try:
os.makedirs(letter)
except OSError:
pass # already exists
for dirpath, dirnames, filenames in os.walk('.'):
for filename in filenames:
if filename.startswith(letter):
src = os.path.join(dirpath, filename)
dst = os.path.join(letter, filename)
if os.path.exists(dst):
print 'warning, existing', dst, 'being overwritten'
os.rename(src, dst)

sure, I'll help: look at os.path.walk in Python2, which I believe is simply os.walk in Python3.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read all files in directory and subdirectories in Python - python

Related

grep bash command into Python script

fins and move files with path name with python3

Copy files from one folder to another with matching names in .txt file

Removing six.b from multiple files

Moving files by starting letter in powershell, python or other scripting language running windows

Categories

Resources