fins and move files with path name with python3 - python

I am trying recreate python script from my perl script to find all files with common name model1_r.pdb and them move it to a new folder with newname of their previous existing folder.
This is the python code I wrote;
import os, shutil, glob
# This program will list all pdb file named (model1_r.pdb) in a given directory.
# inpout: directory name
# output: pdb files
source = input("enter filepath: ")
# build the command to look for model1_r.pdb
for file_path in glob.glob(os.path.join(source, '*model1_r.pdb')):
new_dir = file_path.rsplit('/', 1)[1]
find = os.system("find -type f -name \"model1_r.pdb\" ")
print(filepath)
destination = "/gs/gsfs0/users/kdhusia/dir_all_pdb/"
#Move the files to new folder based with filepath folder name
shutil.move(find, ios.path.join(destination, os.path.basename(filepath)))
Here is my perl script that works fine, but is slow
use strict;
use warnings;
use Data::Dumper;
use File::Basename;
# This program will list all pdb file named (model1_r.pdb) in a given directory.
# inpout: directory name
# output: pdb files
my $dir = $ARGV[0] || undef;
die "a directory is required\n" unless $dir;
chomp $dir;
# build the command to look for model1_r.pdb
my $cmd = "find $dir -type f -name \"model1_r.pdb\" ";
#my $cmd = "find $dir -type f -name \"*.pdb\" ";
print $cmd, "\n";
my #output = `$cmd`;
my $DEST = "/gs/gsfs0/users/kdhusia/dcomplex_all";
for my $line (#output) {
# remove CR from $line
chomp $line;
#print $line;
# get pair-name from line
my #pairs = split("\/", $line);
#print Dumper(\#pairs);
my $new_name = $pairs[2].".pdb";
#print "new_name = $new_name\n";
# copy model1_r.pdb to dinal destination
# change the name of pdb to "pair-name.pdb"
my $cmd = "cp $line $DEST/$new_name";
# my $cmd = "./dcomplex $pdbfile A B "
print "$cmd\n";
# do the copy
system($cmd);
}
I did the conversion using previous discussions at stack(Python: Moving files to folder based on filenames)
I am missing out something during the storing the filepath. Any comments would be highly appreciated.

Here is the reciprocal python3 code:
import os
import shutil
import glob
import argparse
# This program will list all pdb file named (model1_r.pdb) in a given directory.
# inpout: directory name
# output: pdb files
parser = argparse.ArgumentParser()
parser.add_argument("file", help="filename is required", type=str)
args = parser.parse_args()
print (args.file)
dest = "/gs/gsfs0/users/.../dir_all_pdb/"
# build the command to look for model1_r.pdb
#for f in os.path.listdir(source):
for root, dirs, files in os.walk(args.file):
for file in files:
if file.endswith("l1_r.pdb"):
#print (file)
print ("moving ...")
print(os.path.join(root, file)) # printing full pathname
# extratc the name ofthe pair. position 6 identifies tha name of the pair
pair_name = os.path.join(root, file).split("/")
name = pair_name[6]
# buildong final destination ful path name
f_name = dest + name+'.pdb'
print ("to ...", f_name)
src = os.path.join(root, file)
shutil.copy( src, f_name)
This takes path of directory to look for files name and then store it to the desination folder with the names of folder in which the file was found.

Related

Python os.walk from current directory

How can I edit this script, that it will run from the current directory. If I run the script as it is now, I get the error that it can not find the files that I have specified. My feeling is that os.walk is not searching in the subfolders of the current directory. I do not want to specify the path name, since I want to run this script in different directories.
To sum up; please help me to change this script, that it will run from the current directory and find the files that are in the subfolders of the current directory. Thanks!
import os
import csv
from itertools import chain
from collections import defaultdict
for root, dirs, files in os.walk('.'):
d1 = {}
with open (os.path.join(root, 'genes.gff.genespercontig.csv'), 'r') as f1:
for line in f1:
ta = line.split()
d1[ta[1]] = int(ta[0])
d2 = {}
with open(os.path.join(root, 'hmmer.analyze.txt.result.txt'), 'r') as f2:
for line in f2:
tb = line.split()
d2[tb[1]] = int(tb[0])
d3 = defaultdict(list)
for k, v in chain(d1.items(), d2.items()):
d3[k].append(v)
with open(os.path.join(root, 'output_contigsvsgenes.csv'), 'w+') as fnew:
writer = csv.writer(fnew)
for k,v in d3.items():
writer.writerow([k] + v)
import os
os.getcwd() #return the current working directory
so in your case the loop changes to :
for root, dirs, files in os.walk(os.getcwd()):
In your case you might also have to check whether the file exists or not :
if os.path.isfile(os.path.join(root, 'genes.gff.genespercontig.csv')):
with open (os.path.join(root, 'genes.gff.genespercontig.csv'), 'r') as f1:
for line in f1:
ta = line.split()
d1[ta[1]] = int(ta[0])
similarly for all other with as statements
I don't think the issue is working from the current directory, I think the issue is with the way you're using os.walk. You should check that the files exist before you start playing with them, and I think the error might occur because the first root folder is the current working directory. We can rearrange it into a function though, as follows:
import os
import csv
from itertools import chain
from collections import defaultdict
def get_file_values(find_files, output_name):
for root, dirs, files in os.walk(os.getcwd()):
if all(x in files for x in find_files):
outputs = []
for f in find_files:
d = {}
with open(os.path.join(root, f), 'r') as f1:
for line in f1:
ta = line.split()
d[ta[1]] = int(ta[0])
outputs.append(d)
d3 = defaultdict(list)
for k, v in chain(*(d.items() for d in outputs)):
d3[k].append(v)
with open(os.path.join(root, output_name), 'w+') as fnew:
writer = csv.writer(fnew)
for k, v in d3.items():
writer.writerow([k] + v)
get_file_values(['genes.gff.genespercontig.csv', 'hmmer.analyze.txt.result.txt'], 'output_contigsvsgenes.csv')
Not having your data I have been unable to test this, though I think it should work.
EDIT
To get the folder included in each row of the output csv files, we can just change our call to writer.writerow a little, to:
writer.writerow([root, k] + v)
Thus, the first column of each csv file created contains the name of the folder the values were obtained from.
You could use os.getcwd() to get the current directory (the one you're in when calling your script), but the better would be to pass the target directory as argument.
Within a Python script there are many options allowing deep retrospection for better orientation about the environment in which the script is running. The current directory is available via
os.getcwd()
You have in comments suggested, that the files to work on are not in the current directory but in the subdirectories. In this case adjust your script like this (move the entire block of your loop one level deeper into for dir in dirs: and adjust os.path.join() accordingly):
for root, dirs, files in os.walk(os.getcwd()):
for dir in dirs:
print(os.path.join(root, dir, 'genes.gff.genespercontig.csv'))
Just for the fun of it, below a short overview of some other useful insights into the environment a Python script runs within:
import __future__
import os, sys
print( "Executable running THIS script : { " + sys.executable + " }" )
print( "Full path file name of THIS script: { " + os.path.realpath(__file__) + " }" )
print( "Full path directory to THIS script: { " + os.path.dirname(os.path.abspath(__file__)) + " }" )
print( "Current working directory : { " + os.getcwd() + " }" )
print( "Has THIS file started Python? : { " + { True: "Yes", False: "No" }[(__name__ == "__main__")] + " }" )
print( "Which Python version is running? : { " + sys.version.replace("\n", "") + " }" )
print( "Which operating system is there? : { " + sys.platform + " }" )

Read all files in directory and subdirectories in Python

I'm trying to translate this bash line in python:
find /usr/share/applications/ -name "*.desktop" -exec grep -il "player" {} \; | sort | while IFS=$'\n' read APPLI ; do grep -ilqw "video" "$APPLI" && echo "$APPLI" ; done | while IFS=$'\n' read APPLI ; do grep -iql "nodisplay=true" "$APPLI" || echo "$(basename "${APPLI%.*}")" ; done
The result is to show all the videos apps installed in a Ubuntu system.
-> read all the .desktop files in /usr/share/applications/ directory
-> filter the strings "video" "player" to find the video applications
-> filter the string "nodisplay=true" and "audio" to not show audio players and no-gui apps
The result I would like to have is (for example):
kmplayer
smplayer
vlc
xbmc
So, I've tried this code:
import os
import fnmatch
apps = []
for root, dirnames, filenames in os.walk('/usr/share/applications/'):
for dirname in dirnames:
for filename in filenames:
with open('/usr/share/applications/' + dirname + "/" + filename, "r") as auto:
a = auto.read(50000)
if "Player" in a or "Video" in a or "video" in a or "player" in a:
if "NoDisplay=true" not in a or "audio" not in a:
print "OK: ", filename
filename = filename.replace(".desktop", "")
apps.append(filename)
print apps
But I've a problem with the recursive files...
How can I fix it?
Thanks
Looks like you are doing os.walk() loop incorrectly. There is no need for nested dir loop.
Please refer to Python manual for the correct example:
https://docs.python.org/2/library/os.html?highlight=walk#os.walk
for root, dirs, files in os.walk('python/Lib/email'):
for file in files:
with open(os.path.join(root, file), "r") as auto:

Creating a directory within directories using the python scripting language

Please find my python script below:
import os;
import sys;
dir_dst = sys.argv[1]
for x in range(150) :
dirname = str(x)
dst_dir = os.path.join(dir_dst, dirname)
dirname = "annotation"
dst = os.path.join(dst_dir, dirname)
print dst
if not os.path.exists(dst_dir):
os.mkdir(dst)
The aim is to create a directory called "annotation" within each of the numbered directories ranging as in the code above. This code doesn't do it and on printing the value of "dst", here's an example of what it shows:
NLP/test data/reconcile/0\annotation
NLP/test data/reconcile/1\annotation
How can this be resolved?
Change the second to last line to
if not os.path.exists(dst):
Right now you're checking if the original directory exists.

Moving files by starting letter in powershell, python or other scripting language running windows

I need a script than can recursively traverse c:\somedir\ and move files to c:\someotherdir\x\ - where x is the starting letter of the file.
Can anyone help?
Ended up with this one:
import os
from shutil import copy2
import uuid
import random
SOURCE = ".\\pictures\\"
DEST = ".\\pictures_ordered\\"
for path, dirs, files in os.walk(SOURCE):
for f in files:
print(f)
starting_letter = f[0].upper()
source_path = os.path.join(path, f)
dest_path = os.path.join(DEST, starting_letter)
if not os.path.isdir(dest_path):
os.makedirs(dest_path)
dest_fullfile = os.path.join(dest_path, f)
if os.path.exists(dest_fullfile):
periodIndex = source_path.rfind(".")
renamed_soruce_path = source_path[:periodIndex] + "_" + str(random.randint(100000, 999999)) + source_path[periodIndex:]
os.rename(source_path, renamed_soruce_path)
copy2(renamed_soruce_path, dest_path)
os.remove(renamed_soruce_path)
else:
copy2(source_path, dest_path)
os.remove(source_path)`
Here's a simple script that does what you want. It doesn't tell you anything about what it's doing, and will just overwrite the old file if there are two files with the same name.
import os
from shutil import copy2
SOURCE = "c:\\source\\"
DEST = "c:\\dest\\"
# Iterate recursively through all files and folders under the source directory
for path, dirs, files in os.walk(SOURCE):
# For each directory iterate over the files
for f in files:
# Grab the first letter of the filename
starting_letter = f[0].upper()
# Construct the full path of the current source file
source_path = os.path.join(path, f)
# Construct the destination path using the first letter of the
# filename as the folder
dest_path = os.path.join(DEST, starting_letter)
# Create the destination folder if it doesn't exist
if not os.path.isdir(dest_path):
os.makedirs(dest_path)
# Copy the file to the destination path + starting_letter
copy2(source_path, dest_path)
I suspect this will work in PowerShell.
gci -path c:\somedir -filter * -recurse |
where { -not ($_.PSIsContainer) } |
foreach { move-item -path $_.FullName -destination $_.Substring(0, 1) }
ls c:\somedir\* -recurse | ? { -not ($_.PSIsContainer)} | mv -destination "C:\someotherdir\$($_.Name.substring(0,1))" } ... -whatif :P
Here's an answer in Python, note that warning message, you may want to deal with overwrites differently. Also save this to a file in the root directory and run it there, otherwise you have to change the argument to os.walk and also how the paths are joined together.
import os
import sys
try:
letter = sys.argv[1]
except IndexError:
print 'Specify a starting letter'
sys.exit(1)
try:
os.makedirs(letter)
except OSError:
pass # already exists
for dirpath, dirnames, filenames in os.walk('.'):
for filename in filenames:
if filename.startswith(letter):
src = os.path.join(dirpath, filename)
dst = os.path.join(letter, filename)
if os.path.exists(dst):
print 'warning, existing', dst, 'being overwritten'
os.rename(src, dst)
sure, I'll help: look at os.path.walk in Python2, which I believe is simply os.walk in Python3.

Python file-io code listing current folder path instead of the specified

I have the code:
import os
import sys
fileList = os.listdir(sys.argv[1])
for file in fileList:
if os.path.isfile(file):
print "File >> " + os.path.abspath(file)
else:
print "Dir >> " + os.path.abspath(file)
Located in my music folder ("/home/tom/Music")
When I call it with:
python test.py "/tmp"
I expected it to list my "/tmp" files and folders with the full path. But it printed lines like:
Dir >> /home/tom/Music/seahorse-gw2jNn
Dir >> /home/tom/Music/FlashXX9kV847
Dir >> /home/tom/Music/pulse-DcIEoxW5h2gz
This is, the correct file names, but the wrong path (and this files are not either in my Music folder).. What's wrong with this code?
You need to include the full path to the file when you check for existence and print the path:
dir = sys.argv[1]
fileList = os.listdir(dir)
for file in fileList:
file = os.path.join(dir, file) # Get the full path to the file.
# etc...

Categories

Resources