Python os.walk doesn't copy files from upper directory - python

I have some files which are generated in 2 different directory
D:\project\external\gerenateExternal
consist of : 1.txt, 2.txt, 3.txt
D:\project\main\generateMain
consisst of : a.txt, b.txt, 3.txt
I want to copy all files from that different directory to D:\project\main\targetDir
My python is in D:\project\main\copy.py
import os
import shutil
import os.path as path
pyfile = os.path.dirname(os.path.abspath(__file__))
pathExternal = os.path.abspath(pyfile + '\..\external\gerenateExternal')
pathMain = os.path.abspath(pyfile + '\generateMain')
targetdir = os.path.abspath(pyfile + '\targetDir')
for p in [ pathMain , pathExternal ]:
print(p)
for path, dirs, files in os.walk(p):
print(files)
for file in files:
if file.endswith(".txt"):
shutil.copy(os.path.join(path, file), targetdir)
The only files that can be copy is from pathMain
I found that any files in folder same level or below current python file (copy.py) can be copied
But if I have files from upper directory from current python file (copy.py) can't be copied.
How to copy files from upper directory from current python file ?

I don't quite understand why you are using os.walk: You know the 2 folders with the files, you could use them directly?
You could try the following:
from pathlib import Path
from shutil import copy
from itertools import chain
pyfile = Path(__file__).resolve().parent # Should be: D:\project\main
pathExternal = pyfile.parent / Path(r'external\generateExternal') # Should be: D:\project\external\gerenateExternal
pathMain = pyfile / Path('generateMain') # Should be: D:\project\main\generateMain
targetdir = pyfile / Path('targetDir') # Should be: D:\project\main\targetDir
targetdir.mkdir(exist_ok=True) # In case targetdir doesn't exist
for file in chain(pathExternal.glob('*.txt'), pathMain.glob('*.txt')):
copy(file, targetdir)
If you want something more os.walk-like you could replace the loop with
...
for file in pyfile.parent.rglob('*.txt'):
copy(file, targetdir)

The code should work just fine. You have a minor TypO in "gerenateExternal". Please check, if the actual directory has the same name.
In addition, for avoiding "\t" in '\targetDir' is interpreted as tab, I would suggest to escape the character, use a forward slash or join the directory, e.g.
targetdir = os.path.abspath(pyfile + '\\targetDir')

Related

Get files from specific folders in python

I have the following directory structure with the following files:
Folder_One
├─file1.txt
├─file1.doc
└─file2.txt
Folder_Two
├─file2.txt
├─file2.doc
└─file3.txt
I would like to get only the .txt files from each folder listed. Example:
Folder_One-> file1.txt and file2.txt
Folder_Two-> file2.txt and file3.txt
Note: This entire directory is inside a folder called dataset. My code looks like this, but I believe something is missing. Can someone help me.
path_dataset = "./dataset/"
filedataset = os.listdir(path_dataset)
for i in filedataset:
pasta = ''
pasta = pasta.join(i)
for file in glob.glob(path_dataset+"*.txt"):
print(file)
from pathlib import Path
for path in Path('dataset').rglob('*.txt'):
print(path.name)
Using glob
import glob
for x in glob.glob('dataset/**/*.txt', recursive=True):
print(x)
You can use re module to check that filename ends with .txt.
import re
import os
path_dataset = "./dataset/"
l = os.listdir(path_dataset)
for e in l:
if os.path.isdir("./dataset/" + e):
ll = os.listdir(path_dataset + e)
for file in ll:
if re.match(r".*\.txt$", file):
print(e + '->' + file)
One may use an additional option to check and find all files by using the os module (this is of advantage if you already use this module):
import os
#get current directory, you may also provide an absolute path
path=os.getcwd()
#walk recursivly through all folders and gather information
for root, dirs, files in os.walk(path):
#check if file is of correct type
check=[f for f in files if f.find(".txt")!=-1]
if check!=[]:print(root,check)

Copy all the files which begin with the same name to a different directory in python

My directory looks like below with few files.
Directory
--111_file.txt
--222_file.txt
--111_file2.txt
--222_sample.txt
I want to copy all the files that starts with 111 to a separate directory and 222 to a different directory. I am confused with how to traverse the directory and find files which start with the same name.
The following bash script copies all the files with pattern matching:
cp 111* dir1;
cp 222* dir2;
Hey in python you can use shutil lib.
For example:
import shutil
import os
prefix_1 = '111'
prefix_2 = '222'
curr_working_dir = os.getcwd()
target1 = 'traget_path_1'
target2 = 'target_path_2'
files = os.listdir() #Path which includes you source files
for file in files:
if prefix_1 in file:
shutil.copyfile(curr_working_dir+'/'+file,target_1)
elif prefix_2 in file:
shutil.copyfile(curr_working_dir+'/'+file,target_2)
else:
pass
Best regards
If you want to implement that in a program using Python, you can use the shutil module, e.g:
# importing shutil module
import shutil
# Source path
source = '/Users/path/to/source'
# Destination path
destination = '/Users/path/to/destination'
# Move the content of source to destination
dest = shutil.move(source, destination)
To search for files with a condition in directory and subdirectories recursively, you can use glob coupling to re:
import os
import re
from glob import glob
# Source path
source = '/Users/path/to/source'
files = glob(source + '/**', recursive=True) # '/**' and recurvise=True allow to search in subdirectories
files_to_move = [f for f in files if re.match('^\d', os.path.split(f)[1])] # '^\d' searchs for every files which start with a digit

shutil.move() only works with existing folder?

I would like to use the shutil.move() function to move some files which match a certain pattern to a newly created(inside python script)folder, but it seems that this function only works with existing folders.
For example, I have 'a.txt', 'b.txt', 'c.txt' in folder '/test', and I would like to create a folder '/test/b' in my python script using os.join() and move all .txt files to folder '/test/b'
import os
import shutil
import glob
files = glob.glob('./*.txt') #assume that we in '/test'
for f in files:
shutil.move(f, './b') #assume that './b' already exists
#the above code works as expected, but the following not:
import os
import shutil
import glob
new_dir = 'b'
parent_dir = './'
path = os.path.join(parent_dir, new_dir)
files = glob.glob('./*.txt')
for f in files:
shutil.move(f, path)
#After that, I got only 'b' in '/test', and 'cd b' gives:
#[Errno 20] Not a directory: 'b'
Any suggestion is appreciated!
the problem is that when you create the destination path variable name:
path = os.path.join(parent_dir, new_dir)
the path doesn't exist. So shutil.move works, but not like you're expecting, rather like a standard mv command: it moves each file to the parent directory with the name "b", overwriting each older file, leaving only the last one (very dangerous, because risk of data loss)
Create the directory first if it doesn't exist:
path = os.path.join(parent_dir, new_dir)
if not os.path.exists(path):
os.mkdir(path)
now shutil.move will create files when moving to b because b is a directory.

Keeping renamed text files in original folder

This is my current (from a Jupyter notebook) code for renaming some text files.
The issue is when I run the code, the renamed files are placed in my current working Jupyter folder. I would like the files to stay in the original folder
import glob
import os
path = 'C:\data_research\text_test\*.txt'
files = glob.glob(r'C:\data_research\text_test\*.txt')
for file in files:
os.rename(file, file[-27:])
You should only change the name and keep the path the same. Your filename will not always be longer than 27 so putting this into you code is not ideal. What you want is something that just separates the name from the path, no matter the name, no matter the path. Something like:
import os
import glob
path = 'C:\data_research\text_test\*.txt'
files = glob.glob(r'C:\data_research\text_test\*.txt')
for file in files:
old_name = os.path.basename(file) # now this is just the name of your file
# now you can do something with the name... here i'll just add new_ to it.
new_name = 'new_' + old_name # or do something else with it
new_file = os.path.join(os.path.dirname(file), new_name) # now we put the path and the name together again
os.rename(file, new_file) # and now we rename.
If you are using windows you might want to use the ntpath package instead.
file[-27:] takes the last 27 characters of the filename so unless all of your filenames are 27 characters long, it will fail. If it does succeed, you've stripped off the target directory name so the file is moved to your current directory. os.path has utilities to manage file names and you should use them:
import glob
import os
path = 'C:\data_research\text_test*.txt'
files = glob.glob(r'C:\data_research\text_test*.txt')
for file in files:
dirname, basename = os.path.split(file)
# I don't know how you want to rename so I made something up
newname = basename + '.bak'
os.rename(file, os.path.join(dirname, newname))

how to get a folder name and file name in python

I have a python program named myscript.py which would give me the list of files and folders in the path provided.
import os
import sys
def get_files_in_directory(path):
for root, dirs, files in os.walk(path):
print(root)
print(dirs)
print(files)
path=sys.argv[1]
get_files_in_directory(path)
the path i provided is D:\Python\TEST and there are some folders and sub folder in it as you can see in the output provided below :
C:\Python34>python myscript.py "D:\Python\Test"
D:\Python\Test
['D1', 'D2']
[]
D:\Python\Test\D1
['SD1', 'SD2', 'SD3']
[]
D:\Python\Test\D1\SD1
[]
['f1.bat', 'f2.bat', 'f3.bat']
D:\Python\Test\D1\SD2
[]
['f1.bat']
D:\Python\Test\D1\SD3
[]
['f1.bat', 'f2.bat']
D:\Python\Test\D2
['SD1', 'SD2']
[]
D:\Python\Test\D2\SD1
[]
['f1.bat', 'f2.bat']
D:\Python\Test\D2\SD2
[]
['f1.bat']
I need to get the output this way :
D1-SD1-f1.bat
D1-SD1-f2.bat
D1-SD1-f3.bat
D1-SD2-f1.bat
D1-SD3-f1.bat
D1-SD3-f2.bat
D2-SD1-f1.bat
D2-SD1-f2.bat
D2-SD2-f1.bat
how do i get the output this way.(Keep in mind the directory structure here is just an example. The program should be flexible for any path). How do i do this.
Is there any os command for this. Can you Please help me solve this? (Additional Information : I am using Python3.4)
You could try using the glob module instead:
import glob
glob.glob('D:\Python\Test\D1\*\*\*.bat')
Or, to just get the filenames
import os
import glob
[os.path.basename(x) for x in glob.glob('D:\Python\Test\D1\*\*\*.bat')]
To get what you want, you could do the following:
def get_files_in_directory(path):
# Get the root dir (in your case: test)
rootDir = path.split('\\')[-1]
# Walk through all subfolder/files
for root, subfolder, fileList in os.walk(path):
for file in fileList:
# Skip empty dirs
if file != '':
# Get the full path of the file
fullPath = os.path.join(root,file)
# Split the path and the file (May do this one and the step above in one go
path, file = os.path.split(fullPath)
# For each subfolder in the path (in REVERSE order)
subfolders = []
for subfolder in path.split('\\')[::-1]:
# As long as it isn't the root dir, append it to the subfolders list
if subfolder == rootDir:
break
subfolders.append(subfolder)
# Print the list of subfolders (joined by '-')
# + '-' + file
print('{}-{}'.format( '-'.join(subfolders), file) )
path=sys.argv[1]
get_files_in_directory(path)
My test folder:
SD1-D1-f1.bat
SD1-D1-f2.bat
SD2-D1-f1.bat
SD3-D1-f1.bat
SD3-D1-f2.bat
It may not be the best way to do it, but it will get you what you want.

Categories

Resources