FileNotFoundError with filepath that has whitespaces using ZipFile extract

FileNotFoundError with filepath that has whitespaces using ZipFile extract - python

I have a zip file with this structure:
Report
│
└───folder1
│ │
│ └───subfolder1
| |
│ │file 1 2022.txt
│
└───folder2
│ file2.txt
And their relative file paths are as follows: Report/folder1 / subfolder1 / file 1 2022.txt and Report/folder2/file2.txt
I tried to extract the zip file to another destination using the following code:
with ZipFile(attachment_filepath, 'r') as z:
z.extractall('Destination')
However, it gives me a FileNotFoundError: [Winerror 3] The system cannot find the path specified: 'C:\\Users\\myname\\Desktop\\Report\\folder1 \\ subfolder1 '
I can extract just file2.txt without any problems but trying to extract file 1 2022.txt gives me that error,presumably due to all the extra whitespaces

"folder1 " (note the space) isn't the same as "folder1" (no space). When passing a path, it has to be the exact path. You can't add whitespace between path separators because the file system will assume you want a path name with spaces. Whatever put those spaces into the path is the problem.

Related

How to get the output file by running Python file in Visual Studio Code?

I am beginner of Python user and select Visual Studio Code as editor. Recently I write down one Python file to identify all the files/directory name at the same level with and then output txt files to list down all the files/directory name that match my rule.
I remember in last month, when I run this Python file with Visual Studio Code, the output files will be seen at the parent folder(upper/previous level). But today, there is no output files after running this Python file with Visual Studio Code. Due to this reason, I double click the Python file directly to run it without Visual Studio Code and see the output files at the same level along with my Python file.
So my problems are:
How to ensure we can get the output files by running Python file with Visual Studio Code?
How to generate the output files at the same level along with Python file that would be run?
Code:
import os
CurrentScriptDir = os.path.dirname(os.path.realpath(__file__))
All_DirName = []
for root, dirs, files in os.walk(CurrentScriptDir):
for each_dir in dirs:
All_DirName.append(each_dir)
for Each_DirName in All_DirName:
Each_DirName_Split = Each_DirName.split('_')
if Each_DirName_Split[3] == 'twc':
unitname = "_".join(Each_DirName_Split[0:-1])
with open(unitname + ".txt", "a") as file:
file.write(Each_DirName + "_K3" + "\n")
file.close()
else:
next

tl;dr;
Below is my how I would write your program while adhering to the original code's code flow. Explanation follows this, and I can update this answer if you provide more details.
To avoid confusion with paths, I would suggest simply requiring the user to provide it when running the script. The path provided by the user is the path that gets scanned, and is also the location of all text files the script creates; cwd and location of script file is then irrelevant.
import os
import sys
# Usage:
# python Program.py <path>
def find_twc_folders(path):
for root, dirs, files in os.walk(path):
for dir in dirs:
parts = dir.split('_')
if len(parts) == 4 and parts[3] == 'twc': # 'a_twc', 'a_b_c_twc_d', etc are skipped
with open(os.path.join(path, dir[:-4] + '.txt'), 'a') as file: # substring with '_twc' removed
file.write(dir + '_K3\n')
if __name__ == '__main__':
if len(sys.argv) > 1:
find_twc_folders(sys.argv[1])
else:
find_twc_folders(os.path.dirname(os.path.realpath(__file__)))
(EDIT: Changed to use the script's directory if program is called with no args).
Folder setup:
Given the following directory setup, with your current working directory (cwd) in the VSCode terminal being one level above root:
PS C:\Users\there\source\repos\SO\75241788> tree /f
C:.
├───.vscode
└───root
│ Program.py
│
├───0_duplicate_path_twc
├───1a_one_two_three
│ ├───0_duplicate_path_twc
│ ├───2a_one_two_three
│ │ ├───0_duplicate_path_twc
│ │ ├───3a_one_two_three
│ │ └───3b_one_two_twc
│ └───2b_one_two_twc
├───1b_one_two_twc
│ ├───2a_one_two_three
│ ├───2b_one_two_three
│ ├───2c_one_two_twc
│ └───2d_one_two_twc
└───1c_one_two_twc
A dry run gives us the following, after replacing the actual file operations with print():
PS C:\Users\there\source\repos\SO\75241788> python root/Program.py
CurrentScriptDir: C:\Users\there\source\repos\SO\75241788\root
in "0_duplicate_path_twc" # <- in top level directory
in "1a_one_two_three"
in "1b_one_two_twc"
open 1b_one_two.txt
print: 1b_one_two_twc_K3\n
in "1c_one_two_twc"
open 1c_one_two.txt
print: 1c_one_two_twc_K3\n
in "0_duplicate_path_twc" # <- in sub level directory
in "2a_one_two_three"
# ...
In the current implementation, you are only pushing the directory name into your array, not the full path. A relative path that is unqualified will be considered rooted under the cwd by the OS, so your script will create all files at the location you see in your terminal to the left of the >.
Operating on folder names alone in this manner also means identical-named folders at different levels will result in multiple (duplicate) entries being added to the same file.
Code fixes
The final else in your program is unnecessary, as your for loop does that anyways. As mentioned by #rioV8, next is being used incorrectly here also. Also pointed out by him, there is no need to close the file in this case, since with does that for you.
As it stands, removing the unneeded All_DirName array, removing the last 3 lines previously mentioned, moving your join operation inline, and prepending your filepaths with CurrentScriptDir, result in:
import os
CurrentScriptDir = os.path.dirname(os.path.realpath(__file__))
for root, dirs, files in os.walk(CurrentScriptDir):
for each_dir in dirs:
Each_DirName_Split = each_dir.split('_')
# todo: check length > 3 first (or) compare last index instead
if Each_DirName_Split[3] == 'twc':
unitname = "_".join(Each_DirName_Split[0:-1])
with open(os.path.join(CurrentScriptDir, unitname) + '.txt'), 'a') as file:
file.write(each_dir + '_K3\n')
...And running it in the before-mentioned setup will walk all folders found in the folder the script is located in, saving all files to that same folder also.
EDIT: Added os.path.join(CurrentScriptDir, ...) in the previous code example to ensure the files are written next to the source program, regardless of current working directory.

Recursive Function to get all files from main folder and subdirectories inside it in Python

I have a file directory that looks something like this. I have a larger directory, but showing this one just for explanation purposes:
.
├── a.txt
├── b.txt
├── foo
│ └── w.txt
│ └── a.txt
└── moo
└── cool.csv
└── bad.csv
└── more
└── wow.csv
I want to write a recursive function to get year counts for files within each subdirectory within this directory.
I want the code to basically check if it's a directory or file. If it's a directory then I want to call the function again and get counts until there's no more subdirectories.
I have the following code (which keeps breaking my kernel when I test it). There's probably some logic error as well I would think..
import os
import pandas as pd
dir_path = 'S:\\Test'
def getFiles(dir_path):
contents = os.listdir(dir_path)
# check if content is directory or not
for file in contents:
if os.path.isdir(os.path.join(dir_path, file)):
# get everything inside subdirectory
getFiles(dir_path = os.path.join(dir_path, file))
# it's a file
else:
# do something to get the year of the file and put it in a list or something
# at the end create pandas data frame and return
Expected output would be a pandas dataframe that looks something like this..
Subdir 2020 2021 2022
foo 0 1 1
moo 0 2 0
more 1 0 0
How can I do this in Python?
EDIT:
Just realized os.walk() is probably extremely useful for my case here.
Trying to figure out a solution with os.walk() instead of doing it the long way..

Getting string between a character and a symbol, without caring about what follows the symbol

In order to order some files into folders, I have to get the number (as if it was some sort of ID) of both folders (named as p.X, p. fixed, being X a number that can range from 1 to 200150) and files (being PX_N.gmspr, where P is fixed, X is the ID number of the folder and N an identifier of the file, which can be 2,3,6,8,9,A and H).
An example would be p.24 and P24_2.gmspr, P24_3.gmspr, P24_6.gmspr, P24_8.gmspr, P24_9.gmspr, P24_A.gmspr and P24_H.gmspr, in order to move all P24_N.gmspr to p.24
The PX_N.gmspr files are in a different folder than the target folders p.X . A little of os.chdir and os.rename and the files can be moved easily so I believe that is not a problem.
What I want is to obtain the X number of the filename to compare with the folder number, forgetting about both the P and the _N.gmspr string.
Whereas I can obtain the folder number via
foldername.split(".",1)[1] I don't really know how to do it for the file number.
To sum up, I want to move some files called PX_N.gmspr to another folder identified almost the same p.X
Any idea? Thank you!!!
EDIT:
Regarding the answer given, I have to clarify myself about what I am trying to do, specially with the file and folder format:
Mother folder
├── Unclassified
│ └── All PX_N.gmspr being PX certain files that gotta be moved to another folders, X a number that ranges from 1 to 200150 (but not exactly 200150, is just a number ID) and N can be only 2, 3, 6, 9, A or H, nothing more. In total 15435 elements with each of the X having one of the 6 possibles N gmspr.
├──First Folder
│ └── p.X folders (X from 1 to 151), the aim is to select all the PX_N.gmspr files that agree with the X number that matches the p.X of the folder and move it to each folder.
├──Second Folder
│ └── p.X folders (X from 152 to 251, plus p.602 to p.628, p.823, p.824,
│ p.825, p.881 and p.882)
└──Third Folder
└── p.X folders (X from 252 to 386, plus p.585, p.586 and p. 587)
There are some other folders in order to order some more of the 15435 files.
I am currently searching about regex; unluckily for me, it is the first time I actually have to use them.
EDIT CAUSE SOLVED: SO THE POINT WAS TO PLAY WITH REGEX AND GETTING ONLY THE NUMBERS, BUT THEN AS NESTED LISTS APPEARED, ONLY THE FIRST NUMBER WAS USEFUL

This is the perfect job for regexes.
First, let's create a temporary dir and fill it with some files to demonstrate.
from pathlib import Path
from random import choices, randint
from string import ascii_letters
from tempfile import TemporaryDirectory
tmpdir = TemporaryDirectory()
for i in range(4):
n = randint(1, 999)
for i in range(randint(1, 5)):
Path(
tmpdir.name, f"P{n}.{''.join(choices(ascii_letters, k=10))}"
).touch()
Now we have 4 types of file (PN.), with between 1 and 5 files in this type.
Then, we just need to iterate through those file, extract the N from the file name with the regex P(\d+)\..+, and finally create destination dir and move the file.
from pathlib import Path
import re
dir_re = re.compile(r"P(\d+)\..+")
for filepath in Path(tmpdir.name).iterdir():
m = dir_re.match(filepath.name)
dirpath = filepath.parent / f"p.{m.group(1)}"
if not dirpath.is_dir():
dirpath.mkdir()
filepath.rename(dirpath / filepath.name)
For instance, from a flat temp directory, we have now the following sorted.
/var/folders/lf/z7ftpkws0vn7svq8n212czm40000gn/T/tmppve5_m1u/
├── p.413
│   └── P413.yJvxPtuzfz
├── p.705
│   ├── P705.DbwPyiFxum
│   ├── P705.FVwMuSqFms
│   ├── P705.PZyGIQEqSG
│   ├── P705.baRrkcNaZR
│   └── P705.tZKFTKwDah
├── p.794
│   ├── P794.CQTBgXOckQ
│   ├── P794.JNoKsUtgRU
│   └── P794.iSdrdohKYq
└── p.894
└── P894.XbzFxnqYOY
And finally, cleanup the temporary directory.
tmpdir.cleanup()

Tensorflow load many CSVs to `tf.data.Dataset` and use directory as label

I'm trying to load CSV data into a tensorflow Dataset object, but don't know how to associate the label with the CSV files given my directory structure.
I've got a directory structure like:
gesture_data/
├── train/
│ └── gesture{0001..9999}/ <- each directory name is the label
│ └── {timestamp}.txt <- each file is an observation associated with that label
├── test/
└── valid/
Despite having a .txt extension, all the files
gesture_data/{test,train,valid}/gesture{0001..9999}/*.txt are CSV files, with a format like:
│ File: train/gesture0002/2022-05-24T01:59:08.244689+02:00.txt
───────┼─────────────────────────────────────────────────────────────
1 │ 0,391,478,528,374,495,471,405,471,438,396,510,473,401,475,192,383,516,501,412,496,453,395,496,445,376,479,470,402,488,445
2 │ 19,402,488,514,371,494,471,407,472,441,390,514,475,406,488,185,395,499,496,399,488,451,409,490,463,382,490,467,403,487,467
3 │ 40,404,490,526,372,484,487,408,472,441,395,506,477,406,474,193,398,496,504,414,493,459,405,476,446,393,495,467,399,473,447
4 │ 56,400,491,525,370,479,486,386,457,439,383,511,466,406,473,192,398,505,503,411,476,450,412,494,461,389,491,467,397,483,392
5 │ 82,391,478,524,371,483,486,408,473,437,394,513,456,410,483,186,397,500,494,398,491,442,402,490,468,386,495,452,386,491,409
... about 200 more lines after this
Where the first value on a line is milliseconds since the start of recording, and after that are 30 sensor readings taken at that millisecond offset.
Each file is one observation, and the directory the file is in is the label of that observation. So all the files under gesture0001 should have the label gesture0001,all the files under gesture0002 should have the label gesture0002, and so on.
I can't see how to do that easily without making my own custom mapping, but this seems like a common data format and directory structure so I'd imagine there'd be an easier way to do it?
Currently I read in the files like:
gesture_ds = tf.data.experimental.make_csv_dataset(
file_pattern = "../gesture_data/train/*/*.txt",
header=False,
column_names=['millis'] + fingers, # `fingers` is an array labeling each of the sensor measurements
batch_size=10,
num_epochs=1,
num_parallel_reads=20,
shuffle_buffer_size=10000
)
But I don't know how to label the data from here. I found the label_name parameter to make_csv_dataset but that requires the label name to be one of the columns of the CSV file.
I can restructure the CSV file to include the label name as a column, but I'm expecting a lot of data and don't want to bloat the files if I can possibly help it.
Thanks!

Append Excel Files in Multiple Directories in Python

My goal is to append 9 excel files together that exist in different directories. I have a directory tree with the following structure:
Big Folder
|
├── folder_1/
| ├── file1.xls
| ├── file2.xls
| └── file3.xls
|
├── folder_2/
| ├── file4.xls
| ├── file5.xls
| └── file6.xls
|
├── folder_3/
| ├── file7.xls
| ├── file8.xls
| └── file9.xls
I successfully wrote a loop that appends file1, file2, and file3 together within folder_1. My idea is to nest this loop into another loop that flows through each folder as a list. I'm currently tring to us os.walk to accomplish this but am running into the following error in folder_1
[Errno 2 No such file or directory]
Do community members have recommendations on how to extend this loop to execute in each directory? Thanks!

It is hard for me to know how you have implemented the program without given some sort of code to work with, however I believe you have misused the os.walk() method, please read about it here.
I would use the os.walk() method the following way for getting the path to various files in a current directory and subdirectories.
import os
all_files = [(path, files) for path, dirs, files in os.walk(".")]
and then get all the files which ends with "*.xls" like so
all_xls_files = [
os.path.join(path, xls_file)
for (path, xls_files_list) in all_files
for xls_file in xls_files_list
if xls_file.endswith(".xls")
]
this is equivalent to
all_xls_files = []
for (path, xls_files_list) in all_files:
for xls_file in xls_files_list:
if xls_file.endswith(".xls"):
files.append(os.path.join(path, xls_file))
Once you obtain a list of excel files with their path
you can open them by
with open("my_output_file", "w") as output_file:
for file in all_xls_files:
with open(file) as f:
# Do your append here

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.