Find Python files according to their content - python

Using File Explorer in Windows, we can find files by typing part of the file names in the Search box. With the Advance option, we even can find a file according to its content.
Is it possible to search Python files based on their content without "manually opening each file and viewing it in a viewer or editor program"? I use Jupyter Lab to create Python files.
For example, I want to find python files that contain dayfirst.
Thanks for help.

To enable content search using Windows Explorer, you can set up your Windows indexing options to include the contents of .py files. Here is a step by step guide:
https://www.howtogeek.com/99406/how-to-search-for-text-inside-of-any-file-using-windows-search/
Screenshot (for a batch file)...
(Also make sure that the location where you keep your .py files is in a location indexed by Windows.)

Take a look at pathlib.
Relevant points/example:
from pathlib import Path
p = Path('.')
files = [x for x in p.iterdir() if x.is_file()]
found_files = []
for file in files:
with file.open() as f:
for line in f:
if 'dayfirst' in line:
found_files.append(file)

Related

Modify all files in specified directory (including subfolders) and saving them in new directory while presevering folder structure (Python)

(I'm new to python so please excuse the probably trivial question. I tried my best looking for similar issues but suprisingly couldn't find someone with the same question.)
I'm trying to build a simple static site generator in Python. The script should take all .txt files in a specific directory (including subfolders), paste the content of each into a template .html file and then save all the newly generated .html files into a new directory while recreating the folder structure of the original directory.
So for I got the code which does the conversion itself for a single file but I'm unsure how to do it for multiple files in a directory.
with open('template/page.html', 'r') as template:
templatedata = template.read()
with open('content/content.txt', 'r') as content:
contentdata = content.read()
pagedata = templatedata.replace('!PlaceholderContent!', contentdata)
with open('www/content.html', 'w') as output:
output.write(pagedata)
To manipulate files and directories, you will need to import some system functionalites under the built-in module os.
import os
The functionalities under the os module include :
Listing the content of a directory :
path_to_template_dir = 'template/'
template_files = os.listdir(path_to_template_dir)
print(template_files)
# Outputs : ['page.html']
Creating a directory (If it does not already exist) :
path_to_output_dir = 'www/'
try :
os.mkdir(path_to_output_dir)
except FileExistsError as e:
print('Directory exists:', path_to_output_dir)
And since you know the names of the directories you want to use, and using these two functions, you now know the names of the files you want to use and generate, you can now concatenate the name of each file to the names of its directories to create the string str of the final file path, which you can then open() for reading and/or writing.
It's hard to give a perfect code example for your question since the logic of how you want to manipulate each of the template and content file is missing, but here is an example for writing a file inside the newly created directory :
path_to_output_file = path_to_output_dir + 'content.html'
with open(path_to_output_file, 'w') as output:
output.write('Content')
And an example for reading all the template files inside the template/ directory and then printing them to the screen.
for template_file in template_files:
path_to_template_file = path_to_template_dir + template_file
with open(path_to_template_file, 'r') as template:
print(template.read())
In the end, manipulating files is all about creating the path string you want to read from or write to, and then accessing it.
Anymore functionalities you might need (for example : checking if a path is a file os.path.isfile() or if it's for a directory os.path.isdir() can be found under the os module.

Duplicate in list created from filenames (python)

I'm trying to create a list of excel files that are saved to a specific directory, but I'm having an issue where when the list is generated it creates a duplicate entry for one of the file names (I am absolutely certain there is not actually a duplicate of the file).
import glob
# get data file names
path =r'D:\larvalSchooling\data'
filenames = glob.glob(path + "/*.xlsx")
output:
>>> filenames
['D:\\larvalSchooling\\data\\copy.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_70dpf_GroupA_n5_20200808_1015-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx']
you'll note 'D:\larvalSchooling\data\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx' is listed twice.
Rather than going through after the fact and removing duplicates I was hoping to figure out why it's happening to begin with.
I'm using python 3.7 on windows 10 pro
If you wrote the code to remove duplicates (which can be as simple as filenames = set(filenames)) you'd see that you still have two filenames. Print them out one on top of the other to make a visual comparison easier:
'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx',
'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx'
The second one has a leading ~ (probably an auto-backup).
Whenever you open an excel file it will create a ghost copy that works as a temporary backup copy for that specific file. In this case:
Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
~$ Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
This means that the file is open by some software and it's showing you that backup inside(usually that file is hidden from the explorer as well)
Just search for the program and close it. Other actions, such as adding validation so the "~$.*.xlsx" type of file is ignored should be also implemented if this is something you want to avoid.
You can use os.path.splittext to get the file extension and loop through the directory using os.listdir . The open excel files can be skipped using the following code:
filenames = []
for file in os.listdir('D:\larvalSchooling\data'):
filename, file_extension = os.path.splitext(file)
if file_extension == '.xlsx':
if not file.startswith('~$'):
filenames.append(file)
Note: this might not be the best solution, but it'll get the job done :)

File based strings/variables to set file path etc in python operation

I am trying to create part of a program that will take the values found in two CFG files and use them to determine what filetype to search for as well as what folder location to use. The code I found online sort of suits my needs, However I would like to not use a hard coded file path. Here is the code I have modified so far:
import glob
location = open("config.cfg", encoding = 'cp1252')
location = location.read()
filetype = open("filetype.cfg", encoding = 'cp1252')
filetype = filetype.read()
fileset = [file for file in glob.glob(location + filetype, recursive=True)]
print(location)
print(filetype)
for file in fileset:
print(file)
The config.cfg contains one line, which is the file path to a folder with 3 sample JPG files in it.
C:/test
The filetype.cfg contains one line as well, which is the file type to search for
"**/*.jpg"
I've gotten to the point where this code throws no errors, but it also doesn't work as intended either, it seems to read the files properly, but doesn't list the files in the folder. The Config.CFG file contains the folder path, i.e. C:/test, while the filetype.cfg contains "**/*.jpg", which is the type of file I would like searched for. I found the original code here: https://www.techbeamers.com/python-list-all-files-directory/, Look under the 'glob' method.
The original (fully working) code from the link above:
import glob
location = 'c:/test/temp/'
fileset = [file for file in glob.glob(location + "**/*.py", recursive=True)]
for file in fileset:
print(file)
Using Python 3.8 64bit on Windows 10.
Moved from an edit to the question by the OP to an answer.
Remove the quotes around "**/*.jpg" in the filetype.cfg file:
**/*.jpg

vscode - read file from current folder where .py file is

I'm very new to programming, and to vscode.
I'm learning Python and currently I am learning about working with files.
The path looks like this: /home/anewuser/learning/chapter10.
The problem: completely basic "read file in python" lesson does not work in vscode because no such file or directory error raises when running my .py file, located in ~/learning/chapter10. But vscode wants that my .txt file I am supposed to open in python, to be in ~/learning directory, then it works. I don't like this behaviour.
All I want is to be able to read file placed in the directory where the .py file is. How to do this?
Because in your case ~/learning is the default cwd (current working directory), VSCode looks for pi_digits.txt in that location. If you put pi_digits.txt beside file_reader.py (which is located at ~/learning/chapter10), you'll have to specify the path (by prepending chapter10/ to the .txt file).
So you should do this:
with open('chapter10/pi_digits.txt') as file_object:
contents = file_object.read()
print(contents)
If you want to change the default current working directory (for example you want to change it to ~/learning/chapter10), you'll have to do the following:
~/learning/chapter10/file_reader.py
import os # first you need to import the module 'os'
# set the cwd to 'chapter10'
os.chdir('chapter10')
# now 'file_reader.py' and 'pi_digits.txt' are both in the cwd
with open('pi_digits.txt') as file_object:
contents = file_object.read()
print(contents)
With os.chdir('chapter10') you've set chapter10 as the default cwd, in which VSCode now will look for pi_digits.txt.
For detailed information about os.chdir() you can read through the official documentation or take a look at this post on stackoverflow.
In "User Settings", use the search bar to look for "python.terminal.executeInFileDir" and set (=) its value to "true" instead of "false".
I took this answer from here
How to run python interactive in current file's directory in Visual Studio Code?
this is my first time putting an answer on StackOverflow
so I apologize if I didn't do it the right way

Compare archiwum.rar content and extracted data from .rar in the folder on Windows 7

Does anyone know how to compare amount of files and size of the files in archiwum.rar and its extracted content in the folder?
The reason I want to do this, is that server I'am working on has been restarted couple of times during extraction and I am not sure, if all the files has been extracted correctly.
.rar files are more then 100GB's each and server is not that fast.
Any ideas?
ps. if the solution would be some code instead standalone program, my preference is Python.
Thanks
In Python you can use RarFile module. The usage is similar to build-in module ZipFile.
import rarfile
import os.path
extracted_dir_name = "samples/sample" # Directory with extracted files
file = rarfile.RarFile("samples/sample.rar", "r")
# list file information
for info in file.infolist():
print info.filename, info.date_time, info.file_size
# Compare with extracted file here
extracted_file = os.path.join(extracted_dir_name, info.filename)
if info.file_size != os.path.getsize(extracted_file):
print "Different size!"

Categories

Resources