Randomly choose a file inside a folder using Hypothesis - python

I want to add tests using the Hypothesis library (already use in the software for testing).
For these tests, I have to use a set of txt files contained in a folder.
I need to randomly choose one of these files each time I run my tests.
How to do that using Hypothesis?
edit
Here basically how it would look like, to comply the templates of already existing tests.
#given(doc=)
def mytest(doc):
# assert some stuff according to doc
assert some_stuff

Static case
If files list is assumed to be "frozen" (no files will be deleted/added) then we can use os.listdir + hypohtesis.strategies.sampled_from like
import os
from hypothesis import strategies
directory_path = 'path/to/directory/with/txt/files'
txt_files_names = strategies.sampled_from(sorted(os.listdir(directory_path)))
or if we need full paths
from functools import partial
...
txt_files_paths = (strategies.sampled_from(sorted(os.listdir(directory_path)))
.map(partial(os.path.join, directory_path)))
or if the directory may have files of different extensions and we need only .txt ones we can use glob.glob
import glob
...
txt_files_paths = strategies.sampled_from(sorted(glob.glob(os.path.join(directory_path, '*.txt'))))
Dynamic case
If directory contents may change and we want to make directory scan on each data generation attempt it can be done like
dynamic_txt_files_names = (strategies.builds(os.listdir,
strategies.just(directory_path))
.map(sorted)
.flatmap(strategies.sampled_from))
or with full paths
dynamic_txt_files_paths = (strategies.builds(os.listdir,
strategies.just(directory_path))
.map(sorted)
.flatmap(strategies.sampled_from)
.map(partial(os.path.join, directory_path)))
or with glob.glob
dynamic_txt_files_paths = (strategies.builds(glob.glob,
strategies.just(os.path.join(
directory_path,
'*.txt')))
.map(sorted)
.flatmap(strategies.sampled_from))
Edit
Added sorted following comment by #Zac Hatfield-Dodds.

Related

Ignore all files other than specific type of file, for directory comparison in Python

I want to compare two directories for all ".bin" files in them. There can be some other extension type files such as ".txt", ".tar.bz2" in those directories. I want to get the common files as well as files which are not common.
I tried using filecmp.dircmp(), but I am not able to use the ignore parameter with some wild card to ignore those files. Is there any solution which I can use to serve my purpose.
Select the common subset of *.bin files in the two folders and remove the first part of the path (the folder name), then pass them to cmpfiles():
import filecmp
from pathlib import Path
dir1_files = [f.relative_to('folder1') for f in Path('folder1').glob('*.bin')]
dir2_files = [f.relative_to('folder2') for f in Path('folder2').glob('*.bin')]
common_files = set(dir1_files).intersection(dir2_files)
match, mismatch, error = filecmp.cmpfiles('folder1', 'folder2', common_files)
If you want to avoid the preselection of common files, you can instead take the union of the two sets:
common_files = set(dir1_files).union(dir2_files)

A format to specify search pattern for folders and files within them

I am using a testing framework to test designs written in VHDL. In order for this to work, a Python script creates several "libraries" and then adds files in these libraries. Finally the simulator program is invoked, it starts up, compiles all the files into the specified libraries and then runs the tests.
I want to make changes in the way we specify what "libraries" to create and where to add the files for each library from. I think that it should be possible to write the description for these things in JSON and then let Python script process it. In this way, the same Python script can be used for all projects and I don't have to worry about someone not knowing Python.
The main issue is deciding how to express the information in JSON file. The JSON file shall create entries for library name and then location of source files. The fundamental problem is how to express these things using some type of pattern like glob or regular expression:
Pattern for name of folder to search
Pattern for name of subfolders to search
Express if all subfolders should be searched in a folder or not
What subfolders to exclude from search
This would express something like e.g "files in folder A but not its subfolders, folder B and its subfolders but not subfolder X in folder B"
Then we come to the pattern for the actual file names. The pattern of file names shall follow the pattern for the folder. If same file pattern applies to multiple folders, then after multiple lines of folder name patterns, the filename pattern applying to all of them shall occur once.
Pattern for name of file to add into library.
Pattern for name of file to exclude from library.
This would express something like e.g "all files ending with ".vhd" but no files that have "_bb_inst.vhd" in their name and do not add p.vhd and q.vhd"
Finally the Python script parsing the files should be able to detect conflicts in the rules e.g a folder is specified for search and exclusion at same time, the same files are being added into multiple libraries e.t.c. This will of course be done within the Python script.
Now my question is, does a well defined pre-existing method to define something like what I have described here already exist? The only reason to choose JSON to express this is that Python has packages to traverse JSON files.
Have you looked at the glob library?
For your more tricky use cases you could specify in/out lists using glob patterns.
For example
import glob
inlist_pattern = "/some/path/on_yoursystem/*.vhd"
outlist_pattern = "/some/path/on_yoursystem/*_bb_inst.vhd"
filtered_files = set(glob.glob(inlist_pattern )) - set(glob.glob(outlist_pattern))
And other set operations allow you to perform more interesting in/out operations.
To do recursive scans, try ammending your patterns accordingly:
inlist_pattern = "/some/path/on_yoursystem/**/*.vhd"
outlist_pattern = "/some/path/on_yoursystem/**/*_bb_inst.vhd"
list_of_all_vhds_in_sub_dirs = glob.glob(inlist_pattern, recursive=True)
With the recursive=True keyword option, the scan will be performed at the point in the path specified, and where the ** notation is used, plus zero or more subfolders, returning the files that match the overall pattern.

Comparing directories in Python

I have two directories that I want to compare and I want to find the following using Python (while ignoring the structure of each directory):
files with the same name, but different content
files with the same content, but different name
files with both unique content and name, that exist only in one directory but not the other
Is there a robust Python library to do this? I looked everywhere, but I can't find anything that can do all of the above. If possible, I wouldn't want to create one from a scratch since it is potentially a very complex endeavour.
All I can do so far is make a list of files, but I'm utterly lost how to proceed from there.
from pathlib import Path
file_list = []
file_path = Path.cwd()
for file in file_path.rglob('*'):
if file.is_file():
file_list.append(file)
This method prints result of comparison between directories.
result = filecmp.dircmp('dir1', 'dir2')
result.report()
diff dir1 dir2
Only in dir1 : ['newfile.txt']
Identical files : ['file1.txt']
Differing files : ['file2.txt']
"""

How can I read files with similar names on python, rename them and then work with them?

I've already posted here with the same question but I sadly I couldn't come up with a solution (even though some of you guys gave me awesome answers but most of them weren't what I was looking for), so I'll try again and this time giving more information about what I'm trying to do.
So, I'm using a program called GMAT to get some outputs (.txt files with numerical values). These outputs have different names, but because I'm using them to more than one thing I'm getting something like this:
GMATd_1.txt
GMATd_2.txt
GMATf_1.txt
GMATf_2.txt
Now, what I need to do is to use these outputs as inputs in my code. I need to work with them in other functions of my script, and since I will have a lot of these .txt files I want to rename them as I don't want to use them like './path/etc'.
So what I wanted was to write a loop that could get these files and rename them inside the script so I can use these files with the new name in other functions (outside the loop).
So instead of having to this individually:
GMATds1= './path/GMATd_1.txt'
GMATds2= './path/GMATd_2.txt'
I wanted to write a loop that would do that for me.
I've already tried using a dictionary:
import os
import fnmatch
dict = {}
for filename in os.listdir('.'):
if fnmatch.fnmatch(filename, 'thing*.txt'):
examples[filename[:6]] = filename
This does work but I can't use the dictionary key outside the loop.
If I understand correctly, you try to fetch files with similar names (at least a re-occurring pattern) and rename them. This can be accomplished with the following code:
import glob
import os
all_files = glob.glob('path/to/directory/with/files/GMAT*.txt')
for file in files:
new_path = create_new_path(file) # possibly split the file name, change directory and/or filename
os.rename(file, new_path)
The glob library allows for searching files with * wildcards and makes it hence possible to search for files with a specific pattern. It lists all the files in a certain directory (or multiple directories if you include a * wildcard as a directory). When you iterate over the files, you could either directly work with the input of the files (as you apparently intend to do) or rename them as shown in this snippet. To rename them, you would need to generate a new path - so you would have to write the create_new_path function that takes the old path and creates a new one.
Since python 3.4 you should be using the built-in pathlib package instead of os or glob.
from pathlib import Path
import shutil
for file_src in Path("path/to/files").glob("GMAT*.txt"):
file_dest = str(file_src.resolve()).replace("ds", "d_")
shutil.move(file_src, file_dest)
you can use
import os
path='.....' # path where these files are located
path1='.....' ## path where you want these files to store
i=1
for file in os.listdir(path):
if file.endswith(end='.txt'):
os.rename(path + "/" + file, path1 + "/"+str(i) + ".txt")
i+=1
it will rename all the txt file in the source folder to 1,2,3,....n.txt

Search for file names that contain words from a list and have a certain file extension

Beginner at python. I'm trying to search users folders for illegal content saved in folders. I want to find all files that contain either one or a number of words from the below list and also the files also have an extension that's listed.
I can search the files using file.endswith but don't know how to add in the word condition.
I've looked through the site and how only come across how to search for a certain word and not a list of words.
Thank you in advance
import os
L = ['720p','aac','ac3','bdrip','brrip','demonoid','disc','hdtv','dvdrip',
'edition','sample','torrent','www','x264','xvid']
for root, dirs, files in os.walk("Y:\User Folders\"):
for file in files:
if file.endswith(('*.7z','.3gp','.alb','.ape','.avi','.cbr','.cbz','.cue','.divx','.epub','.flac',
'.flv','.idx','.iso','.m2ts','.m2v','.m3u','.m4a','.m4b','.m4p','.m4v','.md5',
'.mkv','.mobi','.mov','.mp3','.mp4','.mpeg','.mpg','.mta','.nfo','.ogg','.ogm',
'.pla','.rar','.rm','.rmvb','.sfap0','.sfk','.sfv','.sls','.smfmf','.srt,''.sub',
'.torrent','.vob','.wav','.wma','.wmv','.wpl','.zip')):
print(os.path.join(root, file))
Perhaps it might be better to do a reverse search, and display a warning about files that DON'T match the file types you want. For instance you could do this:
if file.endswith(".txt", ".py"):
print("File is ok!")
else:
print("File is not ok!")
Using py.path.local from py package
The py package (install by $ pip install py) offers a very nice interface for working with files.
from py.path import local
def isbadname(path):
bad_extensions = [".pyc", "txt"]
bad_names = ["code", "xml"]
return (path.ext in bad_extensions) or (path.purebasename in bad_names)
for path in local(".").visit(isbadname):
print(path.strpath)
Explained:
Import
from py.path import local
py.path.local function creates "objectified" file names. To keep my code short, I import
it this way to use only local for objectifying file name strings.
Create objectified path to local directory:
local(".")
Created object is not a string, but an object, which has many interesting properties and methods.
Listing all files within some directory:
local(".").visit("*.txt")
returns a generator, providing all paths to files having extension ".txt"..
Alternative method to detect files to generate is providing a function, which gets argument path
(objectified file name) and returns True if the file is to be used, False otherwise.
The function isbadname serves exactly this purpose.
If you want to google for more information, use py path local (the name py is not giving good hits).
For more see https://py.readthedocs.io/en/latest/path.html
Note, that if you use pytest package, the py is installed with it (for good
reason - it makes tests related to file names much more readable and shorter).

Categories

Resources