Finding path for corpus in NLTK - python

I am using the Natural Language Toolkit for python to write a program. In it I am trying to load a corpus of my own files. To do that I am using code to the following effect:
from nltk.corpus import PlaintextCorpusReader
corpus_root=(insert filepath here)
wordlists=PlaintextCorpusReader(corpus_root, '.*')
Let's say my file is called reader.py and my corpus of files is located in a directory called 'corpus' in the same directory as reader.py. I would like to know a way to generalize finding the filepath above, so that my code could find the path for the 'corpus' directory for any location for anyone using the code. I have tried these posts, but they only allow me to get absolute file paths:
Find current directory and file's directory
Any help would be greatly appreciated!

C:\Users\UserName\AppData\Roaming\nltk_data\corpora
I used Anaconda Platform, with conda environment... my corpora location

From what I understand
Your reader.py file and corpus directory are always in the same directory
You're looking for a way to refer to corpus from reader.py regardless of where you put them in your directory structure
In that case, the question that you referred to seems to be what you need. Another way of doing it is in this other answer. Using that second option, your code would then be:
from nltk.corpus import PlaintextCorpusReader
import os.path
import sys
basepath = os.path.dirname(__file__)
corpus_root= os.path.abspath(os.path.join(basepath, "corpus"))
wordlists=PlaintextCorpusReader(corpus_root, '.*')
Keep in mind that while an absolute path is created, it is created based on the information obtained in the basepath = os.path.dirname(__file__) bit above, which yields reader.py's current directory. Have a look at the documentation for some official documentation.

Related

set directory for search program in python

I am trying to develop a CNN for image processing. I have about 130 gigs stored on a separate drive on my comp, and I'm having trouble navigating a simple python search program to search through that specified directory. Im trying to have it find a bunch of random XML files scattered in a host of sub-directories/sub-directories/subs on that drive. How do I specify for just this one python program the directory it should be searching in, keeping it only to the context of the program?
Ive tried setting a variable Path = "B:\\MainFolder\SubFolder" and using os.walk, but it makes it through the first directory then stops.
can you try the following:
import os
import glob
base_dir = 'your/start/sirectory'
req_files = glob.glob(os.path.join(base_dir, '**/*.xml'), recursive=True)
Jeril and Eduardo, thank you for the help. i took a shot at pathlib and it worked. idk what was up with my glob code, looked basically the same as yours Jeril:
import glob, os
filelist = []
from pathlib import Path
for path in Path('B:\\CTImageDataset\LIDC-IDRI').rglob('*.xml'):
filelist.append(path.name)
print(filelist)
Worked great, thanks again

Is there a way to be able to use a variable path using os

The goal is to run through a half stable and half variable path.
I am trying to run through a path (go to lowest folder which is called Archive) and fill a list with files that have a certain ending. This works quite well for a stable path such as this.
fileInPath='\\server123456789\provider\COUNTRY\CATEGORY\Archive
My code runs through the path (recursive) and lists all files that have a certain ending. This works well. For simplicity I will just print the file name in the following code.
import csv
import os
fileInPath='\\\\server123456789\\provider\\COUNTRY\\CATEGORY\\Archive
fileOutPath=some path
csvSeparator=';'
fileList = []
for subdir, dirs, files in os.walk(fileInPath):
for file in files:
if file[-3:].upper()=='PAR':
print (file)
The problem is that I can manage to have country and category to be variable e.g. by using *
The standard library module pathlib provides a simple way to do this.
Your file list can be obtained with
from pathlib import Path
list(Path("//server123456789/provider/".glob("*/*/Archive/*.PAR"))
Note I'm using / instead of \\ pathlib handles the conversion for you on windows.

How do I make the current folder path to work for my command

I'm new to Python and really want this command to work so I have been looking around on google but I still can't find any solution. I'm trying to make a script that deletes a folder inside the folder my Blender game are inside so i have been trying out those commands:
import shutil
from bge import logic
path = bge.logic.expandPath("//")
shutil.rmtree.path+("/killme") # remove dir and all contains
The Folder i want to delete is called "killme" and I know you can just do: shutil.rmtree(Path)
but I want the path to start at the folder that the game is in and not the full C:/programs/blabla/blabla/test/killme path.
Happy if someone could explain.
I think you are using shutil.rmtree command in wrong way. You may use the following.
shutil.rmtree(path+"/killme")
Look at the reference https://docs.python.org/3/library/shutil.html#shutil.rmtree
Syntax: shutil.rmtree(path, ignore_errors=False, onerror=None)
Assuming that your current project directory is 'test'. Then, your code will look like the follwing:
import shutil
from bge import logic
path = os.getcwd() # C:/programs/blabla/blabla/test/
shutil.rmtree(path+"/killme") # remove dir and all contains
NOTE: It will fail if the files are read only in the folder.
Hope it helps!
What you could do is set a base path like
basePath = "/bla_bla/"
and then append the path and use something like:
shutil.rmtree(basePath+yourGamePath)
If you are executing the python as a standalone script that is inside the desired folder, you can do the following:
#!/usr/bin/env_python
import os
cwd = os.getcwd()
shutil.rmtree(cwd)
Hope my answer was helpful
The best thing you could do is use the os library.
Then with the os.path function you can list all the directories and filenames and hence can delete/modify the required folders while extractring the name of folders in the same way you want.
for root, dirnames, files in os.walk("issues"):
for name in dirnames:
for filename in files:
*what you want*

How to rapidly switch from one directory to another Python

I have a huge list of image in one directory and another corresponding list of annotations in the other (.txt files).
I need to perform an operation on each image following the matching image annotations and save it into another directory. Is there an elegant way not to chdir three times at each step?
Maybe using cPickle or whatever library used for fast files management ?
import glob
from PIL import Image
os.chdir('path_images')
list_im=glob.glob('*.jpg')
list_im.sort()
list_im=path_images+list_im
os.chdir('path_txt')
list_annot=glob.glob('*.txt')
list_annot.sort()
list_annot=path_txt+list_im
for i in range(0,len(list_images)):
Joel pointed out that the os operations are not mandatory if you include the path in the name
#os.chdir('path_images')
im=Image.open(list_im[i])
#os.chdir('path_text')
action_on_image(im,list_annot[i])
#os.chdir('path_to_save_image')
im.save(path_to_save+nom_image)
I am a true beginner in Python but I am confident that my code is super inefficient and can be improved.
You don't have to chdir (and FWIW you really don't want to depend on the current working directory). Use absolute paths everywhere in your code and you'll be fine.
import os
import glob
from PIL import Image
abs_images_path = <absolute path to your images directory here>
abs_txt_path = <absolute path to your txt directory here>
abs_dest_path = <absolute path to where you want to save your images>
list_im=sorted(glob.glob(os.path.join(abs_images_path, '*.jpg')))
list_annot=sorted(glob.glob(os.path.join(abs_txt_path, '*.txt')))
for im_path, txt_path in zip(list_im, list_annot):
im = Image.open(im_path)
action_on_image(im, txt_path)
im.save(os.path.join(abs_dest_path, nom_image))
Note that if your paths are relative to where your script is installed, you can get the script's directory path with os.path.dirname(os.path.abspath(__file__))

Python operating on files in a folder - 'for file in folder'

I know a folder's path, and for every file in the folder I would like to do some operations. So essentially what I'm looking for is a for file in folder type of code that gives me access to the files in variables.
What is the Python way of doing this?
Thanks
EDIT - example: my folder will contain a bunch of XML files, and I have a python routine already to parse them into variables I need.
This will allow you to access and print all the file names in your current directory:
import os
for filename in os.listdir('.'):
print filename
The os module contains much more information about the various functions available. The os.listdir() function can also take any other paths you want to specify.
Does the glob library look helpful?
It will perform some pattern matching, and accepts both absolute and relative addresses.
>>> import glob
>>> for file in glob.glob("*.xml"): # only loops over XML documents
print file
For people coming at this from a python version 3.5 or later, we now have the superior os.scandir() which has tremendous performance improvements over os.listdir()
For more information about the improvements/benefits, check out https://benhoyt.com/writings/scandir/

Categories

Resources