How can I isolate a section of a path - python

I'm wanting to extract the path that contains the word "tk-nuke-writenode" in it in the example below.
I need to isolate that particular path and only that path. The path(s) below are not fixed so I can’t use the split function and select the "tk-nuke-writenode" path using a field (e.g. [2]). See example below:
NUKE_PATH = os.environ['NUKE_PATH']
Result:
'X:\pipeline\app_config\release\extensions\global\nuke;X:\pipeline\app_config\release\extensions\projects\sgtk\powerPlant\install\app_store\tk-nuke\v0.11.4\classic_startup\restart;X:/pipeline/app_config/release/extensions/projects/sgtk/powerPlant/install/app_store/tk-nuke-writenode/v1.4.1/gizmos'
NUKE_PATH.split(os.pathsep)[2]
Result:
'X:/pipeline/app_config/release/extensions/projects/sgtk/powerPlant/install/app_store/tk-nuke-writenode/v1.4.1/gizmos'
Wanted output:
'X:/pipeline/app_config/release/extensions/projects/sgtk/powerPlant/install/app_store/tk-nuke-writenode/v1.4.1/gizmos'
Thanks in advance for any help you can offer!

The paths are divided by ';' so you can use something like this:
NUKE_PATH = os.environ['NUKE_PATH']
l = NUKE_PATH.split(';')
result = filter(lambda path: "tk-nuke-writenode" in path, l)

Assuming your paths do not contain any semicolons, you can do
path = 'X:\pipeline\app_config\release\extensions\global\nuke;X:\pipeline\app_config\release\extensions\projects\sgtk\powerPlant\install\app_store\tk-nuke\v0.11.4\classic_startup\restart;X:/pipeline/app_config/release/extensions/projects/sgtk/powerPlant/install/app_store/tk-nuke-writenode/v1.4.1/gizmos'
matches = [p for p in path.split(';') if 'tk-nuke-writenode' in p]
matches[0]
'X:/pipeline/app_config/release/extensions/projects/sgtk/powerPlant/install/app_store/tk-nuke-writenode/v1.4.1/gizmos'
This will raise an exception if no matches are found, and it may be a good idea to handle multiple matches as well.

Related

Return strings based on folders in directory

I'm trying to write a code which will look in to a specified directory and return the names of the folders in there. I need them to come in separate strings, instead of in one line, so that I can then use them to create new folders based on those names the code returned.
So far this is what I have:
def Lib_Folder():
my_list = os.listdir('/Users/Tim/Test/Test Library')
if '.DS_Store' in my_list:
my_list.remove('.DS_Store')
return str(my_list).replace('[','').replace(']','').replace("'", "")
Library_Folder = '%s' % ( Lib_Folder() )
print Library_Folder
and it returns this
# Result: testfolder1, testfolder2
What I would like it to return is
testfolder1
testfolder2
Does anyone know how I can achieve this?
It's best to process the values as is, and not try to process the string representation of a Python list, there are lots of little edge cases if you go that route that will cause problems.
Also, your description mentions only wanting the directories in the target folder, you're not checking for that, you will output the files as well.
And I find this is a preference thing, but I prefer to avoid building up lists in memory if I can avoid it. It really doesn't matter for a handful of directory names, but it's a habit I like to get into, so this solution shows how to use a generator:
import os
def Lib_Folder():
target = '/Users/Tim/Test/Test Library'
for cur in os.listdir(target):
# Check to see if the current item is a directory
if os.path.isdir(os.path.join(target, cur)):
# Ignore this folder if it exists
if cur != '.DS_Store':
# It's a good folder, return it
yield cur
# Print out each item as it occurs
for cur in Lib_Folder():
print(cur)
# Or, print them out as one string:
values = "\n".join(Lib_Folder())
print(values)
Yeah, there's a much easier way to do this. Don't try to manipulate the string that represents the list. Work with the list itself. Use str.join to separate each entry with newlines.
def Lib_Folder():
my_list = os.listdir('/Users/Tim/Test/Test Library')
if '.DS_Store' in my_list:
my_list.remove('.DS_Store')
return "\n".join(my_list) # Puts each on its own line.
Library_Folder = Lib_Folder() # The %s thing you were doing is unnecessary.
# Let's check whether it worked!
print(Library_Folder)
# testfolder1
# testfolder2

With Python how can I replace part of a path based on a piece that matches?

I have a path that looks like: data/dev-noise-subtractive-250ms-1/1988/24833/1988-24833-0013.flac
What I want to do is replace the second part, so that it's data/dev-clean/1988/24833/1988-24833-0013.flac. I can't guarantee anything about the second section, other than it starts with dev-.
I need to make it general-purpose, so that it'll work with any arbitrary stem, such as train-, and so on.
You can use re to match and replace it by:
def func(pattern, file):
return re.sub(f'{pattern}[^/]+/', f'{pattern}clean/', file)
func('dev-', 'data/dev-noise-subtractive-250ms-1/1988/24833/1988-24833-0013.flac')
#data/dev-clean/1988/24833/1988-24833-0013.flac
func('train-', 'data/train-noise-subtractive-250ms-1/1988/24833/1988-24833-0013.flac')
#data/train-clean/1988/24833/1988-24833-0013.flac
func('train-', 'data/xxx/xxx/train-noise-subtractive-250ms-1/1988/24833/1988-24833-0013.flac')
#data/xxx/xxx/train-clean/1988/24833/1988-24833-0013.flac
def replace_second_part(path, with_what):
parts = path.split('/')
parts[1] = with_what
return '/'.join(parts)
If you want this to be more portable (work under Windows, for example) using os.path is preferred.

Python - Regex - Match anything except

I'm trying to get my regular expression to work but can't figure out what I'm doing wrong. I am trying to find any file that is NOT in a specific format. For example all files are dates that are in this format MM-DD-YY.pdf (ex. 05-13-17.pdf). I want to be able to find any files that are not written in that format.
I can create a regex to find those with:
(\d\d-\d\d-\d\d\.pdf)
I tried using the negative lookahead so it looked like this:
(?!\d\d-\d\d-\d\d\.pdf)
That works in not finding those anymore but it doesn't find the files that are not like it.
I also tried adding a .* after the group but then that finds the whole list.
(?!\d\d-\d\d-\d\d\.pdf).*
I'm searching through a small list right now for testing:
05-17-17.pdf Test.pdf 05-48-2017.pdf 03-14-17.pdf
Is there a way to accomplish what I'm looking for?
Thanks!
You can try this:
import re
s = "Test.docx 04-05-2017.docx 04-04-17.pdf secondtest.pdf"
new_data = re.findall("[a-zA-Z]+\.[a-zA-Z]+|\d{1,}-\d{1,}-\d{4}\.[a-zA-Z]+", s)
Output:
['Test.docx', '04-05-2017.docx', 'secondtest.pdf']
First find all that are matching, then remove them from your list separately. firstFindtheMatching method first finds matching names using re library:
def firstFindtheMatching(listoffiles):
"""
:listoffiles: list is the name of the files to check if they match a format
:final_string: any file that doesn't match the format 01-01-17.pdf (MM-DD-YY.pdf) is put in one str type output. (ALSO) I'm returning the listoffiles so in that you can see the whole output in one place but you really won't need that.
"""
import re
matchednames = re.findall("\d{1,2}-\d{1,2}-\d{1,2}\.pdf", listoffiles)
#connect all output in one string for simpler handling using sets
final_string = ' '.join(matchednames)
return(final_string, listoffiles)
Here is the output:
('05-08-17.pdf 04-08-17.pdf 08-09-16.pdf', '05-08-17.pdf Test.pdf 04-08-17.pdf 08-09-16.pdf 08-09-2016.pdf some-all-letters.pdf')
set(['08-09-2016.pdf', 'some-all-letters.pdf', 'Test.pdf'])
I've used the main below if you like to regenerate the results. Good thing about doing it this way is that you can add more regex to your firstFindtheMatching(). It helps you to keep things separate.
def main():
filenames= "05-08-17.pdf Test.pdf 04-08-17.pdf 08-09-16.pdf 08-09-2016.pdf some-all-letters.pdf"
[matchednames , alllist] = firstFindtheMatching(filenames)
print(matchednames, alllist)
notcommon = set(filenames.split()) - set(matchednames.split())
print(notcommon)
if __name__ == '__main__':
main()

Regex to parse out a part of URL using python

I am having data as follows,
data['url']
http://hostname.com/aaa/uploads/2013/11/a-b-c-d.jpg https://www.aaa.com/
http://hostname.com/bbb/uploads/2013/11/e-f-g-h.gif https://www.aaa.com/
http://hostname.com/ccc/uploads/2013/11/e-f-g-h.png http://hostname.com/ccc/uploads/2013/11/a-a-a-a.html
http://hostname.com/ddd/uploads/2013/11/w-e-r-t.ico
http://hostname.com/ddd/uploads/2013/11/r-t-y-u.aspx https://www.aaa.com/
http://hostname.com/bbb/uploads/2013/11/t-r-w-q.jpeg https://www.aaa.com/
I want to find out the formats such as .jpg, .gif, .png, .ico, .aspx, .html, .jpeg and parse it out backwards until it finds a "/". Also I want to check for several occurance all through the string. My output should be,
data['parsed']
a-b-c-d
e-f-g-h
e-f-g-h a-a-a-a
w-e-r-t
r-t-y-u
t-r-w-q
I am thinking instead of writing individual commands for each of the formats, is there a way to write everything under a single command.
Can anybody help me in writing for theses commands? I am new to regex and any help would be appreciated.
this builds a list of name to extension pairs
import re
results = []
for link in data:
matches = re.search(r'/(\w-\w-\w-\w)\.(\w{2,})\b', link)
results.append((matches.group(1), matches.group(2)))
This pattern returns the file names. I have just used one of your urls to demonstrate, for more, you could simply append the matches to a list of results:
import re
url = "http://hostname.com/ccc/uploads/2013/11/e-f-g-h.png http://hostname.com/ccc/uploads/2013/11/a-a-a-a.html"
p = r'((?:[a-z]-){3}[a-z]).'
matches = re.findall(p, url)
>>> print('\n'.join(matches))
e-f-g-h
a-a-a-a
There is the assumption that the urls all have the general form you provided.
You might try this:
data['parse'] = re.findall(r'[^/]+\.[a-z]+ ',data['url'])
That will pick out all of the file names with their extensions. If you want to remove the extensions, the code above returns a list which you can then process with list comprehension and re.sub like so:
[re.sub('\.[a-z]+$','',exp) for exp in data['parse']]
Use the .join function to create a string as demonstrated in Totem's answer

Exact match in strings in python

I am trying to find a sub-string in a string, but I am not achieving the results I want.
I have several strings that contains the direction to different directories:
'/Users/mymac/Desktop/test_python/result_files_Sample_8_11/logs',
'/Users/mymac/Desktop/test_python/result_files_Sample_8_1/logs',
'/Users/mymac/Desktop/test_python/result_files_Sample_8_9/logs'
Here is the part of my code here I am trying to find the exact match to the sub-string:
for name in sample_names:
if (dire.find(name)!=-1):
for files in os.walk(dire):
for file in files:
list_files=[]
list_files.append(file)
file_dict[name]=list_files
Everything works fine except that when it looks for Sample_8_1 in the string that contains the directory, the if condition also accepts the name Sample_8_11. How can I make it so that it makes an exact match to prevent from entering the same directory more than once?
You could try searching for sample_8_1/ (i.e., include the following slash). I guess given your code that would be dire.find(name+'/'). This just a quick and dirty approach.
Assuming that dire is populated with absolute path names
for name in sample_names:
if name in dire:
...
e.g.
samples = ['/home/msvalkon/work/tmp_1',
'/home/msvalkon/work/tmp_11']
dirs = ['/home/msvalkon/work/tmp_11']
for name in samples:
if name in dirs:
print "Entry %s matches" % name
Entry /home/msvalkon/work/tmp_11 matches

Categories

Resources