I need to split a path up in python and then remove the last two levels.
Here is an example, the path I want to parse. I want to parse it to level 6.
C:\Users\Me\level1\level2\level3\level4\level5\level6\level7\level8
Below is what I want the output to be. Currently, I can only go one level up.
C:\Users\Me\level1\level2\level3\level4\level5\level6\
a ="C:\Users\Me\level1\level2\level3\level4\level5\level6\level7\level8"
split_path=os.path.split(a)
print split_path
Output:
('C:\Users\Me\level1\level2\level3\level4\level5\level6\level7','level8')
Split the path into all its parts, then join all the parts, except the last two.
import os
seperator = os.path.sep
parts = string.split(seperator)
output = os.path.join(*parts[0:-2])
You can either use the split function twice:
os.path.split(os.path.split(a)[0])[0]
This works since os.path.split() returns a tuple with two items, head and tail, and by taking [0] of that we'll get the head. Then just split again and take the first item again with [0].
Or join your path with the parent directory twice:
os.path.abspath(os.path.join(a, '..', '..'))
You can easily create a function that will step back as many steps as you want:
def path_split(path, steps):
for i in range(steps + 1):
path = os.path.split(path)[0]
return path
So
>>> path_split("C:\Users\Me\level1\level2\level3\level4\level5\level6\level7\level8", 2)
"C:\Users\Me\level1\level2\level3\level4\level5\level6\"
os.path.split(path) gives the whole path except the lastone, and the last one in a tuple. So if you want to remove the last two,
os.path.split(os.path.split(your_path)[0])[0]
Related
I want the value of the actual variable to be printed like this.
Variable value
rootdir = '/home/runner/TestP1'
Required value to be printed
/home/runner
I used the code like this
hello = rootdir.split("/")[1]
print(hello)
Given value
TestP1
but I want to remove just the last word from that string and print the remaining path.
You should use the dirname function from os.path
Return the directory name of pathname path. This is the first element of the pair returned by passing path to the function split().
>>> from os.path import dirname
>>> rootdir = '/home/runner/TestP1'
>>> dirname(rootdir)
'/home/runner'
split() gives a list of elements after separating the string by /.
So, you need to take all elements till the last and re-join:
hello = rootdir.split("/")[:-1]
hello = '/'.join(hello)
print(hello)
Use rsplit() instead and split only till one / from right:
hello = rootdir.rsplit("/", 1)[0]
print(hello)
Further, if you are trying to extract the directory name, instead use #sayse's answer.
You can use rsplit (for spliting from the right side) and put number of max splits equal to 1. like this:
rootdir = '/home/runner/TestP1'
hello = rootdir.rsplit("/", 1)[0]
print(hello)
Output:
I have a number of html files in a directory. I am trying to store the filenames in a list so that I can use it later to compare with another list.
Eg: Prod224_0055_00007464_20170930.html is one of the filenames. From the filename, I want to extract '00007464' and store this value in a list and repeat the same for all the other files in the directory. How do I go about doing this? I am new to Python and any help would be greatly appreciated!
Please let me know if you need more information to answer the question.
Split the filename on underscores and select the third element (index 2).
>>> 'Prod224_0055_00007464_20170930.html'.split('_')[2]
'00007464'
In context that might look like this:
nums = [f.split('_')[2] for f in os.listdir(dir) if f.endswith('.html')]
you may try this (assuming you are in the folder with the files:
import os
num_list = []
r, d, files = os.walk( '.' ).next()
for f in files :
parts = f.split('_') # now `parts` contains ['Prod224', '0055', '00007464', '20170930.html']
print parts[2] # this outputs '00007464'
num_list.append( parts[2] )
Assuming you have a certain pattern for your files, you can use a regex:
>>> import re
>>> s = 'Prod224_0055_00007464_20170930.html'
>>> desired_number = re.findall("\d+", s)[2]
>>> desired_number
'00007464'
Using a regex will help you getting not only that specific number you want, but also other numbers in the file name.
This will work if the name of your files follow the pattern "[some text][number]_[number]_[desired_number]_[a date].html". After getting the number, I think it will be very simple to use the append method to add that number to any list you want.
I have searched possible ways but I am unable to mix those up yet. I have a string that is a path to the image.
myString= "D:/Train/16_partitions_annotated/partition1/images/AAAAA/073-1_00191.jpeg"
What I want to do is replace images with IMAGES and cut off the 073-1_00191.jpeg part at the end. Thus, the new string string should be
newString = "D:/Train/16_partitions_annotated/partition1/IMAGES/AAAAA/"
And the chopped part (073-1_00191.jpeg) will be used separately as the name of processed image. The function .replace() doesn't work here as I need to provide path and filename as separate parameters.
The reason why I want to do is that I am accessing images through their paths and doing some stuff on them and when saving them I need to create another directory (in this case IMAGES) and the next directories after that (in this case AAAAA) should remain the same ( together with the name of corresponding image).
Note that images may have different names and extensions
If something is not clear by my side please ask, I will try to clear up
As alluded to in the comments, os.path is useful for manipulating paths represented as strings.
>>> import os
>>> myString= "D:/Train/16_partitions_annotated/partition1/images/AAAAA/073-1_00191.jpeg"
>>> dirname, basename = os.path.split(myString)
>>> dirname
'D:/Train/16_partitions_annotated/partition1/images/AAAAA'
>>> basename
'073-1_00191.jpeg'
At this point, how you want to handle capitalizing "images" is a function of your broader goal. If you want to simply capitalize that specific word, dirname.replace('images', 'IMAGES') should suffice. But you seem to be asking for a more generalized way to capitalize the second to last directory in the absolute path:
>>> def cap_penultimate(dirname):
... h, t = os.path.split(dirname)
... hh, ht = os.path.split(h)
... return os.path.join(hh, ht.upper(), t)
...
>>> cap_penultimate(dirname)
'D:/Train/16_partitions_annotated/partition1/IMAGES/AAAAA'
It's game of slicing , Here you can try this :
myString= "D:/Train/16_partitions_annotated/partition1/images/AAAAA/073-1_00191.jpeg"
myString1=myString.split('/')
pre_data=myString1[:myString1.index('images')]
after_data=myString1[myString1.index('images'):]
after_data=['IMAGE'] + after_data[1:2]
print("/".join(pre_data+after_data))
output:
D:/Train/16_partitions_annotated/partition1/IMAGE/AAAAA
The simple way :
myString= "D:/Train/16_partitions_annotated/partition1/images/AAAAA/073-1_00191.jpeg"
a = myString.rfind('/')
filename = myString[a+1:]
restofstring = myString[0:a]
alteredstring = restofstring.replace('images', 'IMAGES')
print(alteredstring)
output:
D:/Train/16_partitions_annotated/partition1/IMAGE/AAAAA
My question is closely related to Python identify file with largest number as part of filename
I want to append files to a certain directory. The name of the files are: file1, file2......file^n. This works if i do it in one go, but when i want to add files again, and want to find the last file added (in this case the file with the highest number), it recognises 'file6' to be higher than 'file100'.
How can i solve this.
import glob
import os
latest_file = max(sorted(list_of_files, key=os.path.getctime))
print latest_file
As you can see i tried looking at created time and i also tried looking at modified time, but these can be the same so that doesn't help.
EDIT my filenames have the extention ".txt" after the number
I'll try to solve it only using filenames, not dates.
You have to convert to integer before appling criteria or alphanum sort applies to the whole filename
Proof of concept:
import re
list_of_files = ["file1","file100","file4","file7"]
def extract_number(f):
s = re.findall("\d+$",f)
return (int(s[0]) if s else -1,f)
print(max(list_of_files,key=extract_number))
result: file100
the key function extracts the digits found at the end of the file and converts to integer, and if nothing is found returns -1
you don't need to sort to find the max, just pass the key to max directly
if 2 files have the same index, use full filename to break tie (which explains the tuple key)
Using the following regular expression you can get the number of each file:
import re
maxn = 0
for file in list_of_files:
num = int(re.search('file(\d*)', file).group(1)) # assuming filename is "filexxx.txt"
# compare num to previous max, e.g.
maxn = num if num > maxn else maxn
At the end of the loop, maxn will be your highest filename number.
I want to remove the last string in the list i.e. the library name (delimited by '\'). The text string that I have contains path of libraries used at the compilation time. These libraries are delimited by spaces. I want to retain each path but not till the library name, just one root before it.
Example:
text = " /opt/gcc/4.4.4/snos/lib/gcc/x86_64-suse-linux/4.4.4/crtbeginT.o /opt/gcc/4.4.4/snos/lib/gcc/x86_64-suse-linux/4.4.4/crtfastmath.o /opt/cray/cce/8.2.5/craylibs/x86-64/no_mmap.o /opt/cray/cce/8.2.5/craylibs/x86-64/libcraymath.a /opt/cray/cce/8.2.5/craylibs/x86-64/libcraymp.a /opt/cray/atp/1.7.1/lib/libAtpSigHandler.a /opt/cray/atp/1.7.1/lib/libAtpSigHCommData.a "
I want my output to be like -
Output_list =
[/opt/gcc/4.4.4/snos/lib/gcc/x86_64-suse-linux/4.4.4,
/opt/gcc/4.4.4/snos/lib/gcc/x86_64-suse-linux/4.4.4,
/opt/cray/cce/8.2.5/craylibs/x86-64,
/opt/cray/cce/8.2.5/craylibs/x86-64,
/opt/cray/cce/8.2.5/craylibs/x86-64,
/opt/cray/atp/1.7.1/lib,
/opt/cray/atp/1.7.1/lib]
and finally I want to remove the duplicates in the output_list so that the list looks like.
New_output_list =
[/opt/gcc/4.4.4/snos/lib/gcc/x86_64-suse-linux/4.4.4,
/opt/cray/cce/8.2.5/craylibs/x86-64,
/opt/cray/atp/1.7.1/lib]
I am getting the results using split() function but I am struggling to discard the library names from the path.
any help would be appreciated.
You seem to want (don't try and do string operations with paths, it's bound to end badly):
import os
New_output_List = list(set(os.path.dirname(pt) for pt in text.split()))
os.path.dirname splits a path into it's gets the directory name from a path. We do this for every item in the text, split into a list based on white-space. This is done for every item in the series.
To remove the duplicates, we just convert it to a set and then finally to a list.
try with this
text = " /opt/gcc/4.4.4/snos/lib/gcc/x86_64-suse-linux/4.4.4/crtbeginT.o /opt/gcc/4.4.4/snos/lib/gcc/x86_64-suse-linux/4.4.4/crtfastmath.o /opt/cray/cce/8.2.5/craylibs/x86-64/no_mmap.o /opt/cray/cce/8.2.5/craylibs/x86-64/libcraymath.a /opt/cray/cce/8.2.5/craylibs/x86-64/libcraymp.a /opt/cray/atp/1.7.1/lib/libAtpSigHandler.a /opt/cray/atp/1.7.1/lib/libAtpSigHCommData.a "
New_output_List = []
for x in list(set(text.split(' '))):
New_output_list.append("".join("/" + y if y else '' for y in x.split("/")[:-1]))