extraction of file from filepath - python

I need to extract file name without extension name.
example.
/home/si/text.txt
/home/si/text.vx.txt
In the both case I should receive output text only. I am not sure how many trailing extension file can have but I need to extract only file name. I have tried spliitext(filename)[0] but it gave me output text.vx rather than text

This should work for your needs:
from os.path import basename
print basename("/home/si/text.vx.txt").split('.')[0]
>>> text

I use split function after getting file name.
filename.split('.')[0]

Related

Python - doc to docx file converter input, file path from a txt file

Hi stackoverflow community,
Situation,
I'm trying to run this converter found from here,
However what I want is for it to read an array of file path from a text file and convert them.
Reason being, these file path are filtered manually, so I don't have to convert unnecessary files. There are a large amount of unnecessary files in the folder.
How can I go about with this? Thank you.
with open("file_path",'r') as file_content:
content=file_content.read()
content=content.split('\n')
You can read the data of the file using the method above, Then covert the data of file into a list(or any other iteratable data type) so that we can use it with for loop.I used content=content.split('\n') to split the data of content by '\n' (Every time you press enter key, a new line character '\n' is sended), you can use any other character to split.
for i in content:
# the code you want to execute
Note
Some useful links:
Split
File writing
File read and write
By looking at your situation, I guess this is what you want (to only convert certain file in a directory), in which you don't need an extra '.txt' file to process:
import os
for f in os.listdir(path):
if f.startswith("Prelim") and f.endswith(".doc"):
convert(f)
But if for some reason you want to stick with the ".txt" processing, this may help:
with open("list.txt") as f:
lines = f.readlines()
for line in lines:
convert(line)

How can I rename every file that matches a regex?

I want to rename filenames of the form xyz.ogg.mp3 to xyz.mp3.
I have a regex that looks for .ogg in every file then it replaces the .ogg with an empty string but I get the following error:
Traceback (most recent call last):
File ".\New Text Document.py", line 7, in <module>
os.rename(files, '')
TypeError: rename() argument 1 must be string, not _sre.SRE_Match
Here is what I tried:
for file in os.listdir("./"):
if file.endswith(".mp3"):
files = re.search('.ogg', file)
os.rename(files, '')
How can I make this loop look for every .ogg in each file then replace it with an empty string?
The file structure looks like this: audiofile.ogg.mp3
You can do something like this:
for file in os.listdir("./"):
if file.endswith(".mp3") and '.ogg' in file:
os.rename(file, file.replace('.ogg',''))
Would be far more quicker to write a command line :
rename 's/\.ogg//' *.ogg.mp3
(perl's rename)
An example using Python 3's pathlib (but not regular expressions, as it's kind of overkill for the stated problem):
from pathlib import Path
for path in Path('.').glob('*.mp3'):
if '.ogg' in path.stem:
new_name = path.name.replace('.ogg', '')
path.rename(path.with_name(new_name))
A few notes:
Path('.') gives you a Path object pointing to the current working directory
Path.glob() searches recursively, and the * there is a wildcard (so you get anything ending in .mp3)
Path.stem gives you the file name minus the extension (so if your path were /foo/bar/baz.bang, the stem would be baz)

Should I Use Regex to Get a File Name?

I'm currently working with tkinter in Python (beginner), and I'm writing a small applet that requires one of the labels to dynamically change based on what the name of a selected .csv file is, without the '.csv' tag.
I can currently get the filepath to the .csv file using askopenfilename(), which returns a string that looks something like "User/Folder1/.../filename.csv". I need some way to extract "filename" from this filepath string, and I'm a bit stuck on how to do it. Is this simply a regex problem? Or is there a way to do this using string indices? Which is the "better" way to do it? Any help would be great. Thank you.
EDIT: The reason I was wondering if regex is the right way to do it is because there could be duplicates, e.g. if the user had something like "User/Folder1/hello/hello.csv". That's why I was thinking maybe just use string indices, since the file name I need will always end at [:-4]. Am I thinking about this the right way?
Solution:
import os
file = open('/some/path/to/a/test.csv')
fname = os.path.splitext(str(file))[0].split('/')[-1]
print(fname)
# test
If you get file path and name as string, then:
import os
file = "User/Folder1/test/filename.csv"
fname = os.path.splitext(file)[0].split('/')[-1]
print(fname)
# filename
Explanation on how it works:
Pay attention that command is os.path.splitEXT, not os.path.splitTEXT - very common mistake.
The command takes argument of type string, so if we use file = open(...), then we need to pass os.path.splitext argument of type string. Therefore in our first scenario we use:
str(file)
Now, this command splits complete file path + name string into two parts:
os.path.splitext(str(file))
# result:
['/some/path/to/a/test','csv']
In our case we only need first part, so we take it by specifying list index:
os.path.splitext(str(file))[0]
# result:
'/some/path/to/a/test'
Now, since we only need file name and not the whole path, we split it by /:
os.path.splitext(str(file))[0].split('/')
# result:
['some','path','to','a','test']
And out of this we only need one last element, or in other words, first from the end:
os.path.splitext(str(file)[0].split('/')[-1]
Hope this helps.
Check for more here: Extract file name from path, no matter what the os/path format

Splitting .txt file includes the .txt extension

So I have a lot of data files, which have a name similar to this:
lvh_GTV_TwoField-3-401-86.txt
The thing that changes from file to file is the number 86 and GTV.
I'm trying to use this code to distinguish between files:
f.split('-')[3]
This, if I'm not mistaken, should split the file at the -, and then the 3rd, which is 86. In my case I would really like to use int(f.split('-')[3]) because I need to reference it against another number, however, when splitting at the 3rd, the output is actually 86.txt or so, and therefore I can't it as an integer.
So my question is: How do I split the file, so I only the the value 86, and not the .txt extension along with it ?
Thanks in advance.
You may also use the os.path.splitext function to remove the extension:
import os
os.path.splitext(f)[0].split('-')[3]
Or, more verbosely,
base, ext = os.path.splitext(f)
base.split('-')[3]
Given that this is very controlled, you could splice the string resulting, so something like:
f.split('-')[3][:-4] # '86', take all chars except the last 4 (.txt)
Using PyPI package parse:
from parse import parse
parse("lvh_{}_TwoField-3-401-{:d}.txt", "lvh_GTV_TwoField-3-401-86.txt")[1]
# => 86 (as an int)
Using Python's build-in RegExp library:
import re
m = re.match(
"lvh_.+_TwoField-3-401-(?P<the_number>\d+)\.txt",
"lvh_GTV_TwoField-3-401-86.txt"
)
the_number = int(m.group('the_number'))

python tool to generate txt file by coping only directory/folder names but not the other file names

This is my
import os
filenames= os.listdir (".")
file = open("XML.txt", "w")
result = []
for filename in filenames:
result = "<capltestcase name =\""+filename+"\"\n"
file.write(result)
result = "title = \""+filename+"\"\n"
file.write(result)
result = "/>\n"
file.write(result)
file.close()
My Question /help needed
I want to add standard text ""
to the txt generated, but i cant add it, it says sytax errors can somebody help with code please.
2) how can i just copy foldernames from directory instead of file names , since with my code , it copies all file names in into txt.
Thank you frnds ..
file.write("\\")
use the escape () to write special characters
print("\\<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\\")
Rather than escaping all those double-quotes, why not embed your string inside single quotes instead? In Python (unlike many other languages) there is no difference between using single or double quotes, provided they are balanced (the same at each end).
If you need the backslashes in the text then use a raw string
file.write(r'"\<?xml version="1.0" encoding="iso-8859-1"?>\"')
That will preserve all the double-quotes and the back-slashes.

Categories

Resources