How to use filename.find()? - python

I am new to python, and am trying to understand a script that has the following lines:
dotInd = fileName.find(".")
if dotInd <> -1:
newFC = fileName[0:dotInd]
outFC = newFC + "_buffer"
else:
outFC = fileName + "_buffer"
I have not been able to find what fileName.find(".") is doing, and what the condition dotInd<>-1 means
(Confused about the <> thing)
Any help would be apreciated, also, is there a place where you cand find a list of what all python functions do? Thanks

fileName is an identifier, and refers to an object of type str. You are looking for str.find(). The method returns -1 if the sought-after text is not found, a position otherwise.
<> is an archaic and deprecated way of spelling !=, so it tests if the '.' has been found; if so, the returned position is used to slice the string, removing everything from the '.' to the end.
The code could be better written as:
outFC = fileName.partition('.')[0] + '_buffer'
which will result in the same output without str.find() and testing the output. See the str.partition() function documentation for more information.
It would be more correct still to use os.path.splitext() function to prevent splitting on a leading . (signifying a hidden file on POSIX systems):
import os.path
outFC = os.path.splitext(fileName)[0] + '_buffer'

Related

Issue using a variable with an r-string in Python

Fairly new to Python, and I've got a batch job that I now have to start saving some extracts from out to a company Sharepoint site. I've searched around and cannot seem to find a solution to the issue I keep running into. I need to pass a date into the filename, and was first having issues with using a normal string. If I just type out the entire thing as a raw string, I get the output I want:
x = r"\\mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts\2021-02-15_aRoute.xlsx"
print (x)
The output is: \mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts\2021-02-15_aRoute.xlsx
However, if I break the string into it's parts so I can get a parameter in there, I wind up having to toss an extra double-quote on the "x" parameter to keep the code from running into a "SyntaxError: EOL while scanning string literal" error:
x = r"\\mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts\""
timestamp = date_time_obj.date().strftime('%Y-%m-%d')
filename = "_aRoute.xlsx"
print (x + timestamp + filename)
But the output I get passes that unwanted double quote into my string: \mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts"2021-02-15_aRoute.xlsx
The syntax I need is clearly escaping me, I'm just trying to get the path built so I can save the file itself. If it happens to matter, I'm using pandas to write the file:
data = pandas.read_sql(sql, cnxn)
data.to_excel(string_goes_here)
Any help would be greatly appreciated!
Per the comment from #Matthias, as it turns out, an r-string can't end with a single backslash. The quick workaround, therefore, was:
x = r"\\mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts" + "\\"
The comment from #sammywemmy also linked to what looks to be a much more thorough solution.
Thank you both!

Select the second specific string if found more than one in a string variable [duplicate]

I'm looking for a simple method of identifying the last position of a string inside another string ... for instance. If I had: file = C:\Users\User\Desktop\go.py
and I wanted to crop this so that file = go.py
Normally I would have to run C:\Users\User\Desktop\go.py through a loop + find statement, and Evey time it encountered a \ it would ask ... is the the last \ in the string? ... Once I found the last \ I would then file = file[last\:len(file)]
I'm curious to know if there is a faster neater way to do this.. preferably without a loop.
Something like file = [file('\',last):len(file)]
If there is nothing like what I've shown above ... then can we place the loop inside the [:] somehow. Something like file = [for i in ...:len(file)]
thanks :)
If it is only about file paths, you can use os.path.basename:
>>> import os
>>> os.path.basename(file)
'go.py'
Or if you are not running the code on Windows, you have to use ntpath instead of os.path.
You could split the string into a list then get the last index of the list.
Sample:
>>> file = 'C:\Users\User\Desktop\go.py'
>>> print(file.split('\\')[-1])
go.py
I agree with Felix on that file paths should be handled using os.path.basename. However, you might want to have a look at the built in string function rpartition.
>>> file = 'C:\Users\User\Desktop\go.py'
>>> before, separator, after = file.rpartition('\\')
>>> before
'C:\\Users\\User\\Desktop'
>>> separator
'\\'
>>> after
'go.py'
There's also the rfind function which gives you the last index of a substring.
>>> file.rfind('\\')
21
I realize that I'm a bit late to the party, but since this is one of the top results when searching for e.g. "find last in str python" on Google, I think it might help someone to add this information.
For the general purpose case (as the OP said they like the generalisation of the split solution)...... try the rfind(str) function.
"myproject-version2.4.5.customext.zip".rfind(".")
edit: apologies, I hadn't realized how old this thread was... :-/
For pathname manipulations you want to be using os.path.
For this specific problem you want to use the os.path.basename(path) function which will return the last component of a path, or an empty string if the path ends in a slash (ie. the path of a folder rather than a file).
import os.path
print os.path.basename("C:\Users\User\Desktop\go.py")
Gives:
go.py

Control order of pathlib and string concatenation

I have a directory I want to save files to, saved as a Path object called dir. I want to autogenerate files names at that path using string concatenation.
The only way I can get this to work in a single line is just through string concatenation:
dir = Path('./Files')
constantString = 'FileName'
changingString = '_001'
path2newfile = dir.as_posix() + '/' + constantString + changingString
print(path2newfile) # ./Files/Filename_001
... which is overly verbose and not platform independent.
What I'd want to do is use pathlib's / operator for easy manipulation of the new file path that is also platform independent. This would require ensuring that the string concatenation happens first, but the only way I know how to do that is to set a (pointless) variable:
filename = constantString + changingString
path2newfile = dir / filename
But I honestly don't see why this should have to take two lines.
If I instead assume use "actual" strings (ie. not variables containing strings), I can do something like this:
path2newfile = dir / 'Filename' '_001'
But this doesn't work with variables.
path2newfile = dir / constantString changingString
# SyntaxError: invalid syntax
So I think the base question is how do I control the order of operators in python? Or at least make the concatenation operator + act before the Path operator /.
Keep in mind this is a MWE. My actual problem is a bit more complicated and has to be repeated several times in the code.
Just use parentheses surrounding your string contatenation:
path2newfile = dir / (constantString + changingString)
Have you considered using Python f-strings?
It seems like your real-world example has a "template-y" feel to it, so something like:
path / f"constant part {variable_part}"
may work.
Use os.path.join().
It's both platform-independent and you can plug the desired path parts as arguments.

Function creation - "Undefined name" - Python

I'm writing some code that reads words from a text file and sorts them into a dictionary. It actually all runs fine, but for reference here it is:
def find_words(file_name, delimiter = " "):
"""
A function for finding the number of individual words, and the most popular words, in a given file.
The process will stop at any line in the file that starts with the word 'finish'.
If there is no finish point, the process will go to the end of the file.
Inputs: file_name: Name of file you want to read from, e.g. "mywords.txt"
delimiter: The way the words in the file are separated e.g. " " or ", "
: Delimiter will default to " " if left blank.
Output: Dictionary with all the words contained in the given file, and how many times each word appears.
"""
words = []
dictt = {}
with open(file_name, 'r') as wordfile:
for line in wordfile:
words = line.split(delimiter)
if words[0]=="finish":
break
# This next part is for filling the dictionary
# and correctly counting the amount of times each word appears.
for i in range(len(words)):
a = words[i]
if a=="\n" or a=="":
continue
elif dictt.has_key(a)==False:
dictt[words[i]] = 1
else:
dictt[words[i]] = int(dictt.get(a)) + 1
return dictt
The problem is that it only works if the arguments are given as string literals, e.g, this works:
test = find_words("hello.txt", " " )
But this doesn't:
test = find_words(hello.txt, )
The error message is undefined name 'hello'
I don't know how to alter the function arguments such that I can enter them without speech marks.
Thanks!
Simple, you define that name:
class hello:
txt = "hello.txt"
But joking aside, all the argument values in a function call are expressions. If you want to pass a string literally you'll have to make a string literal, using the quotes. Python is not a text preprocessor like m4 or cpp, and expects the entire program text to follow its syntax.
So it turns out I just misunderstood what was being asked. I've had it clarified by the course leader now.
As I am now fully aware, a function definition needs to be told when a string is being entered, hence the quote marks being required.
I admit full ignorance over my depth of understanding of how it all works - I thought you could pretty much put any assortment of letters and/or numbers in as an argument and then you can manipulate them within the function definition.
My ignorance may stem from the fact that I'm quite new to Python, having learned my coding basics on C++ where, if I remember correctly (it was well over a year ago), functions are defined with each argument being specifically set up as their type, e.g.
int max(int num1, int num2)
Whereas in Python you don't quite do it like that.
Thanks for the attempts at help (and ridicule!)
Problem is sorted now.

Question about paths in Python

let's say i have directory paths looking like this:
this/is/the/basedir/path/a/include
this/is/the/basedir/path/b/include
this/is/the/basedir/path/a
this/is/the/basedir/path/b
In Python, how can i split these paths up so they will look like this instead:
a/include
b/include
a
b
If i run os.path.split(path)[1] it will display:
include
include
a
b
What should i be trying out here, should i be looking at some regex command or can this be done without it? Thanks in advance.
EDIT ALL: I solved it using regular expressions, damn handy tool :)
Perhaps something like this, depends on how hardcoded your prefix is:
def removePrefix(path, prefix):
plist = path.split(os.sep)
pflist = prefix.split(os.sep)
rest = plist[len(pflist):]
return os.path.join(*rest)
Usage:
print removePrefix("this/is/the/basedir/path/b/include", "this/is/the/basedir/path")
b/include
Assuming you're on a platform where the directory separator (os.sep) really is the forward slash).
This code tries to handle paths as something a little more high-level than mere strings. It's not optimal though, you could (or should) do more cleaning and canonicalization to be safer.
Maybe something like this:
result = []
prefix = os.path.commonprefix(list_of_paths)
for path in list_of_paths:
result.append(os.path.relpath(path, prefix))
This works only in 2.6. The relapath in 2.5 and before does the work only in case the path is the current working directory.
what about partition?
It Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.
data = """this/is/the/basedir/path/a/include
this/is/the/basedir/path/b/include
this/is/the/basedir/path/a
this/is/the/basedir/path/b"""
for line in data.splitlines():
print line.partition("this/is/the/basedir/path/")[2]
#output
a/include
b/include
a
b
Updated for the new comment by author:
It looks like u need rsplit for different directories by whether the directory endswith "include" of not:
import os.path
data = """this/is/the/basedir/path/a/include
this/is/the/basedir/path/b/include
this/is/the/basedir/path/a
this/is/the/basedir/path/b"""
for line in data.splitlines():
if line.endswith('include'):
print '/'.join(line.rsplit("/",2)[-2:])
else:
print os.path.split(line)[1]
#or just
# print line.rsplit("/",1)[-1]
#output
a/include
b/include
a
b
While the criterion is not 100% clear, it seems from the OP's comment that the key issue is specifically whether the path's last component ends in "include". If that is the case, and to avoid going wrong when the last component is e.g. "dontinclude" (as another answer does by trying string matching instead of path matching), I suggest:
def lastpart(apath):
pieces = os.path.split(apath)
final = -1
if pieces[-1] == 'include':
final = -2
return '/'.join(pieces[final:])

Categories

Resources