java programmer learning python: string split and join - python

Can someone explain to me why in python when we want to join a string we write:
'delim'.join(list)
and when we want to split a string we write:
str.split('delim')
coming from java it seems that one of these is backwards because in java we write:
//split:
str.split('delim');
//join
list.join('delim');
edit:
you are right. join takes a list. (though it doesnt change the question)
Can someone explain to me the rationale behind this API?

Join only makes sense when joining some sort of iterable. However, since iterables don't necessarily contain all strings, putting join as a method on an iterable doesn't make sense. (what would you expect the result of [1,"baz",my_custom_object,my_list].join("foo") to be?) The only other place to put it is as a string method with the understanding that everything in the iterable is going to be a string. Additionally, putting join as a string method allows it to be used with any iterable -- tuples, lists, generators, custom objects which support iteration or even strings.
Also note that you are completely free to split a string in the same way that you join it:
list_of_strings='this, is , a, string, separated, by , commas.'.split(',')
Of course, the utility here isn't quite as easy to see.

http://docs.python.org/faq/design.html#why-is-join-a-string-method-instead-of-a-list-or-tuple-method
From the source:
join() is a string method because in using it you are telling the
separator string to iterate over a sequence of strings and insert
itself between adjacent elements. This method can be used with any
argument which obeys the rules for sequence objects, including any new
classes you might define yourself.
Because this is a string method it can work for Unicode strings as
well as plain ASCII strings. If join() were a method of the sequence
types then the sequence types would have to decide which type of
string to return depending on the type of the separator.

So that you don't have to reimplement the join operation for every sequence type you create. Python uses protocols, not types, as its main behavioral pattern, and so every sequence can be expected to act the same even though they don't derive from an existing sequence class.

Related

Concatenate string and list to create a list of paths [duplicate]

I would like to put an int into a string. This is what I am doing at the moment:
num = 40
plot.savefig('hanning40.pdf') #problem line
I have to run the program for several different numbers, so I'd like to do a loop. But inserting the variable like this doesn't work:
plot.savefig('hanning', num, '.pdf')
How do I insert a variable into a Python string?
See also
If you tried using + to concatenate a number with a string (or between strings, etc.) and got an error message, see How can I concatenate str and int objects?.
If you are trying to assemble a URL with variable data, do not use ordinary string formatting, because it is error-prone and more difficult than necessary. Specialized tools are available. See Add params to given URL in Python.
If you are trying to assemble a SQL query, do not use ordinary string formatting, because it is a major security risk. This is the cause of "SQL injection" which costs real companies huge amounts of money every year. See for example Python: best practice and securest way to connect to MySQL and execute queries for proper techniques.
If you just want to print (output) the string, you can prepare it this way first, or if you don't need the string for anything else, print each piece of the output individually using a single call to print. See How can I print multiple things (fixed text and/or variable values) on the same line, all at once? for details on both approaches.
Using f-strings:
plot.savefig(f'hanning{num}.pdf')
This was added in 3.6 and is the new preferred way.
Using str.format():
plot.savefig('hanning{0}.pdf'.format(num))
String concatenation:
plot.savefig('hanning' + str(num) + '.pdf')
Conversion Specifier:
plot.savefig('hanning%s.pdf' % num)
Using local variable names (neat trick):
plot.savefig('hanning%(num)s.pdf' % locals())
Using string.Template:
plot.savefig(string.Template('hanning${num}.pdf').substitute(locals()))
See also:
Fancier Output Formatting - The Python Tutorial
Python 3's f-Strings: An Improved String Formatting Syntax (Guide) - RealPython
With the introduction of formatted string literals ("f-strings" for short) in Python 3.6, it is now possible to write this with a briefer syntax:
>>> name = "Fred"
>>> f"He said his name is {name}."
'He said his name is Fred.'
With the example given in the question, it would look like this
plot.savefig(f'hanning{num}.pdf')
plot.savefig('hanning(%d).pdf' % num)
The % operator, when following a string, allows you to insert values into that string via format codes (the %d in this case). For more details, see the Python documentation:
printf-style String Formatting
You can use + as the normal string concatenation function as well as str().
"hello " + str(10) + " world" == "hello 10 world"
In general, you can create strings using:
stringExample = "someString " + str(someNumber)
print(stringExample)
plot.savefig(stringExample)
If you would want to put multiple values into the string you could make use of format
nums = [1,2,3]
plot.savefig('hanning{0}{1}{2}.pdf'.format(*nums))
Would result in the string hanning123.pdf. This can be done with any array.
Special cases
Depending on why variable data is being used with strings, the general-purpose approaches may not be appropriate.
If you need to prepare an SQL query
Do not use any of the usual techniques for assembling a string. Instead, use your SQL library's functionality for parameterized queries.
A query is code, so it should not be thought about like normal text. Using the library will make sure that any inserted text is properly escaped. If any part of the query could possibly come from outside the program in any way, that is an opportunity for a malevolent user to perform SQL injection. This is widely considered one of the important computer security problems, costing real companies huge amounts of money every year and causing problems for countless customers. Even if you think you know the data is "safe", there is no real upside to using any other approach.
The syntax will depend on the library you are using and is outside the scope of this answer.
If you need to prepare a URL query string
See Add params to given URL in Python. Do not do it yourself; there is no practical reason to make your life harder.
Writing to a file
While it's possible to prepare a string ahead of time, it may be simpler and more memory efficient to just write each piece of data with a separate .write call. Of course, non-strings will still need to be converted to string before writing, which may complicate the code. There is not a one-size-fits-all answer here, but choosing badly will generally not matter very much.
If you are simply calling print
The built-in print function accepts a variable number of arguments, and can take in any object and stringify it using str. Before trying string formatting, consider whether simply passing multiple arguments will do what you want. (You can also use the sep keyword argument to control spacing between the arguments.)
# display a filename, as an example
print('hanning', num, '.pdf', sep='')
Of course, there may be other reasons why it is useful for the program to assemble a string; so by all means do so where appropriate.
It's important to note that print is a special case. The only functions that work this way are ones that are explicitly written to work this way. For ordinary functions and methods, like input, or the savefig method of Matplotlib plots, we need to prepare a string ourselves.
Concatenation
Python supports using + between two strings, but not between strings and other types. To work around this, we need to convert other values to string explicitly: 'hanning' + str(num) + '.pdf'.
Template-based approaches
Most ways to solve the problem involve having some kind of "template" string that includes "placeholders" that show where information should be added, and then using some function or method to add the missing information.
f-strings
This is the recommended approach when possible. It looks like f'hanning{num}.pdf'. The names of variables to insert appear directly in the string. It is important to note that there is not actually such a thing as an "f-string"; it's not a separate type. Instead, Python will translate the code ahead of time:
>>> def example(num):
... return f'hanning{num}.pdf'
...
>>> import dis
>>> dis.dis(example)
2 0 LOAD_CONST 1 ('hanning')
2 LOAD_FAST 0 (num)
4 FORMAT_VALUE 0
6 LOAD_CONST 2 ('.pdf')
8 BUILD_STRING 3
10 RETURN_VALUE
Because it's a special syntax, it can access opcodes that aren't used in other approaches.
str.format
This is the recommended approach when f-strings aren't possible - mainly, because the template string needs to be prepared ahead of time and filled in later. It looks like 'hanning{}.pdf'.format(num), or 'hanning{num}.pdf'.format(num=num)'. Here, format is a method built in to strings, which can accept arguments either by position or keyword.
Particularly for str.format, it's useful to know that the built-in locals, globals and vars functions return dictionaries that map variable names to the contents of those variables. Thus, rather than something like '{a}{b}{c}'.format(a=a, b=b, c=c), we can use something like '{a}{b}{c}'.format(**locals()), unpacking the locals() dict.
str.format_map
This is a rare variation on .format. It looks like 'hanning{num}.pdf'.format_map({'num': num}). Rather than accepting keyword arguments, it accepts a single argument which is a mapping.
That probably doesn't sound very useful - after all, rather than 'hanning{num}.pdf'.format_map(my_dict), we could just as easily write 'hanning{num}.pdf'.format(**my_dict). However, this is useful for mappings that determine values on the fly, rather than ordinary dicts. In these cases, unpacking with ** might not work, because the set of keys might not be determined ahead of time; and trying to unpack keys based on the template is unwieldy (imagine: 'hanning{num}.pdf'.format(num=my_mapping[num]), with a separate argument for each placeholder).
string.Formatter
The string standard library module contains a rarely used Formatter class. Using it looks like string.Formatter().format('hanning{num}.pdf', num=num). The template string uses the same syntax again. This is obviously clunkier than just calling .format on the string; the motivation is to allow users to subclass Formatter to define a different syntax for the template string.
All of the above approaches use a common "formatting language" (although string.Formatter allows changing it); there are many other things that can be put inside the {}. Explaining how it works is beyond the scope of this answer; please consult the documentation. Do keep in mind that literal { and } characters need to be escaped by doubling them up. The syntax is presumably inspired by C#.
The % operator
This is a legacy way to solve the problem, inspired by C and C++. It has been discouraged for a long time, but is still supported. It looks like 'hanning%s.pdf' % num, for simple cases. As you'd expect, literal '%' symbols in the template need to be doubled up to escape them.
It has some issues:
It seems like the conversion specifier (the letter after the %) should match the type of whatever is being interpolated, but that's not actually the case. Instead, the value is converted to the specified type, and then to string from there. This isn't normally necessary; converting directly to string works most of the time, and converting to other types first doesn't help most of the rest of the time. So 's' is almost always used (unless you want the repr of the value, using 'r'). Despite that, the conversion specifier is a mandatory part of the syntax.
Tuples are handled specially: passing a tuple on the right-hand side is the way to provide multiple arguments. This is an ugly special case that's necessary because we aren't using function-call syntax. As a result, if you actually want to format a tuple into a single placeholder, it must be wrapped in a 1-tuple.
Other sequence types are not handled specially, and the different behaviour can be a gotcha.
string.Template
The string standard library module contains a rarely used Template class. Instances provide substitute and safe_substitute methods that work similarly to the built-in .format (safe_substitute will leave placeholders intact rather than raising an exception when the arguments don't match). This should also be considered a legacy approach to the problem.
It looks like string.Template('hanning$num.pdf').substitute(num=num), and is inspired by traditional Perl syntax. It's obviously clunkier than the .format approach, since a separate class has to be used before the method is available. Braces ({}) can be used optionally around the name of the variable, to avoid ambiguity. Similarly to the other methods, literal '$' in the template needs to be doubled up for escaping.
I had a need for an extended version of this: instead of embedding a single number in a string, I needed to generate a series of file names of the form 'file1.pdf', 'file2.pdf' etc. This is how it worked:
['file' + str(i) + '.pdf' for i in range(1,4)]
You can make dict and substitute variables in your string.
var = {"name": "Abdul Jalil", "age": 22}
temp_string = "My name is %(name)s. I am %(age)s years old." % var

How many values can be returned from a function?

In python you can do this:
def myFunction():
return "String", 5.5, True, 11
val1, val2, val3, val4 = myFunction()
I argue that this is a function which returns 4 values, however my python instructor says I am wrong and that this only returns one tuple.
Personally I think this is a distinction without a difference because there is no indication that these four values are converted into a tuple and then deconstructed as 4 values. I am unaware of how this is any different than the same type of construct in languages like JavaScript.
Am I right?
I'll take a stab at this, since I think the issue here is a lack of defining terms properly. To be completely pedantic about it, the correct answer to the question as worded is "zero". Python doesn't return a value; it returns an object, and an object is not the same thing as a value. Going back to basics on this one:
https://docs.python.org/3/reference/datamodel.html#objects-values-and-types
Objects are Python’s abstraction for data. All data in a Python
program is represented by objects or by relations between objects.
also:
Every object has an identity, a type and a value.
A value, as defined above, is something different than an object. Functions return objects and not values, so the answer to the question (as asked, if taken to literal extremes) is zero. If the question had been, "How many objects are returned to the caller by this function?" then the answer would be one. This is why defining terms is important, and why vague questions generate multiple (potentially correct) answers. In another sense, the correct answer to the question would be five, because there's five things one might think of as "values" coming back from this function. There's a tuple, and there's the four items inside the tuple. In still another sense, the answer is four (as you've said) because the code flat out says return and then has four values afterwards.
So really, you're both right, and you're both wrong, but only because the question isn't sufficiently clear as to what it wants to know. The instructor is likely trying to put forth the idea that Python returns single objects, which may contain multiple other objects. This is important to know, because it contributes to Python's flexibility when passing data around. I'm not so sure the way the instructor worded it is achieving that goal, but I'm also not present in the class, so it's tough to say. Ideally, instruction should cover neurodiverse ways of understanding, but I'll save that soapbox for a different discussion.
Let's distill it like this in hopes of providing a clear summary. Python doesn't return values, it returns objects. To that end, a function can only return one object. That object can contain multiple values, or even refer to other objects, so while your function can pass back multiple values to the caller, it must do so inside of a single object. "Objects" are the internal unit of data inside of Python, and are distinctly different from the "value" contained in the object, so it's a good practice in Python to always keep in mind the distinction between the two and how they're used, regardless of how the question is worded.
Your instructor is correct.
https://docs.python.org/3/reference/datamodel.html#objects-values-and-types
Tuples
The items of a tuple are arbitrary Python objects. Tuples of two or more items are formed by comma-separated lists of expressions. A tuple of one item (a ‘singleton’) can be formed by affixing a comma to an expression (an expression by itself does not create a tuple, since parentheses must be usable for grouping of expressions). An empty tuple can be formed by an empty pair of parentheses.
Applied to your example:
foo = "String", 5.5, True, 11
assert isinstance(foo, tuple)
So you're clearly returning a single object.
Here's how the assignment expression is specified when there are multiple targets (variables on the left hand side):
https://docs.python.org/3/reference/simple_stmts.html#assignment-statements
Assignment of an object to a target list, optionally enclosed in parentheses or square brackets, is recursively defined as follows.
If the target list is a single target with no trailing comma, optionally in parentheses, the object is assigned to that target.
Else: The object must be an iterable with the same number of items as there are targets in the target list, and the items are assigned, from left to right, to the corresponding targets.
(All emphasis mine)

Python opposite of eval()

Is it possible to do the opposite of eval() in python which treats inputs as literal strings? For example f(Node(2)) -> 'Node(2)' or if there's some kind of higher-order operator that stops interpreting the input like lisp's quote() operator.
The answer to does Python have quote is no, Python does not have quote. There's no homoiconicity, and no native infrastructure to represent the languages own source code.
Now for what is the opposite of eval: you seem to be missing some part the picture here. Python's eval is not the opposite of quote. There's three types values can be represented as: as unparsed strings, as expressions, and as values (of any type).
quote and LISP-style "eval" convert between expressions and values.
Parsing and pretty printing form another pair between expressions and strings.
Python's actual eval goes directly from strings to values, and repr goes back (in the sense of printing a parsable value1).
Since SO does not support drawing diagrams:
So if you are looking for the opposite of eval, it's either trivially repr, but that might not really help you, or it would be the composition of quote and pretty priniting, if there were quoting in Python.
This composition with string functions a bit cumbersome, which is why LISPs let the reader deal with parsing once and for all, and from that point on only go between expressions and values. I don't know any language that has such a "quote + print" function, actually (C#'s nameof comes close, but is restricted to names).
Some caveats about the diagram:
It does not commute: you have deal with pure values or disregard side effects, for one, and most importantly quote is not really a function in some sense, of course. It's a special form. That's why repr(x) is not the same as prettyprint(quote(x)), in general.
It's not depicting pairs of isomorphisms. Multiple expressions can eval to the same value etc.
1That's not always the case in reality, but that's what it's suppsedly there for.

How to include more than one argument in raw_input()? [duplicate]

I would like to put an int into a string. This is what I am doing at the moment:
num = 40
plot.savefig('hanning40.pdf') #problem line
I have to run the program for several different numbers, so I'd like to do a loop. But inserting the variable like this doesn't work:
plot.savefig('hanning', num, '.pdf')
How do I insert a variable into a Python string?
See also
If you tried using + to concatenate a number with a string (or between strings, etc.) and got an error message, see How can I concatenate str and int objects?.
If you are trying to assemble a URL with variable data, do not use ordinary string formatting, because it is error-prone and more difficult than necessary. Specialized tools are available. See Add params to given URL in Python.
If you are trying to assemble a SQL query, do not use ordinary string formatting, because it is a major security risk. This is the cause of "SQL injection" which costs real companies huge amounts of money every year. See for example Python: best practice and securest way to connect to MySQL and execute queries for proper techniques.
If you just want to print (output) the string, you can prepare it this way first, or if you don't need the string for anything else, print each piece of the output individually using a single call to print. See How can I print multiple things (fixed text and/or variable values) on the same line, all at once? for details on both approaches.
Using f-strings:
plot.savefig(f'hanning{num}.pdf')
This was added in 3.6 and is the new preferred way.
Using str.format():
plot.savefig('hanning{0}.pdf'.format(num))
String concatenation:
plot.savefig('hanning' + str(num) + '.pdf')
Conversion Specifier:
plot.savefig('hanning%s.pdf' % num)
Using local variable names (neat trick):
plot.savefig('hanning%(num)s.pdf' % locals())
Using string.Template:
plot.savefig(string.Template('hanning${num}.pdf').substitute(locals()))
See also:
Fancier Output Formatting - The Python Tutorial
Python 3's f-Strings: An Improved String Formatting Syntax (Guide) - RealPython
With the introduction of formatted string literals ("f-strings" for short) in Python 3.6, it is now possible to write this with a briefer syntax:
>>> name = "Fred"
>>> f"He said his name is {name}."
'He said his name is Fred.'
With the example given in the question, it would look like this
plot.savefig(f'hanning{num}.pdf')
plot.savefig('hanning(%d).pdf' % num)
The % operator, when following a string, allows you to insert values into that string via format codes (the %d in this case). For more details, see the Python documentation:
printf-style String Formatting
You can use + as the normal string concatenation function as well as str().
"hello " + str(10) + " world" == "hello 10 world"
In general, you can create strings using:
stringExample = "someString " + str(someNumber)
print(stringExample)
plot.savefig(stringExample)
If you would want to put multiple values into the string you could make use of format
nums = [1,2,3]
plot.savefig('hanning{0}{1}{2}.pdf'.format(*nums))
Would result in the string hanning123.pdf. This can be done with any array.
Special cases
Depending on why variable data is being used with strings, the general-purpose approaches may not be appropriate.
If you need to prepare an SQL query
Do not use any of the usual techniques for assembling a string. Instead, use your SQL library's functionality for parameterized queries.
A query is code, so it should not be thought about like normal text. Using the library will make sure that any inserted text is properly escaped. If any part of the query could possibly come from outside the program in any way, that is an opportunity for a malevolent user to perform SQL injection. This is widely considered one of the important computer security problems, costing real companies huge amounts of money every year and causing problems for countless customers. Even if you think you know the data is "safe", there is no real upside to using any other approach.
The syntax will depend on the library you are using and is outside the scope of this answer.
If you need to prepare a URL query string
See Add params to given URL in Python. Do not do it yourself; there is no practical reason to make your life harder.
Writing to a file
While it's possible to prepare a string ahead of time, it may be simpler and more memory efficient to just write each piece of data with a separate .write call. Of course, non-strings will still need to be converted to string before writing, which may complicate the code. There is not a one-size-fits-all answer here, but choosing badly will generally not matter very much.
If you are simply calling print
The built-in print function accepts a variable number of arguments, and can take in any object and stringify it using str. Before trying string formatting, consider whether simply passing multiple arguments will do what you want. (You can also use the sep keyword argument to control spacing between the arguments.)
# display a filename, as an example
print('hanning', num, '.pdf', sep='')
Of course, there may be other reasons why it is useful for the program to assemble a string; so by all means do so where appropriate.
It's important to note that print is a special case. The only functions that work this way are ones that are explicitly written to work this way. For ordinary functions and methods, like input, or the savefig method of Matplotlib plots, we need to prepare a string ourselves.
Concatenation
Python supports using + between two strings, but not between strings and other types. To work around this, we need to convert other values to string explicitly: 'hanning' + str(num) + '.pdf'.
Template-based approaches
Most ways to solve the problem involve having some kind of "template" string that includes "placeholders" that show where information should be added, and then using some function or method to add the missing information.
f-strings
This is the recommended approach when possible. It looks like f'hanning{num}.pdf'. The names of variables to insert appear directly in the string. It is important to note that there is not actually such a thing as an "f-string"; it's not a separate type. Instead, Python will translate the code ahead of time:
>>> def example(num):
... return f'hanning{num}.pdf'
...
>>> import dis
>>> dis.dis(example)
2 0 LOAD_CONST 1 ('hanning')
2 LOAD_FAST 0 (num)
4 FORMAT_VALUE 0
6 LOAD_CONST 2 ('.pdf')
8 BUILD_STRING 3
10 RETURN_VALUE
Because it's a special syntax, it can access opcodes that aren't used in other approaches.
str.format
This is the recommended approach when f-strings aren't possible - mainly, because the template string needs to be prepared ahead of time and filled in later. It looks like 'hanning{}.pdf'.format(num), or 'hanning{num}.pdf'.format(num=num)'. Here, format is a method built in to strings, which can accept arguments either by position or keyword.
Particularly for str.format, it's useful to know that the built-in locals, globals and vars functions return dictionaries that map variable names to the contents of those variables. Thus, rather than something like '{a}{b}{c}'.format(a=a, b=b, c=c), we can use something like '{a}{b}{c}'.format(**locals()), unpacking the locals() dict.
str.format_map
This is a rare variation on .format. It looks like 'hanning{num}.pdf'.format_map({'num': num}). Rather than accepting keyword arguments, it accepts a single argument which is a mapping.
That probably doesn't sound very useful - after all, rather than 'hanning{num}.pdf'.format_map(my_dict), we could just as easily write 'hanning{num}.pdf'.format(**my_dict). However, this is useful for mappings that determine values on the fly, rather than ordinary dicts. In these cases, unpacking with ** might not work, because the set of keys might not be determined ahead of time; and trying to unpack keys based on the template is unwieldy (imagine: 'hanning{num}.pdf'.format(num=my_mapping[num]), with a separate argument for each placeholder).
string.Formatter
The string standard library module contains a rarely used Formatter class. Using it looks like string.Formatter().format('hanning{num}.pdf', num=num). The template string uses the same syntax again. This is obviously clunkier than just calling .format on the string; the motivation is to allow users to subclass Formatter to define a different syntax for the template string.
All of the above approaches use a common "formatting language" (although string.Formatter allows changing it); there are many other things that can be put inside the {}. Explaining how it works is beyond the scope of this answer; please consult the documentation. Do keep in mind that literal { and } characters need to be escaped by doubling them up. The syntax is presumably inspired by C#.
The % operator
This is a legacy way to solve the problem, inspired by C and C++. It has been discouraged for a long time, but is still supported. It looks like 'hanning%s.pdf' % num, for simple cases. As you'd expect, literal '%' symbols in the template need to be doubled up to escape them.
It has some issues:
It seems like the conversion specifier (the letter after the %) should match the type of whatever is being interpolated, but that's not actually the case. Instead, the value is converted to the specified type, and then to string from there. This isn't normally necessary; converting directly to string works most of the time, and converting to other types first doesn't help most of the rest of the time. So 's' is almost always used (unless you want the repr of the value, using 'r'). Despite that, the conversion specifier is a mandatory part of the syntax.
Tuples are handled specially: passing a tuple on the right-hand side is the way to provide multiple arguments. This is an ugly special case that's necessary because we aren't using function-call syntax. As a result, if you actually want to format a tuple into a single placeholder, it must be wrapped in a 1-tuple.
Other sequence types are not handled specially, and the different behaviour can be a gotcha.
string.Template
The string standard library module contains a rarely used Template class. Instances provide substitute and safe_substitute methods that work similarly to the built-in .format (safe_substitute will leave placeholders intact rather than raising an exception when the arguments don't match). This should also be considered a legacy approach to the problem.
It looks like string.Template('hanning$num.pdf').substitute(num=num), and is inspired by traditional Perl syntax. It's obviously clunkier than the .format approach, since a separate class has to be used before the method is available. Braces ({}) can be used optionally around the name of the variable, to avoid ambiguity. Similarly to the other methods, literal '$' in the template needs to be doubled up for escaping.
I had a need for an extended version of this: instead of embedding a single number in a string, I needed to generate a series of file names of the form 'file1.pdf', 'file2.pdf' etc. This is how it worked:
['file' + str(i) + '.pdf' for i in range(1,4)]
You can make dict and substitute variables in your string.
var = {"name": "Abdul Jalil", "age": 22}
temp_string = "My name is %(name)s. I am %(age)s years old." % var

Why is it string.join(list) instead of list.join(string)?

This has always confused me. It seems like this would be nicer:
["Hello", "world"].join("-")
Than this:
"-".join(["Hello", "world"])
Is there a specific reason it is like this?
It's because any iterable can be joined (e.g, list, tuple, dict, set), but its contents and the "joiner" must be strings.
For example:
'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'
Using something other than strings will raise the following error:
TypeError: sequence item 0: expected str instance, int found
This was discussed in the String methods... finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str methods including join) was released in Oct 2000.
There were four options proposed in this thread:
str.join(seq)
seq.join(str)
seq.reduce(str)
join as a built-in function
Guido wanted to support not only lists and tuples, but all sequences/iterables.
seq.reduce(str) is difficult for newcomers.
seq.join(str) introduces unexpected dependency from sequences to str/unicode.
join() as a free-standing built-in function would support only specific data types. So using a built-in namespace is not good. If join() were to support many data types, creating an optimized implementation would be difficult: if implemented using the __add__ method then it would be O(n²).
The separator string (sep) should not be omitted. Explicit is better than implicit.
Here are some additional thoughts (my own, and my friend's):
Unicode support was coming, but it was not final. At that time UTF-8 was the most likely about to replace UCS-2/-4. To calculate total buffer length for UTF-8 strings, the method needs to know the character encoding.
At that time, Python had already decided on a common sequence interface rule where a user could create a sequence-like (iterable) class. But Python didn't support extending built-in types until 2.2. At that time it was difficult to provide basic iterable class (which is mentioned in another comment).
Guido's decision is recorded in a historical mail, deciding on str.join(seq):
Funny, but it does seem right! Barry, go for it...
Guido van Rossum
Because the join() method is in the string class, instead of the list class.
See http://www.faqs.org/docs/diveintopython/odbchelper_join.html:
Historical note. When I first learned
Python, I expected join to be a method
of a list, which would take the
delimiter as an argument. Lots of
people feel the same way, and there’s
a story behind the join method. Prior
to Python 1.6, strings didn’t have all
these useful methods. There was a
separate string module which contained
all the string functions; each
function took a string as its first
argument. The functions were deemed
important enough to put onto the
strings themselves, which made sense
for functions like lower, upper, and
split. But many hard-core Python
programmers objected to the new join
method, arguing that it should be a
method of the list instead, or that it
shouldn’t move at all but simply stay
a part of the old string module (which
still has lots of useful stuff in it).
I use the new join method exclusively,
but you will see code written either
way, and if it really bothers you, you
can use the old string.join function
instead.
--- Mark Pilgrim, Dive into Python
I agree that it's counterintuitive at first, but there's a good reason. Join can't be a method of a list because:
it must work for different iterables too (tuples, generators, etc.)
it must have different behavior between different types of strings.
There are actually two join methods (Python 3.0):
>>> b"".join
<built-in method join of bytes object at 0x00A46800>
>>> "".join
<built-in method join of str object at 0x00A28D40>
If join was a method of a list, then it would have to inspect its arguments to decide which one of them to call. And you can't join byte and str together, so the way they have it now makes sense.
Why is it string.join(list) instead of list.join(string)?
This is because join is a "string" method! It creates a string from any iterable. If we stuck the method on lists, what about when we have iterables that aren't lists?
What if you have a tuple of strings? If this were a list method, you would have to cast every such iterator of strings as a list before you could join the elements into a single string! For example:
some_strings = ('foo', 'bar', 'baz')
Let's roll our own list join method:
class OurList(list):
def join(self, s):
return s.join(self)
And to use it, note that we have to first create a list from each iterable to join the strings in that iterable, wasting both memory and processing power:
>>> l = OurList(some_strings) # step 1, create our list
>>> l.join(', ') # step 2, use our list join method!
'foo, bar, baz'
So we see we have to add an extra step to use our list method, instead of just using the builtin string method:
>>> ' | '.join(some_strings) # a single step!
'foo | bar | baz'
Performance Caveat for Generators
The algorithm Python uses to create the final string with str.join actually has to pass over the iterable twice, so if you provide it a generator expression, it has to materialize it into a list first before it can create the final string.
Thus, while passing around generators is usually better than list comprehensions, str.join is an exception:
>>> import timeit
>>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i)))
3.839168446022086
>>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i])))
3.339879313018173
Nevertheless, the str.join operation is still semantically a "string" operation, so it still makes sense to have it on the str object than on miscellaneous iterables.
Think of it as the natural orthogonal operation to split.
I understand why it is applicable to anything iterable and so can't easily be implemented just on list.
For readability, I'd like to see it in the language but I don't think that is actually feasible - if iterability were an interface then it could be added to the interface but it is just a convention and so there's no central way to add it to the set of things which are iterable.
- in "-".join(my_list) declares that you are converting to a string from joining elements a list.It's result-oriented. (just for easy memory and understanding)
I made an exhaustive cheatsheet of methods_of_string for your reference.
string_methods_44 = {
'convert': ['join','split', 'rsplit','splitlines', 'partition', 'rpartition'],
'edit': ['replace', 'lstrip', 'rstrip', 'strip'],
'search': ['endswith', 'startswith', 'count', 'index', 'find','rindex', 'rfind',],
'condition': ['isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isnumeric','isidentifier',
'islower','istitle', 'isupper','isprintable', 'isspace', ],
'text': ['lower', 'upper', 'capitalize', 'title', 'swapcase',
'center', 'ljust', 'rjust', 'zfill', 'expandtabs','casefold'],
'encode': ['translate', 'maketrans', 'encode'],
'format': ['format', 'format_map']}
Primarily because the result of a someString.join() is a string.
The sequence (list or tuple or whatever) doesn't appear in the result, just a string. Because the result is a string, it makes sense as a method of a string.
The variables my_list and "-" are both objects. Specifically, they're instances of the classes list and str, respectively. The join function belongs to the class str. Therefore, the syntax "-".join(my_list) is used because the object "-" is taking my_list as an input.
You can't only join lists and tuples. You can join almost any iterable.
And iterables include generators, maps, filters etc
>>> '-'.join(chr(x) for x in range(48, 55))
'0-1-2-3-4-5-6'
>>> '-'.join(map(str, (1, 10, 100)))
'1-10-100'
And the beauty of using generators, maps, filters etc is that they cost little memory, and are created almost instantaneously.
Just another reason why it's conceptually:
str.join(<iterator>)
It's efficient only granting str this ability. Instead of granting join to all the iterators: list, tuple, set, dict, generator, map, filter all of which only have object as common parent.
Of course range(), and zip() are also iterators, but they will never return str so they cannot be used with str.join()
>>> '-'.join(range(48, 55))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, int found
I 100% agree with your issue. If we boil down all the answers and comments here, the explanation comes down to "historical reasons".
str.join isn't just confusing or not-nice looking, it's impractical in real-world code. It defeats readable function or method chaining because the separator is rarely (ever?) the result of some previous computation. In my experience, it's always a constant, hard-coded value like ", ".
I clean up my code — enabling reading it in one direction — using tools.functoolz:
>>> from toolz.functoolz import curry, pipe
>>> join = curry(str.join)
>>>
>>> a = ["one", "two", "three"]
>>> pipe(
... a,
... join("; ")
>>> )
'one; two; three'
I'll have several other functions in the pipe as well. The result is that it reads easily in just one direction, from beginning to end as a chain of functions. Currying map helps a lot.

Categories

Resources