Why is .join() a property of string not list? [duplicate] - python

This question already has answers here:
Why is it string.join(list) instead of list.join(string)?
(11 answers)
Closed 2 years ago.
When joining a list in python, join is a function of a str, so you would do
>>>', '.join(['abc', '123', 'zyx'])
'abc, 123, zyx'
I feel like it would be more intuitive to have it as a property of a list (or any iterator really),
>>>['abc', '123', 'zyx'].join(', ')
'abc, 123, zyx'
Why is this?

.join() is a property of str object, not list. Unfortunately like javascript it isn't posssible to add custom methods to built-in objects, but you can create new classes like:
class MyString:
def __init__(self, string):
self.string = string
def join(self,sep):
return sep.join(self.string)
mystring = MyString("this is the string")
print(mystring.join())
To get original string use mystring.string and you can apply normal python properties and methods

Basically, join only works on lists of strings; it does not do any type coercion. joining a list that has one or more non-string elements will raise an exception.
You can find the reason for that in the article: http://www.faqs.org/docs/diveintopython/odbchelper_join.html

The true why can probably only be given to you by developers and thinkers behind Python, but I will try to give it a jab.
Firstly if you had join on lists, when what would this operation mean for lists of arbitrary objects? For example if you had a list of HttpClient objects, what would the result of the join be? The answer is that it is probably not even valid to ask such question in the first place, since we can assign no meaning to joining arbitrary objects.
Secondly even if the operation is part of String objects API it does not make it impossible for it to also be an operation on objects of some arbitrary other class. This means that if you have a use case where you require .join() operation on lists, then you can create a special lists class which implements this behavior. You can achieve this using either inheritance or composition, whichever you prefer.

Related

String addresses [duplicate]

This question already has answers here:
Does Python do slice-by-reference on strings?
(2 answers)
Closed 2 years ago.
While going through the mutable and immutable topic in depth. I found that the variable address is different when you simply call "s" v/s when you call it by index. Why is this so?
s = "hello!"
print(id(s))
print(id(s[0:5]))
s[0:5] creates a new, temporary string. That, of course, has a different ID. One "normal" case would be to assign this expression to a variable; the separate object is ready for that assignment.
Also note that a common way to copy a sequence is with a whole-sequence slice, such as
s_copy = s[:]
It's creating a new string, as Python strings are immutable (cannot be changed once created)
However, many implementations of Python will tag the same object if they find they're creating the same string, which makes for a good example. Here's what happens in a CPython 3.9.1 shell (which I suspect is one of the most common at time of writing)
>>> s = "hello!"
>>> print(id(s))
4512419376
>>> print(id(s[0:5])) # new string from slice of original
4511329712
>>> print(id(s[0:5])) # refers to the newly-created string
4511329712
>>> print(id(s[0:6])) # references original string
4512419376
>>> print(id(s[:])) # trivial slice also refers to original
4512419376
In general you should not rely on this behavior and only consider it an example.
Check for string equality with ==!
See also Why does comparing strings using either '==' or 'is' sometimes produce a different result?
Strings are immutable and thus by creating a new substring by slicing, the new String has a new address in memory. Because the old String cannot be altered due to it being immutable.

Modify str.format function in python [duplicate]

This question already has answers here:
String format with optional dict key-value
(5 answers)
Closed 7 years ago.
What str.format does almost exactly I'm looking for.
A functionality I would like to add to format() is to use optional keywords and for that I have to use another special character (I guess).
So str.format can do that:
f = "{ID1}_{ID_optional}_{ID2}"
f.format(**{"ID1" : " ojj", "ID2" : "makimaki", "ID_optional" : ""})
# Result: ' ojj__makimaki' #
I can't really use optional ID's. If the dictionary does not contain "ID_optional" it produces KeyError. I think it should be something like this to mark the optional ID:
f = "{ID1}_[IDoptional]_{ID2}"
Another thing: I have lot of template strings to process which are use [] rather than {}. So the best way would be to add the special characters as an argument for the format function.
So the basic question is there a sophisticated way to modify the original function? Or I should write my own format function based on str.format and regular expressions?
One option would be to define your own Formater. You can inherit the standard one and override get_field to return some reasonable default for your use case. See the link for some more documentation.
You if/else and format based on whether the dic has the key or not:
f = "{ID1}_{ID_optional}_{ID2}" if "ID_optional" in d else "{ID1}_{ID2}"
A dict lookup is 0(1) so it is cheap just to check

Semantics of turning list into string [duplicate]

This question already has answers here:
Why is it string.join(list) instead of list.join(string)?
(11 answers)
Closed 9 years ago.
Many programming languages, including Python, support an operation like this:
", ".join(["1","2","3"])
which returns the string
"1, 2, 3"
I understand that this is the case, but I don't understand the design decision behind it - surely it would be more semantically valid to perform the join operation on the list, like so:
["1","2","3"].join(", ")
If anyone could explain the design decision and shed some light on it I'd appreciate it.
Edit: It looks like Javascript has the join method on the list; if anyone has examples for which convention particular languages follow, feel free to comment/answer about the choice in that particular language too.
For python, there are a few arguments against it. First, if it is a string method that accepts an arbitrary iterable, then you only need to support the join method on string objects and it automagically works with anything iterable. Otherwise, custom iterable objects would also need to support a join method, etc. etc.
consider:
", ".join(["1","2","3"])
", ".join(("1","2","3"))
", ".join(str(x) for x in xrange(1,4))
", ".join({'1':None,'2':None,'3':None})
", ".join({'1','2','3'}) #set literal syntax -- python2.7+
", ".join("123")
6 different types all supported very simply by a single method (and I've only touched builtin types).
Second, you can only join a list if everything in it is a basestring type. It seems silly to provide a method on a list which would raise an exception if you used it on a list with the wrong contents -- At least, the API here seems somewhat tricky (to me). Of course, you could argue that list.remove can raise an exception if called with the wrong arguments. And that's true -- But that in general you're only removing a single item. It operates on a lower level than join should. Of course, this argument is weaker than the first one I proposed, so if you don't like it, just fall back on argument 1.

java programmer learning python: string split and join

Can someone explain to me why in python when we want to join a string we write:
'delim'.join(list)
and when we want to split a string we write:
str.split('delim')
coming from java it seems that one of these is backwards because in java we write:
//split:
str.split('delim');
//join
list.join('delim');
edit:
you are right. join takes a list. (though it doesnt change the question)
Can someone explain to me the rationale behind this API?
Join only makes sense when joining some sort of iterable. However, since iterables don't necessarily contain all strings, putting join as a method on an iterable doesn't make sense. (what would you expect the result of [1,"baz",my_custom_object,my_list].join("foo") to be?) The only other place to put it is as a string method with the understanding that everything in the iterable is going to be a string. Additionally, putting join as a string method allows it to be used with any iterable -- tuples, lists, generators, custom objects which support iteration or even strings.
Also note that you are completely free to split a string in the same way that you join it:
list_of_strings='this, is , a, string, separated, by , commas.'.split(',')
Of course, the utility here isn't quite as easy to see.
http://docs.python.org/faq/design.html#why-is-join-a-string-method-instead-of-a-list-or-tuple-method
From the source:
join() is a string method because in using it you are telling the
separator string to iterate over a sequence of strings and insert
itself between adjacent elements. This method can be used with any
argument which obeys the rules for sequence objects, including any new
classes you might define yourself.
Because this is a string method it can work for Unicode strings as
well as plain ASCII strings. If join() were a method of the sequence
types then the sequence types would have to decide which type of
string to return depending on the type of the separator.
So that you don't have to reimplement the join operation for every sequence type you create. Python uses protocols, not types, as its main behavioral pattern, and so every sequence can be expected to act the same even though they don't derive from an existing sequence class.

Why is it string.join(list) instead of list.join(string)?

This has always confused me. It seems like this would be nicer:
["Hello", "world"].join("-")
Than this:
"-".join(["Hello", "world"])
Is there a specific reason it is like this?
It's because any iterable can be joined (e.g, list, tuple, dict, set), but its contents and the "joiner" must be strings.
For example:
'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'
Using something other than strings will raise the following error:
TypeError: sequence item 0: expected str instance, int found
This was discussed in the String methods... finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str methods including join) was released in Oct 2000.
There were four options proposed in this thread:
str.join(seq)
seq.join(str)
seq.reduce(str)
join as a built-in function
Guido wanted to support not only lists and tuples, but all sequences/iterables.
seq.reduce(str) is difficult for newcomers.
seq.join(str) introduces unexpected dependency from sequences to str/unicode.
join() as a free-standing built-in function would support only specific data types. So using a built-in namespace is not good. If join() were to support many data types, creating an optimized implementation would be difficult: if implemented using the __add__ method then it would be O(n²).
The separator string (sep) should not be omitted. Explicit is better than implicit.
Here are some additional thoughts (my own, and my friend's):
Unicode support was coming, but it was not final. At that time UTF-8 was the most likely about to replace UCS-2/-4. To calculate total buffer length for UTF-8 strings, the method needs to know the character encoding.
At that time, Python had already decided on a common sequence interface rule where a user could create a sequence-like (iterable) class. But Python didn't support extending built-in types until 2.2. At that time it was difficult to provide basic iterable class (which is mentioned in another comment).
Guido's decision is recorded in a historical mail, deciding on str.join(seq):
Funny, but it does seem right! Barry, go for it...
Guido van Rossum
Because the join() method is in the string class, instead of the list class.
See http://www.faqs.org/docs/diveintopython/odbchelper_join.html:
Historical note. When I first learned
Python, I expected join to be a method
of a list, which would take the
delimiter as an argument. Lots of
people feel the same way, and there’s
a story behind the join method. Prior
to Python 1.6, strings didn’t have all
these useful methods. There was a
separate string module which contained
all the string functions; each
function took a string as its first
argument. The functions were deemed
important enough to put onto the
strings themselves, which made sense
for functions like lower, upper, and
split. But many hard-core Python
programmers objected to the new join
method, arguing that it should be a
method of the list instead, or that it
shouldn’t move at all but simply stay
a part of the old string module (which
still has lots of useful stuff in it).
I use the new join method exclusively,
but you will see code written either
way, and if it really bothers you, you
can use the old string.join function
instead.
--- Mark Pilgrim, Dive into Python
I agree that it's counterintuitive at first, but there's a good reason. Join can't be a method of a list because:
it must work for different iterables too (tuples, generators, etc.)
it must have different behavior between different types of strings.
There are actually two join methods (Python 3.0):
>>> b"".join
<built-in method join of bytes object at 0x00A46800>
>>> "".join
<built-in method join of str object at 0x00A28D40>
If join was a method of a list, then it would have to inspect its arguments to decide which one of them to call. And you can't join byte and str together, so the way they have it now makes sense.
Why is it string.join(list) instead of list.join(string)?
This is because join is a "string" method! It creates a string from any iterable. If we stuck the method on lists, what about when we have iterables that aren't lists?
What if you have a tuple of strings? If this were a list method, you would have to cast every such iterator of strings as a list before you could join the elements into a single string! For example:
some_strings = ('foo', 'bar', 'baz')
Let's roll our own list join method:
class OurList(list):
def join(self, s):
return s.join(self)
And to use it, note that we have to first create a list from each iterable to join the strings in that iterable, wasting both memory and processing power:
>>> l = OurList(some_strings) # step 1, create our list
>>> l.join(', ') # step 2, use our list join method!
'foo, bar, baz'
So we see we have to add an extra step to use our list method, instead of just using the builtin string method:
>>> ' | '.join(some_strings) # a single step!
'foo | bar | baz'
Performance Caveat for Generators
The algorithm Python uses to create the final string with str.join actually has to pass over the iterable twice, so if you provide it a generator expression, it has to materialize it into a list first before it can create the final string.
Thus, while passing around generators is usually better than list comprehensions, str.join is an exception:
>>> import timeit
>>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i)))
3.839168446022086
>>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i])))
3.339879313018173
Nevertheless, the str.join operation is still semantically a "string" operation, so it still makes sense to have it on the str object than on miscellaneous iterables.
Think of it as the natural orthogonal operation to split.
I understand why it is applicable to anything iterable and so can't easily be implemented just on list.
For readability, I'd like to see it in the language but I don't think that is actually feasible - if iterability were an interface then it could be added to the interface but it is just a convention and so there's no central way to add it to the set of things which are iterable.
- in "-".join(my_list) declares that you are converting to a string from joining elements a list.It's result-oriented. (just for easy memory and understanding)
I made an exhaustive cheatsheet of methods_of_string for your reference.
string_methods_44 = {
'convert': ['join','split', 'rsplit','splitlines', 'partition', 'rpartition'],
'edit': ['replace', 'lstrip', 'rstrip', 'strip'],
'search': ['endswith', 'startswith', 'count', 'index', 'find','rindex', 'rfind',],
'condition': ['isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isnumeric','isidentifier',
'islower','istitle', 'isupper','isprintable', 'isspace', ],
'text': ['lower', 'upper', 'capitalize', 'title', 'swapcase',
'center', 'ljust', 'rjust', 'zfill', 'expandtabs','casefold'],
'encode': ['translate', 'maketrans', 'encode'],
'format': ['format', 'format_map']}
Primarily because the result of a someString.join() is a string.
The sequence (list or tuple or whatever) doesn't appear in the result, just a string. Because the result is a string, it makes sense as a method of a string.
The variables my_list and "-" are both objects. Specifically, they're instances of the classes list and str, respectively. The join function belongs to the class str. Therefore, the syntax "-".join(my_list) is used because the object "-" is taking my_list as an input.
You can't only join lists and tuples. You can join almost any iterable.
And iterables include generators, maps, filters etc
>>> '-'.join(chr(x) for x in range(48, 55))
'0-1-2-3-4-5-6'
>>> '-'.join(map(str, (1, 10, 100)))
'1-10-100'
And the beauty of using generators, maps, filters etc is that they cost little memory, and are created almost instantaneously.
Just another reason why it's conceptually:
str.join(<iterator>)
It's efficient only granting str this ability. Instead of granting join to all the iterators: list, tuple, set, dict, generator, map, filter all of which only have object as common parent.
Of course range(), and zip() are also iterators, but they will never return str so they cannot be used with str.join()
>>> '-'.join(range(48, 55))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, int found
I 100% agree with your issue. If we boil down all the answers and comments here, the explanation comes down to "historical reasons".
str.join isn't just confusing or not-nice looking, it's impractical in real-world code. It defeats readable function or method chaining because the separator is rarely (ever?) the result of some previous computation. In my experience, it's always a constant, hard-coded value like ", ".
I clean up my code — enabling reading it in one direction — using tools.functoolz:
>>> from toolz.functoolz import curry, pipe
>>> join = curry(str.join)
>>>
>>> a = ["one", "two", "three"]
>>> pipe(
... a,
... join("; ")
>>> )
'one; two; three'
I'll have several other functions in the pipe as well. The result is that it reads easily in just one direction, from beginning to end as a chain of functions. Currying map helps a lot.

Categories

Resources