Non-string "Symbol" object in python? - python

Does Python have a builtin type for representing symbolic values, when strings cannot be used?
A quick implementation of my own would look like
class Symbol:
def __init__(self, name):
self.name = name
def __repr__(self):
return self.name
Usecase
Such symbols are useful when a value – say a keyword argument or a list entry – needs to be initialized to a value that indicates that it hasn't been explicitly set.
If the values have constraints on allowed types, commonly None or some string would be used, but if any value is allowed, some other unique object is needed. My common method is to use an object() assigned to some private variable, but the symbol pattern is more convenient for debugging due to providing a meaningful printed representation.
As an alternative, one could use e.g. a tuple ('default value',) and compare against it with the is operator, but this wouldn't work e.g. for dictionary keys.
While the pattern is simple enough to copy/paste into each shell-script I am writing, a builtin solution with established behavior would be preferable.
Non-builtins
I know, that there are packages that provide a symbol type. An obvious one would be the symbol type of sympi, and there is https://pypi.org/project/SymbolType/. However, adding dependencies to avoid a 5-line pattern seems a heavy overkill, hence my question about a builtin type.

You could use the enum library:
https://docs.python.org/3/library/enum.html

Related

How to create nested type of data in Python?

I want to make sure, that one of the arguments, passed when class creation is of certain type. Here is an example:
from __future__ import annotations
from dataclasses import dataclass
#dataclass(frozen=True, order=True)
class ListItems:
items: list | str | int | ListItems
class PList:
def __init__(self, name: str, items: ListItems):
self.type = "list"
self.name = name
self.items = items
a = PList('asd', ['asd'])
The idea was next: items can only be list of string, int data type or other list of string and int, and it's nested. For example:
[] OK
[1,2,'asd'] OK
[[1,2,3],'asd',[]] OK
[{}] NOT OK
['test', [{}]] NOT OK
Is it possible to implement something like this in Python?
I am not really familiar with Python OOP, but from what I have found, there is no native implementation of interfaces and/or abstract class like in other programming languages.
PS:
The code you see, was just my attempt of implementation, it did not work.
def __init__(self, name: str, items: ListItems):
the items: ListItems bit is saying that items should be a ListItems object, it's not passing through the logic of what ListItems is doing, it's literally just comparing what type it is.
i don't have much experience with typing, but i think you're looking for items: list[str|int] note that for lists, there is the normal list type hint, and then there's also one in the typing library. not sure if there's a difference, i just know that the normal list type hint is lowercased (list and not List like in the typing library), and that it is relatively new (3.11 i think)
Short answer to your question Python is a dynamically typed language. It doesn’t know about the type of the variable until the code is run. So declaration is of no use. What it does is, It stores that value at some memory location and then binds that variable name to that memory container. And makes the contents of the container accessible through that variable name. So the data type does not matter. As it will get to know the type of the value at run-time.
Names are bound to objects at execution time by means of assignment statements, and it is possible to bind a name to objects of different types during the execution of the program. Functions and objects can be altered at runtime.
In a dynamically typed language, a variable is simply a value bound to
a name; the value has a type -- like "integer" or "string" or "list"
-- but the variable itself doesn't. You could have a variable which, right now, holds a number, and later assign a string to it if you need
it to change. In a statically typed language, the variable itself has
a type; if you have a variable that's an integer, you won't be able to
assign any other type of value to it later.
From the following point you will find that predefining the datatype explicitly in python code won't enforce it to only accept this type:
(since type errors are a small fraction of all the things that might go wrong in a program); as a result, programmers in dynamic languages rely on their test
suites to catch these and all other errors, rather than using a
dedicated type-checking compiler.
Check this reference for more info.
I'm not sure you even need the class listItems. Just use a simple if statement in your init method.
class PList:
def __init__(self, name, items):
self.type = 'list'
self.name = name
if type(items) is list or type(items) is str or type(items) is int:
self.items = items

Python type hinting for upper vs lower-cased strings?

In Python, if I am creating a list of strings, I would type hint like this:
from typing import List
list_of_strings : List[str] = []
list_of_strings.append('some string')
list_of_strings.append('some other string')
Is there some way to type hint the expected case of the strings? That way, when I write a comparison operator to search for a specific string, for example, I don't accidentally search for the mixed or upper-cased version of a string I know will be lower-cased because all strings in list_of_strings should be lower-cased. I realize I can just add comments and refer back to the list's declaration, but I was wondering if there was a more integrated way to do it.
An alternate way to solve this problem would be to make a class which extends str and rejects any values which aren't in the proper case, and then type hint for that class. Is there any reason why this would be a bad idea aside from it being more of a pain to create than a simple string?
The reason I run into this problem, is that I create lists, dicts, and other structures to store data, then need to add to or search them for a particular key/value and not knowing the expected case creates problems where I add duplicate entries because a simple if 'example' in string_list doesn't find it. And doing if 'example'.upper() in string_list is easy to forget and not very pretty. Jumping back and forth between the declaration (if I wrote a comment there describing expected case) and where I'm coding distracts from my flow, it would be nice to have the information when I'm referencing that object later.
You can in Python 3.10 using typing.TypeGuard.
from typing import TypeGuard
class LowerStr(str):
'''a dummy subclass of str, not actually used at runtime'''
def is_lower_str(val: str) -> TypeGuard[LowerStr]:
return val.islower()
l: list[LowerStr] = []
def append(lst: list[LowerStr], v: str):
if not is_lower_str(v):
raise TypeError('oh no')
lst.append(v)
You could indeed enforce runtime safety using a subclass of str. The disadvantage would mostly be performance. You would want to take care to not add an unnecessary __dict__, by adding __slots__ = () to the class definition, from the top of my head.
Either way, string literals are not going to be validated automatically, so it will cause some overhead, either by calling the typeguard, passing them to the constructor of the subtype, or using cast(LowerStr, 'myliteral').
No, I've been searching on the net for this but it doesn't exist.
The Python's typing module doesn't provide any hint for lowercase or uppercase strings.
[...] type hint the expected case of the strings
Remember that a type is for example str, and in this case you should be talking about hint and not about type hint.
You can anyway create a custom class in this scope:
class UppercaseString(str): pass
The UppercaseString will inherit all the functionalities of the built-in class str (that's what happens when you specify : pass in class declaration).
You can anyway create an instance's method that checks if the string is really uppercase, and otherwise raises an error.

Attributes which aren't valid python identifiers

The usual method of attribute access requires attribute names to be valid python identifiers.
But attributes don't have to be valid python identifiers:
>>> class Thing:
... def __init__(self):
... setattr(self, '0potato', 123)
...
>>> t = Thing()
>>> Thing.__getattribute__(t, '0potato')
123
>>> getattr(t, '0potato')
123
Of course, t.0potato remains a SyntaxError, but the attribute is there nonetheless:
>>> vars(t)
{'0potato': 123}
What is the reason for this being permissable? Is there really any valid use-case for attributes with spaces, empty string, python reserved keywords etc? I thought the reason was that attributes were just keys in the object/namespace dict, but this makes no sense because other objects which are valid dict keys are not allowed:
>>> setattr(t, ('tuple',), 321)
TypeError: attribute name must be string, not 'tuple'
The details from a comment on the post fully answer this question, so I'm posting it as an answer:
Guido says:
...it is a feature that you can use any arbitrary string
with getattr() and setattr(). However these functions should (and do!)
reject non-strings.
Possible use-cases include hiding attributes from regular dotted access, and making attributes in correspondence with external data sources (which may clash with Python keywords). So, the argument seems to be there's simply no good reason to forbid it.
As for a reason to disallow non-strings, this seems to be a sensible restriction which is ensuring greater performance of the implementation:
Although Python's dicts already have some string-only optimizations -- they just dynamically adapt to a more generic and slightly slower approach once the first non-key string shows up.
So, to answer the use case question, looking at the reasoning behind how Python works in the references from the comments above, we can infer some of the situations that might make this Pythonic quirk useful.
You want an object to have an attribute that cannot be accessed with dot notation, say, to protect it from the naive user. (Quoting Guido: "some people might use this to hide state they don't want accessible using regular attribute notation (x.foo)". Of course, he goes on to say, "but that feels like abuse of the namespace to me, and there are plenty of other
ways to manage such state.")
You want an object's attribute names to correspond to external data over which you have no control. Thus, you have to be able to use whatever strings appear in the external data as an attribute name even if it matches a Python reserved word or contains embedded spaces or dashes, etc.

Defining my own None-like Python constant

I have a situation in which I'm asked to read collections of database update instructions from a variety of sources. All sources will contain a primary key value so that the code that applies the updates to the database can find the correct record. The files will vary, however, in what additional columns are reported.
When I read and create my update instructions I must differentiate between an update in which a column (for instance, MiddleName) was provided but was empty (meaning no middle name and the field should be updated to NULL) and an update in which the MiddleName field was not included (meaning the update should not touch the middle name column at all).
The former situation (column provided but no value) seems appropriately represented by the None value. For the second situation, however, I'd like to have a NotInFile "value" that I can use similar to the way I use None.
Is the correct way to implement this as follows?
NotInFile = 1
class PersonUpdate(object):
def __init__(self):
self.PersonID = None
self.FirstName = NotInFile
self.MiddleName = NotInFile
and then in another module
import othermod
upd = othermod.PersonUpdate()
if upd.MiddleName is othermod.NotInFile:
print 'Hey, middle name was not supplied'
I don't see anything particularly wrong with your implementation. however, 1 isn't necessarily the best sentinel value as it is a cached constant in Cpython. (e.g. -1+2 is 1 will return True). In these cases, I might consider using a sentinel object instance:
NotInFile = object()
python also provides a few other named constants which you could use if it seems appropriate: NotImplemented and Ellipsis come to mind immediately. (Note that I'm not recommending you use these constants ... I'm just providing more options).
No, using the integer one is a bad idea. It might work out in this case if MiddleName is always a string or None, but in general the implementation is free to intern integers, strings, tuples and other immutable values as it pleases. CPython does it for small integers and constants of the aforementioned types. PyPy defines is by value for integers and a few other types. So if MiddleName is 1, you're bound to see your code consider it not supplied.
Use an object instead, each new object has a distinct identity:
NotInFile = object()
Alternatively, for better debugging output, define your own class:
class NotInFileType(object):
# __slots__ = () if you want to save a few bytes
def __repr__(self):
return 'NotInFile'
NotInFile = NotInFileType()
del NotInFileType # look ma, no singleton
If you're paranoid, you could make it a proper singleton (ugly). If you need several such instances, you could rename the class into Sentiel or something, make the representation an instance variable and use multiple instances.
If you want type-checking, this idiom is now blessed by PEP 484 and supported by mypy:
from enum import Enum
class NotInFileType(Enum):
_token = 0
NotInFile = NotInFileType._token
If you are using mypy 0.740 or earlier, you need to workaround this bug in mypy by using typing.Final:
from typing import Final
NotInFile: Final = NotInFileType._token
If you are using Python 3.7 or earlier, you can use typing_extensions.Final from pip package typing_extensions instead of typing.Final

How does Python differentiate between the different data types?

Sorry if this is quite noobish to you, but I'm just starting out to learn Python after learning C++ & Java, and I am wondering how in the world I could just declare variables like id = 0 and name = 'John' without any int's or string's in front! I figured out that perhaps it's because there are no ''s in a number, but how would Python figure that out in something like def increase(first, second) instead of something like int increase(int first, int second) in C++?!
The literal objects you mention carry (pointers to;-) their own types with them of course, so when a name's bound to that object the problem of type doesn't arise -- the object always has a type, the name doesn't -- just delegates that to the object it's bound to.
There's no "figuring out" in def increase(first, second): -- name increase gets bound to a function object, names first and second are recorded as parameters-names and will get bound (quite possibly to objects of different types at various points) as increase gets called.
So say the body is return first + second -- a call to increase('foo', 'bar') will then happily return 'foobar' (delegating the addition to the objects, which in this case are strings), and maybe later a call to increase(23, 45) will just as happily return 68 -- again by delegating the addition to the objects bound to those names at the point of call, which in this case are ints. And if you call with incompatible types you'll get an exception as the delegated addition operation can't make sense of the situation -- no big deal!
Python is dynamically typed: all variables can refer to an object of any type. id and name can be anything, but the actual objects are of types like int and str. 0 is a literal that is parsed to make an int object, and 'John' a literal that makes a str object. Many object types do not have literals and are returned by a callable (like frozenset—there's no way to make a literal frozenset, you must call frozenset.)
Consequently, there is no such thing as declaration of variables, since you aren't defining anything about the variable. id = 0 and name = 'John' are just assignment.
increase returns an int because that's what you return in it; nothing in Python forces it not to be any other object. first and second are only ints if you make them so.
Objects, to a certain extent, share a common interface. You can use the same operators and functions on them all, and if they support that particular operation, it works. It is a common, recommended technique to use different types that behave similarly interchangably; this is called duck typing. For example, if something takes a file object you can instead pass a cStringIO.StringIO object, which supports the same method as a file (like read and write) but is a completely different type. This is sort of like Java interfaces, but does not require any formal usage, you just define the appropriate methods.
Python uses the duck-typing method - if it walks, looks and quacks like a duck, then it's a duck. If you pass in a string, and try to do something numerical on it, then it will fail.
Have a look at: http://en.wikipedia.org/wiki/Python_%28programming_language%29#Typing and http://en.wikipedia.org/wiki/Duck_typing
When it comes to assigning literal values to variables, the type of the literal value can be inferred at the time of lexical analysis. For example, anything matching the regular expression (-)?[1-9][0-9]* can be inferred to be an integer literal. If you want to convert it to a float, there needs to be an explicit cast. Similarly, a string literal is any sequence of characters enclosed in single or double quotes.
In a method call, the parameters are not type-checked. You only need to pass in the correct number of them to be able to call the method. So long as the body of the method does not cause any errors with respect to the arguments, you can call the same method with lots of different types of arguments.
In Python, Unlike in C++ and Java, numbers and strings are both objects. So this:
id = 0
name = 'John'
is equivalent to:
id = int(0)
name = str('John')
Since variables id and name are references that may address any Python object, they don't need to be declared with a particular type.

Categories

Resources