x=bytes("Hello! Welcome to Python")
In the above line of code, a string object is being converted into bytes object. But how is that useful as the string object would be finally stored in memory (in binary form) by following some encoding (ASCII or unicode) even if it is not converted into bytes object?
When working with empty strings or ASCII strings of one character Python uses string interning. Interned strings act as singletons, that is, if you have two identical strings that are interned, there is only one copy of them in the memory.
>>> a = 'hello'
>>> b = 'world'
>>> a[4],b[1]
('o', 'o')
>>> id(a[4]), id(b[1]), a[4] is b[1]
(4567926352, 4567926352, True)
>>> id('')
4545673904
>>> id('')
4545673904
As you can see, both string slices point to the same address in the memory. It's possible because Python strings are immutable.
In Python, string interning is not limed to characters or empty strings. Strings that are created during code compilation can also be interned if their length does not exceed 20 characters.
This includes:
function and class names
variable names
argument names
constants (all strings that are defined in the code)
keys of dictionaries
names of attributes
When you hit enter in Python REPL, your statement gets compiled down to the bytecode. That's why all short strings in REPL are also interned.
>>> a = 'teststring'
>>> b = 'teststring'
>>> id(a), id(b), a is b
(4569487216, 4569487216, True)
>>> a = 'test'*5
>>> b = 'test'*5
>>> len(a), id(a), id(b), a is b
(20, 4569499232, 4569499232, True)
>>> a = 'test'*6
>>> b = 'test'*6
>>> len(a), id(a), id(b), a is b
(24, 4569479328, 4569479168, False)
This example will not work, because such strings are not constants:
>>> open('test.txt','w').write('hello')
5
>>> open('test.txt','r').read()
'hello'
>>> a = open('test.txt','r').read()
>>> b = open('test.txt','r').read()
>>> id(a), id(b), a is b
(4384934576, 4384934688, False)
>>> len(a), id(a), id(b), a is b
(5, 4384934576, 4384934688, False)
String interning technique saves tens of thousands of duplicate string allocations. Internally, string interning is maintained by a global dictionary where strings are used as keys. To check if there is already an identical string in the memory Python performs dictionary membership operation.
The unicode object is almost 16 000 lines of C code, so there are a lot of small optimizations which are not mentioned in this article. If you want to learn more about Unicode in Python, I would recommend you to read PEPs about strings and check the code of the unicode object.
Inter conversions are as usual quite popular, but conversion between a string to bytes is more common these days due to the fact that for handling files or Machine Learning ( Pickle File ), we extensively require the strings to be converted to bytes. Let’s discuss certain ways in which this can be performed.
A byte object is a sequence of bytes. These byte objects are machine-readable and can be directly stored on the disk. Strings, on the other hand, are in human-readable form and need to be encoded so that they can be stored on a disk
I learnt that in some immutable classes, __new__ may return an existing instance - this is what the int, str and tuple types sometimes do for small values.
But why do the following two snippets differ in the behavior?
With a space at the end:
>>> a = 'string '
>>> b = 'string '
>>> a is b
False
Without a space:
>>> c = 'string'
>>> d = 'string'
>>> c is d
True
Why does the space bring the difference?
This is a quirk of how the CPython implementation chooses to cache string literals. String literals with the same contents may refer to the same string object, but they don't have to. 'string' happens to be automatically interned when 'string ' isn't because 'string' contains only characters allowed in a Python identifier. I have no idea why that's the criterion they chose, but it is. The behavior may be different in different Python versions or implementations.
From the CPython 2.7 source code, stringobject.h, line 28:
Interning strings (ob_sstate) tries to ensure that only one string
object with a given value exists, so equality tests can be one pointer
comparison. This is generally restricted to strings that "look like"
Python identifiers, although the intern() builtin can be used to force
interning of any string.
You can see the code that does this in Objects/codeobject.c:
/* Intern selected string constants */
for (i = PyTuple_Size(consts); --i >= 0; ) {
PyObject *v = PyTuple_GetItem(consts, i);
if (!PyString_Check(v))
continue;
if (!all_name_chars((unsigned char *)PyString_AS_STRING(v)))
continue;
PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i));
}
Also, note that interning is a separate process from the merging of string literals by the Python bytecode compiler. If you let the compiler compile the a and b assignments together, e.g. by placing them in a module or an if True:, you would find that a and b would be the same string.
This behavior is not consistent, and as others have mentioned depends on the variant of Python being executed. For a deeper discussion, see this question.
If you want to make sure that the same object is being used you can force the interning of strings by the appropriately named intern:
intern(...)
intern(string) -> string
``Intern'' the given string. This enters the string in the (global)
table of interned strings whose purpose is to speed up dictionary lookups.
Return the string itself or the previously interned string object with the
same value.
>>> a = 'string '
>>> b = 'string '
>>> id(a) == id(b)
False
>>> a = intern('string ')
>>> b = intern('string ')
>>> id(a) == id(b)
True
Note in Python3, you have to explicitly import intern from sys import intern.
I am getting a strange result in this boolean in python. I keep getting the wrong result.
string = '94070'
string[0:2] is '95' or string[0:2] is '94'
returns False, but when I hardcode in the value '94', it works
'94' is '95' or '94' is '94'
returns True. I've checked the data types and they are both of type 'str' so I'm not sure what is going on here.
Use == instead of is. In Python, the is operator does an object identity check. The == operator checks two objects (which may be different objects) to see whether they contain the same contents.
is is an identity test (is this the exact same object?), not equality test. While is works coincidentally, as an implementation detail for some things that aren't logically singletons, it shouldn't be used like this; use value equality testing with ==.
Your test of '94' is '94' can work due to a couple of related possibilities:
Python often coalesces constant literals in a function (sometimes only on a single line)
String literals are often interned by Python, so the same string literal expressed anywhere in the code references a common copy of that string
When you slice off bits of a string, interning isn't involved, so the identity test fails.
Use is to see if two arguments refer to the same object and == to see if they have the same value.
>>> a = 'this is some text.'
>>> b = 'this is some text.'
>>> a == b
True
>>> a is b
False
>>> a = 'this is some text.'
>>> b = a
>>> a == b
True
>>> a is b
True
I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False
It seems like the space and the question mark make the is behave differently. What's going on?
EDIT: I know I should be using ==, I just wanted to know why is behaves like this.
Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.
Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:
Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)
An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.
Single characters are unique.
Examples
Alphanumeric string literals always share memory:
>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True
Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:
(interpreter)
>>> x='`!##$%^&*() \][=-. >:"?<a'; y='`!##$%^&*() \][=-. >:"?<a';
>>> z='`!##$%^&*() \][=-. >:"?<a';
>>> x is y
True
>>> x is z
False
(file)
x='`!##$%^&*() \][=-. >:"?<a';
y='`!##$%^&*() \][=-. >:"?<a';
z=(lambda : '`!##$%^&*() \][=-. >:"?<a')()
print(x is y)
print(x is z)
Output: True and False
For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:
>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False
Single characters always share memory, of course:
>>> chr(0x20) is ' '
True
To expand on Ignacio’s answer a bit: The is operator is the identity operator. It is used to compare object identity. If you construct two objects with the same contents, then it is usually not the case that the object identity yields true. It works for some small strings because CPython, the reference implementation of Python, stores the contents separately, making all those objects reference to the same string content. So the is operator returns true for those.
This however is an implementation detail of CPython and is generally neither guaranteed for CPython nor any other implementation. So using this fact is a bad idea as it can break any other day.
To compare strings, you use the == operator which compares the equality of objects. Two string objects are considered equal when they contain the same characters. So this is the correct operator to use when comparing strings, and is should be generally avoided if you do not explicitely want object identity (example: a is False).
If you are really interested in the details, you can find the implementation of CPython’s strings here. But again: This is implementation detail, so you should never require this to work.
The is operator relies on the id function, which is guaranteed to be unique among simultaneously existing objects. Specifically, id returns the object's memory address. It seems that CPython has consistent memory addresses for strings containing only characters a-z and A-Z.
However, this seems to only be the case when the string has been assigned to a variable:
Here, the id of "foo" and the id of a are the same. a has been set to "foo" prior to checking the id.
>>> a = "foo"
>>> id(a)
4322269384
>>> id("foo")
4322269384
However, the id of "bar" and the id of a are different when checking the id of "bar" prior to setting a equal to "bar".
>>> id("bar")
4322269224
>>> a = "bar"
>>> id(a)
4322268984
Checking the id of "bar" again after setting a equal to "bar" returns the same id.
>>> id("bar")
4322268984
So it seems that cPython keeps consistent memory addresses for strings containing only a-zA-Z when those strings are assigned to a variable. It's also entirely possible that this is version dependent: I'm running python 2.7.3 on a macbook. Others might get entirely different results.
In fact your code amounts to comparing objects id (i.e. their physical address). So instead of your is comparison:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
You can do:
>>> id(a) == id(b)
False
But, note that if a and b were directly in the comparison it would work.
>>> id('is it the space?') == id('is it the space?')
True
In fact, in an expression there's sharing between the same static strings. But, at the program scale there's only sharing for word-like strings (so neither spaces nor punctuations).
You should not rely on this behavior as it's not documented anywhere and is a detail of implementation.
Two or more identical strings of consecutive alphanumeric (only) characters are stored in one structure, thus they share their memory reference. There are posts about this phenomenon all over the internet since the 1990's. It has evidently always been that way. I have never seen a reasonable guess as to why that's the case. I only know that it is. Furthermore, if you split and re-join alphanumeric strings to remove spaces between words, the resulting identical alphanumeric strings do NOT share a reference, which I find odd. See below:
Add any non-alphanumeric value identically to both strings, and they instantly become copies, but not shared references.
a ="abbacca"; b = "abbacca"; a is b => True
a ="abbacca "; b = "abbacca "; a is b => False
a ="abbacca?"; b = "abbacca?"; a is b => False
~Dr. C.
'is' operator compare the actual object.
c is d should also be false. My guess is that python make some optimization and in that case, it is the same object.
How do I see the type of a variable? (e.g. unsigned 32 bit)
Use the type() builtin function:
>>> i = 123
>>> type(i)
<type 'int'>
>>> type(i) is int
True
>>> i = 123.456
>>> type(i)
<type 'float'>
>>> type(i) is float
True
To check if a variable is of a given type, use isinstance:
>>> i = 123
>>> isinstance(i, int)
True
>>> isinstance(i, (float, str, set, dict))
False
Note that Python doesn't have the same types as C/C++, which appears to be your question.
You may be looking for the type() built-in function.
See the examples below, but there's no "unsigned" type in Python just like Java.
Positive integer:
>>> v = 10
>>> type(v)
<type 'int'>
Large positive integer:
>>> v = 100000000000000
>>> type(v)
<type 'long'>
Negative integer:
>>> v = -10
>>> type(v)
<type 'int'>
Literal sequence of characters:
>>> v = 'hi'
>>> type(v)
<type 'str'>
Floating point integer:
>>> v = 3.14159
>>> type(v)
<type 'float'>
It is so simple. You do it like this.
print(type(variable_name))
How to determine the variable type in Python?
So if you have a variable, for example:
one = 1
You want to know its type?
There are right ways and wrong ways to do just about everything in Python. Here's the right way:
Use type
>>> type(one)
<type 'int'>
You can use the __name__ attribute to get the name of the object. (This is one of the few special attributes that you need to use the __dunder__ name to get to - there's not even a method for it in the inspect module.)
>>> type(one).__name__
'int'
Don't use __class__
In Python, names that start with underscores are semantically not a part of the public API, and it's a best practice for users to avoid using them. (Except when absolutely necessary.)
Since type gives us the class of the object, we should avoid getting this directly. :
>>> one.__class__
This is usually the first idea people have when accessing the type of an object in a method - they're already looking for attributes, so type seems weird. For example:
class Foo(object):
def foo(self):
self.__class__
Don't. Instead, do type(self):
class Foo(object):
def foo(self):
type(self)
Implementation details of ints and floats
How do I see the type of a variable whether it is unsigned 32 bit, signed 16 bit, etc.?
In Python, these specifics are implementation details. So, in general, we don't usually worry about this in Python. However, to sate your curiosity...
In Python 2, int is usually a signed integer equal to the implementation's word width (limited by the system). It's usually implemented as a long in C. When integers get bigger than this, we usually convert them to Python longs (with unlimited precision, not to be confused with C longs).
For example, in a 32 bit Python 2, we can deduce that int is a signed 32 bit integer:
>>> import sys
>>> format(sys.maxint, '032b')
'01111111111111111111111111111111'
>>> format(-sys.maxint - 1, '032b') # minimum value, see docs.
'-10000000000000000000000000000000'
In Python 3, the old int goes away, and we just use (Python's) long as int, which has unlimited precision.
We can also get some information about Python's floats, which are usually implemented as a double in C:
>>> sys.float_info
sys.floatinfo(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308,
min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15,
mant_dig=53, epsilon=2.2204460492503131e-16, radix=2, rounds=1)
Conclusion
Don't use __class__, a semantically nonpublic API, to get the type of a variable. Use type instead.
And don't worry too much about the implementation details of Python. I've not had to deal with issues around this myself. You probably won't either, and if you really do, you should know enough not to be looking to this answer for what to do.
print type(variable_name)
I also highly recommend the IPython interactive interpreter when dealing with questions like this. It lets you type variable_name? and will return a whole list of information about the object including the type and the doc string for the type.
e.g.
In [9]: var = 123
In [10]: var?
Type: int
Base Class: <type 'int'>
String Form: 123
Namespace: Interactive
Docstring:
int(x[, base]) -> integer
Convert a string or number to an integer, if possible. A floating point argument will be truncated towards zero (this does not include a string
representation of a floating point number!) When converting a string, use the optional base. It is an error to supply a base when converting a
non-string. If the argument is outside the integer range a long object
will be returned instead.
a = "cool"
type(a)
//result 'str'
<class 'str'>
or
do
`dir(a)`
to see the list of inbuilt methods you can have on the variable.
One more way using __class__:
>>> a = [1, 2, 3, 4]
>>> a.__class__
<type 'list'>
>>> b = {'key1': 'val1'}
>>> b.__class__
<type 'dict'>
>>> c = 12
>>> c.__class__
<type 'int'>
Examples of simple type checking in Python:
assert type(variable_name) == int
assert type(variable_name) == bool
assert type(variable_name) == list
It may be little irrelevant. but you can check types of an object with isinstance(object, type) as mentioned here.
The question is somewhat ambiguous -- I'm not sure what you mean by "view". If you are trying to query the type of a native Python object, #atzz's answer will steer you in the right direction.
However, if you are trying to generate Python objects that have the semantics of primitive C-types, (such as uint32_t, int16_t), use the struct module. You can determine the number of bits in a given C-type primitive thusly:
>>> struct.calcsize('c') # char
1
>>> struct.calcsize('h') # short
2
>>> struct.calcsize('i') # int
4
>>> struct.calcsize('l') # long
4
This is also reflected in the array module, which can make arrays of these lower-level types:
>>> array.array('c').itemsize # char
1
The maximum integer supported (Python 2's int) is given by sys.maxint.
>>> import sys, math
>>> math.ceil(math.log(sys.maxint, 2)) + 1 # Signedness
32.0
There is also sys.getsizeof, which returns the actual size of the Python object in residual memory:
>>> a = 5
>>> sys.getsizeof(a) # Residual memory.
12
For float data and precision data, use sys.float_info:
>>> sys.float_info
sys.floatinfo(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.2204460492503131e-16, radix=2, rounds=1)
Do you mean in Python or using ctypes?
In the first case, you simply cannot - because Python does not have signed/unsigned, 16/32 bit integers.
In the second case, you can use type():
>>> import ctypes
>>> a = ctypes.c_uint() # unsigned int
>>> type(a)
<class 'ctypes.c_ulong'>
For more reference on ctypes, an its type, see the official documentation.
Python doesn't have such types as you describe. There are two types used to represent integral values: int, which corresponds to platform's int type in C, and long, which is an arbitrary precision integer (i.e. it grows as needed and doesn't have an upper limit). ints are silently converted to long if an expression produces result which cannot be stored in int.
Simple, for python 3.4 and above
print (type(variable_name))
Python 2.7 and above
print type(variable_name)
It really depends on what level you mean. In Python 2.x, there are two integer types, int (constrained to sys.maxint) and long (unlimited precision), for historical reasons. In Python code, this shouldn't make a bit of difference because the interpreter automatically converts to long when a number is too large. If you want to know about the actual data types used in the underlying interpreter, that's implementation dependent. (CPython's are located in Objects/intobject.c and Objects/longobject.c.) To find out about the systems types look at cdleary answer for using the struct module.
For python2.x, use
print type(variable_name)
For python3.x, use
print(type(variable_name))
You should use the type() function. Like so:
my_variable = 5
print(type(my_variable)) # Would print out <class 'int'>
This function will view the type of any variable, whether it's a list or a class. Check this website for more information: https://www.w3schools.com/python/ref_func_type.asp
Python is a dynamically typed language. A variable, initially created as a string, can be later reassigned to an integer or a float. And the interpreter won’t complain:
name = "AnyValue"
# Dynamically typed language lets you do this:
name = 21
name = None
name = Exception()
To check the type of a variable, you can use either type() or isinstance() built-in function. Let’s see them in action:
Python3 example:
variable = "hello_world"
print(type(variable) is str) # True
print(isinstance(variable, str)) # True
Let's compare both methods performances in python3
python3 -m timeit -s "variable = 'hello_world'" "type(variable) is int"
5000000 loops, best of 5: 54.5 nsec per loop
python3 -m timeit -s "variable = 'hello_world'" "isinstance(variable, str)"
10000000 loops, best of 5: 39.2 nsec per loop
type is 40% slower approximately (54.5/39.2 = 1.390).
We could use type(variable) == str instead. It would work, but it’s a bad idea:
== should be used when you want to check the value of a variable. We would use it to see if the value of the variable is equal to "hello_world". But when we want to check if the variable is a string, is the operator is more appropriate. For a more detailed explanation of when to use one or the other, check this article.
== is slower: python3 -m timeit -s "variable = 'hello_world'" "type(variable) == str" 5000000 loops, best of 5: 64.4 nsec per loop
Difference between isinstance and type
Speed is not the only difference between these two functions. There is actually an important distinction between how they work:
type only returns the type of an object (it's class). We can use it to check if the variable is of type str.
isinstance checks if a given object (first parameter) is:
an instance of a class specified as a second parameter. For example, is variable an instance of the str class?
or an instance of a subclass of a class specified as a second parameter. In other words - is variable an instance of a subclass of str?
What does it mean in practice? Let’s say we want to have a custom class that acts as a list but has some additional methods. So we might subclass the list type and add custom functions inside:
class MyAwesomeList(list):
# Add additional functions here
pass
But now the type and isinstance return different results if we compare this new class to a list!
my_list = MyAwesomeList()
print(type(my_list) is list) # False
print(isinstance(my_list, list)) # True
We get different results because isinstance checks if my_list is an instance of the list (it’s not) or a subclass of the list (it is because MyAwesomeList is a subclass of the list). If you forget about this difference, it can lead to some subtle bugs in your code.
Conclusions
isinstance is usually the preferred way to compare types. It’s not only faster but also considers inheritance, which is often the desired behavior. In Python, you usually want to check if a given object behaves like a string or a list, not necessarily if it’s exactly a string. So instead of checking for string and all its custom subclasses, you can just use isinstance.
On the other hand, when you want to explicitly check that a given variable is of a specific type (and not its subclass) - use type. And when you use it, use it like this: type(var) is some_type not like this: type(var) == some_type.
I saw this one when I was new to Python (I still am):
x = …
print(type(x))```
There's no 32bit and 64bit and 16bit, python is simple, you don't have to worry about it. See how to check the type:
integer = 1
print(type(integer)) # Result: <class 'int'>, and if it's a string then class will be str and so on.
# Checking the type
float_class = 1.3
print(isinstance(float_class, float)) # True
But if you really have to, you can use Ctypes library which has types like unsigned integer.
Ctypes types documentation
You can use it like this:
from ctypes import *
uint = c_uint(1) # Unsigned integer
print(uint) # Output: c_uint(1)
# To actually get the value, you have to call .value
print(uint.value)
# Change value
uint.value = 2
print(uint.value) # 2
There are many data types in python like:
Text Type: str
Numeric Types: int, float, complex
Sequence Types: list, tuple, range
Mapping Type: dict
Set Types: set, frozenset
Boolean Type: bool
Binary Types: bytes, bytearray, memoryview
None Type: NoneType
Here I have written a code having a list containing all type of data types example and printing their type
L = [
"Hello World",
20,
20.5,
1j,
["apple", "banana", "cherry"],
("apple", "banana", "cherry"),
range(6),
{"name" : "John", "age" : 36},
{"apple", "banana", "cherry"},
frozenset({"apple", "banana", "cherry"}),
True,
b"Hello",
bytearray(5),
memoryview(bytes(5)),
None
]
for _ in range(len(L)):
print(type(L[_]))
OUTPUT:
<class 'str'>
<class 'int'>
<class 'float'>
<class 'complex'>
<class 'list'>
<class 'tuple'>
<class 'range'>
<class 'dict'>
<class 'set'>
<class 'frozenset'>
<class 'bool'>
<class 'bytes'>
<class 'bytearray'>
<class 'memoryview'>
<class 'NoneType'>
Just do not do it. Asking for something's type is wrong in itself. Instead use polymorphism. Find or if necessary define by yourself the method that does what you want for any possible type of input and just call it without asking about anything. If you need to work with built-in types or types defined by a third-party library, you can always inherit from them and use your own derivatives instead. Or you can wrap them inside your own class. This is the object-oriented way to resolve such problems.
If you insist on checking exact type and placing some dirty ifs here and there, you can use __class__ property or type function to do it, but soon you will find yourself updating all these ifs with additional cases every two or three commits. Doing it the OO way prevents that and lets you only define a new class for a new type of input instead.