Making Sense of Python's Object Model

Introduction

We're going to explain a few things about Python's object model here. Python's object model (at least with regard to data) can be viewed as being similar to Java's and C#'s if one considers only those languages' "class types", which are reference types.

In this view, Python has no value types, only reference types that are class types, and uses references (or reference variables, if you like) to refer to objects of those classes.

           reference
           variable     object
           +------+     +-----+
reference -|-> o--|---->| 42  |
           +------+     +--|--+
               i         value
          identifier

The complete use of class types (whether viewed as reference or value types) is one reason it is sometimes said that in Python "everything is an object". By the traditional C definition that's nothing special ("An object, sometimes called a variable, is a location in storage, ...", The C Programming Language, Kernighan & Ritchie, Second Edition, Appendix 4, p.195), but with regard to Python it means that Python has a unified type system and there are no objects like those of e.g. C's "primitive" types that are not derived from classes. It's also usually taken to mean that modules, functions, classes, etc. are also objects at runtime in Python, and on the same footing as traditional data objects. Note this comment from Guido van Rossum (http://python-history.blogspot.de/2009/02/first-class-everything.html):

"One of my goals for Python was to make it so that all objects were "first class." By this, I meant that I wanted all objects that could be named in the language (e.g., integers, strings, functions, classes, modules, methods, etc.) to have equal status. That is, they can be assigned to variables, placed in lists, stored in dictionaries, passed as arguments, and so forth."

Python has primitive/fundamental/axiomatic types in the sense that objects of type int, float, str, bool, list, etc. are not composed of objects of other types (attributes) (ignoring special attributes). Collection types like list and dict whose items are accessed via subscripting are not viewed as being composed in this sense.

Note that you won't read much about references, or the binding of references to objects, in Python documentation. The documentation only refers to "name binding" operations, as if a name (identifier) had an existence independent of the reference through which it is bound to its object. Perhaps, to the authors, name binding implies the use of reference types. Or perhaps they mean that Python doesn't use reference types but rather value types and objects can have multiple names, like in C++ (this view can be made to work; it leads to interesting pictures on web sites of objects with name tags tied to them or yellow sticky notes stuck to them). Perhaps the intent is to say that either view is acceptable - it's a matter of what you mean by "reference" and "bind" and you should simply choose the conceptual/mental model that works best for you. Note this passage from the Python Tutorial:

"Objects have individuality, and multiple names (in multiple scopes) can be bound to the same object. This is known as aliasing in other languages. This is usually not appreciated on a first glance at Python, and can be safely ignored when dealing with immutable basic types (numbers, strings, tuples). However, aliasing has a possibly surprising effect on the semantics of Python code involving mutable objects such as lists, dictionaries, and most other types. This is usually used to the benefit of the program, since aliases behave like pointers in some respects. For example, passing an object is cheap since only a pointer is passed by the implementation; and if a function modifies an object passed as an argument, the caller will see the change — this eliminates the need for two different argument passing mechanisms as in Pascal."

In any event, for me, viewing Python as using pointer-like references makes understanding and explaining Python operation, as well as comparison with other languages, much easier.

Construction and Assignment

Because we're viewing Python's object model as being reference-based (or heap-based, if you like, at least conceptually), it presents many of the same kinds of issues around object creation/construction, modification, copying, and destruction that one finds when using references in other languages.

One thing that's noteworthy in a reference-based view of Python is that references are always created in conjunction with objects. There is no definition vs. initialization distinction with regard to references in Python, as in e.g. Java, because declarations aren't used. References are untyped and are always initialized with a non-null value - null references are not possible.

Terminology when using a reference-based explanation is sometimes an issue in Python, as it is with other reference-based languages. The term "object" is sometimes used where "reference" would be more accurate, and "reference variable" and "reference" are used interchangeably. And identifiers, for example, are the names of references, not objects. This issue is nicely addressed in a passage from The Java Programming Language, Arnold & Gosling, Second Edition, Chapter 1, p.10): "All objects in Java are accessed via object references - any variable that may appear to hold an object actually contains a reference to that object. ... Most of the time, you can be imprecise in the distinction between actual objects and references to objects. You can say, "Pass the object to the method" when you really mean "Pass an object reference to the method." We are careful about this distinction only when it makes a difference. Most of the time, you can use "object" and "object reference" interchangeably."

Data objects in Python are most commonly created via assignment statements. Assignment statements in reference-based languages like Python are not like assignment statements in value-based languages like C. An assignment statement in Python is indeed simply a binding operation, i.e. an operation that stores the "address" of an object in a reference. The object is created via the evaluation of an implicit or explicit constructor call. That is, because every object is an object of a class type, every object must be (as in other object-oriented languages) constructed. Python provides constructors for the primitive types, but their usage is optional. Thus i = 1 is effectively i = int(1), i.e. it's an implicit constructor call that creates an int object and stores its address in i (the constructor can be thought of as returning a reference to the created object). Note that there is no new operator in Python. Also note that an assignment cannot be used as an expression in Python (this prevents e.g. if (s = input("The Answer: ")) == "42":).

Implicit construction can obscure the fact for newcomers to Python that a similar subsequent assignment with an existing reference creates a new object (ignoring optimizations) - it doesn't modify the object that the reference is referring to. Thus a subsequent assignment i = 2 is effectively i = int(2) and the reference is bound to a new object. The existing object is garbage collected if appropriate. Assignments assign references.

So how does one change the value of, for example, an int object, if not by assignment? Is there some method available for that? No, there isn't - one can't change the value of an int object. Classes like int, float, and str (in fact all primitive non-collection classes, as well as the primitive collection classes tuple and frozenset) are immutable in Python, i.e. the classes provide no "mutating" methods. Objects of immutable classes cannot be "mutated" or changed (some of the efficiency advantages provided by immutability are discussed later). The primitive collection classes list, dict, and set are mutable.

Because assignments assign references, an assignment only mutates the value of an object if the object is composed of or contains references. This takes the view that the object's value consists of the object's references. In fact, an assignment like i = 1 in module-level code mutates the module object of which the int object is an attribute by adding/modifying a reference (via the __setattr__ method). It is attribute assignment. The module class is thus mutable (although, like function, it's not a class you can normally get to). If one imports sys and gets the main module via the sys.modules dictionary (a case of aliasing; more later), then one can use the attribute operator and treat i as an attribute.

>>> i = 1
>>> i
1
>>> import sys
>>> __main__ = sys.modules["__main__"]
>>> __main__.i
1
>>> __main__.__setattr__("i", 2)
>>> __main__.i
2

The reference i can of course be deleted by using del or __delattr__. This kind of attribute usage is easier to see when one imports a module (as with sys above) - what remains after the module's code is executed is a module object whose attributes are the module-level objects created in the module. Again, "implicit" attribute usage in the main module obscures this fact.

Item assignment to a mutable collection mutates the collection object by changing a reference (via the __setitem__ method; references are added via e.g. append).

So all assignment is either attribute or item assignment, and one can therefore say that assignment is a method and = is an operator in Python (although one sometimes reads the opposite).

In an assignment such as i = i + 1 (in reality i = i.__add__(1), in reality i = int.__add__(i, 1), in reality __main__.__setattr__("i", int.__add__(i, 1)) assuming we're in the main module), a new object is constructed and i is rebound, i.e. __add__ is a non-mutating method that initiates construction of a new object and returns a reference to it. Mutating method calls such as a.append(42) (where a is a list) are not normally used alone on the right-hand side of an assignment because they don't initiate construction - they have a side effect (mutation of a in this case) and return None.

Note that changing a mutable object that's an attribute or item of an immutable object is allowed because the immutable object isn't being changed with respect to its references (i.e. it's "value" in this context isn't changing; in other contexts the value of the immutable object's attributes/items themselves can fairly be interpreted to be the immutable object's value).

In contrast to data objects, function and class/type objects are usually defined with an explicit indication of type, i.e. by using the def and class keywords. These are also binding operations. Objects created in class-level code are attributes of their class object. Objects created in function (or method)-level code, however, are not attributes at call time of their function object (this could be viewed as an inconsistency).

All of this leads to a runtime object graph in which every (user-accessible) object except the main module object is naturally an attribute or item of at least one other object. The main module object can be viewed as the root object in this picture (a picture often blurred by talk of namespaces and symbol tables and dictionaries). For example, any module imported by the main module - including the automatically imported __builtins__ module - is an attribute of the main module. Import is attribute assignment.

>>> import sys
>>> __main__ = sys.modules["__main__"]
>>> 
>>> __main__.sys.version
'3.6.1 (default, Apr 24 2017, 11:44:31) \n[GCC 4.8.4]'
>>> 
>>> __main__.__builtins__.print(__main__.sys.version)
3.6.1 (default, Apr 24 2017, 11:44:31) 
[GCC 4.8.4]

Note that after __main__ = sys.modules["__main__"] above, __main__ is an attribute of __main__, i.e. __main__ has a reference to itself, and so __main__.__main__ works, as does __main__.__main__.__main__, etc. (this is similar to how Perl's main package works). Perhaps avoiding confusion about this is why __main__ isn't automatically defined in Python.

Copying and Passing

Can we copy an object in Python? What happens when an identifier is used alone on the right-hand side of an assignment?

We've seen that i = 1 is effectively i = int(1). j = i, however, is not an implicit constructor call - it simply copies a reference. This operation is indeed known as "copy-by-reference" (with respect to the object) or "aliasing" and it leads to multiple references referring to the same object.

In fact, None, True, and False are identifiers in Python (not literals) and assignments to them are reference copies - None is a reference to a singleton and True and False are references to a "doubleton" (so one never creates objects of NoneType or bool in Python).

What happens when a constructor is used with a reference, e.g. j = int(i)? Constructors of the primitive immutable classes don't actually construct anything in this case - they simply return the reference of the object being copied, i.e. j = int(i) is identical to j = i. It's not normally possible to copy objects of immutable classes.

This could be viewed as an inconsistency (i.e. constructors should construct), but it makes perfect sense for immutable classes. One can either take the view that Python can perform aliasing because objects of these classes are immutable (the objects cannot be changed, so there is no sense in copying them), or that objects of these classes must be immutable because Python performs aliasing (allowing e.g. numeric objects to be changed via multiple references would make programs hard to understand; value semantics should be preserved). Put another way, "immutability allows aliasing" vs. "aliasing requires immutability".

Things work differently for the primitive mutable classes. Since lists, for example, are mutable, one has to have the ability to truly copy lists ("copy-by-value"). If a is a list, then b = a is not identical to b = list(a). The first is aliasing and the second is copy construction. This also goes for dictionaries and sets.

Argument passing to functions works exactly like assignment in Python. If we have the function signature def f(i) then the call f(42) is effectively f(int(42)) and the constructed int object's reference is stored in i in f. The call f(j) is copy-by-reference/aliasing (again, with respect to the object; the reference is copied by value) and i and j refer to the same object. If j refers to an int object then this is efficient and can have no side effects - there is no way for code in f to change the int object that j refers to. If j refers to an object of a mutable class, e.g. a list, then code in f can change the object that j refers to. Function return works the same way - it returns the reference of an object constructed in the return statement or an already-existing reference (e.g. None).

And there we are. A few things about Python's object model.

Current rating: 5

Categories

Tags

Authors

Feeds

RSS / Atom