Philosophy of Cinema: Ontology, Semantics, Aesthetics

Wittgenstein’s Picture Theory of Pictures

Enrico Terrone

1. Objects as Qualia

In his papers On the Nature of Tractatus Objects (2004) and An Adequacy Condition for the Interpretation of the Tractatus Ontology (2010) Pasquale Frascolla ([2004]: 369) argues for «the identification of Tractatus objects with qualia, i.e. with repeatable phenomenal qualities in the sense of Goodman’s The Structure of Appearance». Hence Tractatus objects have to be conceived of as «abstract entities (universals), whose instances appear in the stream of phenomena» ([2004]: 370). According to Frascolla ([2004]: 374), Tractatus objects are not substances existing necessarily, that is, existing at every possible world. By contrast, «objects, as repeatable phenomenal qualities, are abstract entities, whereas existence, within the theoretical framework of the Tractatus, is strictly confined to minimal concrete complexes or states of affairs». Therefore existence does not concern objects but states of affairs, conceived by Frascolla ([2004]: 374) as phenomenal complexes «which can be analyzed in repeatable qualitative parts (qualia, in Goodman’s sense)».

To sum up, objects, as abstract qualia, constitute states of affairs as phenomenal complexes that compose «the stream of phenomena, what is perceived, the given» ([2004]: 374).

If we consider just the visual experience, then an atomic state of affairs is a «minimal concrete visual complex» that «can be divided into three constituent qualitative parts: a phenomenal time, a visual-field place and a phenomenal colour» ([2004]: 374). In this scope, Tractatus objects are chromatic qualia (phenomenal colour), spatial qualia (visual-field place), and temporal qualia (phenomenal time), while an atomic state of affairs is the combination of a given chromatic quale, a given spatial quale and a given temporal quale.

So conceived, objects satisfy the following «adequacy conditions» for the Tractatus ontology (cf. Frascolla [2010]):

I) Objects are colourless and, by a natural generalization, they are not-spatial and timeless as well (cf. TLP 2.0232), since only a state of affairs (constituted by the combination of a chromatic quale, a spatial quale and a temporal quale) has a color, whereas a object is a color at most (in the case in which it is a chromatic quale).

II) Space and time are on a par with color as forms of objects (cf. TLP 2.0251), since space, time and color are categories, each of which collects objects (spatial qualia, temporal qualia, chromatic qualia), «all enjoying the same combinatorial possibilities» (Frascolla [2004]: 378). For example, the quale of red can combine with every place in the visual field and with every moment in the phenomenal time to constitute an atomic state of affairs, but it can not combine with the quale of green: its form does not allow this combination.

III) The «Principle of Identity of Indiscernibles» does not hold of objects (TLP 2.0233, 2.02331, 5.5302), since two color qualia, for instance a red quale and a green quale, share the same logical form – that is, color – and nevertheless they are different (one is red, the other is green). Yet the Principle applies to states of affairs, which have to be identical if they are constituted by the same combinations of objects: if two atomic visual complexes are constituted by the same phenomenal time, visual-field place and phenomenal color, then they must be the same visual complex.

2. Propositions as Pixels

According to the Tractatus, facts are existing states of affairs. Some facts are special since they present other states of affairs. Wittgenstein calls these special facts «pictures» and claims that they are constituted by elements corresponding to the objects that constitute the presented state of affairs. On the one hand, «The picture presents the facts in logical space» (TLP 2.11), namely, it presents the states of affairs. On the other hand «The picture is a fact» (TLP 2.141).

If we endorse Frascolla’s account, according to which objects are chromatic qualia, spatial qualia and temporal qualia, then picture elements are signs of color, signs of space and signs of time. In this sense a paradigmatic case of a Tractatus picture is a movie composed by pixels.

I use the acronyms «pixel» to designate a pictorial unit independently of any digital encoding of it. In this sense a pixel of a movie is an «atomic pictorial fact» (or, in Wittgenstein’s terms, an elementary proposition) constituted by the combination of three elements: a sign of colour, a sign of space and a sign of time. In a screened movie, the pixel is itself an atomic fact F, that is, a visual complex constituted by a spatial quale S (a certain position on the screen), a temporal quale T (a certain instant in the screening), and a chromatic quale C (a certain screened color). Yet the pixel is more than this, since its elements S and T respectively correspond to another spatial quale S’ (a certain position in the depicted scene) and another temporal quale T’ (a certain instant in the depicted scene). Therefore the fact F (constituted by the combination of S, T and C) presents another state of affairs F’ (constituted by the combination of S’, T’ and C).

So movies, and more generally pictures, are facts made by pixels. A static picture (a picture in the ordinary sense) is defective with respect to a movie since all its pixels can present states of affairs having only one temporal quale T’ while movies have pixels that can present states of affairs having different temporal qualia T’, T’’, T’’’ etc.

Indeed, the movie itself is a defective depiction since, as a fact, it is a two-dimensional surface so that its pixels just present spatially two-dimensional visual complexes instead of the three-dimensional ones composing the visual field. The ideal Tractarian picture is a sort of hologram made by pixels having a three-dimensional spatial element. But movies (and even static pictures like photographs) can however be treated as Tractarian pictures to the extent that they present, although not a visual field as such, an ersatz visual structure that can be experienced approximately like we experience our visual field – a «quite competent» visual structure, according to the basic principles of projective geometry:

If we look at an object, say a tree, every (visible) point of it sends to the eye a ray which is called the 'projector,' or the 'projecting ray' of this point. The projector of the whole tree is compounded out of many rays, each of which 'projects' one or more points to the eye […] We can now intercept, or 'intersect,' the projector of the tree by a plane, each projecting ray being cut in a point […] By this means we obtain in the plane, as the 'section' or 'trace' of this projector, a perspective picture, a 'projection' of the tree, and this projection evidently throws the same projector into the eye as the tree itself, and is therefore quite competent to convey a notion of the latter to us. Ordinary photographs of three-dimensional objects are essentially such perspective, plane pictures of the objects (Reye [1898]: 9-10, my emphasis).

3. Pictures as Visual Structures

If we endorse Frascolla's interpretation of the Tractatus ontology, then Wittgenstein's «picture theory of propositions» reveals to be a genuine depiction theory. Elementary propositions are indeed pixels, that is, the basic components of both static and moving pictures, which are therefore to be considered as complex propositions composed by logical conjunctions of pixels.

Certainly this is not a perceptual account of depiction like the ones that scholars like Gombrich (1960) or Wollheim (1980) built by directly starting from the considerations about «noticing aspects» in the Philosophical Investigations. This is rather a theory of depiction that reveals affinities with the structural accounts proposed by scholars like Goodman (1968) and especially Haugeland (1991) and Kulvicki (2006).

According to structural accounts, there is a basic level at which what a picture depicts does not depend on what a competent viewer can recognize, but simply on the picture’s structure. Haugeland calls this basic level «bare bone content» (complementary to a «fleshed out content» in which the recognition takes place), and he claims that, at this level, «all the photos ‘strictly’ represent is certain variations of incident light with respect to direction» (Haugeland [1991]: 189).

In order to relate Haugeland’s claim to our – so far outlined – picture theory, we need, first of all, to relate the Goodmanian notion of phenomenal qualia (on which relies our interpretation of the Tractatus ontology) to the objective physical notion of «variations of incident light with respect to direction» used by Haugeland. That is to say, we need to address what Goodman ([1968]: 380) calls «the problem of accounting for the physical world upon a phenomenalist basis», and David Chalmers (2006) effectively characterizes as «the fall from Eden».  

In order to account for the fall from the phenomenal Eden to the physical Earth, we should treat our visual qualia (phenomenal times, visual-field places, phenomenal colors) not only as Tractatus objects, but also as Tractatus elements corresponding to other kinds of objects, namely, physical times, physical places, physical wavelengths. An enlightening characterization of such a  correspondence between phenomenal qualities and physical properties is provided by one of the thinker that mostly influenced Wittgenstein's Tractatus, namely, Hermann von Helmholtz ([1878]: 223-224):

Schopenhauer and many followers of Kant have been led to the improper conclusion that there is no real content at all in our space-perceptions, that space and its relations are purely transcendental and have nothing corresponding to them in the sphere of the real. We are, however, justified in taking our space-perceptions as signs of certain otherwise unknown relations in the world of reality, though we may not assume any sort of similarity between the sign and what is signified.

The correspondence between phenomenal qualities and physical properties allows us to claim that pictures are visual propositions about light – better to say, about spatio-temporal distributions of light energy. What we ordinarily call «pictures' subjects» are just interpretations (in Haugeland's terms: «fleshed out contents») of these visual propositions about light (in Haugeland's terms: «bare bones contents»). But our visual perceptions are in their turn visual propositions about light, and of a more fundamental kind, so that pictures can be also conceived of – like we have done so far and we are going to do in what follows – as propositions about the contents of our visual perceptions.

In order to better understand how pictures can count as propositions of this sort, let us come back to the Tractatus. First of all, a picture is a fact, that is, an aggregate of atomic visual complexes (atomic facts) in our visual field. In other words, a picture is a surface perceived in our environment. Yet this surface has something special: it is composed by atomic facts constituted by elements. These atomic facts are pixels, that is, elementary propositions. The picture is more than a mere fact (i.e., it is more than a simple surface in our environment) since it is composed by pixels that are more than mere atomic facts.

An atomic fact is something absolutely singular and concrete: a phenomenal color at a given visual-field place and at a given time. On the other hand, a pixel has a distinctive degree of abstractness, since it can be instantiated by different atomic facts (F1, F2, F3…) in different visual-field places and times, and nevertheless it still presents the same atomic state of affairs F’, in which a certain screened color C is at a certain position S’ and time T’ in the depicted scene. We can see the same picture in different moments of our life and even at different places; nevertheless, it still presents the same visual structure, since its pixels still present the same combinations of color, space and time.

That being the case, the pixels have a peculiar abstractness that is intermediate between, on the one hand, the concreteness and singularity of facts and, on the other hand, the absolute abstractness of Tractatus objects conceived as «repeatable phenomenal qualities». Pixels work as «repeatable phenomenal facts». They are not absolutely repeatable like objects are, since at every new «repetition» of a pixel only the phenomenal color remains the same whereas there is a new phenomenal time and probably also a new visual-field place. Hence there is a new fact.

Nevertheless, pixels are in some sense repeatable since – although their spatial and temporal constituents change – they still present the same state of affairs F’ constituted by the same location S’ and the same phenomenal time T’. The pixel as a fact is not repeatable, but the pixel as a presentational function is repeatable since a given state of affairs F’ can be presented by a series of facts (F1, F2, F3…) all working as if they were the same pixel. In this sense, the pixel as a presentational function can be conceived of as an abstract type that presents a visual state of affairs F’ by being instantiated by visual factual tokens (F1, F2, F3…). 

The confusion between the pixel as a type and the tokens instantiating it is the ontological fallacy that leads Berys Gaut ([2010]: 58) to argue that in digital pictures the pixel is not a «minimal denotative unit» because «the parts of a pixel denote the parts of the area of the object that the pixel denotes […]  The denotation relation still holds at the sub-pixel level. The parts of a pixel do denote, unlike the parts of a word». In other words, if we look closely at a pixel on the screen, then, according to Gaut, we can see a small colored square that has colored parts denoting in their turn. But what we truly see in looking closely at a pixel on the screen is not the pixel itself, but the token that instantiates it! Such a token is a small colored area having colored parts, but the pixel instantiated by this token is an elementary proposition having no parts at all.

The picture, as conjunction of pixels, can be conceived in its turn as an abstract type instantiated by factual tokens. In what follows, I will call such a type the picture’s design, and I will call each of its factual tokens a picture’s experienced surface. Hence a picture is an abstract design that presents a visual state of affairs, and that can be instantiated by a series of surface-facts. The design, so defined, is a visual array that mediates between a visual fact (the picture's surface, by which the design is instantiated) a visual state of affairs (the picture's subject, the depicted scene presented by the design). The surface is in our actual spatio-temporal environment, the scene is in another spatio-temporal environment, whereas the design, qua abstract type, does not belong to any spatio-temporal environment: it is just a structure of colored points.

Although a picture is made of pixels, we do not normally notice pixels while looking at pictures. We normally grasp the picture’s meaning directly at the overall picture level or at some intermediate level (e.g., figures, details etc.). But we can grasp such a meaning just because the picture is composed by pixels.

The underlying level of pixels, which makes the meaning of a picture noticeable, normally is not noticeable itself. But it can be noticed when the viewer wants to extract as much information as possible from the picture, and it can also be noticed when the maker wants to control its picture at the most detailed level, as it often happens in computer graphics practices. Although the pixel level is not noticed by usual viewers and usual makers of pictures, it makes the depicted things noticeable, and it is the ultimate level at which depiction can be exploited both by the picture’s maker and by the picture’s viewer.

A similar issue is discussed by Goodman in his account of the visual field as composed by visual complexes constituted by spatial, temporal and chromatic qualia. On the one hand, Goodman ([1967]: 261) admits that qualia are not normally noticeable by the subject of the experience: «I am not suggesting that in actual experience we first take inventory of the specific qualia of an individual and then determine its size and shape by counting these qualia and studying out their arrangement».

On the other hand, Goodman ([1967]: 263) suggests that the qualia ground the possibility of every experience:

Whatever may be the original givens of experience, qualia may still be the elements into which we ordinarily tend to dissect the content of experience in order to comprehend it according to a structural scheme that will be applicable to further experience. This would make it easy to explain, for instance, the ready apprehension of shapes; for while the combination of qualia in a certain presentation might be novel, the qualia themselves and their relations within their several fixed arrays would be familiar. If new content is analyzed as a new combination of familiar and already ordered qualia, its whole structure becomes immediately comprehensible; and this is quite consistent with our earlier observation that the pattern of qualia in a presentation is often noticed before the several qualia themselves.

Pixels are constitutive elements not only of digital pictures (in which we can actually distinguish discrete constitutive elements), but also of analogical photographic pictures, since a traditional photo «is comprised of sometimes billions of individual grains […] In this respect there is also an array of picture elements in the traditional photograph, albeit one with vastly more elements than is usual in digital photographs, and which are not arrayed in a grid. Keep on enlarging such a photograph, and in the end one will see individual grains, from which the object is not recognizable, even though the grains denote parts of the object» (Gaut [2010: 59). In this sense even a painting can be considered as composed by pixels to the extent that there is a microscopic level at which we have no more painted areas but rather individual grains of paint.

In his paper Digital Pictures, Sampling, and Vagueness: The Ontology of Digital Pictures, John Zeimbekis ([2012]: 51) shows how an appropriate technology could allow us to produces different instances of a given picture that are all «phenomenally identical in respect of color, shape, and size», namely, that instantiate the same type. Technology already allows us to do that for digital pictures, and nothing prevents us to do the same in the future with the other kinds of pictures: «what allows digital pictures to be types is not so much their dependence on binary-code representations as it is the technology that manipulates subphenomenal quantities. This, jointly with the fact that autography is not necessary for pictures, suggests that by using the same principles [...] it is possible to make type-identical paintings and analog photographs» (Zeimbekis [2012]: 51).

4. Standard of Correctness

Why are some visual facts pictures while others are not? What makes a visual fact a picture? Tractatus ontology and semantics – interpreted according to Frascolla’s hypothesis – do not allow us to wholly answer these questions.

In principle, every fact, that is, every phenomenal complex in the visual field, could be interpreted as a picture representing a given state of affairs. For example, the white wall in front of me could be interpreted not only as a mere fact constituted by the combination of temporal and spatial qualia and white qualia, but also as a pictorial fact presenting a given state of affairs (the combination of other temporal and spatial qualia with the same white qualia, composing for instance a blanket of snow). That is to say that the mere phenomenology of an experience does not allow us to distinguish in principle between mere visual facts and pictures. We need to refer to intentions, concepts, histories of production, standards of correctness, norms, practices, agreements. In other words, we need to temporary leave the Tractatus and to address the Philosophical Investigations.

Yet temporary leaving does not mean definitively giving up. In the Wittgensteinian account of depiction I am proposing, the Philosophical Investigations does not work as a confutation of the Tractatus but rather as its completion. We can apply to pictures-as-such Kenny’s ([1973]: 179) point about propositions-as-pictures:

One of the rare remarks in the Philosophical Investigations explicitly about the proposition as a picture takes up this point. ‘Thinking of a proposition as a word-picture of the facts has something misleading about it: one tends to think only of such pictures as hang on our walls: which seem simply to portray how a thing looks, what it is like. These pictures are as it were idle’ (PI, i, 291; Z 244). All these passages seem to suggest that the picture theory needs supplementing, rather than that it is false; that the theory of meaning as use is a complement rather than a rival to the picture theory. They stress the point, so often made since the 1930s, that the signs by themselves are dead and need the use to give them life.

That being the case, what is the use that can make a «dead picture» alive? We can try to answer this question by combining Tractatus' picture theory with Philosophical Investigations' meaning-as-use theory. In order to make a «dead picture» alive, we need a practice – we could call it «the depiction game» – allowing practitioners to (explicitly or at least implicitly) make the following moves.

I) Signaling that a given visual fact is not a simple fact but a special pictorial fact that has been intentionally realized in order to instantiate a design presenting a visual state of affairs. In our culture the picture frame is a ordinary pragmatic device allowing us to distinguish pictorial facts from mere visual facts, but often it is simply the coupling between the picture's content and the context in which it is exposed that allows the viewer to recognize the picture as a picture. Yet the content on its own – without the coupling with the context – is not sufficient in principle in order to distinguish the picture from a mere visual fact, as suggested for example by the case of trompe-l'oeil, by the two Magritte's paintings called The Human Condition, and especially by Arthur Danto's ([1981]: 1) thought experiment about the various indiscernible red canvas, among which there is also the depictive painting Red Table Cloth, «a still-life executed by an embittered disciple of Matisse».

II) Indicating, with a certain approximation, which are the spatial location S’ and the time T’ constituting the visual state of affairs presented by the picture (this latter being conceived of – according to the Tractatus – as a visual fact situated in spatial location S and time T but capable of presenting a different state of affairs situated in S' and T'). At least, the maker has to indicate whether the presented state of affairs is claimed to subsist in the actual world or in a certain fictional world. In principle, the picture's content does not allow the viewer to establish whether the former or the latter is the case. Yet this is the basic requirement in order to make a move in the depiction game. Without this indication, there are only two options, both unsatisfying. On the one hand, we could presuppose that a state of affair can only exists in the actual world, so that a picture just says: «in the actual world, this visual state of affairs exists»; but this presupposition has the unsound consequence that fictional pictures are just a pile of lies without any sense. On the other hand, we could assume that a picture just says: «in some possible world, this visual state of affairs subsists»; but for every visual state of affairs we can conceive of a possible world in which it subsists; so, depictively speaking, such a picture says nothing; it cannot counts as a move in the depiction game; it is, according to the Tractatus, a mere tautology.

III) Sharing a conceptual framework allowing the viewer to recognize the things intended by the maker in the visual state of affairs presented by the picture. The picture's title is the typical pragmatic device used in order to perform this move, though often the mere sharing of the same socio-historical context allows the viewer to rightly recognize the depictive intentions of the maker by simply looking at the picture. Yet there are cases, for example the use of a picture to carry out singular reference, in which the maker has to provide the viewer with further information: otherwise, «since picture perception in itself gives no information about the location of the depictum in objective space, but only appearance-based, qualitative information, there should be no epistemic resources left with which to exclude multiple reference» (Zeimbekis [2010]: 15).

These three kinds of move give us a standard of correctness providing a minimal amount of normativity that according to the later Wittgenstein is the basic requirement for something to have meaning. Scholars like Wollheim (1987), Hopkins (1998), Lopes (1996), Newall (2011) – in the framework of their mainly perceptual accounts of pictures – have acknowledged that a standard of correctness is a necessary condition of depiction, and have tried to make it explicit either in terms of the maker's intentions (Wollheim, Hopkins) or in terms of causal processes (Lopes), or as a combination of both (Newall). Yet, in the wake of Brandom's ([1994]: 13-18) pragmatic interpretation of the later Wittgenstein, the standard of correctness for pictures could be better specified in terms of a socio-historical practice instituting a depiction game. Such a game has not be confused with Goodman's (1968) account of pictures as belonging to a special symbol system, nor with Walton's (1990) games of make-believe. The former indeed relies on conventions, the latter on imagination, whereas the depiction game is essentially relying on perception.

That is because, making a picture counts as a move in the depiction game that commits the maker to the claim that in a certain world there is a certain visual state of affairs in which we can recognize certain things. On the other hand, the viewer can assess this claim by looking at the picture. But how can she carry out this assessment? Once again, the complementarity of the Tractatus and the Philosophical Investigations – in this specific case, the complementarity of the picture theory and the noticing-aspects theory – gives us a way to address the question.

5. Noticing Aspects

Wittgenstein’s considerations about noticing aspects and seeing-as (PI, part II, section XI) outline an account of the pictorial experience that can be developed in the light of the Tractarian distinction between facts (the picture’s surface as a mere visual complex experienced in the visual field) and depicting facts (the picture’s design as the presentation of a different visual state of affairs). Wittgenstein ([1953]: 193) introduces the notion of «noticing aspects» in general: «I contemplate a face, and then suddenly notice its likeness to another. I see that it has not changed; and yet I see it differently. I call this experience ‘noticing an aspect’». Then Wittgenstein ([1953]: 196) applies this notion to pictures: «I suddenly see the solution of a puzzle-picture. Before, there were branches there; now there is a human shape. My visual impression has changed and now I recognize that it has not only shape and colour but also a quite particular ‘organization’».

Finding the solution of the picture-puzzle amounts to determining which is the particular organization, but just looking for a solution already entails that the picture’s surface is seen as a special visual fact presenting another visual state of affairs in which we have to recognize «a quite particular organization». There is a fundamental difference between ordinary cases of noticing aspects and the pictorial ones: in the former we notice aspects in the visual facts we see, in the latter we notice aspects in the visual states of affairs presented by the visual facts we see.

In this sense the Tractarian distinction between mere facts and depicting facts grounds the depiction’s theory outlined in the Philosophical Investigations. In order to see the picture-puzzle as a human face, I have to consider the picture-surface’s shapes and colours as a visual design presenting something else. I have to treat the combinations of spatial, temporal and chromatic qualia constituting the surface in my visual field as pixels presenting the combinations of other spatial and temporal qualia with the corresponding chromatic qualia. The conjunction of these pixels individuates the picture as a design. Solving the picture-puzzle consists in experiencing this design as presenting a combination of phenomenal qualia, and in interpreting this combination as a given thing recognized in virtue of the application of a concept (at least, an elementary concept like «the thing that normally causes such a combination of phenomenal qualia»).

The notion of «noticing aspects» is not sufficient to characterize depiction. Wittgenstein is very explicit about this point. We can «notice aspects» also in ordinary visual experience: «I meet someone whom I have not seen for years; I see him clearly, but fail to know him. Suddenly I know him, I see the old face in the altered one» (Wittgenstein [1953]: 197). Yet there is a crucial difference between «ordinary notice aspects» and «pictorial notice aspects». The former is an interpretation of (the application of a concept to) a fact F, that is, a visual complex F directly experienced in the visual field: I meet someone and I match his face with the visual concept of an old friend’s face. The latter is an interpretation of (the application of a concept to) a state of affairs F’ presented by a fact F: I apply a concept to a visual state of affairs F’ presented by a visual complex F directly experienced in my visual field.

In order to distinguish between the ordinary noticing aspects and the pictorial one, we need the Tractarian distinction between mere facts and states of affairs presented by pictorial facts. The act of noticing aspects is the same in both cases, but it applies to different ontological domains: mere visual facts in the ordinary noticing aspects, visual states of affairs presented by pictorial facts in the pictorial noticing aspects.

I look at a picture. If I see it as a mere surface, then its points of colors belong to a space that is my space and to a time that is my time. But if I see it as a depiction, then its points of colors belong to a space that is not my space, and to a time that is not my time. Indeed, they belong to the space and time of the depicted scene. Nevertheless, I visually experience that space and that time, and I try to apply my visual concepts to the color distribution experienced in such a space-time different from my actual one.

In this sense, every picture works as a puzzle-picture. Normally we are not aware of this since our minds are so fast in applying concepts that we do not realize that we are applying these concepts to a visual state of affairs presented by the picture’s design rather than directly to the picture’s surface as a fact in our visual field. Nevertheless, when we recognize a thing in a picture, we do not place this thing in our environment, but in a peculiar pictorial space, and making this move requires that we implicitly conceive of the picture not simply as a fact in our visual field but rather as a different visual state of affairs presented by means of this visual fact.

6. Seeing-in

In developing Philosophical Investigations’ insights about pictures, Wollheim proposes to explain depiction in terms of a distinctive experience that he calls «seeing-in»:

Seeing-in is a natural capacity we have – it precedes pictures, though pictures foster it – which allows us, when confronted by certain differentiated surfaces, to have experiences that possess a dual aspect, or “twofoldness,” so that, on the one hand, we are aware of the differentiation of the surface, and, on the other hand, we observe something in front of, or behind, something else(Wollheim [1993]: 188).

Hence, in the seeing-in experience, the viewer relates to a picture along two dimensions: a configurational fold representing the picture’s surface as such, and a recognitional fold representing the depicted scene. These two folds constitute the peculiar twofoldness of the seeing-in experience. Unlike Gombrich's (1960) account of seeing-as, in which the experience of surface and that of the depicted subject can only alternate (like the experience of the duck and that of the rabbit in Jastrow's picture, cf. Wittgenstein [1953]: 194), in Wollheim's account of seeing-in the two experiential folds are concurrent

Wollheim’s theory has a great explicative power and strongly affects the contemporary philosophical debate about depiction (cf. Lopes 1996, Hopkins 1998, Abell and Bantinaki 2010). Yet Wollheim’s theory also raises an important problem that its followers find it hard to solve (cf. Budd 1992): how can we satisfactory characterize the two folds of the seeing-in experience? Assuming that we concurrently represent both the depicting surface and the depicting scene, how do we represent the surface? How do we represent the scene?

A joint reading of the considerations about pictures in the Tractatus and the Philosophical Investigations gives us useful insights in order to address these questions.

Both the configurational fold and the recognitional one have to deal with the noticing-aspects tasks described in the Philosophical Investigations. In the configurational fold the viewer notices the picture’s surface, its differentiation, the marks placed on it. In the recognitional fold, the viewer notices the things represented in the picture by applying the appropriate concepts.

Yet in order to individuate the crucial difference between the configurational and the recognitional folds, we need the Tractatus picture’s theory. The two folds of the pictorial experience apply indeed their noticing-aspects tasks to different visual structures. On the one hand, the configurational fold applies to the picture as a fact in the visual field: a visual complex constituted by chromatic qualia, spatial qualia and temporal qualia. On the other hand, the recognitional fold applies to the visual state of affairs presented by this fact: a different visual complex constituted by the same chromatic qualia but other spatial qualia and temporal qualia. Here is the main difference between the two folds: the different ontological substrata of their phenomenology. Wittgenstein’s picture theory of pictures leads back the epistemological question (how do we understand pictures?) to the ontological question (what are pictures?). It allows us to better address the former by answering the latter.

7. Seeing-as

In order to account for the the picture as a visual proposition, we have observed that the relation between the picture's surface and the depicted scene is mediated by an abstract type, a visual array, a structure of points of color that we have called the picture's design. Although the picture's design is an abstract type, it can be perceived by attending to the picture's surface, like a musical work conceived of as an abstract type can be perceived by attending to its performances (cf. Dodd [2006]: 11-16): «in listening to a symphony one hears two things at once, the symphony and a performance thereof» (Wolterstorff [1980]: 41). Since a picture, unlike a symphony, normally also has a representational content, in looking at a picture one can see three things at once: its surface, its design and its depicted scene.

Seeing-in probably provides the best explanation of the relation between the experience of the picture's surface (the visual fact F directly experienced in our visual field) and that of the depicted scene (the scene recognized in the visual state of affairs F' presented by the picture’s surface). Yet in respect of the relation between the experience of the picture's design and that of the depicted scene, the best explanation should be seeing-as. That is because the design as a visual array has its colored points all on the same plane, whereas in visually recognizing the depicted scene we have to perceive these very points as three-dimensionally organized. In order to see all the colored points of a picture on the same plane, as if the picture be a colored map, you need the same kind of perceptual switch needed in order to perceive the duck rather than the rabbit in the Jastrow's picture. You can not concurrently perceive the visual structure (the «colored map») and the depicted scene: you can only alternately perceive them.

Such a perception of the picture's visual structure as a colored map is relevant to the picture aesthetic appreciation, at least in the case of paintings, since it corresponds to the way in which the painter viewed the picture while making it. How can we take it into account? Once again, the Tractatus ontology allows us to address a perceptual issue, by showing that a picture is not simply a surface representing a scene, but it is rather a visual fact (the surface) instantiating an abstract type (the design) that presents a visual state of affairs (in which we could recognize the depicted scene). Wollheim's seeing-in can take into account the concurrent experiences of the scene and of the surface, but in order to take into account the alternation between the experience of the scene and that of the design, we need Gombrich's seeing-as. We need two distinct ways of perceiving since depiction involves two distinct relations to be experienced: that between the surface and the scene, and that between the design and the scene. That is why both seeing-in and seeing-as contribute to explain the experience of a given picture.


Abell, C., Bantinaki, K. (eds.), 2010: Philosophical Perspectives on Depiction, Oxford University Press, Oxford.

Budd, M., 1992: On Looking at a Picture, in Hopkins, J., Savile, A. (eds.), Psychoanalysis, Mind, and Art: Perspectives on Richard Wollheim, Blackwell, London.

Chalmers, D., 2006: Perception and the Fall from Eden, in Gendler, T., Hawthorne, J. (eds.), Perceptual Experience, pp. 49-125, Oxford University Press, Oxford.

Frascolla, P., 2004: On the Nature of Tractatus Objects, “Dialectica”, 58 (3), pp. 369-382.

Frascolla, P., 2010: An Adequacy Condition for the Interpretation of the Tractatus Ontology, in Frascolla, P., Marconi, D., Voltolini, A. (eds.), Wittgenstein: Mind, Meaning and Metaphilosophy, Palgrave Macmillan, Basingstoke.

Danto, A.C., 1981: The Transfiguration of the Commonplace: a Philosophy of Art,Harvard University Press, Cambridge.

Dodd, J., 2007: Works of Music: An Essay in Ontology, Clarendon, Oxford.

Gaut, B., 2010, A Philosophy of Cinematic Art, Cambridge University Press, Cambridge.

Gombrich, E., 1960, Art and Illusion, Phaidon Press, London.

Goodman, N., 1967: The Structure of Appearance (second edition), Harvard University Press, Cambridge.

Goodman, N., 1968: Languages of Art, Indianapolis, Bobbs Merrill.

Haugeland, J., 1991: Representational genera, in Ramsey, W., Stich, SP., Rumelhart, D.E. (eds.), Philosophy and Connectionist Theory, Lawrence Erlbaum, Hillsdale.

Helmholtz, H., 1878: The Origin and Meaning of Geometrical Axioms, “Mind”, 3 (10), pp. 212-225.

Hopkins, R., 1998: Picture, Image and Experience. A Philosophical Inquiry, Cambridge University Press, Cambridge.

Kenny, A.J.P., 1973: Wittgenstein, Penguin, Harmondsworth.

Kulvicki, J., 2006: On Images: Their Structure and Content, Clarendon, Oxford.

Lopes, D., 1996: Understanding Pictures, Clarendon, Oxford.

Newall, M., 2011: Pictures and the Standard of Correctness, in “Esthetica


Reye, T., 1898: Lectures on the Geometry of Position, MacMillan, New York.

Walton, K., 1990: Mimesis as Make-Believe, Harvard University Press, Cambridge.

Wittgenstein, L., 1922: Tractatus logico-philosophicus, Kegan Paul, Trench, Trubner & Co, London.

Wittgenstein, L., 1953: Philosophical Investigations, Blackwell, Oxford.

Wollheim, R., 1980: Seeing-as, Seeing-in, and Pictorial Representation, in Art and its Object (second edition), Cambridge University Press, Cambridge.

Wollheim, R., 1987: Painting as an Art, Thames and Hudson, London.

Wollheim, R., 1993: Pictures and Language, in The Mind and Its Depths, Harvard University Press, Cambridge.

Wolterstorff, N., 1980: Works and Worlds of Art, Clarendon Press, Oxford.

Zeimbekis, J., 2010: Pictures and Singular Thought, “The Journal of Aesthetics and Art Criticism”, 68 (1), pp. 11-21.

Zeimbekis, J., 2012: Digital Pictures, Sampling, and Vagueness: The Ontology of Digital Pictures,  “The Journal of Aesthetics and Art Criticism”, 70 (1), pp. 43-53.


Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License (CC-BY- 4.0)

Firenze University Press
Via Cittadella, 7 - 50144 Firenze
Tel. (0039) 055 2757700 Fax (0039) 055 2757712