We present a unified computational theory of an agent's perception and memory. In our model, both perception and memory are realized by different operational modes of the oscillating interactions between a symbolic index layer and a subsymbolic representation layer. The two layers form a bilayer tensor network (BTN). The index layer encodes indices for concepts, predicates, and episodic instances. The representation layer broadcasts information and reflects the cognitive brain state; it is our model of what authors have called the “mental canvas” or the “global workspace.” As a bridge between perceptual input and the index layer, the representation layer enables the grounding of indices by their subsymbolic embeddings, which are implemented as connection weights linking both layers. The propagation of activation to earlier perceptual processing layers in the brain can lead to embodiments of indices. Perception and memories first create subsymbolic representations, which are subsequently decoded semantically to produce sequences of activated indices that form symbolic triple statements. The brain is a sampling engine: only activated indices are communicated to the remaining parts of the brain. Triple statements are dynamically embedded in the representation layer and embodied in earlier processing layers: the brain speaks to itself. Although memory appears to be about the past, its main purpose is to support the agent in the present and the future. Recent episodic memory provides the agent with a sense of the here and now. Remote episodic memory retrieves relevant past experiences to provide information about possible future scenarios. This aids the agent in decision making. “Future” episodic memory, based on expected future events, guides planning and action. Semantic memory retrieves specific information, which is not delivered by current perception, and defines priors for future observations. We argue that it is important for the agent to encode individual entities, not just classes and attributes. Perception is learning: episodic memories are constantly being formed, and we demonstrate that a form of self-supervised learning can acquire new concepts and refine existing ones. We test our model on a standard benchmark data set, which we expanded to contain richer representations for attributes, classes, and individuals. Our key hypothesis is that obtaining a better understanding of perception and memory is a crucial prerequisite to comprehending human-level intelligence.