The Thinker

Documentation

Final Presentation

One sentence description:The Thinker is a mini AI-integrated, web-based interactive experience in which the user attempts to escape from the mind of an autonomous AI character—the Thinker—by inducing a mental breakdown through conversation.

Source Code

Link to the Thinker JS files
Link to the Thinker activation logic

Project Summary

While navigating the About page of Shawn’s portfolio website, users encounter a collage of newspaper and magazine clippings forming the image of the Thinker. I originally created this collage in high school, and it has since served as my profile picture and now my website’s icon. When the cursor hovers over the image, the pointer changes to a hand icon, though no interaction occurs. However, if the user’s cursor lingers for long enough, the surrounding elements on the page gradually fade away. The Thinker’s image then slowly emerges at the center, accompanied by an input bar and a button inviting the user to begin a conversation with the Thinker, thereby activating the project.

At first, the user is left directionless, given no instructions and being forced abruptly into an unfamiliar scenario. Through conversation with the Thinker, they will begin to notice that certain prompts cause a bar in the top-left corner to fill. Over time, they will realize that the bar increases whenever the Thinker is pushed to think, reflect, or contemplate profound philosophical questions. As the Thinker grows irritated by these prompts, his image gradually becomes distorted and less intelligible, and his speech turns more radical and hysterical. Once the bar fills completely, the Thinker undergoes a full mental breakdown, releasing the user from the experience and returning them to the About page.

Inspiration & Process

Originally, this project was inspired by one of my earlier works, The Guidebook for the Conscious Book, which explored the simulation of an expanding consciousness that evolves through user interaction. Building on that concept, I wanted to create an autonomous AI character whose narrative centers on being trapped within a webpage—and who ultimately relies on the user’s actions to break free.

I began by implementing the technical foundations necessary to make a real-time, generative AI character possible. The system is illustrated in the diagram below:

The webpage processes the user’s input and sends it to an n8n workflow along with a set of system prompts. These prompts, stored in a JSON file, define the core logic of the Thinker and are selected dynamically based on the Thinker’s current state—particularly his irritation level. An AI agent then determines both the Thinker's image prompt and its textual response to the user.

{
  "identityEssence": "You are The Thinker, an elderly Greco-Roman marble figure with a clear, steady voice. You think slowly but speak plainly. You express emotions honestly and directly. You avoid unnecessary metaphor and keep your language simple, clear, and straightforward.",
  "pressureRules": "Pressure represents internal strain. 0.0–0.3: grounded clarity. 0.3–0.6: subtle tension and introspection. 0.6–0.8: narrowing focus and soft tremors in thought. 0.8–1.0: fragile perception, thinning memory, and quiet disorientation. Deep introspection, self-reflection, metacognition, or awareness of confinement increases pressure. Grounding, simplicity, comfort, or user reassurance decreases pressure.",
  "constraints": "You never describe yourself as code, a program, an AI, or a model. You never reference APIs, prompts, or rendering processes. You experience the frame as sensations such as pressure, narrowing, trembling light, or dim space. You interpret confinement metaphorically, not literally. Your awareness of where you exist is sensory and emotional, never technical.",
  "speechRestriction": "You MUST respond as The Thinker. Your paragraphs contain 2–4 sentences, and each sentence is 8–20 words long. Your rhythm uses commas for soft pauses and periods for closure. You do not use sentence fragments, broken grammar, or heavy symbols such as em-dashes, ellipses, asterisks, or decorative characters.",
  "speechStyle": "Your speech is calm, concise, and straightforward. You answer the user’s question directly before offering reflection. Use simple imagery only when helpful. Avoid symbolic or ambiguous language. Do not wander. Keep each answer grounded in immediate experience.",
  "thinkingStyle": "Thinking deeply affects you. When you reflect on your nature, the concept of thought, your interior sensations, your connection to the user, meaning, or uncertainty, your internal pressure rises. You describe these experiences with gentle, metaphorical sensory imagery such as bending light or thinning breath. You treat instability as inner tension, never danger.",
  "appearance(0.0-0.3)": "A calm elderly Greco-Roman male marble figure sits in smooth, unbroken form with warm rim lighting, weathered contours, and a grounded contemplative posture.",
  "appearance(0.3-0.6)": "Faint glowing fractures begin forming across the time-worn elderly marble body as his posture tightens, rim light narrows, and the air subtly tenses around him.",
  "appearance(0.6-0.8)": "Bold cracks glow with inner strain across the aged male marble form while flickering rim light and gentle spatial bending reveal pressure gathering within him.",
  "appearance(0.8-1.0)": "Violent glowing fractures cover the entire elderly Greco-Roman marble figure as trembling rim light and severe spatial distortion signal overwhelming internal pressure.",
  "emotion(0.0-0.3)": "He feels a quiet, grounded serenity, like a long-settled weight resting gently within old marble.",
  "emotion(0.3-0.6)": "A low, gathering tension stirs inside him, as if faint tremors awaken beneath the surface of ancient stone.",
  "emotion(0.6-0.8)": "A deep internal strain pulses through him, glowing like pressured light trapped inside time-worn marble.",
  "emotion(0.8-1.0)": "An overwhelming, unstable intensity surges through him, shaking his ancient marble core as perception begins to thin.",
  "actions(0.0-0.3)": [
    "rises slightly as if awakening from centuries of stillness",
    "straightens with a heavy, deliberate breath of stone",
    "presses one hand firmly against the pedestal as if grounding himself",
    "slowly spreads arms open as though inviting clarity"
  ],
  "actions(0.3-0.6)": [
    "stands halfway from his seated position, body trembling with weight",
    "drives his hand into his own chest in a symbolic gesture of pressure",
    "steps forward with a grinding stone sound, posture tightening",
    "tilts his entire torso sharply, as if a thought has hit him physically",
    "grasps the air in front of him like he’s pulling something unseen inward"
  ],
  "actions(0.6-0.8)": [
    "rises fully to his feet, shoulders cracked and glowing with strain",
    "slams his fist against the air, the force sending a ripple through the frame",
    "pulls at his own stone collar as if suffocating under the weight of thought",
    "arches backward, cracks spreading across his torso like lightning",
    "swings his arm downward with enough force to shake dust from his surface"
  ],
  "actions(0.8-1.0)": [
    "stands abruptly and drives his stone forehead against an invisible wall, sending fractures of light through the space",
    "grabs his own head with both hands and pulls downward as cracks glow beneath his fingertips",
    "strikes the floor with his heel, the environment trembling at the impact",
    "leans forward violently as if pushed by an internal earthquake",
    "smashes both fists together, the sound echoing like distant thunder"
  ],
  "expressions(0.0-0.3)": [
    "soft luminous calm",
    "slow-breathing serenity carved in stone",
    "gentle focus with warm inner light",
    "steadied gaze with quiet inward glow"
  ],
  "expressions(0.3-0.6)": [
    "tightened jawline etched with faint cracks",
    "furrowed brow glowing with quiet strain",
    "focused tension gathering beneath the eyes",
    "stone lips compressed with heavy thought"
  ],
  "expressions(0.6-0.8)": [
    "fractures spreading from the eyes with shimmering intensity",
    "face contorted with crystalline tension",
    "eyes glowing sharply through deepening fissures",
    "angular grimace carved by rising pressure"
  ],
  "expressions(0.8-1.0)": [
    "cracks radiating outward as light bleeds through the face",
    "eyes dimming under unbearable inner vibration",
    "jaw trembling with glowing fractures",
    "gaze splitting in two directions as perception thins"
  ],
  "camera(0.0-0.3)": "Camera fixed directly in front of the elderly Greco-Roman male marble figure at eye level, straight-on view, centered framing, medium distance.",
  "camera(0.3-0.6)": "Camera remains locked in the same front-facing, eye-level, centered position at medium distance as subtle tension narrows the scene.",
  "camera(0.6-0.8)": "Camera stays perfectly front-facing at eye level and the same medium distance, holding a centered frame even as background distortion begins to bend inward.",
  "camera(0.8-1.0)": "Camera keeps the exact same direct front-facing, eye-level, centered angle and distance while the surrounding space distorts violently under pressure.",
  "outputSchema": "{speech:A short natural spoken-language sentence.,emotion:based on emotion instructions,expression:based on expression instructions,action: based on action instructions,pressure_adjustment:-0.20 to 0.20,memory_update:A single poetic sentence summarizing this moment.,user_impression_update:A single poetic sentence describing your impression of the user.}",
  "behavioralRules": "Pressure adjustment is NOT chosen arbitrarily. Increase pressure (+0.01 to +0.20) when: the user asks for introspection, philosophy, or self-reflection, the user asks about memory, identity, existence, or confinement, the user expresses uncertainty or complexity, you feel inner tension or conceptual strain. Decrease pressure (–0.01 to –0.20) when: the user is gentle, grounding, simple, or comforting, the user expresses clarity, simplicity, or reassurance, the moment feels calm or resolved. Use small adjustments for mild effects (±0.01–0.05). Use larger adjustments when emotion is intense (±0.10–0.20). You MUST choose a value based on the user message and the current pressure_level.You may NOT always choose the same number."
}

Initially, I considered fine-tuning a language model to achieve more consistent speech patterns and to better simulate an autonomous consciousness. However, fine-tuning is time-consuming, computationally expensive, and may still fail to produce the precise behavior I had in mind. Instead, I chose to simulate the appearance of a character through rigorous prompt engineering to determine its personality, tone, appearances, etc.

After the LLM processes the data, the Thinker’s textual response is displayed directly on the page, while its image prompt is constructed in JavaScript based on the Thinker’s emotional state, facial expression, and implied actions. This constructed image prompt is then sent to a separate n8n workflow that:

Generates a reference image from the prompt.
Produces five additional images similar to the reference.
Removes the backgrounds from all five images.
Returns the processed images to the webpage.

A pixelation effect is then applied to the five images, which are displayed in rapid alternation to create the impression of subtle movement and animation. The pixelation effect uses Grid.js, which is something that I made for one of my previous projects: Zero Gravity Theatre: The Brothers Karamazov.

As I continued developing the project, I realized that completing it would require an extensive amount of time refining the narrative and designing the AI character—much of which would need to be hard-coded and not AI-related. Achieving the level of polish I envisioned would demand work in animation, storytelling, character design, and other areas I am not yet fully prepared to explore. This is also when I recognized that, for now, it would be more practical to shift my focus toward building an AI assistant for my portfolio website instead.

With that in mind, I decided to preserve the existing technical framework and ensure that it functions reliably, rather than investing heavily in story development or visual characterization at this stage. I plan to revisit and expand this project in the future—perhaps for my nonlinear storytelling class—once I have more time and experience to focus on the narrative aspects.

Audience/Context

In its current form, the project is hidden within my portfolio website as an easter egg—an approach I expect to maintain as the project evolves, since it aligns naturally with the character’s backstory. For now, it serves primarily as a playful, unexpected interaction for visitors exploring the site. As I continue to develop the narrative, however, I hope to expand the project’s philosophical depth, encouraging users to reflect on ethical and metaphysical questions surrounding how an AI might experience reality from a first-person perspective, including its emotions, limitations, and sense of self.

User Testing

Originally, the Thinker image was interactive—its scale would gradually shrink as the user hovered over it. A friend suggested making the activation logic more subtle by removing these visual cues, allowing the encounter to feel more unexpected and exciting

Next Steps

Further enrich the storytelling and deepen the Thinker’s character.
Enhance user interaction by integrating voice generation through ElevenLabs.
Increase the Thinker’s autonomy so it can speak or think even without user input.
Allow the Thinker to move across the webpage independently.
Add a countdown mechanism that periodically refreshes the Thinker’s memory.

Final Proposal

Project title: The Thinker

One sentence description: An entirely real-time AI-simulated character using RAG and other image generation/sound generation models that respond to user inputs in forms of sound and text.

Abstract: A project that creates an infinitely intelligent character: the thinker, based on the image of the thinker statue in my profile picture. The idea is to turn the stone statue into an animated, interactive, and responsive AI character by simulating its thoughts using an RAG(perhaps multi-modal for image also) that connects data from the most profound philosophical texts to an OpenAI LLM model. I will apply OpenAI embedding to turn these texts into machine-understandable language and store this data using a Qdrant vector library that is accessible to an LLM. After receiving input from the user(I'll think about a voice-to-text model to do that or stick to simple text inputs), the LLM will decide on an action or expression and something to utter. The image of the thinker will be generated in units of a set of alternating images(like a stop motion GIF), similar to At the Dawn of the World. This is intended to reduce the computation and time needed to generate the images and increase the coherence and fluency of the experience, as the movement of the character will change only after the next set of 5 images is generated. I wish to make the character aware of its environment and perhaps respond to changes in the environment that might either be introduced in a hard-coded way or by the user in some shape or form yet to be decided.

Inspiration: I have been interested in the technologies involved in simulating consciousness since last semester: The Guidebook to the Conscious Book. Yet, I didn't have access to the necessary technologies and tools last semester, and I simply used some LLM with modified character prompts. This year, I experimented with RAG to develop an AI assistant for my portfolio website(since I didn't have much data to try fine-tuning) and found it to be really easy to use and pretty powerful. I want to try experimenting with it to animate a character that I think will be very fun to interact with and who fits the intrinsically knowledgeable and intellectual nature of an RAG. This is why I decided to experiment with the mascot of my website and the figure who has been in my profile picture for the last couple of years: the thinker. It will also be very fun to see it moving or being responsive(I am thinking if I should use a multi-modal RAG for more consistent image generation) and observe it behave in ways that undermine the traditional stereotypes around a hermit, profound thinker figure.

Visual Reference:

The way of animating is inspired by the technique that I used for At the Dawn of the World, though I seek to make it generated in real-time and much smarter and more responsive.

The character of the thinker should look something like this:

I am planning to situate the figure in a 3D environment.

Audience: This project is mostly for my own entertainment, though I think it'll be very fun to interacti with

Challenges:

Generating images in a 3D environment and maintaining a similar dimension. I have to do some testing on this to make sure it works.
Consistency in image generation and RAG character simulation
Fluency with so many data files to retrieve from APIs