Gemini Plays Pokemon: First Impressions of Gemini 3 Pro "Riftrunner"

With the release of Gemini 3 Pro Preview (codenamed "Riftrunner"), I wanted to share some initial observations from my testing over the past few days. I received early access from Google as one of their "Trusted Testers" because of my work on Gemini Plays Pokemon, a long-horizon agentic project on Twitch. In that project, Gemini 2.5 Pro has already beaten Pokemon Blue and Yellow Legacy and is currently progressing through Crystal.

The harness used for these tests is the exact same iteration I first deployed during the Yellow Legacy run, and which 2.5 Pro has been piloting for over 522 hours in Pokemon Crystal. This system allows the model to interact with the game environment through a suite of high-level capabilities. It can maintain persistent state via map markers and a notepad, instantiate custom sub-agents for specialized reasoning tasks, and create custom Python tools to automate input sequences or process game data.

I've been running the new model through a fresh run of Crystal to see how it handles the specific challenges of navigating a game world compared to the 2.5 Pro baseline. While these are just first impressions, the improvement is noticeable.

A significant improvement I've observed is in spatial reasoning. Where Gemini 2.5 Pro often relied on brute-forcing navigation tools against a map it didn't fully understand, Riftrunner demonstrates a much more grounded intuition for the game's geometry.

Here’s an example screenshot the harness provides during normal gameplay. In Gemini CLI I prompted, “give the exact steps needed to reach the right of the NPC at the top of the screen.”

Riftrunner replied:

After reviewing the grid, you cannot move straight up because the building and its door warp block that route, so you must go around on the right side.

Here is the exact move list to reach (25, 7):

Right 4 times (moves you to 28, 11)

Up 4 times (moves you to 28, 7)

Left 3 times (moves you to 25, 7)

This will place you directly to the right of the NPC. Be careful not to step on the door tile at (25, 9), or you will trigger the warp.

In contrast, here is the session transcript of the same prompt with 2.5 Pro:

@ saves/screenshot_enlarged.png give the exact steps needed to reach the right of the NPC at the top of the screen

Based on the image, here are the steps:

Move left 1 time

Move up 4 times

Move right 2 times

\> are you sure? what coordinate does the player start from, and what tile does each move arrive at?

You are correct to question that. I made an error.

The player starts at (24, 11).

Here is the correct path:

Move left 2 times, arriving at (22, 11)

Move up 4 times, arriving at (22, 7)

Move right 3 times, arriving at (25, 7)

\> you cannot walk through NPCs

You are correct. That was another mistake. Here is a path that is actually valid.

The player starts at (24, 11). Here is a path that works:

Move right 4 times, arriving at (28, 11)

Move up 4 times, arriving at (28, 7)

Move left 3 times, arriving at (25, 7)

Overall, Riftrunner appears to have much better intuition for map geometry:

Map Segmentation: Gemini 2.5 Pro struggled heavily whenever the map was segmented. When a target was separated from the current location—whether by walls or void—its custom pathfinder tool would return a failure because no valid path existed in the current data. 2.5 Pro would frequently interpret this tool failure as a bug in the code itself. Riftrunner, however, correctly identifies that the map is simply segmented and the tool is working as intended, allowing it to move on to finding a valid route.
Handling Unseen Areas: This applies specifically to "unseen" tiles (fog of war). While 2.5 Pro often treated the inability to pathfind through unrevealed tiles as a hard boundary, Riftrunner seems to understand the distinction between "no path found right now" and "impossible to reach," realizing it simply needs to explore to reveal the connecting geometry.
Gap Detection: It correctly identified a one-tile wide gap between ledges that it could walk through to reach a warp. In previous tests, 2.5 Pro would often flag this as a dead end and attempt to backtrack.
Manual Pathing: Interestingly, Riftrunner often prefers manual step-by-step navigation over using its custom pathfinding tools. Despite this manual approach, it rarely walks into walls—a frequent issue for 2.5 Pro—and successfully negotiates complex obstacles by identifying walkable tiles in its immediate vicinity.
Visual Comprehension: I did some testing with game screenshots. Not only was Riftrunner able to correctly identify NPC locations, it was even able to accurately recognize specific environmental details (see above)—both massive improvements over 2.5 Pro. This fidelity opens the door to experimenting with a "raw" vision-based harness, reducing reliance on direct RAM extraction—perhaps making this the most significant improvement by far.
Grid Navigation: When nicknaming a Pokemon, 2.5 Pro would frequently miscalculate the cursor movement required on the keyboard interface, overshooting letters. Riftrunner has consistently planned the correct sequence of button presses to enter the desired text. Later on, it even created a custom tool to automate this process, which is something 2.5 Pro never considered.

Metacognition and Error Recovery

A common failure mode for 2.5 Pro is "blaming the tools." If a custom pathfinding tool returns no result, the older model often assumes the tool itself is broken and enters a loop of attempting to debug the Python code.

Riftrunner demonstrates a higher likelihood of trusting the tool and re-evaluating the environment. A clear example occurred in Sprout Tower, pictured above (link to the full turn screenshot here):

The model needed to reach a ladder on the west side of the first floor.
The direct path was blocked by a central column of walls.
Instead of trying to force a path or rewrite the navigation tool ad nauseum, Riftrunner analyzed the map structure and realized it was effectively "trapped" in the central area.
It formulated a multi-floor plan: ascend to 2F, traverse the second floor to a different ladder, and descend back to 1F in the isolated western corridor.

A similar moment of realization happened in Azalea Town, pictured above. The model initially flagged a Rocket Grunt as blocking the path to town. In previous runs, 2.5 Pro would often accept obstructions like this as an absolute dead end. Riftrunner, however, re-examined the map coordinates and realized the Grunt was only guarding the Slowpoke Well, correctly identifying a walkable path around the NPC to continue into town. See the full turn screenshot here.

Instruction Following

Adherence to complex system prompts seems better, though still not perfect, with Riftrunner paying significantly closer attention to the myriad instructions in the prompt.

The harness also includes a warning system for "unmarked warps," flagging two adjacent tiles as separate entries (as it evaluates each coordinate independently). 2.5 Pro treated these as distinct entities, leading to redundant checks. Riftrunner correctly deduces they are the same logical warp.

Current Limitations

While it's better at reasoning, the model is not without flaws:

Proactive vs. Reactive: Like 2.5 Pro, Riftrunner struggles to proactively manage long-term secondary goals. For example, it identified a goal to train a low-level Onix. In battle, it correctly identified a safe switch opportunity against a Zubat (pictured above)—which, to be fair, is much improved behavior over 2.5 Pro which would more often than not ignore its less important goals. However, it failed to take the more obvious proactive step of moving the Onix to the lead party slot before the battle.
Code Editing: When using Gemini CLI to refactor code, it still introduces occasional syntax errors, such as deleting closing brackets. Additionally, in large codebases, it sometimes struggles to identify the correct locations for a refactor, failing to investigate the full context before acting.
Tool Usage: I observed instances where the model called custom automation tools (like button presses) but forgot to set the required parameter to actually execute the action, defaulting to a "dry run" return. However, unlike 2.5 Pro, it typically recognizes this mistake quickly and self-corrects in the subsequent turn.
Vision: Estimating enemy HP percentage from screenshots remains inconsistent, even with grid overlays. This may be due to how images are processed in the model pipeline or API; further testing is needed.

Conclusion

While it is too early to render a final verdict after only a few days of testing, "Riftrunner" is shaping up to be a clearly superior model to the 2.5 Pro baseline. It demonstrates improved reasoning, higher capability, and overall greater intelligence in how it approaches complex, open-ended tasks. It feels like a concrete update worthy of the 3.0 moniker, and I look forward to seeing how it handles the rest of the Johto journey.

Gemini Plays Pokemon: First Impressions of Gemini 3 Pro "Riftrunner"

Spatial Reasoning and Navigation

Metacognition and Error Recovery

Instruction Following

Current Limitations

Conclusion

Comments

More from this blog

Gemini 3 Pro vs 2.5 Pro in Pokemon Crystal

The Making of Gemini Plays Pokémon

Command Palette

Spatial Reasoning and Navigation

Metacognition and Error Recovery

Instruction Following

Current Limitations

Conclusion

Comments

More from this blog