AI Agent Infrastructure / Evaluation / Observability

I build the infrastructure that makes AI agents reliable, observable, and enterprise-ready.

Staff Software Engineer with Apple-scale platform experience, founder roots, and a current focus on the reliability stack for agents, humanoids, and space-grade autonomy.

Role

Staff Software Engineer

AI agent infrastructure

Systems

Evaluation + observability

Reliable enterprise agents

Scale

Apple-scale platforms

Media, ads, commerce web systems

Founder

300+ builds / 100M+ txns

Founder and CTO roots

Current build surface

Agent Reliability Stack

The public focus stack behind my thesis: build agents with modern frameworks, then make them inspectable, governed, and measurable.

Operating loop

Trace every run
Evaluate every outcome
Govern every tool
Escalate to humans
Monitor cost and latency
Learn from failures

Working stack

Runtime

PythonTypeScriptClaude AgentsCodex AgentsGoogle Agents & ADK

Agent interface

Agent SkillsMCPsGenerative UI

Trust layer

Agent Identity/SecurityLangChain

Infra + evals

Anyscale RayWeights & BiasesBraintrustLangSmith

Data plane

ClickHouseSparkAirflowData Analytics

AI systems thesis

The future belongs to autonomous systems people can inspect, trust, and improve.

My work sits at the intersection of agents, humanoids, and space: three domains where autonomy is only valuable when it is observable, governed, and resilient.

RK-01 / RELIABILITY CORE

Autonomy becomes useful when it can be inspected.

This replaces generic cards with a navigational system: ideas orbit a central operating thesis instead of sitting in equal boxes.

01

Agents

Designing harnesses, memory, context engineering, orchestration, and runtime supervision for long-horizon autonomous work.

02

Humanoids

Thinking about the reliability layer for embodied systems: tool policy, world-state memory, fleet telemetry, and human override.

03

Space

Studying mission-grade autonomy patterns where delayed feedback, resilience, and auditability become existential system requirements.

AI agents must be observable before they can be trusted.
Evaluation is the executive dashboard for autonomous work.
Humanoids and space will need the same reliability stack as enterprise agents.
Young technologists need proof that reinvention is possible.

The arc

Founder roots, Apple-scale platforms, then the post-AI world.

My journey connects founder speed, platform-scale engineering, and the technical pattern that matters now: making autonomous work reliable at scale.

ARC-01 / 2008 - 2012

YEH Technologies, instantPay

Founder roots

Built 300+ websites and enterprise apps, then architected fintech infrastructure processing 100M+ transactions per year.

ARC-02 / 2012 - 2017

Genpact, Flipkart

Platform builder

Moved from mobile and digital transformation into high-scale commerce systems used by thousands of operators and millions of customers.

ARC-03 / 2017 - 2023

Apple Ads, Media Products

Apple scale

Led web systems across Search Ads, News Ads, Apple Music, Apple TV, Podcasts, and Books.

ARC-04 / 2023 - now

AI Agent Infrastructure

Post-AI world

Architecting evaluation, observability, guardrails, memory, and supervision patterns for production AI agents.

Profile highlights

Real proof points behind the AI systems thesis.

The public story should not feel abstract. It comes from founder execution, fintech scale, enterprise systems, Flipkart commerce, Apple platforms, AI infrastructure, and mentorship.

FOUNDER ROOTS / YEH TECHNOLOGIES

Built 300+ websites and enterprise apps for global clients.

entrepreneurship

The early chapter was hands-on founder work: selling, designing, shipping, and learning how real customers judge software.

FINTECH SCALE / INSTANTPAY

Architected infrastructure processing 100M+ annual transactions.

CTO experience

Led the technology foundation for a multi-service fintech platform where reliability, throughput, and operator trust mattered every day.

ENTERPRISE + COMMERCE / GENPACT, FLIPKART

Built mobile, healthcare, and ecommerce systems used by thousands.

platform builder

Moved from enterprise transformation into Flipkart-scale commerce, including operator workflows and customer-facing post-order experiences.

APPLE SCALE / ADS AND MEDIA PRODUCTS

Led web systems across Apple Ads, Music, TV, Podcasts, Books, and editorial launches.

Staff engineer trajectory

This chapter proves product judgment at global scale: high-visibility consumer experiences, business tools, and cross-functional platform delivery.

AI INFRASTRUCTURE / NOW

Focused on agent observability, evaluation, guardrails, and enterprise-ready autonomy.

post-AI world

The current chapter connects everything before it: systems thinking, product taste, scale, and the reliability stack for AI agents.

MENTORSHIP / HUMAN IMPACT

Mentored 40+ engineers and professionals through the AI shift.

impact

The public mission is not only technical credibility. It is helping younger technologists see a bigger path for themselves.

What I build

A selected portfolio of systems thinking.

The work is framed around transferable architecture patterns: traces, evals, supervision, memory, policy, and high-scale product engineering.

SIG-01

Flagship case study

Agent Observability & Evaluation

A practical walkthrough of how to make agents measurable: traces, task success, hallucination checks, tool precision, and human escalation.

SIG-02

Architecture thesis

Deep Agent Runtime Patterns

Planner, executor, reviewer loops, durable memory, tool sandboxes, policy admission, and replayable execution graphs.

SIG-03

Pre-AI credibility

Apple-Scale Web Platforms

The product-scale foundation behind the AI chapter: media, ads, commerce, platform systems, and operator workflows.

Mission control

Reliable autonomy needs a control surface.

Agent fleets, guardrails, eval loops, human escalation, and future humanoid or space systems all need the same operating grammar: know what happened, why it happened, and when a human should intervene.

RK-MISSION-CONTROL / HUD KIT

Autonomy Operations Interface

STATUS / GREENSIGNAL / LIVE

MISSION QUEUE

1observe
2evaluate
3constrain
4escalate

AGENT FLEET

Reliable agent swarm

Long-horizon agents with trace replay, tool policy, evaluation loops, and escalation channels.

LIVE TELEMETRY

trace fidelity97%
tool precisionhigh
eval cadencelive

DIAGNOSTICS

observability92%
evaluation86%
guardrails88%
human oversight81%

GUARDRAIL MATRIX

tools
secrets
memory
cost
latency
policy
review
rollback

Speaking

Conference-ready talks for the agent infrastructure era.

These topics are designed for AI agent conferences, labs, startup summits, universities, and engineering leadership communities.

TALK-01

Production AI Is Nothing Like Demo AI

AI agent conferences, engineering leadership summits, startup operator events

A field guide for agent reliability, observability, escalation, and executive trust.

TALK-02

The New Staff Engineer: Architect, Evaluator, Operator

Labs, engineering orgs, universities, founder communities

How senior technologists create leverage when AI writes more of the code.

TALK-03

From India Founder to Apple Staff Engineer

Universities, developer communities, early-career technologists

A practical reinvention story for young technologists building ambitious careers.

View speaker profile

Impact

Signals that speak to executives, labs, and the next generation.

This is the credibility layer: shipped scale, Staff-level judgment, founder range, and mentorship.

SIG-01

years building software at scale

14+

Public credibility signal

SIG-02

software engineer focused on AI agent infrastructure

Staff

Public credibility signal

SIG-03

products built as founder and CTO

300+

Public credibility signal

SIG-04

annual transactions in early fintech infrastructure

100M+

Public credibility signal

SIG-05

technologists mentored through the AI shift

40+

Public credibility signal

OPEN TO / SENIOR AI LEADERSHIP, LABS, AND SPEAKING

Bring me into the room where reliable autonomy is being shaped.

Best fit: senior AI infrastructure leadership, AI agent conference speaking, founder and lab advisory, and mentorship for ambitious young technologists.

Senior AI infrastructure leadership
AI agent conference speaking
Founder and lab advisory
Young technologist mentorship
Start a conversation