Agents

Evaluation

Observability

Guardrails

Humanoids

Space

AI Agent Infrastructure / Evaluation / Observability

I build the infrastructure that makes AI agents reliable, observable, and enterprise-ready.

Staff Software Engineer with Apple-scale platform experience, founder roots, and a current focus on the reliability stack for agents, humanoids, and space-grade autonomy.

Role

Staff Software Engineer

AI agent infrastructure

Systems

Evaluation + observability

Reliable enterprise agents

Scale

Apple-scale platforms

Media, ads, commerce web systems

Founder

300+ builds / 100M+ txns

Founder and CTO roots

See the flagship case study Invite me to speak

Current build surface

Agent Reliability Stack

The public focus stack behind my thesis: build agents with modern frameworks, then make them inspectable, governed, and measurable.

Operating loop

Trace every run

Evaluate every outcome

Govern every tool

Escalate to humans

Monitor cost and latency

Learn from failures

Working stack

Runtime

PythonTypeScriptClaude AgentsCodex AgentsGoogle Agents & ADK

Agent interface

Agent SkillsMCPsGenerative UI

Trust layer

Agent Identity/SecurityLangChain

Infra + evals

Anyscale RayWeights & BiasesBraintrustLangSmith

Data plane

ClickHouseSparkAirflowData Analytics

AI systems thesis

The future belongs to autonomous systems people can inspect, trust, and improve.

My work sits at the intersection of agents, humanoids, and space: three domains where autonomy is only valuable when it is observable, governed, and resilient.

RK-01 / RELIABILITY CORE

Autonomy becomes useful when it can be inspected.

This replaces generic cards with a navigational system: ideas orbit a central operating thesis instead of sitting in equal boxes.

Agents

Designing harnesses, memory, context engineering, orchestration, and runtime supervision for long-horizon autonomous work.

Humanoids

Thinking about the reliability layer for embodied systems: tool policy, world-state memory, fleet telemetry, and human override.

Space

Studying mission-grade autonomy patterns where delayed feedback, resilience, and auditability become existential system requirements.

AI agents must be observable before they can be trusted.

Evaluation is the executive dashboard for autonomous work.

Humanoids and space will need the same reliability stack as enterprise agents.

Young technologists need proof that reinvention is possible.

The arc

Founder roots, Apple-scale platforms, then the post-AI world.

My journey connects founder speed, platform-scale engineering, and the technical pattern that matters now: making autonomous work reliable at scale.

ARC-01 / 2008 - 2012

YEH Technologies, instantPay

Founder roots

Built 300+ websites and enterprise apps, then architected fintech infrastructure processing 100M+ transactions per year.

ARC-02 / 2012 - 2017

Genpact, Flipkart

Platform builder

Moved from mobile and digital transformation into high-scale commerce systems used by thousands of operators and millions of customers.

ARC-03 / 2017 - 2023

Apple Ads, Media Products

Apple scale

Led web systems across Search Ads, News Ads, Apple Music, Apple TV, Podcasts, and Books.

ARC-04 / 2023 - now

AI Agent Infrastructure

Post-AI world

Architecting evaluation, observability, guardrails, memory, and supervision patterns for production AI agents.

Profile highlights

Real proof points behind the AI systems thesis.

The public story should not feel abstract. It comes from founder execution, fintech scale, enterprise systems, Flipkart commerce, Apple platforms, AI infrastructure, and mentorship.

FOUNDER ROOTS / YEH TECHNOLOGIES

Built 300+ websites and enterprise apps for global clients.

entrepreneurship

The early chapter was hands-on founder work: selling, designing, shipping, and learning how real customers judge software.

FINTECH SCALE / INSTANTPAY

Architected infrastructure processing 100M+ annual transactions.

CTO experience

Led the technology foundation for a multi-service fintech platform where reliability, throughput, and operator trust mattered every day.

ENTERPRISE + COMMERCE / GENPACT, FLIPKART

Built mobile, healthcare, and ecommerce systems used by thousands.

platform builder

Moved from enterprise transformation into Flipkart-scale commerce, including operator workflows and customer-facing post-order experiences.

APPLE SCALE / ADS AND MEDIA PRODUCTS

Led web systems across Apple Ads, Music, TV, Podcasts, Books, and editorial launches.

Staff engineer trajectory

This chapter proves product judgment at global scale: high-visibility consumer experiences, business tools, and cross-functional platform delivery.

AI INFRASTRUCTURE / NOW

Focused on agent observability, evaluation, guardrails, and enterprise-ready autonomy.

post-AI world

The current chapter connects everything before it: systems thinking, product taste, scale, and the reliability stack for AI agents.

MENTORSHIP / HUMAN IMPACT

Mentored 40+ engineers and professionals through the AI shift.

impact

The public mission is not only technical credibility. It is helping younger technologists see a bigger path for themselves.

What I build

A selected portfolio of systems thinking.

The work is framed around transferable architecture patterns: traces, evals, supervision, memory, policy, and high-scale product engineering.

SIG-01

Flagship case study

Agent Observability & Evaluation

A practical walkthrough of how to make agents measurable: traces, task success, hallucination checks, tool precision, and human escalation.

SIG-02

Architecture thesis

Deep Agent Runtime Patterns

Planner, executor, reviewer loops, durable memory, tool sandboxes, policy admission, and replayable execution graphs.

SIG-03

Pre-AI credibility

Apple-Scale Web Platforms

The product-scale foundation behind the AI chapter: media, ads, commerce, platform systems, and operator workflows.

Mission control

Reliable autonomy needs a control surface.

Agent fleets, guardrails, eval loops, human escalation, and future humanoid or space systems all need the same operating grammar: know what happened, why it happened, and when a human should intervene.

RK-MISSION-CONTROL / HUD KIT

Autonomy Operations Interface

STATUS / GREENSIGNAL / LIVE

MISSION QUEUE

1observe

2evaluate

3constrain

4escalate

AGENT FLEET

Reliable agent swarm

Long-horizon agents with trace replay, tool policy, evaluation loops, and escalation channels.

LIVE TELEMETRY

trace fidelity97%

tool precisionhigh

eval cadencelive

DIAGNOSTICS

observability92%

evaluation86%

guardrails88%

human oversight81%

GUARDRAIL MATRIX

tools

secrets

memory

cost

latency

policy

review

rollback

Speaking

Conference-ready talks for the agent infrastructure era.

These topics are designed for AI agent conferences, labs, startup summits, universities, and engineering leadership communities.

TALK-01

Production AI Is Nothing Like Demo AI

AI agent conferences, engineering leadership summits, startup operator events

A field guide for agent reliability, observability, escalation, and executive trust.

TALK-02

The New Staff Engineer: Architect, Evaluator, Operator

Labs, engineering orgs, universities, founder communities

How senior technologists create leverage when AI writes more of the code.

TALK-03

From India Founder to Apple Staff Engineer

Universities, developer communities, early-career technologists

A practical reinvention story for young technologists building ambitious careers.

View speaker profile

Thought leadership

Essays on agents, autonomy, and human ambition.

The writing hub gives founders, builders, labs, and conference organizers a clear view into how I think about the future.

ESSAY-01

Why Agent Observability Is the Bottleneck for Enterprise AI

essay direction

The move from demos to deployed agents depends on traces, evals, cost visibility, and supervision.

ESSAY-02

The Evaluation Problem for Long-Horizon Agents

essay direction

Task success, tool precision, hallucination rate, and execution quality need to become first-class metrics.

ESSAY-03

Agents, Humanoids, and Space: The Coming Autonomy Stack

essay direction

A long-term view of the reliability systems that will connect digital, embodied, and mission-grade autonomy.

Read essay directions

Impact

Signals that speak to executives, labs, and the next generation.

This is the credibility layer: shipped scale, Staff-level judgment, founder range, and mentorship.

SIG-01

years building software at scale

14+

Public credibility signal

SIG-02

software engineer focused on AI agent infrastructure

Staff

Public credibility signal

SIG-03

products built as founder and CTO

300+

Public credibility signal

SIG-04

annual transactions in early fintech infrastructure

100M+

Public credibility signal

SIG-05

technologists mentored through the AI shift

40+

Public credibility signal

OPEN TO / SENIOR AI LEADERSHIP, LABS, AND SPEAKING

Bring me into the room where reliable autonomy is being shaped.

Best fit: senior AI infrastructure leadership, AI agent conference speaking, founder and lab advisory, and mentorship for ambitious young technologists.

Senior AI infrastructure leadership

AI agent conference speaking

Founder and lab advisory

Young technologist mentorship

Start a conversation