Toptal · Lennar · Phase II

Engagement Proposal

Agentic Competitive
Intelligence.

Replacing Apify with an internally-owned ingestion platform — agent-driven, configuration-first, and native to the AI-Lennar stack.

For Lennar Engineering Leadership

01Where we are today

A vendor pipeline that scales the wrong axis.

Current state · Apify

Each competitor = a custom build. Each day = a delivery charge.

Cost shape

Linear in competitors. Linear in days. Onboarding gated by an external dev team.

Operational shape

Site change → external ticket → wait. Quality and freshness sit outside our control.

Coverage shape

~4–5 competitors today. Adding the next one looks like the last one — a project, not a config.

Today's pipeline

02Objective

Stand up a parallel, internally-owned platform that makes Apify unnecessary.

Coverage

Pricing, inventory & QMI, and floor plan / community detail — the full Apify surface.

Cost

Materially below per-competitor and per-delivery Apify spend, and flat as competitor count grows.

Onboarding

New competitor = configuration. No code path, no external dependency.

Quality

Equal or higher fidelity and freshness than today's delivery.

Greenfield · runs in parallel · Apify untouched during the engagement

Out of scope: consumer surfaces · email-agent system

Objective

03Precedent

The pattern is already in production.

The promotions email-agent runs on this exact pattern today — autonomous agents, LLM extraction, Athenix serving. We're extending it, not inventing it.

Promotions agent · in production · same building blocks: agent runtime + LLM extraction + Athenix tables.

→ extend, don't rebuild

The pattern

04Architecture

End-to-end, on the AI-Lennar stack.

Architecture

05Inside the platform

The agent loop, end to end.

Each step is its own Bedrock agent · tool-using · memory-backed

selectors strengthen with every run

The agent loop

06Stack alignment

Lands on what AI-Lennar already runs.

Layer	Constraint	What we use	Why
Agent runtime	Bedrock / AgentCore	Bedrock + AgentCore	Same runtime as the email-agent system. Tool use, memory, IAM all in place.
Transformation	Snowflake + dbt (ATX 2.0)	Snowflake + dbt	Selectors land raw records; dbt models normalize, SCD-track, and surface marts.
Landing & serving	Athenix	Athenix	Same landing contract as today's Apify drop and the promotions agent.
Fetching	—	Oxylabs (residential proxy)	Anti-bot resilience with usage-priced fetches. Replaces Apify's per-delivery line.
Schema discovery	—	AgentQL	Lets the schema agent understand a new site without bespoke code.

Stack alignment · Two new vendors: Oxylabs & AgentQL · usage-priced

07Onboarding

Adding a competitor is a config, not a project.

competitors / kb_homes.yml

#  competitor configuration
name: kb_homes
root_url: https://www.kbhome.com
regions: [TX, FL, AZ, CA, NV]
data_types:
  - pricing
  - inventory
  - community_detail
cadence: daily
discovery:
  start_path: /find-your-home
  agent: route_agent_v2
selectors:
  agent: selector_agent_v3
  bootstrap_mode: agentql
fetch:
  proxy: oxylabs.residential
  render: js
#  that's the whole onboarding.

01 / DROP IN A YAML

Domain, regions, data types, cadence. That's the human surface.

02 / AGENTS BOOTSTRAP

Route → Schema → Selector agents run once. Site map and selectors land in the registry.

03 / DRY-RUN VALIDATION

Sample fetch, parse, validate. Diff against expected schema. Operator approves.

04 / GO LIVE

Daily run kicks in. Athenix tables populate. No code merged, no external ticket.

Onboarding

08Resilience

Sites change. The platform notices and self-heals.

Selector versioning

Every selector is versioned and addressable. Field-level success rates are tracked per run; degradation triggers a re-derivation.

LLM fallback on miss

A failed selector falls through to an LLM extractor on the same HTML. The result trains the next selector version automatically.

Anti-bot via Oxylabs

Residential proxies, JS rendering, configurable cadence and jitter. Block rates feed back into the fetch policy.

Drift detection

Distribution checks per field — null rate, value range, type — per competitor, per day. Drift alarms before downstream sees it.

Replayable raw store

Every fetched HTML is versioned in S3. Any record can be re-parsed against a new selector — no re-fetching the site.

Triage console

A single screen lists failing competitors / fields, the offending HTML diff, and a one-click "regenerate selector" action.

Resilience · Quality is a closed loop, not an SLA hope

09Phased plan · 16 weeks

Four months, four phases, one early proof point.

Solo extension · Amin · 16 weeks · ends with internal Lennar team owning it

★ M1 at week 6 · ★ M2 = handoff at week 16

Plan

10Early proof point

Week 6. One competitor, all three data types, end-to-end in Athenix.

06wk

to first end-to-end run

3/3

data types covered at M1

1→N

framework, not a one-off

M1 acceptance criteria

For one chosen competitor, the daily run produces pricing, inventory and community detail rows in Athenix tables that downstream BI can already query — at parity or better than today's Apify drop, on the same schema contract.

What ships with M1

Route, Schema, Selector, Fetcher, Parser, Validator agents in Bedrock.
Selector registry & raw HTML store in S3.
dbt models landing into Athenix on the existing contract.

Milestone 1

11Cost shape

Fixed build cost. Variable run cost. Flat onboarding.

Illustrative shape. Real curves land after the first trial runs.

FIXED · ENGAGEMENT

Toptal · Amin · 16 weeks · solo extension. One bill, ends at handoff.

VARIABLE · RUNTIME

Bedrock tokens (agent reasoning) · Oxylabs fetches (per-page) · AgentQL calls (mostly onboarding + drift events).

FLAT · ONBOARDING

Adding the 5th, 6th, Nth competitor consumes the same fixed framework — no new external SOW, no Apify-style step-up.

→ FOR THE PROPOSAL

Concrete unit economics fall out of the first 2–3 trial runs in Phase 01, well before any spend at scale.

Cost shape

12Risks & dependencies

What I want flagged before kickoff.

Risk	Why it matters	Mitigation
Site-side anti-bot	Competitor sites can tighten defenses mid-engagement.	Oxylabs residential proxy · jittered cadence · per-site fetch policy that adapts on block signal.
Selector drift on UI changes	Selectors break silently when sites redesign.	Per-field success monitoring · LLM fallback · auto-regeneration via Selector agent · raw HTML replay.
Athenix contract drift	Downstream BI & Pricing Machine assume today's Apify schema.	Land on identical contract at M1 · diff report against Apify · cutover only on parity.
Bedrock / AgentCore access	Need IAM, model access, and AgentCore parity with the email-agent system on day 1.	Replicate the email-agent IAM & deployment template in week 1.
Vendor cost surprises	Oxylabs / AgentQL / Bedrock token cost only fully visible after live runs.	Phase 01 sets a per-competitor cost ceiling before scaling. Stop / re-cost gate.
Legal / ToS posture	Public-surface scraping raises ToS and rate-of-access questions.	Legal review of target list before Phase 02 · public data only · no auth, no PII, no dark-pattern bypass.

Risks

13Engagement & handoff

A solo extension that ends with Lennar owning the platform.

Shape

Solo extension for Amin. Tech proposal, execution, and handoff — all under one owner.

Duration

16 weeks. M1 at week 6 (one competitor end-to-end). M2 at week 16 (handoff complete).

Boundary

Apify untouched throughout. Email-agent system untouched. New platform runs in parallel until parity is proven.

Handoff package · ships in Phase 04

01 Architecture & ADRs

Decision records for every key technical choice.

02 Runbooks

Onboarding a competitor. Triaging a drift alarm. Re-parsing history.

03 Triage console

The single screen the on-call uses.

04 Enablement sessions

Pair-coding weeks with the receiving Lennar team.

Same pattern. Same stack. New surface area. Owned in-house.

→ READY TO KICK OFF

Engagement & handoff

Agentic Competitive Intelligence.