Toptal  ·  Lennar  ·  Phase II
Toptal Lennar
Engagement Proposal

Agentic Competitive
Intelligence.

Replacing Apify with an internally-owned ingestion platform — agent-driven, configuration-first, and native to the AI-Lennar stack.
For Lennar Engineering Leadership
01Where we are today

A vendor pipeline that scales the wrong axis.

Current state · Apify
competitor A competitor B competitor C competitor N APIFY per-scraper build $ per-delivery $ Athenix
Each competitor = a custom build. Each day = a delivery charge.
Cost shape
Linear in competitors. Linear in days. Onboarding gated by an external dev team.
Operational shape
Site change → external ticket → wait. Quality and freshness sit outside our control.
Coverage shape
~4–5 competitors today. Adding the next one looks like the last one — a project, not a config.
Today's pipeline
02Objective

Stand up a parallel, internally-owned platform that makes Apify unnecessary.

Coverage
Pricing, inventory & QMI, and floor plan / community detail — the full Apify surface.
Cost
Materially below per-competitor and per-delivery Apify spend, and flat as competitor count grows.
Onboarding
New competitor = configuration. No code path, no external dependency.
Quality
Equal or higher fidelity and freshness than today's delivery.
Greenfield · runs in parallel · Apify untouched during the engagement
Out of scope: consumer surfaces · email-agent system
Objective
03Precedent

The pattern is already in production.

The promotions email-agent runs on this exact pattern today — autonomous agents, LLM extraction, Athenix serving. We're extending it, not inventing it.
PROVISION Masked-identity agents SUBSCRIBE Competitor communications EXTRACT LLM-driven structuring LAND Athenix tables SERVE End users + downstream BI
Promotions agent · in production · same building blocks: agent runtime + LLM extraction + Athenix tables.
→ extend, don't rebuild
The pattern
04Architecture

End-to-end, on the AI-Lennar stack.

DISCOVERY ACQUISITION EXTRACTION & MODELING SERVING ROUTE AGENT Area → State → City → Community index SCHEMA AGENT · AGENTQL Site schema discovery, page-type classification SELECTOR AGENT Build & version field selectors FETCHER · OXYLABS Residential proxy fetch, JS rendering, retries RAW STORE · S3 Versioned HTML snapshots — replayable, auditable PARSER · SELECTORS + LLM FALLBACK Apply selectors → typed records. LLM rescue on miss. VALIDATOR Schema, range, drift, freshness checks before promotion SNOWFLAKE + DBT · ATX 2.0 Normalize, model, SCD-track, surface marts ATHENIX Landing + serving BigLen Pricing Machine Downstream BI AWS BEDROCK · AGENTCORE orchestration · runtime · tool use · memory
Architecture
05Inside the platform

The agent loop, end to end.

01 DISCOVER Crawl area / state / city → community 02 SCHEMA AgentQL infers page schema 03 SELECT Build versioned field selectors 04 FETCH Oxylabs proxy + render → S3 05 PARSE Selectors → records LLM fallback on miss 06 LAND Validate → Snowflake → Athenix marts FEEDBACK ENHANCE QUALITY SIGNALS PARSE MISSES
Each step is its own Bedrock agent · tool-using · memory-backed
selectors strengthen with every run
The agent loop
06Stack alignment

Lands on what AI-Lennar already runs.

Layer Constraint What we use Why
Agent runtime Bedrock / AgentCore Bedrock + AgentCore Same runtime as the email-agent system. Tool use, memory, IAM all in place.
Transformation Snowflake + dbt (ATX 2.0) Snowflake + dbt Selectors land raw records; dbt models normalize, SCD-track, and surface marts.
Landing & serving Athenix Athenix Same landing contract as today's Apify drop and the promotions agent.
Fetching Oxylabs (residential proxy) Anti-bot resilience with usage-priced fetches. Replaces Apify's per-delivery line.
Schema discovery AgentQL Lets the schema agent understand a new site without bespoke code.
Stack alignment · Two new vendors: Oxylabs & AgentQL · usage-priced
07Onboarding

Adding a competitor is a config, not a project.

competitors / kb_homes.yml
#  competitor configuration
name: kb_homes
root_url: https://www.kbhome.com
regions: [TX, FL, AZ, CA, NV]
data_types:
  - pricing
  - inventory
  - community_detail
cadence: daily
discovery:
  start_path: /find-your-home
  agent: route_agent_v2
selectors:
  agent: selector_agent_v3
  bootstrap_mode: agentql
fetch:
  proxy: oxylabs.residential
  render: js
#  that's the whole onboarding.
01 / DROP IN A YAML
Domain, regions, data types, cadence. That's the human surface.
02 / AGENTS BOOTSTRAP
Route → Schema → Selector agents run once. Site map and selectors land in the registry.
03 / DRY-RUN VALIDATION
Sample fetch, parse, validate. Diff against expected schema. Operator approves.
04 / GO LIVE
Daily run kicks in. Athenix tables populate. No code merged, no external ticket.
Onboarding
08Resilience

Sites change. The platform notices and self-heals.

Selector versioning
Every selector is versioned and addressable. Field-level success rates are tracked per run; degradation triggers a re-derivation.
LLM fallback on miss
A failed selector falls through to an LLM extractor on the same HTML. The result trains the next selector version automatically.
Anti-bot via Oxylabs
Residential proxies, JS rendering, configurable cadence and jitter. Block rates feed back into the fetch policy.
Drift detection
Distribution checks per field — null rate, value range, type — per competitor, per day. Drift alarms before downstream sees it.
Replayable raw store
Every fetched HTML is versioned in S3. Any record can be re-parsed against a new selector — no re-fetching the site.
Triage console
A single screen lists failing competitors / fields, the offending HTML diff, and a one-click "regenerate selector" action.
Resilience · Quality is a closed loop, not an SLA hope
09Phased plan · 16 weeks

Four months, four phases, one early proof point.

W1 W2 W3 W4 W5 W6 W7 W8 W10 W12 W14 W16 PHASE 01 Foundation Bedrock agents · S3 raw store · selector registry PHASE 02 First competitor end-to-end Pricing · inventory · community detail · landed in Athenix M1 · END-TO-END PROOF PHASE 03 Generalization & onboarding Config-driven framework · 2nd, 3rd, 4th competitor live PHASE 04 Hardening & handoff Triage console · runbooks · documentation · enablement
Solo extension · Amin · 16 weeks · ends with internal Lennar team owning it
★ M1 at week 6 · ★ M2 = handoff at week 16
Plan
10Early proof point

Week 6. One competitor, all three data types, end-to-end in Athenix.

06wk
to first end-to-end run
3/3
data types covered at M1
1→N
framework, not a one-off
M1 acceptance criteria
For one chosen competitor, the daily run produces pricing, inventory and community detail rows in Athenix tables that downstream BI can already query — at parity or better than today's Apify drop, on the same schema contract.
What ships with M1
  • Route, Schema, Selector, Fetcher, Parser, Validator agents in Bedrock.
  • Selector registry & raw HTML store in S3.
  • dbt models landing into Athenix on the existing contract.
Milestone 1
11Cost shape

Fixed build cost. Variable run cost. Flat onboarding.

$ / MONTH 1 3 6 N APIFY PROPOSED today (~4–5 competitors) SCALE FRONTIER
Illustrative shape. Real curves land after the first trial runs.
FIXED · ENGAGEMENT
Toptal · Amin · 16 weeks · solo extension. One bill, ends at handoff.
VARIABLE · RUNTIME
Bedrock tokens (agent reasoning) · Oxylabs fetches (per-page) · AgentQL calls (mostly onboarding + drift events).
FLAT · ONBOARDING
Adding the 5th, 6th, Nth competitor consumes the same fixed framework — no new external SOW, no Apify-style step-up.
→ FOR THE PROPOSAL
Concrete unit economics fall out of the first 2–3 trial runs in Phase 01, well before any spend at scale.
Cost shape
12Risks & dependencies

What I want flagged before kickoff.

Risk Why it matters Mitigation
Site-side anti-bot Competitor sites can tighten defenses mid-engagement. Oxylabs residential proxy · jittered cadence · per-site fetch policy that adapts on block signal.
Selector drift on UI changes Selectors break silently when sites redesign. Per-field success monitoring · LLM fallback · auto-regeneration via Selector agent · raw HTML replay.
Athenix contract drift Downstream BI & Pricing Machine assume today's Apify schema. Land on identical contract at M1 · diff report against Apify · cutover only on parity.
Bedrock / AgentCore access Need IAM, model access, and AgentCore parity with the email-agent system on day 1. Replicate the email-agent IAM & deployment template in week 1.
Vendor cost surprises Oxylabs / AgentQL / Bedrock token cost only fully visible after live runs. Phase 01 sets a per-competitor cost ceiling before scaling. Stop / re-cost gate.
Legal / ToS posture Public-surface scraping raises ToS and rate-of-access questions. Legal review of target list before Phase 02 · public data only · no auth, no PII, no dark-pattern bypass.
Risks
13Engagement & handoff

A solo extension that ends with Lennar owning the platform.

Shape
Solo extension for Amin. Tech proposal, execution, and handoff — all under one owner.
Duration
16 weeks. M1 at week 6 (one competitor end-to-end). M2 at week 16 (handoff complete).
Boundary
Apify untouched throughout. Email-agent system untouched. New platform runs in parallel until parity is proven.
Handoff package · ships in Phase 04
01  Architecture & ADRs
Decision records for every key technical choice.
02  Runbooks
Onboarding a competitor. Triaging a drift alarm. Re-parsing history.
03  Triage console
The single screen the on-call uses.
04  Enablement sessions
Pair-coding weeks with the receiving Lennar team.
Same pattern. Same stack. New surface area. Owned in-house.
→ READY TO KICK OFF
Engagement & handoff