Case Study

Distressed Asset Intelligence Platform

Automated scoring and prioritization across 335,000+ NYC commercial properties

The Problem

Manual Research Cannot Scale to Market Size

New York City contains over one million commercial and residential properties spread across five boroughs. Within that universe, a meaningful but unknown number of assets are under financial or physical distress at any given moment, behind on taxes, accumulating code violations, facing litigation, or carrying unsustainable debt loads.

Identifying these properties has traditionally been an analyst driven process. A researcher manually pulls records from city databases, cross references court filings, checks violation histories, and builds a picture of each property one at a time. A skilled analyst might evaluate 20 to 30 properties per day with any depth. At that rate, covering the full market would take decades.

The result is that most distressed asset sourcing operates on fragments of the available data. Firms rely on personal networks, one off tip sheets, or narrowly scoped searches. Opportunities surface late, inconsistently, or not at all. The data exists to do better: NYC publishes millions of records across housing violations, building permits, property transactions, and court filings. But no one had assembled it into a single, scorable view of the entire market.

Why This Matters Now

According to McKinsey's 2025 research on AI adoption, 80% of companies cite data limitations as the primary roadblock to scaling agentic AI systems. The challenge is not building models, it is assembling the structured data pipelines those models need to operate on. In commercial real estate, this data problem is particularly acute: the information is public but scattered across dozens of agencies with incompatible formats and identifiers.

Source: McKinsey Global Survey on AI, 2025

The Approach

Build the Data Infrastructure First

Rather than starting with a model and hoping the data would follow, FGP took the opposite approach: build a complete data ingestion layer that normalizes every relevant NYC public dataset around a single universal property key, then design scoring logic on top of that foundation.

The system ingests data from the NYC Department of Finance, Housing Preservation and Development, Department of Buildings, Automated City Register Information System (ACRIS), the County Clerk, and HMDA census tract lending data. Every record is normalized to a 10 digit Borough Block Lot (BBL) identifier, the universal key that links a tax lien sale record to the same property's building permits, court filings, and mortgage history.

Each property is then scored across 13 distinct distress signals. Signals are weighted by severity, adjusted for building size through per unit normalization, decayed by recency, and amplified when multiple signals converge through combo multipliers. The output is a single, rank orderable distress score that reflects both the depth and concentration of stress on each asset.

The design prioritizes transparency. Every score includes a full signal breakdown showing exactly which factors contributed and at what weight. An analyst reviewing a flagged property can immediately see whether the score is driven by active litigation, delinquent taxes, code violations, overleveraged debt, or some combination. No black box.

The Research Supports This Sequence

McKinsey's analysis of AI driven operations finds that workflow redesign, not model sophistication, is the strongest predictor of measurable AI impact. Organizations that restructured their data and process architecture before deploying AI saw significantly higher returns than those that layered AI onto existing workflows. The Distressed Asset Intelligence Platform followed this principle: the data pipeline and scoring architecture were designed as a complete system, not an add on to manual research.

Source: McKinsey, “Why agents are the next frontier of generative AI,” 2025

System Architecture

How the Platform Works

Data Ingestion

23 datasets from
NYC open data APIs

→

Normalization

BBL key unification
across all sources

→

Signal Scoring

13 weighted signals
with decay & normalization

→

Tiered Output

Ranked properties
with full breakdown

Operational Throughput

335,454 properties

Every scored property evaluated across all 13 distress signals simultaneously. Each signal applies tiered scoring, per unit normalization, and temporal decay, over 4.3 million individual signal calculations per run.

Time Compression

Weeks → Hours

What previously required weeks of manual research to evaluate a few hundred properties now executes across the entire NYC market in a single automated run. Full pipeline ingestion and scoring completes in hours, not months.

Automation Rate

95%+ Automated

From API ingestion through scoring and tiered output, the pipeline runs end to end without manual intervention. Human review is reserved for the final stage: evaluating top tier opportunities that the system surfaces.

System Scale

23 Sources, 13 Models

Data drawn from 23 distinct datasets across 6 NYC agencies. Scoring uses 13 independent signal models with 4 combo multipliers, 3 temporal decay schedules, and per unit normalization across 3 signal types. Total pipeline processes ~15 GB of raw data into ~2 GB of structured, queryable output.

The 13 Distress Signals

Each property is evaluated against every signal. Points are weighted by severity, with the most actionable indicators carrying the highest weight.

10Lis Pendens (Pre Foreclosure Litigation)

9Tax Lien Sale List

9Class C HPD Violations (Immediately Hazardous)

7Mechanic’s Liens (Unpaid Contractors)

7Mortgage Stacking (3+ Mortgages in 36 Months)

6ECB Violations (Outstanding Balance)

6High Loan to Value Ratio

5Rent Impairing Violations

5DOB Safety Violations

4Class B HPD Violations (Hazardous)

3Assessment Value Decline

2No Building Permits in 10+ Years

1High Mortgage Denial Census Tract

Results

Tiered Output for Prioritized Action

The platform produces a fully scored and tiered view of every property in the NYC market. Rather than delivering a flat list, the system segments properties into actionable tiers based on distress severity, allowing different teams and strategies to focus on the segment that matches their risk appetite and deal structure.

The primary output segment, properties scoring 30 or above, contains 6,535 assets showing meaningful convergence of multiple distress signals. Within that group, 1,750 properties score above 40, representing severe multi signal distress with combo multiplier amplification. The system also tracks signal convergence across the full market: 139,000 properties show two or more simultaneous signals, and 21,020 exhibit five or more converging indicators of deep distress.

Score Band	Classification	Properties	Interpretation
40+	Severe	1,750	Maximum signal density with combo amplification
30 – 39.9	High	~4,785	Strong financial and physical distress convergence
20 – 29.9	Elevated	~14,500	Multiple converging distress signals
10 – 19.9	Moderate	~65,000	Two or more meaningful signals present
1 – 9.9	Low	~249,000	Early or isolated signal activity

Context: What This Type of Build Typically Costs

Industry benchmarks for mid market AI implementations, systems involving data pipeline construction, custom scoring models, and production deployment, typically range from $60,000 to $150,000. FGP built this system internally using open source tools at near zero infrastructure cost, demonstrating the kind of efficiency that becomes possible when the build is led by practitioners who understand both the domain and the technology.

Source: Sparkout Tech Solutions, “AI Implementation Cost Analysis,” 2025

How This Applies to Your Business

The Same Patterns, Applied to Your Operations

The Distressed Asset Intelligence Platform is FGP's internal production system, built by the same team that works with clients. Every component of this system maps directly to the services FGP offers through its AI and automation practice. The patterns used here, structured data ingestion, scoring model design, automated pipeline orchestration, and tiered output for human review, are the same patterns we apply to client engagements across industries.

Readiness Assessment

Identifying the Opportunity

Before building anything, FGP mapped the full landscape of NYC's public data sources, evaluated data quality and coverage, and identified which signals would carry predictive value for distressed asset identification. This is the same discovery process we run with clients: understanding what data exists, where the gaps are, and which workflows have the highest automation potential.

Workflow Build

Constructing the System

The pipeline itself, 23 data source integrations, 13 scoring models, normalization logic, temporal decay, and combo multipliers, represents a full workflow build. FGP designed, developed, and deployed this system end to end. For clients, this phase involves the same pattern: translating an assessed opportunity into a working, automated system with measurable output.

Managed Operations

Keeping It Running

The platform runs on a recurring schedule, ingesting updated data from city APIs, recalculating scores, and producing fresh tiered output. This ongoing operation, monitoring data quality, adjusting signal weights as market conditions shift, and ensuring system reliability, mirrors the managed operations FGP provides to clients who need their systems maintained and optimized over time.

Get Started

Ready to Build Your Intelligence Layer?

Whether you are looking to automate a manual research process, build a scoring system for your market, or scale an existing data pipeline, FGP can help you get there.

Start a Conversation