Performance and Scalability at Tenant Scale | Embedded Reporting Challenges

Performance problems in embedded reporting are disproportionately visible. A slow dashboard is immediately obvious to the customer using it. A report that takes eight seconds to load gets complained about in your next support call. And unlike bugs that only surface in specific conditions, slow reporting hits every user who loads that report — at the worst possible time, which is usually the busiest time of day when concurrent usage is highest.

The challenge is that performance problems at scale are almost invisible during development and early deployment. A reporting layer that feels fast against a dev environment with sample data and two test tenants behaves very differently against production data volumes with 80 concurrent tenants all loading dashboards at 9am Monday morning.

What Degrades as You Scale

Query performance against growing data

Analytical queries are inherently data-intensive — they aggregate, join, and filter across large datasets. A report that returns in 400ms against a tenant with one year of transaction data may take 6–8 seconds against a tenant with five years of data. The underlying query hasn't changed; the dataset has. Without a caching layer, every report load hits the database fresh, and load time scales directly with data volume.

This is the most predictable performance problem in embedded reporting, and also the most commonly underestimated. Teams test against their current data volumes and project forward linearly, without accounting for the fact that analytical query performance degrades non-linearly as data grows.

Concurrent tenant load

Tenant usage patterns cluster. When your customers are in the same time zones, they're loading dashboards at the same times — Monday morning, end of quarter, right after a scheduled delivery window. In a shared database architecture, concurrent heavy queries compete for database resources. One tenant running a complex aggregation can slow query response for others sharing the same database instance.

In a per-tenant database architecture, concurrency at the database level is isolated by design. But the analytics platform itself can still be a bottleneck — connection handling, query routing, result serialization — if it isn't architected for concurrent multi-tenant load.

Scheduled report delivery at scale

When 60 tenants all have reports scheduled for 7am, the job queue processing those exports needs to handle them concurrently, not sequentially. A sequential processor creates a delivery window that expands with tenant count — the last tenant in the queue gets their "7am" report at 9am. Production-ready scheduling means concurrent job processing with configurable worker counts, failure handling, and retry logic that doesn't require manual intervention when an individual export fails.

In-Memory Caching — The Primary Performance Solution

The most effective solution to query performance at scale is in-memory caching: the first query runs against the database and the result is stored in memory; subsequent requests for the same report serve the cached result instantly, without touching the database at all.

For this to work correctly in a multi-tenant environment, three things must be true:

Per-tenant cache isolation. Cached results from Tenant A must never be served to Tenant B. The cache must be namespaced by tenant identity — not just by report ID. A caching implementation without tenant namespace is both a performance risk (serving stale data from the wrong tenant's last query) and a security risk.

Configurable invalidation per report. Different reports have different freshness requirements. A real-time inventory dashboard should invalidate its cache frequently; a monthly summary report can cache for hours without becoming misleading. All-or-nothing cache invalidation forces a tradeoff: either you cache aggressively and serve stale data on time-sensitive reports, or you invalidate frequently and lose the performance benefit on stable reports.

Memory residence, not disk. Disk-based caching is faster than re-running queries but still adds measurable latency. In-memory result storage means retrieval is fast enough to be effectively instantaneous from the end user's perspective — the difference between a 50ms response and a 400ms response is the difference between "fast" and "noticeable."

Yurbi FastCache

FastCache stores query results in memory with per-tenant isolation enforced — cached data from one tenant is never served to another. Cache invalidation is configurable per report. Included in every plan at no additional cost. For ISVs with large data volumes or high concurrent user counts, FastCache is the primary performance lever — and the reason Yurbi deployments at 100+ tenants perform consistently without requiring custom query optimization work from your team.

Multi-Server Deployments

At sufficient scale — very high concurrent user counts, many tenants, or enterprise customers requiring dedicated performance — a single production server becomes a constraint. Multi-server deployment distributes load across multiple instances behind a load balancer.

Before you need this, understand the model your platform uses. Can you add production servers without changing your licensing tier? Is the cost per additional server predictable? Can individual customers be assigned to dedicated instances for performance isolation or data sovereignty reasons?

Yurbi's model is $500/server/year at list rate, with volume discounts above 10 additional servers. There's no cap on server count — ISV deployments in our sweet spot run 100+ servers. The per-server model means you grow server capacity with your customer base at a predictable incremental cost, without a licensing restructure every time you need more capacity.

Testing Performance Before You're in Production

The practical challenge is that meaningful performance testing requires realistic data volumes and concurrent load — conditions that don't exist in a trial environment with sample data. A few approaches that help:

Test against a copy of your largest customer's production database, not sample data. Load times against realistic data volumes are the number that matters. Ask the vendor for benchmark data at tenant counts and data volumes similar to your 18-month projection. Understand how caching is configured by default and what your team needs to do to optimize it for your specific query patterns. And confirm the scheduled delivery architecture handles concurrent jobs — ask specifically what happens when 50 reports are all scheduled for the same time.

FastCache. $500/server. Predictable at scale.

In-memory query caching with per-tenant isolation, included in every plan. Additional production servers at $500/year — no tier restructure required as you grow.

See Full Pricing