Lessons from the Inside: What I Learned Building Analytics at Google and Twitter
February 28, 2024 • 10 min read
Ten years of building large-scale analytics systems at Google and Twitter taught me lessons that no computer science course could. The gap between academic knowledge and production reality is vast, especially when you're dealing with systems that serve billions of users and process petabytes of data daily.
This is what I wish I knew when I started—lessons learned from building systems that millions of creators, advertisers, and businesses depend on every day.
Scale Changes Everything
At university, we learn about Big O notation and algorithmic complexity. In production, you discover that constant factors matter more than you think, and the real bottlenecks are rarely where you expect them.
At Google, I worked on AdSense analytics systems that processed billions of ad impressions daily. The challenge wasn't just computational—it was operational. How do you deploy changes to systems that can't go down? How do you debug issues when your dataset is too large to examine manually?
Real-time Analytics Pipeline Architecture
# Streaming data ingestion at scale from apache_beam import Pipeline, transforms from apache_beam.options.pipeline_options import PipelineOptions def process_analytics_events(): with Pipeline(options=PipelineOptions()) as pipeline: events = ( pipeline | 'Read from Pub/Sub' >> beam.io.ReadFromPubSub( subscription=f'projects/{PROJECT}/subscriptions/analytics-events' ) | 'Parse JSON' >> beam.Map(parse_event) | 'Add Timestamps' >> beam.Map(add_processing_timestamp) | 'Window by Minutes' >> beam.WindowInto( beam.window.FixedWindows(60) # 1-minute windows ) | 'Aggregate Metrics' >> beam.CombinePerKey( combine_analytics_metrics ) | 'Write to BigQuery' >> beam.io.WriteToBigQuery( table='analytics.real_time_metrics', write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND ) )
The Human Factor is Everything
The most sophisticated analytics system is useless if people don't trust it or understand it. At Twitter, I learned that building great analytics tools is 20% engineering and 80% psychology.
Creators and small businesses using Twitter's analytics weren't data scientists. They needed insights, not dashboards full of metrics. This taught me the importance of contextual intelligence over raw data access.
"The best analytics system is the one that answers the question you didn't know you had."
Real-Time vs. Right-Time
One of the biggest mistakes I see teams make is obsessing over real-time data when what they actually need is right-time data. Real-time analytics are expensive to build and maintain, and often provide little additional value over near-real-time systems.
Building Efficient Analytics APIs
interface AnalyticsQuery { metrics: string[]; dimensions: string[]; dateRange: { start: string; end: string; }; filters?: Record<string, any>; granularity: 'hour' | 'day' | 'week' | 'month'; } class AnalyticsAPI { private cache: Map<string, CachedResult> = new Map(); async query(params: AnalyticsQuery): Promise<AnalyticsResult> { const cacheKey = this.generateCacheKey(params); // Lesson: Cache aggressively, invalidate smartly const cached = this.cache.get(cacheKey); if (cached && !this.isStale(cached, params.granularity)) { return cached.data; } const result = await this.executeQuery(params); // Cache with TTL based on data freshness requirements this.cache.set(cacheKey, { data: result, timestamp: Date.now(), ttl: this.getTTL(params.granularity) }); return result; } private getTTL(granularity: string): number { // Lesson: Match cache TTL to user expectations switch (granularity) { case 'hour': return 5 * 60 * 1000; // 5 minutes case 'day': return 30 * 60 * 1000; // 30 minutes case 'week': return 4 * 60 * 60 * 1000; // 4 hours default: return 24 * 60 * 60 * 1000; // 24 hours } } }
Observability Over Dashboards
After years of building dashboards that nobody looked at, I learned that what people really need is observability—the ability to understand what's happening in their business when something changes.
This insight eventually led to Findly. Instead of building better dashboards, we built systems that could answer questions conversationally and proactively surface insights when patterns change.
Data Quality is Job #1
No amount of sophisticated analysis can overcome poor data quality. At both Google and Twitter, I spent more time on data validation, cleaning, and quality monitoring than on actual analytics features.
- Validate at ingestion: Catch bad data before it enters your system
- Monitor distributions: Track how your metrics change over time
- Build automated alerts: Know when data quality degrades
- Document everything: Future you will thank present you
The Latency-Accuracy Tradeoff
Users want their data instantly, but they also want it to be accurate. In practice, you have to choose. The secret is understanding which metrics need to be fast versus which need to be accurate.
Page views? Fast is fine, even if you're off by 5%. Revenue? Accuracy matters more than speed. The key is being transparent about these tradeoffs and setting proper expectations.
Build for the Question Behind the Question
When someone asks "How many users did we have yesterday?", they're usually really asking "Are we growing?" or "Did that campaign work?" or "Is something broken?"
The best analytics systems anticipate these underlying questions and surface contextual information that helps users understand not just what happened, but why it matters.
Key Takeaways
- Scale changes everything: Techniques that work for millions of events break at billions
- Psychology matters more than technology: Build for trust and understanding, not just functionality
- Real-time is expensive: Most use cases don't actually need it
- Data quality is foundational: Garbage in, garbage out applies at any scale
- Build for questions, not answers: Enable exploration, don't just display metrics
- Observability > Dashboards: Help users understand what's changing and why
These lessons shaped how I think about analytics and ultimately led to founding Findly. The future of business intelligence isn't about building better dashboards—it's about building systems that understand context, ask clarifying questions, and provide insights that drive action.
If you're building analytics systems, remember: your users don't want data, they want understanding. Your job is to bridge that gap.
Want to see these principles in action? Check out Findly or reach out to me at pedromnasc@gmail.com.