π Introduction
In todayβs fast-paced digital world, businesses need real-time insights to make quick decisions. Traditional batch processing, where data is analyzed hours or days after events occur, is no longer sufficient. This is where Apache Kafka shines, especially with its powerful components: Kafka Connect for data integration and Kafka Streams for real-time processing.
ποΈ Architecture Overview
graph TB
DB[(PostgreSQL)] --> KC[Kafka Connect]
KC --> KT[Kafka Topic<br/>user-profiles]
WEB[Web Application] --> KC2[Kafka Producer]
KC2 --> KT2[Kafka Topic<br/>user-clicks]
KT --> KS[Kafka Streams<br/>in Quarkus App]
KT2 --> KS
KS --> KT3[Kafka Topic<br/>enriched-clicks]
KS --> KT4[Kafka Topic<br/>user-metrics]
KT3 --> ES[(Elasticsearch)]
KT4 --> API[REST API]
API --> DASH[Real-time Dashboard]
style KS fill:#e1f5fe
style KC fill:#f3e5f5
π Complete Data Flow
flowchart TD
PG[(PostgreSQL<br/>User Profiles)] --> KCS[Kafka Connect<br/>Source Connector]
WA[Web App<br/>User Clicks] --> KPP[Kafka Producer]
KCS --> KTP[Kafka Topic<br/>user-profiles]
KPP --> KTC[Kafka Topic<br/>user-clicks]
KTP --> KSP{Kafka Streams App}
KTC --> KSP
subgraph KSP [Kafka Streams Processing]
direction TB
S1[Read Streams] --> J1[Join Clicks with Profiles]
J1 --> F1[Filter & Transform]
F1 --> A1[Aggregate & Count]
A1 --> W1[5-min Windows]
W1 --> OS1[Output Streams]
end
KSP --> KTE[Kafka Topic<br/>enriched-clicks]
KSP --> KTM[Kafka Topic<br/>user-metrics]
KSP --> SS[(Local State Store<br/>RocksDB)]
KTE --> KCSINK[Kafka Connect<br/>Sink Connector]
KCSINK --> ES[(Elasticsearch)]
SS --> RA[Quarkus REST API]
KTM --> RA
RA --> DASH[π Real-time Dashboard]
ES --> SI[π Search Interface]
π How It Works
1. Data Ingestion with Kafka Connect
Kafka Connect automatically streams data from PostgreSQL to Kafka:
// Example user profile in Kafka
{
"user_id": "alice123",
"name": "Alice Smith",
"email": "alice@example.com",
"country": "USA",
"plan_tier": "premium"
}
2. Real-time Event Collection
Web applications send click events directly to Kafka:
// Example click event
{
"user_id": "alice123",
"page_url": "/products/123",
"action": "view",
"timestamp": 1690000000000
}
3. Stream Processing Magic
Kafka Streams performs real-time processing:
// Enrich clicks with user profiles
clickStream.join(userProfiles, (click, profile) -> {
return EnrichedClick(
user_id = click.user_id,
user_name = profile.name,
country = profile.country,
page_url = click.page_url,
action = click.action
);
});
// Count clicks in 5-minute windows
enrichedClicks
.groupBy(user_id)
.windowedBy(TimeWindows.of(5 minutes))
.count();
4. Instant Query Capability
REST API queries local state store for instant results:
GET /dashboard/users/alice123/metrics
Response:
{
"user_id": "alice123",
"user_name": "Alice Smith",
"clicks_last_5min": 15,
"country": "USA",
"plan_tier": "premium"
}
π― Key Benefits
β‘ Performance
- Millisecond latency vs seconds with databases
- Pre-computed results ready instantly
- No database load - queries hit local state stores
π§ Reliability
- Fault-tolerant - automatic recovery
- Exactly-once processing - no data loss
- Scalable - add instances as needed
π‘ Business Value
- Real-time decisions - detect trends immediately
- Better user experience - instant personalization
- Cost effective - reduces database load
π οΈ Technical Implementation
Kafka Connect Configuration
{
"name": "postgres-source",
"config": {
"connector.class": "PostgresConnector",
"database.hostname": "localhost",
"database.dbname": "mydb",
"table.include.list": "public.user_profiles"
}
}
Kafka Streams Topology
KStream<String, UserClick> clicks = builder.stream("user-clicks");
KTable<String, UserProfile> profiles = builder.table("user-profiles");
KStream<String, EnrichedClick> enriched = clicks.join(profiles, enrichmentLogic);
enriched.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofMinutes(5)))
.count();
REST API Endpoint
@GET
@Path("/users/{userId}/metrics")
public UserMetrics getMetrics(@PathParam("userId") String userId) {
return kafkaStreams.store("user-metrics-store").get(userId);
}
π Real-world Results
| Metric | Traditional | Kafka Solution |
|---|---|---|
| Query Time | 2-5 seconds | 10-50 milliseconds |
| Data Freshness | 1-24 hours | Real-time |
| Database Load | 80% CPU | 20% CPU |
| Development | 2-3 weeks | 2-3 days |
π Getting Started
- Setup Kafka cluster (Confluent, AWS MSK, or self-hosted)
- Configure Kafka Connect for your data sources
- Develop Kafka Streams application
- Build REST API to expose processed data
- Create dashboards and monitoring
π Learn More
- Apache Kafka Documentation
- Confluent Kafka Streams Guide
- Quarkus Kafka Streams Extension
π― Summary
This architecture demonstrates how to build real-time analytics pipelines that:
- Ingest data automatically with Kafka Connect
- Process and enrich data in real-time with Kafka Streams
- Serve results instantly via REST APIs
- Scale efficiently without database bottlenecks
Perfect for: E-commerce analytics, IoT monitoring, fraud detection, real-time recommendations, and live dashboards.
Happy streaming! π