Engineering in the Wild

Engineering Database Interactions: Revisiting ORM-Repo-UoW Patterns

TLDR

Recently, I had to redo some database patterns in our application code and by default implemented a repo/uow pattern.

"Why not just use the ORM directly?", asked one the other engineers. I didn't have a good answer to that question other than a cookie cutter reply: "It's a proven pattern that will save us on technical debt..."... but was I so sure myself?

This post is my way to rethink this assumption and understand the patterns.

... Lets get on with it!

Raw SQL: Physical plumbing, connections and sessions.

At the heart of database interactions lies the interaction of connection, cursor and session objects. Lets review:

Connection: The Database Hotline

A connection object represents an active connection to a database server and provides the foundation for all database interactions.

Key Attributes

Key Methods

Cursor: The SQL Commander

A cursor uses a connection. Cursors transform passive connections into stateful command channels - each maintains its own execution context like separate browser tabs.

Key Attributes

Key Methods

Session: The Bundler for Database Changes

A session provides a higher-level abstraction over connections and transactions. It acts as a logical container for related database operations that should be treated as a unit. Its like version control branches - all changes remain isolated until you commit the final PR, with automatic conflict detection via transaction isolation levels.

Key Attributes

Key Methods

To illustrate, here's a simplified system diagram showing their conceptual relationship: CleanShot 2025-03-11 at 21

And again in code:

import sqlite3 

# Establish a connection to the Chinook database 
conn = sqlite3.connect('chinook.db') cursor = conn.cursor() 

# Test the connection with a simple query 
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';") 
tables = cursor.fetchall() 
print("Available tables:", tables) 

# Always close connections when done 
conn.close()

From the above, the basics are evident: connect, execute, close. Nice!

Now imagine this at scale: Each service or method that interacts with the database must manage its own connection. And its own SQL. And more. At scale, this will NOT turn out well.

Engineering For Robustness: The UoW & Repository Patterns

Focusing on application engineering with a reasonable amount of users, there are a few more considerations we have to carefully manage such as:

Resource Management

Data Integrity

Security

This is where the Unit of Work (UoW) and Repository patterns provide significant engineering benefits. These patterns introduce useful abstractions at the logical layer:

Unit of Work provides:

Repository pattern delivers:

(For more details on this, check the # Deepdives below.)

Together, these patterns work with other components like Transaction Isolation Levels to balance data consistency with performance based on your application's needs.

Here's where the system architecture starts evolving to look something like this: CleanShot 2025-03-11 at 21

Lets make this relatable with some code and a real use case.

This example demonstrates transaction management using raw SQL. The BEGIN TRANSACTION statement initiates the transaction, COMMIT finalizes it, and ROLLBACK aborts it if errors occur. While functional, this approach requires explicit transaction management in every function that requires transactional integrity.

import sqlite3
from typing import Dict, Any, List, Optional

# Unit of Work pattern implementation
class CustomerUnitOfWork:
    def __init__(self, db_path: str):
        self.db_path = db_path
        self.conn = None
        self.customer_repository = None
    
    def __enter__(self):
        # Initialize connection when entering context
        self.conn = sqlite3.connect(self.db_path)
        self.conn.isolation_level = 'DEFERRED'
        # Create repositories that share this connection
        self.customer_repository = CustomerRepository(self.conn)
        return self
        
    def __exit__(self, exc_type, exc_val, exc_tb):
        # Commit or rollback based on whether an exception occurred
        try:
            if exc_type is not None:
                self.rollback()
            else:
                self.commit()
        finally:
            self.conn.close()
            
    def commit(self):
        self.conn.commit()
        
    def rollback(self):
        self.conn.rollback()

# Sample repository using the connection from UoW
class CustomerRepository:
    def __init__(self, connection):
        self.conn = connection
        
    def get_by_id(self, customer_id: int) -> Optional[Dict[str, Any]]:
        cursor = self.conn.cursor()
        cursor.execute("SELECT * FROM customers WHERE CustomerId = ?", (customer_id,))
        row = cursor.fetchone()
        if not row:
            return None
        return {"CustomerId": row[0], "Email": row[1]}  # simplified
        
    def update_email(self, customer_id: int, new_email: str) -> bool:
        cursor = self.conn.cursor()
        cursor.execute(
            "UPDATE customers SET Email = ? WHERE CustomerId = ?",
            (new_email, customer_id)
        )
        return cursor.rowcount > 0

# Usage example
try:
    # The Unit of Work manages transaction boundaries
    with CustomerUnitOfWork('chinook.db') as uow:
        # Multiple operations within a single transaction
        customer = uow.customer_repository.get_by_id(1)
        if customer:
            uow.customer_repository.update_email(1, 'updated@example.com')
            
        # No explicit commit needed - handled by context manager if no exceptions
        
except Exception as e:
    print(f"Transaction failed: {e}")

With this approach, we've addressed several critical risks:

However, even with these improvements, raw SQL transactions have significant limitations:

Fundamentally, while direct SQL interaction gives you complete control, it introduces several challenges that become more pronounced as applications grow in complexity. The tight coupling between application code and database schema makes it difficult to evolve the schema without breaking existing code. Also, SQL strings embedded throughout the codebase become difficult to maintain and test, and ensuring proper connection management across all database operations becomes increasingly error-prone.

This leads to our next pattern: ORMs, which maps OOP to SQL strings.

ORM: baking in the domain abstraction.

While the Unit of Work and Repository patterns bring significant improvements to raw SQL interactions, they still leave your application working with primitive data structures and manual SQL statement management. Not my idea of fun. This is where Object-Relational Mapping (ORM) frameworks help by providing a semantic approach to how we interact with databases.

I'll talk about SQLAlchemy as the implementation, but note there are many other flavours of ORMs out there like Prism for the JS/TS ecosystem. Back to SQLAlchemy: SQLAlchemy provides two distinct ways to interact with your database:

When using SQLAlchemy, it transforms how we work with our data access layer: CleanShot 2025-03-11 at 21

Let's see how this looks in practice with a concrete example:

from sqlalchemy import create_engine, Column, Integer, String, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship

# Configuration and setup
engine = create_engine('sqlite:///chinook.db')
Base = declarative_base()
Session = sessionmaker(bind=engine)

# Domain model definitions
class Customer(Base):
    __tablename__ = 'customers'
    
    customer_id = Column(Integer, primary_key=True)
    first_name = Column(String)
    last_name = Column(String)
    email = Column(String)
    
    invoices = relationship("Invoice", back_populates="customer")
    
    def __repr__(self):
        return f"<Customer {self.first_name} {self.last_name}>"

class Invoice(Base):
    __tablename__ = 'invoices'
    
    invoice_id = Column(Integer, primary_key=True)
    customer_id = Column(Integer, ForeignKey('customers.customer_id'))
    invoice_date = Column(String)
    total = Column(Integer)
    
    customer = relationship("Customer", back_populates="invoices")

# Application code
session = Session()

# Query with relationships
customers_with_invoices = (
    session.query(Customer)
    .filter(Customer.invoices.any(Invoice.total > 10))
    .order_by(Customer.last_name)
    .all()
)

for customer in customers_with_invoices:
    print(f"{customer.first_name} {customer.last_name} has purchased:")
    for invoice in customer.invoices:
        print(f"  ${invoice.total:.2f} on {invoice.invoice_date}")

session.close()

By using the ORM approach, you get several significant engineering advantages:

The Complete Picture: SQL databases via ORM, Repo, UoW

We now have 3 building blocks: an ORM, a Repo and a UoW. They each address different needs, resulting in a powerful matrix:

Concern Raw SQL Repository ORM
Schema Changes Break all SQL Update one class Migrations + ORM
Query Safety String hacking Parameterization Type-safe DSL
Connection Life Manual Pool per request Session context

Mixing and matching is totally valid, and you can:

For large, complex systems, a common approach combines all three:

# Repository pattern with SQLAlchemy ORM
class CustomerRepository:
    def __init__(self, session):
        self.session = session
        
    def get_by_id(self, customer_id):
        return self.session.query(Customer).filter_by(customer_id=customer_id).first()
    
    def get_with_high_value_invoices(self, min_amount=100):
        return (
            self.session.query(Customer)
            .join(Customer.invoices)
            .filter(Invoice.total >= min_amount)
            .distinct()
            .all()
        )
    
    def update_email(self, customer_id, new_email):
        customer = self.get_by_id(customer_id)
        if customer:
            customer.email = new_email
        return customer

# Unit of Work with SQLAlchemy session
class UnitOfWork:
    def __init__(self):
        self.session_factory = Session
        
    def __enter__(self):
        self.session = self.session_factory()
        self.customers = CustomerRepository(self.session)
        return self
        
    def __exit__(self, exc_type, exc_val, exc_tb):
        try:
            if exc_type is not None:
                self.rollback()
            else:
                self.commit()
        finally:
            self.session.close()
            
    def commit(self):
        self.session.commit()
        
    def rollback(self):
        self.session.rollback()

# Application code
with UnitOfWork() as uow:
    high_value_customers = uow.customers.get_with_high_value_invoices(500)
    for customer in high_value_customers:
        print(f"VIP Customer: {customer.first_name} {customer.last_name}")

This layered approach gives you the best of all worlds: domain-focused code, clean separation of concerns, and the power of SQLAlchemy's ORM to handle the complex object-relational mapping.

But whichever pattern you pull out, always retain understanding of the layers beneath. Ignorance is NOT bliss and even the best ORM can't optimize what you don't comprehend.


Deepdives

The following are standalone sections. I recommend reading them after reviewing the main content so you understand how the pieces fit and the motivations behind them.

The Repository: From Raw SQL to Semantics

The Repository pattern transforms database interactions from imperative SQL statements to semantic method calls. For developer collaboration, repositories create a clear contract between data access and business logic. This separation is super useful as applications grow, enabling independent evolution of both layers.

By implementing a repo, you get some great engineering benefits such as:

Here is a codestub to illustrate:

class CustomerRepository:
    def __init__(self, connection):
        self.conn = connection
        
    def get_by_id(self, customer_id: int) -> Optional[Dict[str, Any]]:
        cursor = self.conn.cursor()
        cursor.execute("SELECT * FROM customers WHERE CustomerId = ?", (customer_id,))
        row = cursor.fetchone()
        if not row:
            return None
        return {"CustomerId": row[0], "Email": row[1]}
        
    def update_email(self, customer_id: int, new_email: str) -> bool:
        cursor = self.conn.cursor()
        cursor.execute(
            "UPDATE customers SET Email = ? WHERE CustomerId = ?",
            (new_email, customer_id)
        )
        return cursor.rowcount > 0

When combined with the Unit of Work pattern, repositories gain transaction coordination without tight coupling to transaction management code. When combined with ORMs, repositories provide a clean interface between business logic and database operations.

Unit of Work: Transaction Best Practices in Reusable form

The Unit of Work pattern wraps database transaction best practices into a consistent, reusable form. It acts as a transaction coordinator that maintains clear boundaries around related operations, ensuring they succeed or fail as a complete unit.

Its responsibilities are:

I like to also think of this as a "shopping cart" for database changes - you add, modify, or delete items, and then either check out (commit) or abandon the cart (rollback). By standardizing this pattern in your codebase, you eliminate duplicated transaction handling logic and ensure consistent resource management practices in every database interaction. Put another way, with the UoW, multiple operations become grouped into one atomic transaction.

In python, this is done by using the context manager protocol (__enter__ and __exit__), the UoW creates a clean, readable syntax for transaction management:

class CustomerUnitOfWork:
    def __enter__(self):
        # Initialize connection when entering context
        self.conn = sqlite3.connect(self.db_path)
        self.conn.isolation_level = 'DEFERRED'
        # Create repositories that share this connection
        self.customer_repository = CustomerRepository(self.conn)
        return self
        
    def __exit__(self, exc_type, exc_val, exc_tb):
        # Commit or rollback based on whether an exception occurred
        try:
            if exc_type is not None:
                self.rollback()
            else:
                self.commit()
        finally:
            self.conn.close()

# Implementation Example
with CustomerUnitOfWork('chinook.db') as uow:
    customer = uow.customer_repository.get_by_id(1)
    if customer:
        uow.customer_repository.update_email(1, 'updated@example.com')
    # No explicit commit needed - happens automatically if no exceptions

ORMs: Key Concepts and Their Implementation

Engine: The Database Connection Factory

The create_engine() function produces the central component in SQLAlchemy's architecture - the Engine. The Engine is designed to be a long-lived, application-wide resource. It's thread-safe and typically created once at application startup.

When you create an engine, you're setting up a connection management system that:

Key Properties

Key Methods

Declarative Base: The Model Foundation

The declarative_base() function creates a base class that serves as the foundation for all your ORM models. Think of it as the parent class that brings SQLAlchemy's mapping machinery to your domain classes.

When you create models that inherit from this base:

class Customer(Base):
    __tablename__ = 'customers'
    customer_id = Column(Integer, primary_key=True)

You're doing several things at once:

The Base class maintains a catalog (metadata) of all mapped classes and their corresponding tables, which is essential for:

Without this foundation, your models would be just regular Python classes with no connection to the database layer.

SessionMaker: The Session Factory

The sessionmaker is a configurable factory that produces Session objects with consistent settings.

Key Properties

Key Methods

Sessions created by the same sessionmaker share consistent behavior:

# Create two sessions with identical configuration
session1 = Session()
session2 = Session()

# Both have the same autoflush, expire_on_commit settings
assert session1.autoflush == session2.autoflush

When your application needs to change database connections (like for testing):

# Reconfigure the session factory to use a different engine
Session.configure(bind=test_engine)

This architecture gives you thread-safe database access with consistent configuration across your application.

The ORM engine is like a factory for your database connectivity - it establishes and maintains the network connections your application needs to communicate with the database. Conceptually, think of it as a specialized factory that produces database connections on demand while efficiently managing their lifecycle.

In Code

Here's how these components work together in SQLAlchemy:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base

# Engine - thread-safe, application-wide resource
engine = create_engine(
    'postgresql://user:password@localhost/dbname',
    pool_size=10,               # Maintain 10 connections
    max_overflow=20,            # Allow 20 more when busy
    pool_timeout=30,            # Wait 30 sec for connection
    pool_recycle=3600           # Recycle connections after 1 hour
)

# Session factory - configured template for sessions
Session = sessionmaker(
    bind=engine,
    autocommit=False,           # Transactions not auto-committed
    autoflush=True,             # Changes flushed before queries
    expire_on_commit=True       # Refresh objects after commit
)

# Base class for declarative models
Base = declarative_base()

ORMs: A few useful patterns

Query Objects: Encapsulate complex query logic in dedicated classes

class CustomerQueries:
    def __init__(self, session):
        self.session = session
    
    def high_value_customers(self, min_purchase=1000):
        return (
            self.session.query(Customer)
            .join(Customer.orders)
            .group_by(Customer.id)
            .having(func.sum(Order.amount) > min_purchase)
            .all()
        )
# Lazy loading (default) - loads related objects on access
customer = session.query(Customer).get(1)
# SQL query happens here, when the relationship is accessed
orders = customer.orders  

# Eager loading - loads related objects immediately
customer = (
    session.query(Customer)
    .options(joinedload(Customer.orders))
    .get(1)
)
# No additional SQL query needed
orders = customer.orders

Bulk Operations: Optimize for mass updates or inserts

# Instead of:
for user in users:
    user.is_active = False
    session.add(user)

# Use bulk update:
session.query(User).filter(User.id.in_([u.id for u in users])).update(
    {"is_active": False}, synchronize_session=False
)

Event Listeners: React to model lifecycle events

from sqlalchemy import event

@event.listens_for(User, 'before_insert')
def hash_password(mapper, connection, target):
    if target.password:
        target.password = hash_password(target.password)