Engineering in the Wild

Remote developer environments

Problem Statement

Imagine you have a team of 10 developers working on large machine learning (ML) projects. Each developer requires substantial compute resources to train and run their models. As an organization, you can either give each person their own powerful setup or create one central resource that everyone shares. The second option—setting up a shared resource—is usually more cost-effective and allows better resource utilization, since developers can benefit from pooled hardware without each having to purchase a separate high-end machine.

Solution

Remote Development is a strategy where a local machine serves primarily as an interface (like a window), while all the heavy tasks run on a powerful shared server. This setup is typically implemented using a secure connection method, such as SSH.

By using remote development, your team focuses on coding while the central server handles time-consuming ML operations. It also decreases the configuration overhead on individual machines and centralizes data, which can simplify collaboration and data security

Basic Concepts: Networking

SSH is an application protocol in the TCP/IP stack. In simple terms, its an agreed mode of communication.

In the network hierachy, here is how it looks like

1. Application Layer     (HTTP, SSH, FTP)
2. Transport Layer      (TCP, UDP)
3. Network Layer       (IP, ICMP)
4. Network Access Layer (Ethernet, WiFi)

Think of it like a shipping company:

If you've worked in the Linux Terminal, then you've probably touched these different layers at some point:

# Network troubleshooting commands use TCP/IP terms
curl https://api.github.com     # Application Layer (HTTPS)
netstat -t                      # Transport Layer (TCP connections)
ping 192.168.1.1                # Internet Layer (IP)
tcpdump -i eth0                  # Network Access Layer (capturing on ethernet)

Basic Concepts: SSH

What happens when you type SSH <user>@<ip>

When you send a SSH Command, the following happens:

Your SSH Client
    │
    ▼
SSH Protocol (Encrypts data)
    │
    ▼
TCP (Ensures reliable delivery)
    │
    ▼
IP (Routes packets to destination)
    │
    ▼
Network Interface (Physical transmission)

SSH over a LAN

Think of the early days of LAN parties, where computers connected through a local network. In a local area network, devices communicate with each other directly through a router or a network switch.

• Router or Switch: Acts like a traffic junction directing data to each device.
• Identifiers: Computers talk to each other using IP addresses (e.g., 192.168.x.x).

Sequence

  1. Computer A obtains the IP of Computer B from the local network.
  2. User on Computer A runs ssh user@.
  3. Router or switch routes the packets to Computer B directly.
  4. SSH authenticates and establishes the secure session if the credentials are correct.

This is simple because both devices have stable, known IP addresses. They are also no firewalls between them.

SSH over the Internet

When connecting over the internet, the concept is similar but there are additional layers and problems. Let's trace the path: ![[Illustrated Guide to Remote Development-20250211133604484.jpg]]

At each node, there are a few abstractions and indirection to manage:

Your Laptop Layer

Home Router (NAT) Layer

Your ISP Layer

Internet Backbone Layer

Their ISP Layer

Company Router Layer

Remote Server Layer

As you can see, there are many intermediatry steps. And still the main goal stands: how can a server retain a persistant IP address?

There are 3 fundamental approaches to giving a server a persistent address:

  1. Classic, Static IP
  2. Reverse Tunnel (lets make a proxy)
  3. VPN (forget the internet, treat it like a private internet)

Solution: Persistent Server Addressing

1. Static IP / DDNS Approach

Concept: The traditional approach - either pin your IP address (Static) or keep updating a DNS record (DDNS) when your IP changes. Like having a fixed postal address or a mail forwarding service.

2. Reverse Tunnels

Concept: Instead of clients connecting directly to your server, a trusted middle service (like ngrok or Cloudflare Tunnels) maintains a tunnel to your server. Like having a P.O. Box at the post office - mail goes to the post office first, then to you.

This is particularly popular in development environments. For example, when testing webhook deliveries from GitHub to a local server, or when showing a client a work-in-progress website running on your laptop. Services like Gitpod and GitHub Codespaces use similar technology to expose development ports.