Remote developer environments

12 Nov, 2023

Problem Statement

Imagine you have a team of 10 developers working on large machine learning (ML) projects. Each developer requires substantial compute resources to train and run their models. As an organization, you can either give each person their own powerful setup or create one central resource that everyone shares. The second option—setting up a shared resource—is usually more cost-effective and allows better resource utilization, since developers can benefit from pooled hardware without each having to purchase a separate high-end machine.

Solution

Remote Development is a strategy where a local machine serves primarily as an interface (like a window), while all the heavy tasks run on a powerful shared server. This setup is typically implemented using a secure connection method, such as SSH.

By using remote development, your team focuses on coding while the central server handles time-consuming ML operations. It also decreases the configuration overhead on individual machines and centralizes data, which can simplify collaboration and data security

Basic Concepts: Networking

SSH is an application protocol in the TCP/IP stack. In simple terms, its an agreed mode of communication.

In the network hierachy, here is how it looks like

1. Application Layer     (HTTP, SSH, FTP)
2. Transport Layer      (TCP, UDP)
3. Network Layer       (IP, ICMP)
4. Network Access Layer (Ethernet, WiFi)

Think of it like a shipping company:

Application Layer: What you want to send (letter, package)
Transport Layer: How it should be delivered (express, standard)
Network Layer: The routing and addressing (which cities/hubs to go through)
Network Access Layer: The actual vehicles and roads used

If you've worked in the Linux Terminal, then you've probably touched these different layers at some point:

# Network troubleshooting commands use TCP/IP terms
curl https://api.github.com     # Application Layer (HTTPS)
netstat -t                      # Transport Layer (TCP connections)
ping 192.168.1.1                # Internet Layer (IP)
tcpdump -i eth0                  # Network Access Layer (capturing on ethernet)

Basic Concepts: SSH

What happens when you type `SSH <user>@<ip>`

When you send a SSH Command, the following happens:

Your SSH Client
    │
    ▼
SSH Protocol (Encrypts data)
    │
    ▼
TCP (Ensures reliable delivery)
    │
    ▼
IP (Routes packets to destination)
    │
    ▼
Network Interface (Physical transmission)

SSH over a LAN

Think of the early days of LAN parties, where computers connected through a local network. In a local area network, devices communicate with each other directly through a router or a network switch.

• Router or Switch: Acts like a traffic junction directing data to each device.
• Identifiers: Computers talk to each other using IP addresses (e.g., 192.168.x.x).

Sequence

Computer A obtains the IP of Computer B from the local network.
User on Computer A runs ssh user@.
Router or switch routes the packets to Computer B directly.
SSH authenticates and establishes the secure session if the credentials are correct.

This is simple because both devices have stable, known IP addresses. They are also no firewalls between them.

SSH over the Internet

When connecting over the internet, the concept is similar but there are additional layers and problems. Let's trace the path: ![[Illustrated Guide to Remote Development-20250211133604484.jpg]]

At each node, there are a few abstractions and indirection to manage:

Your Laptop Layer

Has only a private IP address (192.168.1.100)
Cannot be directly reached from the internet
Private IP address can change each time you connect (via DHCP)
Needs to know the public IP of the destination

Home Router (NAT) Layer

Must maintain a NAT table of connections
Public IP may change (ISP assigns dynamically)
Blocks incoming connections by default
Needs port forwarding rules for incoming connections

Your ISP Layer

May block certain ports (especially port 22 for SSH)
Might use carrier-grade NAT (double NAT)
Can throttle or shape traffic
May have unstable routing

Internet Backbone Layer

Variable latency
Possible packet loss
Route changes
Multiple hops between networks

Their ISP Layer

Similar issues to Your ISP
Different routing policies
Different port restrictions
Possibly different quality of service

Company Router Layer

Corporate firewall rules
May block incoming SSH
Needs explicit port forwarding
Access control lists (ACLs)

Remote Server Layer

Only knows its private IP
Cannot initiate connections to your laptop
Needs firewall rules configured
SSH daemon must be properly configured

As you can see, there are many intermediatry steps. And still the main goal stands: how can a server retain a persistant IP address?

There are 3 fundamental approaches to giving a server a persistent address:

Classic, Static IP
Reverse Tunnel (lets make a proxy)
VPN (forget the internet, treat it like a private internet)

Solution: Persistent Server Addressing

1. Static IP / DDNS Approach

Concept: The traditional approach - either pin your IP address (Static) or keep updating a DNS record (DDNS) when your IP changes. Like having a fixed postal address or a mail forwarding service.

2. Reverse Tunnels

Concept: Instead of clients connecting directly to your server, a trusted middle service (like ngrok or Cloudflare Tunnels) maintains a tunnel to your server. Like having a P.O. Box at the post office - mail goes to the post office first, then to you.

This is particularly popular in development environments. For example, when testing webhook deliveries from GitHub to a local server, or when showing a client a work-in-progress website running on your laptop. Services like Gitpod and GitHub Codespaces use similar technology to expose development ports.