GitLab Runner Performance Optimization: From Slow Pipelines to Speed

A simple lint check takes 5 minutes instead of 30 seconds. Your runner is barely using any CPU. The problem isn’t hardware — it’s configuration.

These optimizations cut our pipeline times by 60%. No hardware upgrades needed.

Why Pipelines Are Slow (It’s Not Your CPU)

Our first runner setup crawled. Both VMs showed almost no load in htop. The real bottlenecks:

Network latency through Cloudflare - Runner hit the public internet to reach GitLab
No caching - Every job ran npm ci from scratch
Default runner config - Conservative settings left performance on the table
DNS resolution delays - Docker containers couldn’t resolve internal hostnames

Fix these:

50-70% faster pipelines - Direct connections + caching
No more “Pending” jobs - Runner picks up work instantly
Predictable build times - Cache hits = consistent performance
Lower resource usage - Less network traffic, fewer redundant operations

💡 The Real Bottleneck

For self-hosted GitLab behind Cloudflare or any reverse proxy, the biggest performance killer is usually network routing - not CPU, memory, or disk.

Our Infrastructure Setup

Two Proxmox VMs on the same local network. External access via Cloudflare Tunnel.

Component	Specs	IP Address
GitLab Server	4 vCPUs, 12GB RAM, 80GB SSD	192.168.1.10
GitLab Runner	4 vCPUs, 8GB RAM, 80GB SSD	192.168.1.11
External Access	Cloudflare Tunnel	gitlab.example.com

The problem: The runner resolved gitlab.example.com to Cloudflare IPs. Traffic went out to the internet and back — despite both VMs sitting on the same network.

Optimization 1: Direct Internal Connection

Biggest single win. Route runner traffic directly to GitLab over the local network.

The Problem

Check your runner logs:

sudo journalctl -u gitlab-runner -n 20 --no-pager

If you see Cloudflare IPs, your traffic is taking the long way around:

dial tcp 104.21.9.142:443: i/o timeout

Those IPs (104.21.x.x, 172.67.x.x) are Cloudflare, not your GitLab server.

The Fix

Step 1: Find what port GitLab is listening on

On your GitLab server:

ss -tlun | grep -E '80|443|5443'

Typical output:

tcp   LISTEN 0      511    0.0.0.0:80    0.0.0.0:*

GitLab is listening on port 80 (HTTP) internally.

Step 2: Update runner config.toml

On the runner VM, edit /etc/gitlab-runner/config.toml:

[[runners]]
  name = "gitlab-runner-01"
  url = "http://192.168.1.10/"
  clone_url = "http://192.168.1.10/"
  # ... rest of config

Key changes:

url - Internal IP with HTTP (not HTTPS)
clone_url - Forces git operations onto the internal network too

Step 3: Configure Docker containers to resolve hostnames

Containers need to reach GitLab too. Add extra_hosts:

[runners.docker]
  extra_hosts = ["gitlab.example.com:192.168.1.10", "registry.example.com:192.168.1.10"]

This injects /etc/hosts entries into every container so gitlab.example.com resolves to your internal IP.

Step 4: Restart the runner

sudo gitlab-runner restart
sudo journalctl -u gitlab-runner -f

Job checks should now succeed without timeouts.

Before vs After

Metric	Before (via Cloudflare)	After (Direct)
Git clone	15-30 seconds	2-5 seconds
Artifact upload	10-20 seconds	1-3 seconds
Cache restore	20-40 seconds	5-10 seconds
Total pipeline	5-8 minutes	2-3 minutes

Optimization 2: Runner Resource Configuration

Defaults are conservative. Tune them.

Recommended config.toml

concurrent = 2
check_interval = 3
connection_max_age = "15m0s"
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "gitlab-runner-01"
  url = "http://192.168.1.10/"
  clone_url = "http://192.168.1.10/"
  executor = "docker"
  request_concurrency = 2

  [runners.cache]
    MaxUploadedArchiveSize = 0

  [runners.docker]
    tls_verify = false
    image = "alpine:latest"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 536870912
    network_mtu = 0
    cpus = "1.5"
    memory = "2560m"
    pull_policy = ["if-not-present"]
    extra_hosts = ["gitlab.example.com:192.168.1.10", "registry.example.com:192.168.1.10"]

Key Settings Explained

Setting	Value	Why
`concurrent`	2	Run 2 jobs simultaneously (adjust based on RAM)
`check_interval`	3	Poll for jobs every 3 seconds
`request_concurrency`	2	Fixes “long polling” warning
`cpus`	”1.5”	Allocate 1.5 CPUs per container
`memory`	”2560m”	2.5GB per container
`shm_size`	536870912	512MB shared memory (enough for Node.js)
`pull_policy`	”if-not-present”	Don’t re-pull images every time

Memory Budget Calculation

For a runner with 8GB RAM:

System/Docker overhead:  ~1.5GB
Runner process:          ~0.5GB
Container 1:              2.5GB
Container 2:              2.5GB
Buffer:                   1.0GB
─────────────────────────────────
Total:                    8.0GB ✓

⚠ Don't Over-Allocate

If concurrent × memory exceeds your available RAM, containers will be OOM-killed. Start conservative and increase based on monitoring.

Optimization 3: Pipeline Caching Strategy

npm ci on every job wastes 30-60 seconds. Cache it.

The Problem

Without caching, every job:

Downloads packages from npm registry
Installs all dependencies from scratch
Repeats this even when package-lock.json hasn’t changed

The Solution: Dedicated Install Stage

image: node:24.12.0-trixie-slim

stages:
  - install
  - lint
  - build
  - test
  - deploy

variables:
  NPM_CONFIG_CACHE: .npm
  npm_config_prefer_offline: 'true'
  npm_config_audit: 'false'
  npm_config_fund: 'false'

# Global cache - all jobs can pull from this
cache:
  key:
    files:
      - package-lock.json
  paths:
    - .npm/
    - node_modules/
  policy: pull # Most jobs only read cache

# ===============================
# STAGE: INSTALL
# ===============================
install_deps:
  stage: install
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - .npm/
      - node_modules/
    policy: pull-push # This job updates the cache
  script:
    - npm ci
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH

How Other Jobs Use the Cache

lint_code:
  stage: lint
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci' # Fallback if cache miss
    - npm run lint
    - npm run format:check
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH =~ /^(develop|main)$/

build_site:
  stage: build
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci'
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 1 day
  rules:
    - if: $CI_COMMIT_BRANCH

Cache Key Strategy

Cache key is based on package-lock.json:

cache:
  key:
    files:
      - package-lock.json

Same package-lock.json = cache hit = fast
Changed package-lock.json = cache miss = full install (expected)

Cache Hit vs Miss Performance

Scenario	npm ci Time	Total Job Time
Cache miss (first run)	45-60 seconds	70-90 seconds
Cache hit (subsequent)	0 seconds	15-25 seconds
Partial cache hit	10-20 seconds	30-45 seconds

Optimization 4: Use needs for Parallel Execution

GitLab runs stages sequentially by default. needs unlocks parallelism.

Without needs (Sequential)

install → lint → build → test → deploy
  30s      40s     60s    20s     30s   = 180s total

With needs (Parallel)

lint_code:
  needs:
    - job: install_deps
      optional: true # Don't fail if install_deps was skipped

build_site:
  needs:
    - job: install_deps
      optional: true

test_build:
  needs:
    - job: build_site
      artifacts: true # Download artifacts from build_site

install ──→ lint ──────────→ deploy
   30s  ╲     40s              30s
         ╲
          → build → test ──→
             60s     20s

Independent jobs run in parallel. Total time drops.

Optimization 5: Sync Develop After Production Deploy

Prevents “source branch is X commits behind target” in future merge requests.

# ===============================
# STAGE: POST-DEPLOY
# ===============================
# Add GITLAB_INTERNAL_IP as a CI/CD variable (e.g., 192.168.1.10)
sync_develop:
  stage: post-deploy
  image: alpine:latest
  variables:
    GIT_STRATEGY: clone
    GIT_DEPTH: 0
  before_script:
    - apk add --no-cache git
    - git config user.email "ci@example.com"
    - git config user.name "GitLab CI"
    - git remote set-url origin "http://oauth2:${PUSH_TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"
  script:
    - git fetch origin develop
    - git checkout develop
    - git merge origin/main --no-edit
    - git push origin develop
  rules:
    - if: $CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE == "push"
  allow_failure: true

ℹ Required CI/CD Variables

This job needs two CI/CD variables:

PUSH_TOKEN - Project Access Token with write_repository scope
GITLAB_INTERNAL_IP - Your GitLab server’s internal IP (e.g., 192.168.1.10)

Create them at Settings → CI/CD → Variables.

Why Use Internal IP in sync_develop?

CI_SERVER_HOST resolves to gitlab.example.com (external URL) on port 443 (Cloudflare HTTPS). Inside a Docker container, that routes through Cloudflare — slow and unreliable. Internal IP keeps git operations on the local network.

Complete Optimized .gitlab-ci.yml

Everything together:

image: node:24.12.0-trixie-slim

stages:
  - install
  - lint
  - build
  - test
  - deploy
  - post-deploy

default:
  interruptible: true

variables:
  NPM_CONFIG_CACHE: .npm
  npm_config_prefer_offline: 'true'
  npm_config_audit: 'false'
  npm_config_fund: 'false'

cache:
  key:
    files:
      - package-lock.json
  paths:
    - .npm/
    - node_modules/
  policy: pull

# ===============================
# STAGE: INSTALL
# ===============================
install_deps:
  stage: install
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - .npm/
      - node_modules/
    policy: pull-push
  script:
    - npm ci
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH

# ===============================
# STAGE: LINT
# ===============================
lint_code:
  stage: lint
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci'
    - npm run lint
    - npm run format:check
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH =~ /^(develop|main)$/

lint_commit:
  stage: lint
  needs:
    - job: install_deps
      optional: true
  variables:
    GIT_DEPTH: 0
  script:
    - '[ -d node_modules ] || npm ci'
    - npx commitlint --from $CI_MERGE_REQUEST_DIFF_BASE_SHA --to $CI_MERGE_REQUEST_DIFF_HEAD_SHA
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

# ===============================
# STAGE: BUILD
# ===============================
build_site:
  stage: build
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci'
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 1 day
  rules:
    - if: $CI_COMMIT_BRANCH

# ===============================
# STAGE: TEST
# ===============================
test_build:
  stage: test
  needs:
    - job: build_site
      artifacts: true
  script:
    - test -d dist
    - test "$(ls -A dist)"
  rules:
    - if: $CI_COMMIT_BRANCH

# ===============================
# STAGE: DEPLOY
# ===============================
deploy_develop:
  stage: deploy
  needs:
    - job: build_site
      artifacts: true
  variables:
    NODE_ENV: production
  before_script:
    - npm install -g wrangler
  script:
    - wrangler pages deploy dist --project-name=my-project --branch=develop
  environment:
    name: develop
    url: https://develop.my-project.pages.dev
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy_production:
  stage: deploy
  needs:
    - job: build_site
      artifacts: true
  variables:
    NODE_ENV: production
  before_script:
    - npm install -g wrangler
  script:
    - wrangler pages deploy dist --project-name=my-project --branch=main
  environment:
    name: production
    url: https://my-project.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

# ===============================
# STAGE: POST-DEPLOY
# ===============================
sync_develop:
  stage: post-deploy
  image: alpine:latest
  variables:
    GIT_STRATEGY: clone
    GIT_DEPTH: 0
  before_script:
    - apk add --no-cache git
    - git config user.email "ci@example.com"
    - git config user.name "GitLab CI"
    - git remote set-url origin "http://oauth2:${PUSH_TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"
  script:
    - git fetch origin develop
    - git checkout develop
    - git merge origin/main --no-edit
    - git push origin develop
  rules:
    - if: $CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE == "push"
  allow_failure: true

Troubleshooting Common Issues

Issue 1: Pipeline Stuck in “Pending”

Symptoms: Jobs show “Pending” indefinitely, runner appears online.

Check runner logs:

sudo journalctl -u gitlab-runner -f

Common causes:

Runner URL mismatch - config.toml URL doesn’t match GitLab’s expected URL
Network timeout - Runner can’t reach GitLab
Tag mismatch - Jobs require tags the runner doesn’t have

Fix: Verify url in config.toml:

sudo gitlab-runner verify

Issue 2: “connection refused” Errors

dial tcp 192.168.1.10:443: connect: connection refused

Cause: Wrong port. GitLab listens on port 80 (HTTP), not 443 (HTTPS).

Fix: Use HTTP URL:

url = "http://192.168.1.10/"

Issue 3: “unauthorized” in Container Jobs

fatal: unable to access 'https://gitlab.example.com/...':
Failed to connect to gitlab.example.com port 443

Cause: Container can’t resolve hostname or is hitting the wrong port.

Fix: Use internal IP in job scripts:

git remote set-url origin "http://oauth2:${TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"

Issue 4: Cache Never Hits

Symptoms: npm ci runs every time, “No cache found” in logs.

Common causes:

Cache key changed (check package-lock.json)
Cache expired (default 2 weeks)
Different runner picked up the job

Check cache status:

# In job log, look for:
Checking cache for <key>...
Successfully extracted cache
# or
No URL provided, cache will not be downloaded

Performance Checklist

The Bottom Line

Default configs prioritize compatibility over speed. The biggest wins:

Direct internal connections - Bypass Cloudflare for runner-to-GitLab traffic
Aggressive caching - Cache node_modules, not just .npm
Parallel execution - Use needs for independent jobs
Proper resource allocation - Tune concurrent, cpus, and memory

Our pipeline went from 5-8 minutes to under 2 minutes. Expect 50-70% improvement on most setups.

Start with the internal connection fix. Lowest effort, highest impact.

Next Steps: Implement This Today

✓ Quick Wins (30 minutes)

Fix 1: Internal Connection

Update config.toml with internal IP
Add extra_hosts for Docker containers
Restart runner: sudo gitlab-runner restart

Fix 2: Enable Caching

Add install_deps stage to pipeline
Configure cache with package-lock.json key
Set pull_policy: if-not-present

ℹ Advanced Optimization (1-2 hours)

Tune concurrent, cpus, memory based on your workload
Set up sync_develop job with PUSH_TOKEN
Add needs dependencies for parallel execution
Configure artifact expiration policies

Why Pipelines Are Slow (It’s Not Your CPU)

Our Infrastructure Setup

Optimization 1: Direct Internal Connection

The Problem

The Fix

Before vs After

Optimization 2: Runner Resource Configuration

Recommended config.toml

Key Settings Explained

Memory Budget Calculation

Optimization 3: Pipeline Caching Strategy

The Problem

The Solution: Dedicated Install Stage

How Other Jobs Use the Cache

Cache Key Strategy

Cache Hit vs Miss Performance

Optimization 4: Use needs for Parallel Execution

Without needs (Sequential)

With needs (Parallel)

Optimization 5: Sync Develop After Production Deploy

Why Use Internal IP in sync_develop?

Complete Optimized .gitlab-ci.yml

Troubleshooting Common Issues

Issue 1: Pipeline Stuck in “Pending”

Issue 2: “connection refused” Errors

Issue 3: “unauthorized” in Container Jobs

Issue 4: Cache Never Hits

Performance Checklist

The Bottom Line

Next Steps: Implement This Today

Additional Resources