Reading Time: 6 minutes

From time to time, hardware needs replacing—and sometimes that means migrating Virtual Machines.

If you’re using a virtualization solution like VMware with vCenter, this is usually an easy task: you can essentially drag-and-drop VMs from your old cluster to your new one.

With Proxmox VE, migrating VMs between clusters isn’t currently supported through the web UI in a “cluster-to-cluster” workflow. However, there is an experimental CLI feature that makes this possible, and it saved my bacon.

[TL;DR]

Proxmox doesn’t support VM drag-and-drop migrations between clusters in the web UI, but you can live-migrate VMs to a remote Proxmox cluster using the experimental CLI command ‘qm remote-migrate’.
By using an API token, the target host fingerprint, and specifying the destination storage + bridge, you can migrate running VMs with minimal downtime (often just a reboot) and very little packet loss.

The challenge: migrating a distributed Mastodon environment

I’m running Mastodon on Proxmox in a distributed setup (not a single VM). This design lets me make changes with minimal impact and downtime for platform users.

There are a few common options to migrate VMs in Proxmox:

Option 1: Add new hosts to the old cluster

This can work well, but it wasn’t ideal for my case because:

I also replaced my underlying iSCSI storage
I wanted a clean new cluster (and not a stretched transitional cluster)

Option 2: Back up and restore

I could’ve backed up and restored using the Proxmox VE plugin for Veeam Backup & Replication, but that introduces:

Downtime during restore
Risk of data loss between the last backup and shutdown.

Option 3: Rebuild and let application replication handle it

Some parts of the stack can be rebuilt cleanly (e.g., Elasticsearch nodes), but doing that across the entire platform is:

More work
More change than necessary
More room for mistakes

So I went looking for a better option.

The solution: remote live migration (CLI only)

Proxmox has an experimental feature that allows remote migration to another cluster using “qm remote-migrate”.

This can even be done online (live), meaning the VM keeps running during migration—with minimal disruption.

Requirements/prerequisites

Before you try this, make sure:

The target host is reachable on port 8006 (Proxmox API)
The target storage exists and is configured on the destination cluster
CPU features on the destination host are compatible or better
- Live migration will fail if the destination CPU can’t support the running VM CPU model/features
The VM isn’t blocked by HA policies (we’ll remove it from HA first)
Ideally, you have the QEMU guest agent installed (not required, but it helps with clean freeze/thaw)
The VM’s disks are on storage that can be migrated (see storage notes below)

Live migration to a remote host/cluster

To migrate VMs between clusters, you’ll need an API token for the destination cluster.

Token creation is straightforward and won’t be covered here, but the main caveat is permissions.

Because migration makes many API calls, I took the easy route and assigned the token the Administrator role.

Security note: Administrator is convenient, but you may want to create a dedicated role with only the required privileges in more locked-down environments.

Migration script (remote live migration)

You’ll need:

Target host IP
Target storage name (e.g., ‘local-lvm’, ‘ceph’, ‘iscsi-storage01’, etc.).
Target network bridge (e.g. ‘vmbr0’)
Your VM ID (e.g. 100)

#!/usr/bin/env bash
set -euo pipefail

TARGET_HOST="[IP of target host]"
TARGET_STORAGE="[Name of the storage]"
TARGET_BRIDGE="[Name of the network bridge]"
TOKEN='PVEAPIToken=root@pam!migrate=[Token]'

VMID=[Your VM ID]

# Fetch and store the SHA256 fingerprint for the destination node
FP="$(openssl s_client -connect "${TARGET_HOST}:8006" </dev/null 2>/dev/null \
  | openssl x509 -fingerprint -sha256 -noout \
  | sed 's/^.*=//')"

# Remove from HA if present (ignore error if it isn't HA-managed)
ha-manager remove "vm:${VMID}" 2>/dev/null || true

# Remote migrate (live)
qm remote-migrate "${VMID}" "${VMID}" \
  "apitoken=${TOKEN},host=${TARGET_HOST},fingerprint=${FP}" \
  --target-storage "${TARGET_STORAGE}" \
  --target-bridge "${TARGET_BRIDGE}" \
  --online \
  --delete

What this does

Migrates VM ‘${VMID}’ to a new Proxmox cluster via API token auth
Uses ‘–online’ for live migration
Copies/moves storage to ‘–target-storage’
Rewrites network bridge mapping to ‘–target-bridge’
Deletes the VM from the old cluster (‘–delete’)

About ‘–delete’ and VM locks

If you include ‘–delete’, the VM will be removed from the old cluster after the migration completes.

If you omit ‘–delete‘, the VM remains on the source cluster in a locked “migrate” state.

If that happens, you can unlock it with:

qm unlock "${VMID}" 2>/dev/null || true

Migrating multiple VMs

If you’re migrating multiple VMs to the same destination host/storage/bridge, you can reuse the script and just change:

VMID=101

…and run it again.

Post-migration steps

After migration completes, you may still need to reapply cluster-specific configuration, like:

Adding the VM back into HA
Assigning it to a pool
Updating firewall rules, tags, notes, etc.

CPU type/performance note

To take advantage of new CPU capabilities, the VM typically needs a reboot (depending on your CPU type settings). Since I was making hardware changes anyway, I rebooted via the Proxmox web UI.

I haven’t tested whether an in-guest reboot is always sufficient, but in my case, the UI reboot worked perfectly.

Result

Using ‘qm remote-migrate’, I was able to migrate my Mastodon VMs with:

minimum downtime (just a reboot where needed)
1 ping loss during migration
no backup/restore window
no “rebuild everything” effort

Dry run/safety checklist (recommended)

Before migrating anything production-related, here’s a quick checklist that can save a lot of pain.

On the destination cluster

Target node is healthy and visible in the Proxmox UI
Target storage is configured and working (Datacenter -> Storage)
Target bridge exists and matches the expected naming (vmbr0, etc.)
Firewall rules allow access to port 8006 between clusters (API)
Time sync / NTP is working (clock skew can cause weird TLS/auth issues)

On the source cluster

VM is running stable (no backup jobs, no pending snapshots)
HA can be removed temporarily (or you understand how it should behave)
You know where the VM disks currently live
- Quick check: ‘qm config <VMID>’.

CPU compatibility sanity check

Destination CPU is at least as capable as the source
VM CPU type is not “too specific” (e.g., pinned to a model not available on the destination)

Migration validation plan (before production)

Start with a non-critical VM first
Confirm the migrated VM:
- boots
- has network
- has the correct disk layout
- has correct IP/routes/gateway
- responds to ping and/or service checks

Troubleshooting (common errors + fixes)

Here are a few common problems you may run into, along with what to check first.

TLS / fingerprint issues

Symptoms

Migration fails early with TLS verification errors
Fingerprint mismatch / invalid fingerprint errors

Fixes

Re-run the fingerprint command and confirm it matches the destination node certificate:
‘openssl s_client -connect <TARGET_HOST>:8006 </dev/null 2>/dev/null \ | openssl x509 -fingerprint -sha256 -noout’
If the destination cluster recently regenerated certificates, your cached fingerprint may be outdated.

Permission denied (API token)

Symptoms

Migration fails with auth errors
API calls return “permission check failed.”

Fixes

Ensure your token is formatted correctly: “TOKEN=’PVEAPIToken=root@pam!migrate=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'”
Make sure the token has enough permissions
(I used Administrator because it’s the least painful option for migration.)

“VM is locked” / stuck migration state

Symptoms

VM becomes unmanageable in the UI
Operations fail with “VM is locked (migrate).”

Fix

Unlock it manually:

qm unlock <VMID>

Note: only do this once you’re sure the migration job has stopped or failed.

Live migration fails due to CPU incompatibility

Symptoms

Migration starts but fails quickly
You see errors related to CPU flags/features

Fixes

Make sure the destination CPU supports the VM’s current CPU model
Consider changing the VM CPU type to something more compatible (like ‘x86-64-v2’ or ‘kvm64’) before migrating
(that may require downtime and depends on your workload requirements)

Storage migration fails (storage not found / unsupported)

Symptoms

Error about the target storage not existing
Disk copy fails, or storage mapping fails

Fixes

Verify the storage name is correct on the destination:
- Datacenter -> Storage
- or: ‘pvesm status’
Ensure the destination storage supports VM disks (not all storage types are suitable)

Networking comes up wrong after migration

Symptoms

VM boots but has no connectivity
Wrong bridge used / interface not connected

Fixes

Ensure ‘–target-bridge’ matches a real bridge on the destination node
Verify VM config on the destination: ‘qm config <VMID>’
Confirm VLAN tagging (if used) is consistent between clusters

Storage notes: Ceph vs iSCSI vs local disks

Storage is where migrations often get interesting. Here’s how this typically plays out:

Ceph (shared storage inside a cluster)

If both source and destination nodes are in the same Ceph cluster (or shared environment), migration is usually smooth.

Pros

VM disks may not need to be copied in the same way
Great for traditional intra-cluster live migration

Cons

If you’re building a new cluster with a new Ceph backend, you’re still doing data movement

iSCSI (common in “separate storage backend” setups)

This was my scenario: a new cluster and new iSCSI storage.

Pros

Great for centralized shared storage setups
Makes rebuilds and restores easier

Cons

You must ensure storage is configured cleanly on the new cluster
Disk naming / LUN mapping issues can complicate things
Some iSCSI setups behave differently depending on LVM/LVM-thin/ZFS layering

Local disks (local-lvm, ZFS local, etc.)

If your VM disks are on local storage, migration means copying disk data over the network.

Pros

Simple and fast on a single-node setup
No external dependencies

Cons

Potentially slow migrations for large disks
Live migration becomes more sensitive to storage throughput and network bandwidth
If the VM is write-heavy during migration, it may take longer

Final thoughts

If you’re moving between clusters—especially when storage is changing, and you want to keep disruption low—remote live migration is a fantastic tool to know about.

It’s not available in the UI today, but it’s absolutely worth keeping in your toolbox.

Proxmox: Migrating VMs to a New Cluster (with Minimal Downtime)

[TL;DR]

The challenge: migrating a distributed Mastodon environment

Option 1: Add new hosts to the old cluster

Option 2: Back up and restore

Option 3: Rebuild and let application replication handle it

The solution: remote live migration (CLI only)

Requirements/prerequisites

Live migration to a remote host/cluster

Migration script (remote live migration)

What this does

About ‘–delete’ and VM locks

Migrating multiple VMs

Post-migration steps

CPU type/performance note

Result

Dry run/safety checklist (recommended)

On the destination cluster

On the source cluster

CPU compatibility sanity check

Migration validation plan (before production)

Troubleshooting (common errors + fixes)

TLS / fingerprint issues

Symptoms

Fixes

Permission denied (API token)

Symptoms

Fixes

“VM is locked” / stuck migration state

Symptoms

Fix

Live migration fails due to CPU incompatibility

Symptoms

Fixes

Storage migration fails (storage not found / unsupported)

Symptoms

Fixes

Networking comes up wrong after migration

Symptoms

Fixes

Storage notes: Ceph vs iSCSI vs local disks

Ceph (shared storage inside a cluster)

iSCSI (common in “separate storage backend” setups)

Local disks (local-lvm, ZFS local, etc.)

Final thoughts

No Comments Yet

Leave a Reply Cancel reply