From time to time, hardware needs replacing—and sometimes that means migrating Virtual Machines.
If you’re using a virtualization solution like VMware with vCenter, this is usually an easy task: you can essentially drag-and-drop VMs from your old cluster to your new one.
With Proxmox VE, migrating VMs between clusters isn’t currently supported through the web UI in a “cluster-to-cluster” workflow. However, there is an experimental CLI feature that makes this possible, and it saved my bacon.
[TL;DR]
Proxmox doesn’t support VM drag-and-drop migrations between clusters in the web UI, but you can live-migrate VMs to a remote Proxmox cluster using the experimental CLI command 'qm remote-migrate'.
By using an API token, the target host fingerprint, and specifying the destination storage + bridge, you can migrate running VMs with minimal downtime (often just a reboot) and very little packet loss.
The challenge: migrating a distributed Mastodon environment
I’m running Mastodon on Proxmox in a distributed setup (not a single VM). This design lets me make changes with minimal impact and downtime for platform users.
There are a few common options to migrate VMs in Proxmox:
Option 1: Add new hosts to the old cluster
This can work well, but it wasn’t ideal for my case because:
- I also replaced my underlying iSCSI storage
- I wanted a clean new cluster (and not a stretched transitional cluster)
Option 2: Back up and restore
I could’ve backed up and restored using the Proxmox VE plugin for Veeam Backup & Replication, but that introduces:
- Downtime during restore
- Risk of data loss between the last backup and shutdown.
Option 3: Rebuild and let application replication handle it
Some parts of the stack can be rebuilt cleanly (e.g., Elasticsearch nodes), but doing that across the entire platform is:
- More work
- More change than necessary
- More room for mistakes
So I went looking for a better option.
The solution: remote live migration (CLI only)
Proxmox has an experimental feature that allows remote migration to another cluster using "qm remote-migrate".
This can even be done online (live), meaning the VM keeps running during migration—with minimal disruption.
Requirements/prerequisites
Before you try this, make sure:
- The target host is reachable on port 8006 (Proxmox API)
- The target storage exists and is configured on the destination cluster
- CPU features on the destination host are compatible or better
- Live migration will fail if the destination CPU can’t support the running VM CPU model/features
- The VM isn’t blocked by HA policies (we’ll remove it from HA first)
- Ideally, you have the QEMU guest agent installed (not required, but it helps with clean freeze/thaw)
- The VM’s disks are on storage that can be migrated (see storage notes below)
Live migration to a remote host/cluster
To migrate VMs between clusters, you’ll need an API token for the destination cluster.
Token creation is straightforward and won’t be covered here, but the main caveat is permissions.
Because migration makes many API calls, I took the easy route and assigned the token the Administrator role.
Security note: Administrator is convenient, but you may want to create a dedicated role with only the required privileges in more locked-down environments.
Migration script (remote live migration)
You’ll need:
- Target host IP
- Target storage name (e.g., 'local-lvm', 'ceph', 'iscsi-storage01', etc.).
- Target network bridge (e.g. 'vmbr0')
- Your VM ID (e.g. 100)
#!/usr/bin/env bash
set -euo pipefail
TARGET_HOST="[IP of target host]"
TARGET_STORAGE="[Name of the storage]"
TARGET_BRIDGE="[Name of the network bridge]"
TOKEN='PVEAPIToken=root@pam!migrate=[Token]'
VMID=[Your VM ID]
# Fetch and store the SHA256 fingerprint for the destination node
FP="$(openssl s_client -connect "${TARGET_HOST}:8006" </dev/null 2>/dev/null \
| openssl x509 -fingerprint -sha256 -noout \
| sed 's/^.*=//')"
# Remove from HA if present (ignore error if it isn't HA-managed)
ha-manager remove "vm:${VMID}" 2>/dev/null || true
# Remote migrate (live)
qm remote-migrate "${VMID}" "${VMID}" \
"apitoken=${TOKEN},host=${TARGET_HOST},fingerprint=${FP}" \
--target-storage "${TARGET_STORAGE}" \
--target-bridge "${TARGET_BRIDGE}" \
--online \
--delete
What this does
- Migrates VM '${VMID}' to a new Proxmox cluster via API token auth
- Uses '--online' for live migration
- Copies/moves storage to '--target-storage'
- Rewrites network bridge mapping to '--target-bridge'
- Deletes the VM from the old cluster ('--delete')
About '--delete' and VM locks
If you include '--delete', the VM will be removed from the old cluster after the migration completes.
If you omit '--delete', the VM remains on the source cluster in a locked “migrate” state.
If that happens, you can unlock it with:
qm unlock "${VMID}" 2>/dev/null || true
Migrating multiple VMs
If you’re migrating multiple VMs to the same destination host/storage/bridge, you can reuse the script and just change:
VMID=101
…and run it again.
Post-migration steps
After migration completes, you may still need to reapply cluster-specific configuration, like:
- Adding the VM back into HA
- Assigning it to a pool
- Updating firewall rules, tags, notes, etc.
CPU type/performance note
To take advantage of new CPU capabilities, the VM typically needs a reboot (depending on your CPU type settings). Since I was making hardware changes anyway, I rebooted via the Proxmox web UI.
I haven’t tested whether an in-guest reboot is always sufficient, but in my case, the UI reboot worked perfectly.
Result
Using 'qm remote-migrate', I was able to migrate my Mastodon VMs with:
- minimum downtime (just a reboot where needed)
- 1 ping loss during migration
- no backup/restore window
- no “rebuild everything” effort
Dry run/safety checklist (recommended)
Before migrating anything production-related, here’s a quick checklist that can save a lot of pain.
On the destination cluster
- Target node is healthy and visible in the Proxmox UI
- Target storage is configured and working (Datacenter -> Storage)
- Target bridge exists and matches the expected naming (vmbr0, etc.)
- Firewall rules allow access to port 8006 between clusters (API)
- Time sync / NTP is working (clock skew can cause weird TLS/auth issues)
On the source cluster
- VM is running stable (no backup jobs, no pending snapshots)
- HA can be removed temporarily (or you understand how it should behave)
- You know where the VM disks currently live
- Quick check: 'qm config <VMID>'.
CPU compatibility sanity check
- Destination CPU is at least as capable as the source
- VM CPU type is not “too specific” (e.g., pinned to a model not available on the destination)
Migration validation plan (before production)
- Start with a non-critical VM first
- Confirm the migrated VM:
- boots
- has network
- has the correct disk layout
- has correct IP/routes/gateway
- responds to ping and/or service checks
Troubleshooting (common errors + fixes)
Here are a few common problems you may run into, along with what to check first.
TLS / fingerprint issues
Symptoms
- Migration fails early with TLS verification errors
- Fingerprint mismatch / invalid fingerprint errors
Fixes
- Re-run the fingerprint command and confirm it matches the destination node certificate:
'openssl s_client -connect <TARGET_HOST>:8006 </dev/null 2>/dev/null \ | openssl x509 -fingerprint -sha256 -noout' - If the destination cluster recently regenerated certificates, your cached fingerprint may be outdated.
Permission denied (API token)
Symptoms
- Migration fails with auth errors
- API calls return “permission check failed.”
Fixes
- Ensure your token is formatted correctly: "TOKEN='PVEAPIToken=root@pam!migrate=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'"
- Make sure the token has enough permissions
(I used Administrator because it’s the least painful option for migration.)
“VM is locked” / stuck migration state
Symptoms
- VM becomes unmanageable in the UI
- Operations fail with “VM is locked (migrate).”
Fix
Unlock it manually:
qm unlock <VMID>Note: only do this once you’re sure the migration job has stopped or failed.
Live migration fails due to CPU incompatibility
Symptoms
- Migration starts but fails quickly
- You see errors related to CPU flags/features
Fixes
- Make sure the destination CPU supports the VM’s current CPU model
- Consider changing the VM CPU type to something more compatible (like 'x86-64-v2' or 'kvm64') before migrating
(that may require downtime and depends on your workload requirements)
Storage migration fails (storage not found / unsupported)
Symptoms
- Error about the target storage not existing
- Disk copy fails, or storage mapping fails
Fixes
- Verify the storage name is correct on the destination:
- Datacenter -> Storage
- or: 'pvesm status'
- Ensure the destination storage supports VM disks (not all storage types are suitable)
Networking comes up wrong after migration
Symptoms
- VM boots but has no connectivity
- Wrong bridge used / interface not connected
Fixes
- Ensure '--target-bridge' matches a real bridge on the destination node
- Verify VM config on the destination: 'qm config <VMID>'
- Confirm VLAN tagging (if used) is consistent between clusters
Storage notes: Ceph vs iSCSI vs local disks
Storage is where migrations often get interesting. Here’s how this typically plays out:
Ceph (shared storage inside a cluster)
If both source and destination nodes are in the same Ceph cluster (or shared environment), migration is usually smooth.
Pros
- VM disks may not need to be copied in the same way
- Great for traditional intra-cluster live migration
Cons
- If you’re building a new cluster with a new Ceph backend, you’re still doing data movement
iSCSI (common in “separate storage backend” setups)
This was my scenario: a new cluster and new iSCSI storage.
Pros
- Great for centralized shared storage setups
- Makes rebuilds and restores easier
Cons
- You must ensure storage is configured cleanly on the new cluster
- Disk naming / LUN mapping issues can complicate things
- Some iSCSI setups behave differently depending on LVM/LVM-thin/ZFS layering
Local disks (local-lvm, ZFS local, etc.)
If your VM disks are on local storage, migration means copying disk data over the network.
Pros
- Simple and fast on a single-node setup
- No external dependencies
Cons
- Potentially slow migrations for large disks
- Live migration becomes more sensitive to storage throughput and network bandwidth
- If the VM is write-heavy during migration, it may take longer
Final thoughts
If you’re moving between clusters—especially when storage is changing, and you want to keep disruption low—remote live migration is a fantastic tool to know about.
It’s not available in the UI today, but it’s absolutely worth keeping in your toolbox.
No Comments Yet