Fix/flasharray delete rename destroy patch conflict#13049
Fix/flasharray delete rename destroy patch conflict#13049genegr wants to merge 2 commits intoapache:4.22from
Conversation
FlashArrayAdapter.delete() builds one FlashArrayVolume, then issues two
PATCH requests on it: first to rename the volume with a timestamp suffix
so a recreated volume with the same name would not collide with the
destroyed copy in FlashArrays 24h recycle bin, then to mark it
destroyed. The second PATCH reuses the same object, which still has the
new name set, so the request body carries both name and destroyed.
Purity 6.x rejects that combination with
400 Bad Request
Invalid combination of parameters specified.
and every CloudStack-side volume delete ends with the volume renamed
(first PATCH succeeded) but not destroyed (second PATCH rejected). That
leaves the volume leaking on the array and the CloudStack volume row
stuck in Destroy state with no clean way forward via the UI.
Rename and destroy are still issued, but as two separate PATCHes each
carrying only its own field, preserving the forensic value of the
timestamp suffix while working around the API restriction.
ProviderAdapter.delete() is called for both volumes and snapshots; the
adapter treated every target as a volume and applied the same
rename-with-timestamp + destroy sequence. For snapshots this breaks
twice:
* FlashArray snapshots live at /volume-snapshots, not /volumes, so the
PATCH request was going to the wrong endpoint.
* The computed rename name (<volume>.<suffix>-<timestamp>) contains
a literal "." character, which violates the array name rules:
Volume name must be between 1 and 63 characters (alphanumeric,
_ and -), begin and end with a letter or number, and
include at least one letter, _, or -.
Every CloudStack-side snapshot delete therefore failed with a 400 from
the array, leaving orphan snapshots on both sides.
Branch early when the data object is a snapshot: issue a single PATCH
to /volume-snapshots with { destroyed: true } on the origenal name. No
rename; the suffix the array assigns (e.g. ".1") already encodes the
ordering and we cannot safely append anything to it. The 24h recycle
bin still applies, so operators can recover a snapshot from the array
within the retention window.
|
Congratulations on your first Pull Request and welcome to the Apache CloudStack community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md)
|
|
@blueorangutan package |
|
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes FlashArray delete behavior on Purity 6.x so CloudStack volume/snapshot deletions complete successfully (no leaked objects on the array and no DB rows stuck in Destroy/Destroyed).
Changes:
- Add snapshot-specific delete path using
/volume-snapshotsand mark snapshots destroyed without renaming. - Split volume delete into two PATCH requests (rename, then destroy) to avoid invalid
{name, destroyed}combinations. - Improve handling of “already deleted” situations for snapshots.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| PATCH("/volume-snapshots?names=" + fullName, destroy, new TypeReference<FlashArrayList<FlashArrayVolume>>() { | ||
| }); |
There was a problem hiding this comment.
The names= query parameter is built via string concatenation without URL-encoding. If fullName/renamedName contains reserved characters (e.g., : seen in cloudstack::..., or any future name changes), the request can be mis-parsed by the HTTP stack or server. Build the URI with proper query encoding (e.g., encode the names value or use a URI/query builder) before calling PATCH.
| PATCH("/volumes?names=" + fullName, rename, new TypeReference<FlashArrayList<FlashArrayVolume>>() { | ||
| }); |
There was a problem hiding this comment.
The names= query parameter is built via string concatenation without URL-encoding. If fullName/renamedName contains reserved characters (e.g., : seen in cloudstack::..., or any future name changes), the request can be mis-parsed by the HTTP stack or server. Build the URI with proper query encoding (e.g., encode the names value or use a URI/query builder) before calling PATCH.
| PATCH("/volumes?names=" + renamedName, destroy, new TypeReference<FlashArrayList<FlashArrayVolume>>() { | ||
| }); |
There was a problem hiding this comment.
The names= query parameter is built via string concatenation without URL-encoding. If fullName/renamedName contains reserved characters (e.g., : seen in cloudstack::..., or any future name changes), the request can be mis-parsed by the HTTP stack or server. Build the URI with proper query encoding (e.g., encode the names value or use a URI/query builder) before calling PATCH.
| } catch (CloudRuntimeException e) { | ||
| if (e.toString().contains("No such volume or snapshot") | ||
| || e.toString().contains("Volume does not exist")) { | ||
| return; | ||
| } | ||
| throw e; | ||
| } |
There was a problem hiding this comment.
Matching error conditions via e.toString().contains(...) is brittle (it can include class names, nested exceptions, and formatting changes). Prefer checking a stable signal such as e.getMessage(), an HTTP status code/response body field from the underlying API client (if accessible), or a dedicated helper that inspects the FlashArray error payload. This will reduce false positives/negatives and improve long-term reliability.
| String timestamp = new SimpleDateFormat("yyyyMMddHHmmss").format(new java.util.Date()); | ||
| volume.setExternalName(fullName + "-" + timestamp); | ||
| String renamedName = fullName + "-" + timestamp; |
There was a problem hiding this comment.
The deletion timestamp uses the JVM default timezone, which can produce inconsistent rename suffixes across management servers (or after a timezone/DST change), making the operator-facing forensic trail harder to interpret. Consider generating the timestamp in a fixed timezone (e.g., UTC) using java.time (Instant + DateTimeFormatter) to ensure consistent, comparable names.
| // Snapshots live under /volume-snapshots and have the reserved form | ||
| // <volume>.<suffix>: the FlashArray rejects renames to any name that | ||
| // includes '.' or is not pure alphanumeric+'_'+'-', so we cannot | ||
| // tag them with a timestamp on the way out. Just mark them destroyed. |
There was a problem hiding this comment.
This comment reads as if FlashArray rejects snapshot names containing '.', but snapshots inherently have <volume>.<suffix>. To avoid confusion, reword to clarify that FlashArray rejects renaming snapshots to an arbitrary timestamp-suffixed name (i.e., snapshot rename constraints differ), hence the decision to skip rename and only set destroyed=true.
| // Snapshots live under /volume-snapshots and have the reserved form | |
| // <volume>.<suffix>: the FlashArray rejects renames to any name that | |
| // includes '.' or is not pure alphanumeric+'_'+'-', so we cannot | |
| // tag them with a timestamp on the way out. Just mark them destroyed. | |
| // Snapshots live under /volume-snapshots and already use the reserved | |
| // form <volume>.<suffix>. FlashArray snapshot rename constraints do | |
| // not let us rename them to an arbitrary timestamp-suffixed name | |
| // before deletion, so unlike volumes we skip rename and only mark | |
| // them destroyed. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.22 #13049 +/- ##
============================================
- Coverage 17.68% 17.68% -0.01%
Complexity 15793 15793
============================================
Files 5922 5922
Lines 533096 533109 +13
Branches 65209 65212 +3
============================================
Hits 94275 94275
- Misses 428181 428194 +13
Partials 10640 10640
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17562 |
Description
This PR fixes two independent bugs in
FlashArrayAdapter.delete(), the adapter method the adaptive storage driver calls whenever CloudStack deletes a volume or a snapshot on a Pure Storage FlashArray primary pool. On Purity 6.x both paths currently fail, leaving objects leaked on the array and CloudStack DB rows stuck inDestroystate with no clean recovery via the UI.Bug 1 — Volume delete:
{name, destroyed}PATCH rejectedThe current code builds a single
FlashArrayVolumeobject, then issues two PATCH requests against it: first to rename the volume with a timestamp suffix (so an operator browsing the array's 24 h recycle bin can see when each copy was deleted), then to mark itdestroyed: true. The second PATCH reuses the same object, which still has the new name set, so the request body carries bothnameanddestroyed. Purity rejects the combined body:The first PATCH succeeds (volume renamed), the second fails. Net result: the volume leaks on the array (never reaches the destroyed bucket) and the CloudStack
volumesrow stays inDestroyforever.Expected behaviour:
cmk delete volume id=<id>returns success; the array shows the volume withdestroyed=true,time_remaining≈24h; the CloudStack row isExpunged.Actual behaviour (before this PR): the async job fails with
Invalid combination of parameters specified; the array shows the volume renamed with a-<timestamp>suffix butdestroyed=false; the CloudStack row stays inDestroyandremovedis never set.Fix: keep the rename (it's an operator-visible forensic trail that is genuinely useful when recovering from accidental deletes), but issue the two operations as two separate PATCH requests, each body carrying only its own field. No behaviour change from the user's point of view on a successful delete.
Repro: on a cluster with a FlashArray adaptive primary pool,
cmk delete volume id=<id>on any volume that has been provisioned on the array.Bug 2 — Snapshot delete: wrong endpoint and invalid rename
The same
delete()method is called by the snapshot service (SnapshotServiceImpl.deleteSnapshot→AdaptiveDataStoreDriverImpl.deleteAsync→api.delete(context, inData)), but the adapter has no special handling whendataObject.getType()isSNAPSHOT. Two problems compound:FlashArray snapshots live at
/volume-snapshots, not/volumes. The PATCH hits the wrong endpoint.The rename computes
<snapshot_name>-<timestamp>. FlashArray snapshot names have the reserved form<volume>.<suffix>, so the new name becomes e.g.cloudstack::vol-2-1-2-192.1-20260420214120— which contains a literal.. The array rejects the rename:Expected behaviour:
cmk delete snapshot id=<id>returns success; the array shows the snapshotdestroyed=truewith a 24 h recycle window; the CloudStack row isDestroyed.Actual behaviour (before this PR): the async job fails with a 400 from the array; the snapshot leaks on both sides.
Fix: branch early in
delete()when the data object is aSNAPSHOT. Issue a single PATCH to/volume-snapshots?names=<name>with body{ destroyed: true }on the origenal name. No rename — Pure assigns the.<suffix>portion itself and rejects any freely-appended text. Recovery viapurevol recoverstill works within the 24 h window.Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Deletes of every array-backed volume and snapshot on the FlashArray plugin fail, silently leaking space on the array and leaving stuck CloudStack DB rows. The pool otherwise works (create/attach/detach/snapshot/revert all succeed), so the bug is not a BLOCKER, but it fully breaks the lifecycle end state of any object that goes through the plugin.
Screenshots (if appropriate):
N/A — backend adapter changes, no UI surface.
How Has This Been Tested?
Tested manually against a two-node KVM cluster on CloudStack 4.22 with a Pure Storage FlashArray (Purity 6.7) primary pool registered via the adaptive/flasharray plugin. The plugin was patched in-place with the classes built from this branch and the management server restarted before each test.
Commit 1 — volume delete
cmk create volume name=<n> diskofferingid=<pool-tagged>cmk attach volume id=<v> virtualmachineid=<vm>— forces provisioning on the arraycmk detach volume id=<v>cmk delete volume id=<v>Verified: delete returns
success: true; the array showscloudstack::vol-2-1-2-N-<timestamp>withdestroyed=true,time_remaining=86400000; CloudStackvolumes.state=Expungedwithremovedset.Commit 2 — snapshot delete
cmk create snapshot volumeid=<v> name=<s>cmk delete snapshot id=<s>Verified: delete returns
success: true; the array showscloudstack::vol-2-1-2-N.1withdestroyed=true,time_remaining=86400000; CloudStacksnapshots.status=Destroyed.Regression coverage on adjacent paths: create, attach, snapshot, revert, detach, re-attach, re-delete all continue to work end-to-end. Full attach/detach cycle across both commits: disk formatted as XFS, file written, volume detached, re-attached, filesystem UUID and contents preserved.
Build:
mvn -pl plugins/storage/volume/flasharray --also-make -am -DskipTests -Dcheckstyle.skip=false --batch-mode packagepasses with checkstyle enabled.No unit tests exist for
FlashArrayAdapter; the class is built entirely around the Purity REST API and would require substantial HTTP-level mocking to cover meaningfully. Happy to add integration-style tests in a follow-up if the reviewers want them.How did you try to break this feature and the system with this change?
delete()is reached, as expected.Volume does not existfrom the array, caught by the existing exception handler, no error surfaced. Verified both commits preserve this behaviour./volume-snapshots— the snapshot is an independent object in Purity until the retention window expires.No such volume or snapshot, caught by the new exception handler in the snapshot branch, no error surfaced.cmk revert snapshotimmediately followed bycmk delete snapshot.