feat: add Trivy JSON report export with configurable storage and endpoint delivery#2893
feat: add Trivy JSON report export with configurable storage and endpoint delivery#2893
Conversation
…ook delivery - Capture raw Trivy JSON from scan job logs and optionally write to file storage - Storage options: operator filesystem (emptyDir), alternate storage (PVC subpath), or webhook-only - Deliver raw JSON to configurable HTTP endpoint with retries and custom headers - TTL-based cleanup of stored reports when file storage is used - No CRD storage for Trivy JSON (avoids etcd performance issues) - Helm: trivyJSONReportStorage (operator|alternate|""), trivyJSONReportStorageDir override
There was a problem hiding this comment.
Pull request overview
This pull request adds a new feature to capture and export raw Trivy JSON reports from scan job logs. The feature provides configurable storage options (operator filesystem via emptyDir, alternate storage via PVC, or webhook-only) and supports HTTP endpoint delivery with retry logic. TTL-based cleanup is implemented for stored reports.
Changes:
- New
pkg/trivyjsonreportpackage for handling raw Trivy JSON report capture, storage, delivery, and cleanup - Integration into scan job controller to buffer logs, extract JSON, write to storage, and deliver to endpoints
- Helm chart configuration for multiple storage and delivery options with TTL cleanup support
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/trivyjsonreport/writer.go | Implements file-based storage for Trivy JSON reports with metadata tracking |
| pkg/trivyjsonreport/json.go | JSON extraction logic to parse valid JSON from raw log output |
| pkg/trivyjsonreport/delivery.go | HTTP delivery service with retry logic and custom headers support |
| pkg/trivyjsonreport/cleanup.go | TTL-based cleanup service for old report files |
| pkg/vulnerabilityreport/controller/scanjob.go | Integration point for buffering logs and triggering report handling; refactored alternate storage logic |
| pkg/operator/operator.go | Initialization of TrivyJSON writer, delivery service, and cleanup goroutine |
| pkg/operator/etc/config.go | Configuration structure and parsing for all TrivyJSON report settings |
| deploy/helm/values.yaml | Documentation and default values for new configuration options |
| deploy/helm/templates/deployment.yaml | Volume mount configuration for operator filesystem storage |
| deploy/helm/templates/configmaps/trivy-operator-config.yaml | Environment variable mapping for configuration |
| deploy/helm/README.md | Documentation table updates for new Helm values |
| tests/itest/helper/helper.go | Security annotation for test registry password field |
| pkg/webhook/webhookreporter.go | Security annotation for webhook URL usage |
| pkg/docker/config.go | Security annotation for Docker auth password field |
| pkg/apis/aquasecurity/v1alpha1/zz_generated.deepcopy.go | Auto-generated whitespace changes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| rawLogData, err = io.ReadAll(io.LimitReader(logsStream, maxTrivyJSONLogSize)) | ||
| if err != nil { | ||
| return VulnerabilityReports{}, nil, nil, fmt.Errorf("reading logs for TrivyJSONReport: %w", err) | ||
| } |
There was a problem hiding this comment.
The log buffering uses io.LimitReader(maxTrivyJSONLogSize) but doesn’t detect truncation. If the scan output exceeds 100MiB, the read will succeed with partial data, leading to confusing downstream parse/extract errors. Consider reading max+1 bytes and returning a clear error when the limit is exceeded (or make the limit configurable).
| rawLogData, err = io.ReadAll(io.LimitReader(logsStream, maxTrivyJSONLogSize)) | |
| if err != nil { | |
| return VulnerabilityReports{}, nil, nil, fmt.Errorf("reading logs for TrivyJSONReport: %w", err) | |
| } | |
| // Read up to maxTrivyJSONLogSize+1 bytes so we can detect if the limit is exceeded. | |
| rawLogData, err = io.ReadAll(io.LimitReader(logsStream, maxTrivyJSONLogSize+1)) | |
| if err != nil { | |
| return VulnerabilityReports{}, nil, nil, fmt.Errorf("reading logs for TrivyJSONReport: %w", err) | |
| } | |
| if len(rawLogData) > maxTrivyJSONLogSize { | |
| return VulnerabilityReports{}, nil, nil, fmt.Errorf("Trivy JSON log output exceeds %d bytes limit", maxTrivyJSONLogSize) | |
| } |
| if r.ConfigData.CompressLogs() { | ||
| decompressedReader, err := utils.ReadCompressData(io.NopCloser(bytes.NewReader(rawLogData))) | ||
| if err != nil { | ||
| log.Error(err, "Failed to decompress raw log data for TrivyJSONReport") | ||
| } else { | ||
| jsonReportData, err = io.ReadAll(decompressedReader) | ||
| if err != nil { | ||
| log.Error(err, "Failed to read decompressed data for TrivyJSONReport") | ||
| jsonReportData = rawLogData | ||
| } | ||
| } |
There was a problem hiding this comment.
When logs are compressed, this path base64-decodes + bzip2-decompresses into an unbounded byte slice (io.ReadAll). Because the 100MiB cap is on the encoded log stream, the decompressed output can still expand very large and cause high memory usage. Consider applying a size limit while reading decompressedReader (and/or reusing the decompression done in Plugin.ParseReportData to avoid doing the work twice).
| for attempt := 1; attempt <= maxAttempts; attempt++ { | ||
| now := time.Now().UTC() | ||
| metadata.DeliveryAttempts = attempt | ||
| metadata.LastDeliveryAttempt = &now | ||
|
|
||
| err := d.send(rawJSON) | ||
| if err == nil { | ||
| // Success | ||
| metadata.Delivered = true | ||
| deliveredAt := time.Now().UTC() | ||
| metadata.DeliveredAt = &deliveredAt | ||
| metadata.LastDeliveryError = "" | ||
|
|
||
| log.Info("Successfully delivered TrivyJSON report", | ||
| "endpoint", d.Config.TrivyJSONReportDeliveryURL, | ||
| "attempts", attempt) | ||
|
|
||
| return nil | ||
| } | ||
|
|
||
| lastErr = err | ||
| metadata.LastDeliveryError = err.Error() | ||
|
|
||
| if attempt < maxAttempts { | ||
| log.Info("Delivery attempt failed, retrying", | ||
| "attempt", attempt, | ||
| "maxAttempts", maxAttempts, | ||
| "error", err.Error()) | ||
| // Exponential backoff: 5s, 10s, 15s, ... | ||
| time.Sleep(time.Duration(attempt) * 5 * time.Second) | ||
| } |
There was a problem hiding this comment.
This retry loop uses time.Sleep for backoff and doesn’t honor cancellation. In controller reconcile paths this can block a worker for up to the full retry duration, even if the manager is shutting down. Consider threading a context through DeliverReport/send and using a context-aware backoff (e.g., select on ctx.Done() during waits, or wait.ExponentialBackoffWithContext).
| err := filepath.Walk(c.BaseDir, func(path string, info os.FileInfo, err error) error { | ||
| if err != nil { | ||
| c.Logger.V(1).Error(err, "Failed to access path during TrivyJSON cleanup", "path", path) | ||
| return nil | ||
| } | ||
|
|
||
| if info.IsDir() { | ||
| return nil | ||
| } | ||
|
|
||
| // Only process .json files (both reports and metadata) | ||
| if filepath.Ext(path) != ".json" { | ||
| return nil | ||
| } | ||
|
|
||
| if info.ModTime().Before(cutoff) { | ||
| c.Logger.V(1).Info("Removing old TrivyJSON file", "path", path, "age", time.Since(info.ModTime())) | ||
| if err := os.Remove(path); err != nil { | ||
| c.Logger.Error(err, "Failed to remove old file", "path", path) | ||
| } else { | ||
| removedCount++ | ||
| } | ||
| } | ||
|
|
||
| return nil | ||
| }) | ||
|
|
||
| if err != nil { | ||
| c.Logger.Error(err, "Error walking directory for cleanup") | ||
| } | ||
|
|
||
| if removedCount > 0 { | ||
| c.Logger.Info("TrivyJSON cleanup completed", "removedFiles", removedCount) | ||
| } | ||
|
|
||
| // Clean up empty directories | ||
| c.cleanupEmptyDirs() |
There was a problem hiding this comment.
cleanup() walks BaseDir and deletes any file with a .json extension older than TTL, then removes any empty directories under BaseDir. If a user points TrivyJSONReportStorageDir at a directory that also contains other JSON artifacts, this can delete unrelated data. Consider scoping cleanup to only the subdirectories and filename patterns created by the Writer (e.g., BaseDir/{namespaced,cluster}/...) and only removing empty dirs beneath those roots.
| err := filepath.Walk(c.BaseDir, func(path string, info os.FileInfo, err error) error { | |
| if err != nil { | |
| c.Logger.V(1).Error(err, "Failed to access path during TrivyJSON cleanup", "path", path) | |
| return nil | |
| } | |
| if info.IsDir() { | |
| return nil | |
| } | |
| // Only process .json files (both reports and metadata) | |
| if filepath.Ext(path) != ".json" { | |
| return nil | |
| } | |
| if info.ModTime().Before(cutoff) { | |
| c.Logger.V(1).Info("Removing old TrivyJSON file", "path", path, "age", time.Since(info.ModTime())) | |
| if err := os.Remove(path); err != nil { | |
| c.Logger.Error(err, "Failed to remove old file", "path", path) | |
| } else { | |
| removedCount++ | |
| } | |
| } | |
| return nil | |
| }) | |
| if err != nil { | |
| c.Logger.Error(err, "Error walking directory for cleanup") | |
| } | |
| if removedCount > 0 { | |
| c.Logger.Info("TrivyJSON cleanup completed", "removedFiles", removedCount) | |
| } | |
| // Clean up empty directories | |
| c.cleanupEmptyDirs() | |
| // Only clean up files under known TrivyJSON report subdirectories to avoid | |
| // deleting unrelated JSON files if BaseDir is shared. | |
| reportSubdirs := []string{"namespaced", "cluster"} | |
| for _, subdir := range reportSubdirs { | |
| root := filepath.Join(c.BaseDir, subdir) | |
| info, err := os.Stat(root) | |
| if err != nil || !info.IsDir() { | |
| // If the subdirectory doesn't exist or isn't a directory, skip it. | |
| continue | |
| } | |
| err = filepath.Walk(root, func(path string, info os.FileInfo, err error) error { | |
| if err != nil { | |
| c.Logger.V(1).Error(err, "Failed to access path during TrivyJSON cleanup", "path", path) | |
| return nil | |
| } | |
| if info.IsDir() { | |
| return nil | |
| } | |
| // Only process .json files (both reports and metadata) | |
| if filepath.Ext(path) != ".json" { | |
| return nil | |
| } | |
| if info.ModTime().Before(cutoff) { | |
| c.Logger.V(1).Info("Removing old TrivyJSON file", "path", path, "age", time.Since(info.ModTime())) | |
| if err := os.Remove(path); err != nil { | |
| c.Logger.Error(err, "Failed to remove old file", "path", path) | |
| } else { | |
| removedCount++ | |
| } | |
| } | |
| return nil | |
| }) | |
| if err != nil { | |
| c.Logger.Error(err, "Error walking directory for cleanup", "root", root) | |
| } | |
| } | |
| if removedCount > 0 { | |
| c.Logger.Info("TrivyJSON cleanup completed", "removedFiles", removedCount) | |
| } | |
| // Note: we intentionally avoid removing empty directories at BaseDir level | |
| // here to prevent accidental deletion of unrelated directories if BaseDir | |
| // is shared with other components. |
| done := make(chan struct{}) | ||
| go func() { | ||
| svc.Start(ctx) | ||
| close(done) | ||
| }() | ||
| // Allow first cleanup to run | ||
| time.Sleep(200 * time.Millisecond) | ||
| cancel() | ||
| <-done |
There was a problem hiding this comment.
These tests rely on fixed time.Sleep delays to wait for cleanup to run, which can be flaky on slow/loaded CI runners. Since Start() runs cleanup immediately, consider synchronizing deterministically (e.g., poll for the expected file state with a timeout, or provide a test hook to run a single cleanup pass) instead of sleeping a fixed duration.
Description
Checklist