Files
cocos/pkg/oci/extract_test.go
T
Sammy Kerata Oina da31d76c94
CI / checkproto (push) Has been cancelled
CI / lint (push) Has been cancelled
Rust CI Pipeline / rust-check (push) Has been cancelled
CI / test (agent) (push) Has been cancelled
CI / test (cli) (push) Has been cancelled
CI / test (cmd) (push) Has been cancelled
CI / test (internal) (push) Has been cancelled
CI / test (manager, true) (push) Has been cancelled
CI / test (pkg) (push) Has been cancelled
CI / upload-coverage (push) Has been cancelled
NOISSUE - Agent Pull mode for remote resources (#575)
* feat(kbs): implement KBS client for attestation and resource retrieval

- Added KBS client implementation in pkg/kbs/client.go with methods for attestation and resource retrieval.
- Introduced necessary data structures for requests and responses.
- Implemented error handling for various scenarios.

test(kbs): add unit tests for KBS client

- Created comprehensive tests for the KBS client in pkg/kbs/client_test.go.
- Included tests for attestation success and failure cases, as well as resource retrieval.

feat(registry): introduce HTTP and S3 registry implementations

- Added HTTPRegistry for downloading resources over HTTP/HTTPS with retry logic in pkg/registry/http.go.
- Implemented S3Registry for downloading resources from AWS S3 and S3-compatible services in pkg/registry/s3.go.
- Included error handling and configuration options for both registries.

chore(registry): define registry interface and configuration

- Created registry interface and configuration struct in pkg/registry/registry.go.
- Added default configuration settings for registry clients.

docs(cvms): update README for CVMS server configuration and usage

- Enhanced documentation for CVMS server with detailed command-line flags and usage examples.
- Clarified direct upload and remote resource modes, including KBS integration.

fix(cvms): integrate KBS for remote resource handling in main.go

- Updated main.go to support remote datasets and algorithms using KBS.
- Added validation for command-line flags to ensure proper configuration.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* fix: Move ifeq conditional outside define block in attestation-service.mk

Make conditionals cannot be evaluated inside define...endef blocks
when used as recipe bodies. Restructured to define the
ATTESTATION_SERVICE_INSTALL_INIT_SYSTEMD block conditionally based
on BR2_PACKAGE_CC_ATTESTATION_AGENT configuration.

* feat: Implement remote resource downloading for algorithms and datasets using AWS S3/MinIO credentials.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: Add comprehensive documentation and agent support for testing remote resource download with KBS attestation.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: Improve agent logging for remote resource configuration and KBS status, and add a testing guide for remote resource downloads with KBS attestation.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: Add a comprehensive guide for testing remote resource download with KBS attestation and update multiple package versions to a specific commit.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: Add failure transitions for resource reception states and a comprehensive guide for testing remote resource downloads with KBS attestation.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: Implement remote resource download with KBS attestation in the agent and add a comprehensive testing guide.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* test: Add comprehensive guide for testing remote resource download with KBS attestation and include a debug log in the attestation client.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: Delegate KBS attestation and token retrieval to a new attestation-agent service and document remote resource testing.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* client fixes

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* raw evidence

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* fix: Build all Go files in cmd directories, not just main.go

This fixes the issue where fetch_raw_evidence.go wasn't being included
in the attestation-service build.

* fix: Wrap binary evidence in JSON for KBS compatibility

Fixes 'invalid character' error by wrapping raw binary evidence
in a JSON structure with base64 encoding, as expected by KBS.

* chore: Update buildroot packages to c28cefae

Includes fixes for:
1. attestation-service build (including fetch_raw_evidence.go)
2. Agent KBS evidence format (wrapping binary in JSON)

* fix: Implement KBS RCAR handshake with cookies

Fixes 'cookie not found' error (401) from KBS by:
1. Adding CookieJar support to KBS client
2. Implementing GetChallenge() to perform /auth handshake and capture session cookie
3. Updating Agent to get challenge, decode nonce, and use it for evidence generation
4. Regenerating mocks

* chore: Update buildroot packages to f6981ac5

Includes KBS RCAR handshake fix (cookie support + GetChallenge loop)

* fix: Update KBS client JSON tags to kebab-case

Fixes deserialization error (401) from KBS by:
1. Using kebab-case (e.g. extra-params) for JSON tags as per protocol.
2. Initializing ExtraParams as empty object {} instead of null/omitted.

* fix: Wrap attestation evidence in primary_evidence format

Updates Agent to construct 'tee-evidence' payload with:
- primary_evidence: containing the actual quote/data
- additional_evidence: empty JSON object

This matches the Confidential Containers KBS Attestation Protocol requirements.

* fix: Update KBS protocol version to 0.4.0

KBS rejected 0.1.0 with a version mismatch error. Bumping to 0.4.0 to match server expectation.

* fix: Generate ephemeral key for KBS RuntimeData

Updates RuntimeData to include a valid ephemeral EC P-256 public key in JWK format, as required by the KBS RCAR protocol.
Also fixes the KBS client struct to support TEEPubKey as an object.

* fix: Update sample attestation quote to valid JSON

The default attestation.bin was binary, but the KBS Sample Verifier expects a valid JSON quote containing 'svn' and 'report_data'.
Updated the embedded bin file to contain this JSON structure.

* fix: Generate dynamic JSON quote for Sample TEE in FetchRawEvidence

The KBS Sample Verifier expects a JSON object with 'svn' and 'report_data'.
Previously, we were returning raw binary data (reportData+nonce).
This commit updates FetchRawEvidence to return a marshaled JSON structure with:
- svn: "1"
- report_data: base64(req.ReportData)

* refactor: Delegate Sample Attestation to Provider

Refactored sample attestation logic:
- Moved JSON Quote generation into EmptyProvider (standalone mode).
- Updated FetchRawEvidence to call provider.TeeAttestation instead of manual generation.
This enables using the real CC Attestation Agent for UNSPECIFIED platform if configured.

* feat: Add comprehensive debug logging and enforce CC AA usage

Changes:
- Updated EmptyProvider to return error instead of generating mock data
  This forces proper use of CC Attestation Agent's sample attester
- Added detailed logging to attestation-service FetchRawEvidence:
  * Hex dump of evidence (first 200 bytes)
  * String preview of evidence
  * Total evidence length
- Added detailed logging to agent service:
  * Raw evidence hex and string previews
  * KBS evidence JSON preview (first 500 bytes)
  * Evidence lengths at each transformation step

This logging will help diagnose why KBS Sample Verifier is rejecting evidence.

* fix: Enable CC AA by default and add attestation-service log forwarding

Changes:
- Set USE_CC_ATTESTATION_AGENT=true by default in systemd service
- Added StandardOutput/StandardError to forward logs to /var/log/cocos/
- Updated HAL makefile to handle new default value
- This ensures attestation-service uses CC AA's sample attester
- Logs will now be visible in CVMS output for debugging

* feat: Add gRPC log forwarding to attestation-service

Implemented the same log forwarding mechanism used by the agent:
- Added ProtoHandler to write logs to both stdout and logQueue
- Connected to log client (/run/cocos/log.sock) for gRPC forwarding
- Added goroutine to forward logs to CVMS via log client
- Logs will now appear in CVMS output during computation runs

This enables visibility into attestation-service debug output including:
- CC AA connection status
- Evidence generation details (hex dumps, string previews)
- Any errors from providers

* fix: Parse sample evidence JSON instead of base64-encoding it

The attestation-service returns sample evidence as JSON:
{"svn":"1","report_data":"base64..."}

The agent was incorrectly base64-encoding this JSON string again.
KBS Sample Verifier expects the parsed JSON object directly.

Fixed by:
- Parsing the JSON evidence from attestation-service
- Passing the parsed object directly in primary_evidence.evidence
- This matches what KBS Sample Verifier expects

* debug: Increase KBS evidence logging preview to 1000 bytes

Show the complete JSON structure being sent to KBS to debug
the attestation failure.

* debug: Add comprehensive CC AA configuration logging

Added debug logs to show:
- Whether CC AA is enabled in config
- CC AA address being used
- Connection success/failure
- Which provider is ultimately selected
- Warning when falling back to EmptyProvider

This will help diagnose why EmptyProvider is being used
instead of CC Attestation Agent.

* debug: Add startup logging for log client connection

Added log message to show if log client connection succeeds
at attestation-service startup. This will help diagnose why
logs aren't appearing in CVMS output.

* feat: Add retry logic with exponential backoff to log client

Added simple retry mechanism to handle concurrent log requests:
- 3 retry attempts with exponential backoff (10ms, 20ms, 40ms)
- Applies to both SendLog and SendEvent methods
- Centralized in log client so all services benefit
- Should eliminate 'failed to send log' errors from concurrent requests

This fixes the issue where attestation-service logs weren't
appearing in CVMS output due to dropped messages.

* fix: Flatten sample evidence fields in primary_evidence for KBS

KBS Sample Verifier expects svn and report_data at the top level
of primary_evidence, not nested under an 'evidence' key.

Changed structure from:
{"primary_evidence": {"tee": "sample", "evidence": {"svn": "1", ...}}}

To:
{"primary_evidence": {"tee": "sample", "svn": "1", "report_data": "...", ...}}

This matches what KBS expects when deserializing the Quote structure.

* fix: Use sample quote directly as primary_evidence per KBS protocol

According to KBS attestation protocol spec, for sample TEE type,
primary_evidence should be the sample quote JSON directly:
{"svn": "1", "report_data": "..."}

Removed extra 'tee' and 'platform' fields that were causing KBS
to fail deserializing the Quote structure. The 'tee' field is
already sent in the Request payload during RCAR handshake.

Refs:
- https://github.com/confidential-containers/trustee/blob/main/kbs/docs/kbs_attestation_protocol.md
- https://github.com/confidential-containers/guest-components/blob/main/attestation-agent/attester/src/sample/mod.rs

* fix: Make CC AA required for sample attestation when configured

When USE_CC_ATTESTATION_AGENT=true, attestation-service now
requires AA to be available for NoCC/sample platform. This ensures
sample evidence always comes from AA with the correct KBS format.

Changes:
- Error out if AA connection fails for NoCC platform when AA is configured
- Only use EmptyProvider if AA is explicitly NOT configured
- Prevents incorrect sample evidence format from EmptyProvider

This ensures attestation-service delegates to AA for sample evidence
generation instead of creating it itself.

* fix: Implement proper RCAR protocol with tee-pubkey and runtime-data hash

Fixed KBS attestation error 'REPORT_DATA is different from that in Sample Quote'

Changes:
1. Generate ephemeral EC key pair BEFORE getting evidence from AA
2. Create runtime-data with nonce + tee-pubkey (JWK format)
3. Hash runtime-data (SHA-256) and use as report_data for AA
4. This binds the tee-pubkey to the TEE evidence per RCAR protocol

The report_data in the evidence now matches what KBS expects:
hash(runtime-data) instead of computation ID.

This completes the full RCAR protocol implementation:
- Request → Challenge → Attestation (with bound tee-pubkey) → Response

* fix(agent): use simple nonce for Sample attestation report_data

For Sample/NoCC attestation, use the raw nonce bytes directly as
report_data instead of hashing runtime-data. This avoids JSON
serialization mismatches with the KBS Sample verifier.

Real TEEs (TDX/SNP) still use runtime-data hash binding to
cryptographically bind the ephemeral tee-pubkey to the evidence.

* fix(agent): use RFC 8785 canonical JSON for runtime-data hashing

The KBS Sample attestation verifier (and likely others) expects the
report_data to be the SHA-256 hash of the *canonical* JSON serialization
(RFC 8785) of the runtime-data. Standard Go JSON marshaling does not
guarantee key ordering, leading to hash mismatches.

This change uses github.com/gowebpki/jcs to canonicalize the runtime-data
before hashing, ensuring compatibility with the KBS RCAR implementation.
Also reverted the temporary 'simple nonce' workaround.

* feat(hal): add CoCo Keyprovider and Skopeo packages

- Add coco-keyprovider buildroot package with systemd service
- Add skopeo buildroot package for OCI image handling
- Add ocicrypt_keyprovider.conf for encrypted image decryption
- Update Config.in to include new packages

This enables standard CoCo ecosystem integration for encrypted
OCI images instead of custom S3/HTTP registry clients.

* feat(oci): add OCI image handling package with Skopeo integration

- Add pkg/oci/types.go with ResourceSource and ImageManifest types
- Add pkg/oci/skopeo.go with Skopeo wrapper for pull/decrypt
- Add pkg/oci/extract.go for extracting algorithms and datasets from layers

This package provides OCI image handling using Skopeo and CoCo
Keyprovider for encrypted image decryption, replacing custom
S3/HTTP registry clients.

* chore: regenerate protobuf files for updated cvms.proto

* refactor(agent): replace S3/HTTP/KBS with OCI package

- Remove pkg/kbs and pkg/registry imports
- Add pkg/oci import for OCI image handling
- Replace downloadAndDecryptResource with OCI-based implementation
- Use Skopeo + CoCo Keyprovider for automatic decryption
- Reduce code from ~240 lines to ~70 lines

This eliminates custom KBS RCAR handshake, S3/HTTP registry clients,
and manual decryption logic. CoCo Keyprovider handles all decryption
automatically via ocicrypt protocol.

* chore: remove obsolete pkg/kbs and pkg/registry packages

- Delete pkg/kbs/ (custom KBS client, ~300 lines)
- Delete pkg/registry/ (S3/HTTP registry clients, ~400 lines)
- Remove unused imports from agent/service.go
- Run go mod tidy to clean up dependencies

These packages have been replaced by pkg/oci with Skopeo and
CoCo Keyprovider for standard CoCo ecosystem integration.

* fix(agent): update ResourceSource struct to include type and encryption fields

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* fix(hal): update CoCo Keyprovider to v0.16.0 and fix build path

- Update version from v0.11.0 to v0.16.0 (matches attestation agent)
- Fix install path: target is at repo root, not in coco_keyprovider subdir
- This fixes the build error where coco_keyprovider binary wasn't found

The cargo workspace in guest-components builds to a shared target/
directory at the repository root, not within each crate's subdirectory.

* feat: Update remote resources testing guide to use kbs-client and coco-keyprovider for key management and encryption, enable insecure TLS for Skopeo, and enhance CVMS with

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: Update component versions, revise image encryption documentation, and sanitize OCI image paths for Skopeo compatibility.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: Add `decompress` option to Dataset and `algo_type`/`algo_args` to Algorithm protobuf messages, updating client, test, and build configurations.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* Update multiple package versions and enhance OCI image extraction error reporting for missing algorithm files.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* chore: Bump package versions, improve OCI image extraction debugging by returning seen files, and remove unused dataset type parsing from test code.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* refactor: Migrate OCI extraction to use structured logging with `slog` and `context`, and update package versions.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: Bump multiple component versions, add encrypted status for computation inputs and algorithms, and refine OCI layer extraction warnings.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* logging

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: Add `Encrypted` field to algorithm and dataset resource sources and update all component versions.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: update component versions, integrate coco-keyprovider service, and configure ocicrypt key provider.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: add support for KBS parameters and dataset/algorithm hash calculations in CVMS

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: update resource download and extraction logic to support requirements.txt and improve hash verification

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* chore: Update dependencies, improve code style, and add GetRawEvidence to attestation client mocks.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* Refactor code structure for improved readability and maintainability

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* fix: update golangci configuration to include errcheck for build path and remove unnecessary exclusions

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* fix: streamline kernel command line handling in QEMU args construction

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* feat: add attestation binary and update checksum tests and policy structure

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* Add unit tests for attestation agent, attestation, log, crypto, OCI, and Skopeo clients

- Implement tests for the attestation agent client including Unix socket and TCP address handling, token retrieval, and error scenarios.
- Enhance attestation client tests to cover fetching raw evidence for various platforms (SNP, TDX, VTPM, SNPvTPM) and validate error handling.
- Introduce log client tests to verify retry behavior for sending logs and events.
- Create comprehensive tests for crypto package focusing on AES-GCM decryption, encrypted resource parsing, and key unwrapping.
- Add tests for OCI package to validate algorithm and dataset extraction, including JSON serialization of OCILayout.
- Implement Skopeo client tests to ensure proper functionality for image pulling, inspecting, and resource source handling.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* fix: handle JSON marshal errors in test cases for decrypt and extract functions

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* test: add comprehensive tests for algorithm and dataset extraction with various scenarios

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* refactor: replace hardcoded Python script content with constant variable

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* fix: remove redundant mock expectation for SendAgentConfig in TestCreateVMWithAaKbsParams

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* test: add tests for event sending failure, dataset extraction with path traversal, and Skopeo client behavior

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* test: add tests for download and decryption of resources with various URL formats

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* refactor: Introduce OCIClient interface for agent service to improve testability of OCI image operations and enhance related tests.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>

* refactor: Change `get_uint64_from_tcb` to accept `TcbVersion` by value and use `u64::from` for type conversions.

---------

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
2026-03-16 14:48:55 +01:00

921 lines
26 KiB
Go

// Copyright (c) Ultraviolet
// SPDX-License-Identifier: Apache-2.0
package oci
import (
"archive/tar"
"bytes"
"compress/gzip"
"context"
"encoding/json"
"log/slog"
"os"
"path/filepath"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
const testPythonScript = "print('hello')"
func TestIsAlgorithmFile(t *testing.T) {
tests := []struct {
name string
filename string
want bool
}{
{"Python file", "algorithm.py", true},
{"WASM file", "module.wasm", true},
{"WAT file", "module.wat", true},
{"JavaScript file", "script.js", true},
{"Shell script", "run.sh", true},
{"Main python file", "main.py", true},
{"Execute file", "execute.py", true},
{"Algorithm name in path", "src/algorithm_v2.py", true},
{"Random python file", "helper.py", true},
{"CSV data file", "data.csv", false},
{"JSON config file", "config.json", false},
{"Text file", "readme.txt", false},
{"Binary file", "data.bin", false},
{"Uppercase extension", "MAIN.PY", true},
{"Mixed case", "Algorithm.Py", true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := isAlgorithmFile(tt.filename)
assert.Equal(t, tt.want, got)
})
}
}
func TestIsDataFile(t *testing.T) {
tests := []struct {
name string
filename string
want bool
}{
{"CSV file", "data.csv", true},
{"JSON file", "config.json", true},
{"Text file", "readme.txt", true},
{"Parquet file", "data.parquet", true},
{"Arrow file", "data.arrow", true},
{"DAT file", "data.dat", true},
{"Python file", "script.py", false},
{"WASM file", "module.wasm", false},
{"Binary file", "data.bin", false},
{"Uppercase CSV", "DATA.CSV", true},
{"Nested path", "data/input/dataset.csv", true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := isDataFile(tt.filename)
assert.Equal(t, tt.want, got)
})
}
}
func TestExtractAlgorithm(t *testing.T) {
logger := slog.Default()
t.Run("missing index.json", func(t *testing.T) {
tempDir := t.TempDir()
_, err := ExtractAlgorithm(context.Background(), logger, tempDir, t.TempDir())
assert.Error(t, err)
assert.Contains(t, err.Error(), "failed to read index.json")
})
t.Run("invalid index.json", func(t *testing.T) {
tempDir := t.TempDir()
err := os.WriteFile(filepath.Join(tempDir, "index.json"), []byte("not json"), 0o644)
require.NoError(t, err)
_, err = ExtractAlgorithm(context.Background(), logger, tempDir, t.TempDir())
assert.Error(t, err)
assert.Contains(t, err.Error(), "failed to parse index.json")
})
t.Run("empty manifests", func(t *testing.T) {
tempDir := t.TempDir()
index := OCIIndex{SchemaVersion: 2}
data, _ := json.Marshal(index)
err := os.WriteFile(filepath.Join(tempDir, "index.json"), data, 0o644)
require.NoError(t, err)
_, err = ExtractAlgorithm(context.Background(), logger, tempDir, t.TempDir())
assert.Error(t, err)
assert.Contains(t, err.Error(), "no manifests found")
})
t.Run("successful extraction", func(t *testing.T) {
ociDir, destDir := setupTestOCIImage(t, "algorithm.py", testPythonScript)
algoPath, err := ExtractAlgorithm(context.Background(), logger, ociDir, destDir)
require.NoError(t, err)
assert.NotEmpty(t, algoPath)
assert.Contains(t, algoPath, "algorithm.py")
})
}
func TestExtractDataset(t *testing.T) {
t.Run("missing index.json", func(t *testing.T) {
tempDir := t.TempDir()
_, err := ExtractDataset(tempDir, t.TempDir())
assert.Error(t, err)
assert.Contains(t, err.Error(), "failed to read index.json")
})
t.Run("successful extraction", func(t *testing.T) {
ociDir, destDir := setupTestOCIImage(t, "data.csv", "col1,col2\n1,2")
files, err := ExtractDataset(ociDir, destDir)
require.NoError(t, err)
assert.NotEmpty(t, files)
})
}
func TestExtractDatasetWithPathTraversal(t *testing.T) {
t.Run("path traversal skipped, valid file extracted", func(t *testing.T) {
ociDir := t.TempDir()
destDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
layerPath := filepath.Join(blobsDir, "layer123")
layerFile, err := os.Create(layerPath)
require.NoError(t, err)
gw := gzip.NewWriter(layerFile)
tw := tar.NewWriter(gw)
// Path traversal entry (should be skipped)
maliciousHdr := &tar.Header{
Name: "../../../tmp/evil.csv",
Mode: 0o644,
Size: int64(len("evil")),
}
require.NoError(t, tw.WriteHeader(maliciousHdr))
_, err = tw.Write([]byte("evil"))
require.NoError(t, err)
// Valid CSV file
csvContent := "col1,col2\n1,2"
csvHdr := &tar.Header{
Name: "data.csv",
Mode: 0o644,
Size: int64(len(csvContent)),
}
require.NoError(t, tw.WriteHeader(csvHdr))
_, err = tw.Write([]byte(csvContent))
require.NoError(t, err)
require.NoError(t, tw.Close())
require.NoError(t, gw.Close())
require.NoError(t, layerFile.Close())
manifest := struct {
Layers []struct {
Digest string `json:"digest"`
} `json:"layers"`
}{
Layers: []struct {
Digest string `json:"digest"`
}{{Digest: "sha256:layer123"}},
}
manifestData, _ := json.Marshal(manifest)
require.NoError(t, os.WriteFile(filepath.Join(blobsDir, "manifest123"), manifestData, 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:manifest123", Size: len(manifestData)}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
files, err := ExtractDataset(ociDir, destDir)
require.NoError(t, err)
assert.Len(t, files, 1)
assert.Contains(t, files[0], "data.csv")
// Verify malicious file was NOT created outside destDir
_, err = os.Stat("/tmp/evil.csv")
assert.True(t, os.IsNotExist(err))
})
}
func TestExtractDatasetInvalidManifest(t *testing.T) {
t.Run("invalid manifest JSON", func(t *testing.T) {
ociDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
require.NoError(t, os.WriteFile(filepath.Join(blobsDir, "manifest123"), []byte("not json"), 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:manifest123", Size: 8}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
_, err := ExtractDataset(ociDir, t.TempDir())
assert.Error(t, err)
assert.Contains(t, err.Error(), "failed to parse manifest")
})
}
func TestExtractDatasetWithDirectory(t *testing.T) {
t.Run("layer with directory entries for dataset", func(t *testing.T) {
ociDir := t.TempDir()
destDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
layerPath := filepath.Join(blobsDir, "layer123")
layerFile, err := os.Create(layerPath)
require.NoError(t, err)
gw := gzip.NewWriter(layerFile)
tw := tar.NewWriter(gw)
// Directory entry
dirHdr := &tar.Header{
Name: "data/",
Mode: 0o755,
Typeflag: tar.TypeDir,
}
require.NoError(t, tw.WriteHeader(dirHdr))
// CSV inside directory
csvContent := "a,b\n1,2"
csvHdr := &tar.Header{
Name: "data/dataset.csv",
Mode: 0o644,
Size: int64(len(csvContent)),
}
require.NoError(t, tw.WriteHeader(csvHdr))
_, err = tw.Write([]byte(csvContent))
require.NoError(t, err)
require.NoError(t, tw.Close())
require.NoError(t, gw.Close())
require.NoError(t, layerFile.Close())
manifest := struct {
Layers []struct {
Digest string `json:"digest"`
} `json:"layers"`
}{
Layers: []struct {
Digest string `json:"digest"`
}{{Digest: "sha256:layer123"}},
}
manifestData, _ := json.Marshal(manifest)
require.NoError(t, os.WriteFile(filepath.Join(blobsDir, "manifest123"), manifestData, 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:manifest123", Size: len(manifestData)}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
files, err := ExtractDataset(ociDir, destDir)
require.NoError(t, err)
require.Len(t, files, 1)
assert.Contains(t, files[0], "dataset.csv")
})
}
func TestExtractDatasetMissingManifest(t *testing.T) {
t.Run("manifest file not found", func(t *testing.T) {
ociDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:nonexistent", Size: 0}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
_, err := ExtractDataset(ociDir, t.TempDir())
assert.Error(t, err)
assert.Contains(t, err.Error(), "failed to read manifest")
})
}
func TestOCILayoutStructure(t *testing.T) {
t.Run("OCILayout JSON serialization", func(t *testing.T) {
layout := OCILayout{ImageLayoutVersion: "1.0.0"}
data, err := json.Marshal(layout)
require.NoError(t, err)
var decoded OCILayout
err = json.Unmarshal(data, &decoded)
require.NoError(t, err)
assert.Equal(t, layout.ImageLayoutVersion, decoded.ImageLayoutVersion)
})
}
func setupTestOCIImage(t *testing.T, filename, content string) (ociDir, destDir string) {
t.Helper()
ociDir = t.TempDir()
destDir = t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
layerPath := filepath.Join(blobsDir, "layer123")
layerFile, err := os.Create(layerPath)
require.NoError(t, err)
gw := gzip.NewWriter(layerFile)
tw := tar.NewWriter(gw)
hdr := &tar.Header{
Name: filename,
Mode: 0o644,
Size: int64(len(content)),
}
require.NoError(t, tw.WriteHeader(hdr))
_, err = tw.Write([]byte(content))
require.NoError(t, err)
require.NoError(t, tw.Close())
require.NoError(t, gw.Close())
require.NoError(t, layerFile.Close())
manifest := struct {
Layers []struct {
Digest string `json:"digest"`
} `json:"layers"`
}{
Layers: []struct {
Digest string `json:"digest"`
}{{Digest: "sha256:layer123"}},
}
manifestData, err := json.Marshal(manifest)
require.NoError(t, err)
manifestPath := filepath.Join(blobsDir, "manifest123")
require.NoError(t, os.WriteFile(manifestPath, manifestData, 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{
MediaType: "application/vnd.oci.image.manifest.v1+json",
Digest: "sha256:manifest123",
Size: len(manifestData),
}},
}
indexData, err := json.Marshal(index)
require.NoError(t, err)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
return ociDir, destDir
}
func TestExtractAlgorithmWithRequirements(t *testing.T) {
logger := slog.Default()
t.Run("extract algorithm with requirements.txt", func(t *testing.T) {
ociDir := t.TempDir()
destDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
layerPath := filepath.Join(blobsDir, "layer123")
layerFile, err := os.Create(layerPath)
require.NoError(t, err)
gw := gzip.NewWriter(layerFile)
tw := tar.NewWriter(gw)
// Add algorithm file
algoContent := testPythonScript
algoHdr := &tar.Header{
Name: "main.py",
Mode: 0o644,
Size: int64(len(algoContent)),
}
require.NoError(t, tw.WriteHeader(algoHdr))
_, err = tw.Write([]byte(algoContent))
require.NoError(t, err)
// Add requirements.txt
reqContent := "numpy==1.21.0\npandas==1.3.0"
reqHdr := &tar.Header{
Name: "requirements.txt",
Mode: 0o644,
Size: int64(len(reqContent)),
}
require.NoError(t, tw.WriteHeader(reqHdr))
_, err = tw.Write([]byte(reqContent))
require.NoError(t, err)
require.NoError(t, tw.Close())
require.NoError(t, gw.Close())
require.NoError(t, layerFile.Close())
// Create manifest and index
manifest := struct {
Layers []struct {
Digest string `json:"digest"`
} `json:"layers"`
}{
Layers: []struct {
Digest string `json:"digest"`
}{{Digest: "sha256:layer123"}},
}
manifestData, err := json.Marshal(manifest)
require.NoError(t, err)
require.NoError(t, os.WriteFile(filepath.Join(blobsDir, "manifest123"), manifestData, 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:manifest123", Size: len(manifestData)}},
}
indexData, err := json.Marshal(index)
require.NoError(t, err)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
algoPath, err := ExtractAlgorithm(context.Background(), logger, ociDir, destDir)
require.NoError(t, err)
assert.Contains(t, algoPath, "main.py")
// Verify requirements.txt was also extracted
reqPath := filepath.Join(destDir, "requirements.txt")
_, err = os.Stat(reqPath)
assert.NoError(t, err)
})
}
func TestExtractAlgorithmNoAlgoFile(t *testing.T) {
logger := slog.Default()
t.Run("no algorithm file in layers", func(t *testing.T) {
ociDir := t.TempDir()
destDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
layerPath := filepath.Join(blobsDir, "layer123")
layerFile, err := os.Create(layerPath)
require.NoError(t, err)
gw := gzip.NewWriter(layerFile)
tw := tar.NewWriter(gw)
// Add a non-algorithm file (e.g., just a readme)
readmeContent := "This is a readme"
readmeHdr := &tar.Header{
Name: "README.md",
Mode: 0o644,
Size: int64(len(readmeContent)),
}
require.NoError(t, tw.WriteHeader(readmeHdr))
_, err = tw.Write([]byte(readmeContent))
require.NoError(t, err)
require.NoError(t, tw.Close())
require.NoError(t, gw.Close())
require.NoError(t, layerFile.Close())
manifest := struct {
Layers []struct {
Digest string `json:"digest"`
} `json:"layers"`
}{
Layers: []struct {
Digest string `json:"digest"`
}{{Digest: "sha256:layer123"}},
}
manifestData, _ := json.Marshal(manifest)
require.NoError(t, os.WriteFile(filepath.Join(blobsDir, "manifest123"), manifestData, 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:manifest123", Size: len(manifestData)}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
_, err = ExtractAlgorithm(context.Background(), logger, ociDir, destDir)
assert.Error(t, err)
assert.Contains(t, err.Error(), "no algorithm file found")
})
}
func TestExtractDatasetNoDataFiles(t *testing.T) {
t.Run("no data files in layers", func(t *testing.T) {
ociDir := t.TempDir()
destDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
layerPath := filepath.Join(blobsDir, "layer123")
layerFile, err := os.Create(layerPath)
require.NoError(t, err)
gw := gzip.NewWriter(layerFile)
tw := tar.NewWriter(gw)
// Add a python file (not a data file)
pyContent := testPythonScript
pyHdr := &tar.Header{
Name: "script.py",
Mode: 0o644,
Size: int64(len(pyContent)),
}
require.NoError(t, tw.WriteHeader(pyHdr))
_, err = tw.Write([]byte(pyContent))
require.NoError(t, err)
require.NoError(t, tw.Close())
require.NoError(t, gw.Close())
require.NoError(t, layerFile.Close())
manifest := struct {
Layers []struct {
Digest string `json:"digest"`
} `json:"layers"`
}{
Layers: []struct {
Digest string `json:"digest"`
}{{Digest: "sha256:layer123"}},
}
manifestData, _ := json.Marshal(manifest)
require.NoError(t, os.WriteFile(filepath.Join(blobsDir, "manifest123"), manifestData, 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:manifest123", Size: len(manifestData)}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
_, err = ExtractDataset(ociDir, destDir)
assert.Error(t, err)
assert.Contains(t, err.Error(), "no dataset files found")
})
}
func TestExtractAlgorithmInvalidManifest(t *testing.T) {
logger := slog.Default()
t.Run("invalid manifest JSON", func(t *testing.T) {
ociDir := t.TempDir()
destDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
// Write invalid manifest
require.NoError(t, os.WriteFile(filepath.Join(blobsDir, "manifest123"), []byte("not json"), 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:manifest123", Size: 8}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
_, err := ExtractAlgorithm(context.Background(), logger, ociDir, destDir)
assert.Error(t, err)
assert.Contains(t, err.Error(), "failed to parse manifest")
})
}
func TestExtractAlgorithmMissingManifest(t *testing.T) {
logger := slog.Default()
t.Run("manifest file not found", func(t *testing.T) {
ociDir := t.TempDir()
destDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
// Don't create manifest file
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:missing123", Size: 8}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
_, err := ExtractAlgorithm(context.Background(), logger, ociDir, destDir)
assert.Error(t, err)
assert.Contains(t, err.Error(), "failed to read manifest")
})
}
func TestExtractAlgorithmWithDirectory(t *testing.T) {
logger := slog.Default()
t.Run("layer with directory entries", func(t *testing.T) {
ociDir := t.TempDir()
destDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
layerPath := filepath.Join(blobsDir, "layer123")
layerFile, err := os.Create(layerPath)
require.NoError(t, err)
gw := gzip.NewWriter(layerFile)
tw := tar.NewWriter(gw)
// Add a directory entry
dirHdr := &tar.Header{
Name: "src/",
Mode: 0o755,
Typeflag: tar.TypeDir,
}
require.NoError(t, tw.WriteHeader(dirHdr))
// Add algorithm file in subdirectory
algoContent := testPythonScript
algoHdr := &tar.Header{
Name: "src/main.py",
Mode: 0o644,
Size: int64(len(algoContent)),
}
require.NoError(t, tw.WriteHeader(algoHdr))
_, err = tw.Write([]byte(algoContent))
require.NoError(t, err)
require.NoError(t, tw.Close())
require.NoError(t, gw.Close())
require.NoError(t, layerFile.Close())
manifest := struct {
Layers []struct {
Digest string `json:"digest"`
} `json:"layers"`
}{
Layers: []struct {
Digest string `json:"digest"`
}{{Digest: "sha256:layer123"}},
}
manifestData, _ := json.Marshal(manifest)
require.NoError(t, os.WriteFile(filepath.Join(blobsDir, "manifest123"), manifestData, 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:manifest123", Size: len(manifestData)}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
algoPath, err := ExtractAlgorithm(context.Background(), logger, ociDir, destDir)
require.NoError(t, err)
assert.Contains(t, algoPath, "main.py")
})
}
func TestExtractAlgorithmPathTraversal(t *testing.T) {
logger := slog.Default()
t.Run("path traversal attempt", func(t *testing.T) {
ociDir := t.TempDir()
destDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
layerPath := filepath.Join(blobsDir, "layer123")
layerFile, err := os.Create(layerPath)
require.NoError(t, err)
gw := gzip.NewWriter(layerFile)
tw := tar.NewWriter(gw)
// Add a file with path traversal attempt
maliciousContent := "malicious"
maliciousHdr := &tar.Header{
Name: "../../../etc/malicious.py",
Mode: 0o644,
Size: int64(len(maliciousContent)),
}
require.NoError(t, tw.WriteHeader(maliciousHdr))
_, err = tw.Write([]byte(maliciousContent))
require.NoError(t, err)
// Add a legit file
algoContent := testPythonScript
algoHdr := &tar.Header{
Name: "algorithm.py",
Mode: 0o644,
Size: int64(len(algoContent)),
}
require.NoError(t, tw.WriteHeader(algoHdr))
_, err = tw.Write([]byte(algoContent))
require.NoError(t, err)
require.NoError(t, tw.Close())
require.NoError(t, gw.Close())
require.NoError(t, layerFile.Close())
manifest := struct {
Layers []struct {
Digest string `json:"digest"`
} `json:"layers"`
}{
Layers: []struct {
Digest string `json:"digest"`
}{{Digest: "sha256:layer123"}},
}
manifestData, _ := json.Marshal(manifest)
require.NoError(t, os.WriteFile(filepath.Join(blobsDir, "manifest123"), manifestData, 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:manifest123", Size: len(manifestData)}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
algoPath, err := ExtractAlgorithm(context.Background(), logger, ociDir, destDir)
require.NoError(t, err)
assert.Contains(t, algoPath, "algorithm.py")
// Verify malicious file was NOT extracted outside destDir
_, err = os.Stat("/etc/malicious.py")
assert.True(t, os.IsNotExist(err))
})
}
func TestExtractAlgorithmErrorPathsAdditional(t *testing.T) {
logger := slog.Default()
t.Run("invalid layer gzip", func(t *testing.T) {
ociDir, destDir := setupTestOCIImage(t, "main.py", "print('hello')")
// Corrupt the layer file
layerPath := filepath.Join(ociDir, "blobs", "sha256", "layer123")
err := os.WriteFile(layerPath, []byte("not gzip"), 0o644)
require.NoError(t, err)
_, err = ExtractAlgorithm(context.Background(), logger, ociDir, destDir)
assert.Error(t, err)
assert.Contains(t, err.Error(), "no algorithm file found")
})
t.Run("invalid tar formatting", func(t *testing.T) {
ociDir, destDir := setupTestOCIImage(t, "main.py", "print('hello')")
layerPath := filepath.Join(ociDir, "blobs", "sha256", "layer123")
// Create a valid gzip but invalid tar
var buf bytes.Buffer
gw := gzip.NewWriter(&buf)
_, err := gw.Write([]byte("not a tar archive but it is gzipped"))
require.NoError(t, err)
gw.Close()
err = os.WriteFile(layerPath, buf.Bytes(), 0o644)
require.NoError(t, err)
_, err = ExtractAlgorithm(context.Background(), logger, ociDir, destDir)
assert.Error(t, err)
assert.Contains(t, err.Error(), "no algorithm file found")
})
t.Run("non-existent layer file", func(t *testing.T) {
ociDir := t.TempDir()
destDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
manifest := struct {
Layers []struct {
Digest string `json:"digest"`
} `json:"layers"`
}{
Layers: []struct {
Digest string `json:"digest"`
}{{Digest: "sha256:nonexistent"}},
}
manifestData, _ := json.Marshal(manifest)
require.NoError(t, os.WriteFile(filepath.Join(blobsDir, "manifest123"), manifestData, 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:manifest123", Size: len(manifestData)}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
_, err := ExtractAlgorithm(context.Background(), logger, ociDir, destDir)
assert.Error(t, err)
assert.Contains(t, err.Error(), "no algorithm file found")
})
}
func TestExtractDatasetErrorPathsAdditional(t *testing.T) {
t.Run("invalid layer gzip", func(t *testing.T) {
ociDir, destDir := setupTestOCIImage(t, "data.csv", "a,b,c")
layerPath := filepath.Join(ociDir, "blobs", "sha256", "layer123")
err := os.WriteFile(layerPath, []byte("not gzip"), 0o644)
require.NoError(t, err)
_, err = ExtractDataset(ociDir, destDir)
assert.Error(t, err)
})
t.Run("non-existent layer file", func(t *testing.T) {
ociDir := t.TempDir()
destDir := t.TempDir()
blobsDir := filepath.Join(ociDir, "blobs", "sha256")
require.NoError(t, os.MkdirAll(blobsDir, 0o755))
manifest := struct {
Layers []struct {
Digest string `json:"digest"`
} `json:"layers"`
}{
Layers: []struct {
Digest string `json:"digest"`
}{{Digest: "sha256:nonexistent"}},
}
manifestData, _ := json.Marshal(manifest)
require.NoError(t, os.WriteFile(filepath.Join(blobsDir, "manifest123"), manifestData, 0o644))
index := OCIIndex{
SchemaVersion: 2,
Manifests: []struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int `json:"size"`
}{{Digest: "sha256:manifest123", Size: len(manifestData)}},
}
indexData, _ := json.Marshal(index)
require.NoError(t, os.WriteFile(filepath.Join(ociDir, "index.json"), indexData, 0o644))
_, err := ExtractDataset(ociDir, destDir)
assert.Error(t, err)
assert.Contains(t, err.Error(), "no dataset files found")
})
}