* feat(kbs): implement KBS client for attestation and resource retrieval - Added KBS client implementation in pkg/kbs/client.go with methods for attestation and resource retrieval. - Introduced necessary data structures for requests and responses. - Implemented error handling for various scenarios. test(kbs): add unit tests for KBS client - Created comprehensive tests for the KBS client in pkg/kbs/client_test.go. - Included tests for attestation success and failure cases, as well as resource retrieval. feat(registry): introduce HTTP and S3 registry implementations - Added HTTPRegistry for downloading resources over HTTP/HTTPS with retry logic in pkg/registry/http.go. - Implemented S3Registry for downloading resources from AWS S3 and S3-compatible services in pkg/registry/s3.go. - Included error handling and configuration options for both registries. chore(registry): define registry interface and configuration - Created registry interface and configuration struct in pkg/registry/registry.go. - Added default configuration settings for registry clients. docs(cvms): update README for CVMS server configuration and usage - Enhanced documentation for CVMS server with detailed command-line flags and usage examples. - Clarified direct upload and remote resource modes, including KBS integration. fix(cvms): integrate KBS for remote resource handling in main.go - Updated main.go to support remote datasets and algorithms using KBS. - Added validation for command-line flags to ensure proper configuration. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: Move ifeq conditional outside define block in attestation-service.mk Make conditionals cannot be evaluated inside define...endef blocks when used as recipe bodies. Restructured to define the ATTESTATION_SERVICE_INSTALL_INIT_SYSTEMD block conditionally based on BR2_PACKAGE_CC_ATTESTATION_AGENT configuration. * feat: Implement remote resource downloading for algorithms and datasets using AWS S3/MinIO credentials. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Add comprehensive documentation and agent support for testing remote resource download with KBS attestation. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Improve agent logging for remote resource configuration and KBS status, and add a testing guide for remote resource downloads with KBS attestation. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Add a comprehensive guide for testing remote resource download with KBS attestation and update multiple package versions to a specific commit. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Add failure transitions for resource reception states and a comprehensive guide for testing remote resource downloads with KBS attestation. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Implement remote resource download with KBS attestation in the agent and add a comprehensive testing guide. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * test: Add comprehensive guide for testing remote resource download with KBS attestation and include a debug log in the attestation client. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Delegate KBS attestation and token retrieval to a new attestation-agent service and document remote resource testing. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * client fixes Signed-off-by: Sammy Oina <sammyoina@gmail.com> * raw evidence Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: Build all Go files in cmd directories, not just main.go This fixes the issue where fetch_raw_evidence.go wasn't being included in the attestation-service build. * fix: Wrap binary evidence in JSON for KBS compatibility Fixes 'invalid character' error by wrapping raw binary evidence in a JSON structure with base64 encoding, as expected by KBS. * chore: Update buildroot packages toc28cefaeIncludes fixes for: 1. attestation-service build (including fetch_raw_evidence.go) 2. Agent KBS evidence format (wrapping binary in JSON) * fix: Implement KBS RCAR handshake with cookies Fixes 'cookie not found' error (401) from KBS by: 1. Adding CookieJar support to KBS client 2. Implementing GetChallenge() to perform /auth handshake and capture session cookie 3. Updating Agent to get challenge, decode nonce, and use it for evidence generation 4. Regenerating mocks * chore: Update buildroot packages tof6981ac5Includes KBS RCAR handshake fix (cookie support + GetChallenge loop) * fix: Update KBS client JSON tags to kebab-case Fixes deserialization error (401) from KBS by: 1. Using kebab-case (e.g. extra-params) for JSON tags as per protocol. 2. Initializing ExtraParams as empty object {} instead of null/omitted. * fix: Wrap attestation evidence in primary_evidence format Updates Agent to construct 'tee-evidence' payload with: - primary_evidence: containing the actual quote/data - additional_evidence: empty JSON object This matches the Confidential Containers KBS Attestation Protocol requirements. * fix: Update KBS protocol version to 0.4.0 KBS rejected 0.1.0 with a version mismatch error. Bumping to 0.4.0 to match server expectation. * fix: Generate ephemeral key for KBS RuntimeData Updates RuntimeData to include a valid ephemeral EC P-256 public key in JWK format, as required by the KBS RCAR protocol. Also fixes the KBS client struct to support TEEPubKey as an object. * fix: Update sample attestation quote to valid JSON The default attestation.bin was binary, but the KBS Sample Verifier expects a valid JSON quote containing 'svn' and 'report_data'. Updated the embedded bin file to contain this JSON structure. * fix: Generate dynamic JSON quote for Sample TEE in FetchRawEvidence The KBS Sample Verifier expects a JSON object with 'svn' and 'report_data'. Previously, we were returning raw binary data (reportData+nonce). This commit updates FetchRawEvidence to return a marshaled JSON structure with: - svn: "1" - report_data: base64(req.ReportData) * refactor: Delegate Sample Attestation to Provider Refactored sample attestation logic: - Moved JSON Quote generation into EmptyProvider (standalone mode). - Updated FetchRawEvidence to call provider.TeeAttestation instead of manual generation. This enables using the real CC Attestation Agent for UNSPECIFIED platform if configured. * feat: Add comprehensive debug logging and enforce CC AA usage Changes: - Updated EmptyProvider to return error instead of generating mock data This forces proper use of CC Attestation Agent's sample attester - Added detailed logging to attestation-service FetchRawEvidence: * Hex dump of evidence (first 200 bytes) * String preview of evidence * Total evidence length - Added detailed logging to agent service: * Raw evidence hex and string previews * KBS evidence JSON preview (first 500 bytes) * Evidence lengths at each transformation step This logging will help diagnose why KBS Sample Verifier is rejecting evidence. * fix: Enable CC AA by default and add attestation-service log forwarding Changes: - Set USE_CC_ATTESTATION_AGENT=true by default in systemd service - Added StandardOutput/StandardError to forward logs to /var/log/cocos/ - Updated HAL makefile to handle new default value - This ensures attestation-service uses CC AA's sample attester - Logs will now be visible in CVMS output for debugging * feat: Add gRPC log forwarding to attestation-service Implemented the same log forwarding mechanism used by the agent: - Added ProtoHandler to write logs to both stdout and logQueue - Connected to log client (/run/cocos/log.sock) for gRPC forwarding - Added goroutine to forward logs to CVMS via log client - Logs will now appear in CVMS output during computation runs This enables visibility into attestation-service debug output including: - CC AA connection status - Evidence generation details (hex dumps, string previews) - Any errors from providers * fix: Parse sample evidence JSON instead of base64-encoding it The attestation-service returns sample evidence as JSON: {"svn":"1","report_data":"base64..."} The agent was incorrectly base64-encoding this JSON string again. KBS Sample Verifier expects the parsed JSON object directly. Fixed by: - Parsing the JSON evidence from attestation-service - Passing the parsed object directly in primary_evidence.evidence - This matches what KBS Sample Verifier expects * debug: Increase KBS evidence logging preview to 1000 bytes Show the complete JSON structure being sent to KBS to debug the attestation failure. * debug: Add comprehensive CC AA configuration logging Added debug logs to show: - Whether CC AA is enabled in config - CC AA address being used - Connection success/failure - Which provider is ultimately selected - Warning when falling back to EmptyProvider This will help diagnose why EmptyProvider is being used instead of CC Attestation Agent. * debug: Add startup logging for log client connection Added log message to show if log client connection succeeds at attestation-service startup. This will help diagnose why logs aren't appearing in CVMS output. * feat: Add retry logic with exponential backoff to log client Added simple retry mechanism to handle concurrent log requests: - 3 retry attempts with exponential backoff (10ms, 20ms, 40ms) - Applies to both SendLog and SendEvent methods - Centralized in log client so all services benefit - Should eliminate 'failed to send log' errors from concurrent requests This fixes the issue where attestation-service logs weren't appearing in CVMS output due to dropped messages. * fix: Flatten sample evidence fields in primary_evidence for KBS KBS Sample Verifier expects svn and report_data at the top level of primary_evidence, not nested under an 'evidence' key. Changed structure from: {"primary_evidence": {"tee": "sample", "evidence": {"svn": "1", ...}}} To: {"primary_evidence": {"tee": "sample", "svn": "1", "report_data": "...", ...}} This matches what KBS expects when deserializing the Quote structure. * fix: Use sample quote directly as primary_evidence per KBS protocol According to KBS attestation protocol spec, for sample TEE type, primary_evidence should be the sample quote JSON directly: {"svn": "1", "report_data": "..."} Removed extra 'tee' and 'platform' fields that were causing KBS to fail deserializing the Quote structure. The 'tee' field is already sent in the Request payload during RCAR handshake. Refs: - https://github.com/confidential-containers/trustee/blob/main/kbs/docs/kbs_attestation_protocol.md - https://github.com/confidential-containers/guest-components/blob/main/attestation-agent/attester/src/sample/mod.rs * fix: Make CC AA required for sample attestation when configured When USE_CC_ATTESTATION_AGENT=true, attestation-service now requires AA to be available for NoCC/sample platform. This ensures sample evidence always comes from AA with the correct KBS format. Changes: - Error out if AA connection fails for NoCC platform when AA is configured - Only use EmptyProvider if AA is explicitly NOT configured - Prevents incorrect sample evidence format from EmptyProvider This ensures attestation-service delegates to AA for sample evidence generation instead of creating it itself. * fix: Implement proper RCAR protocol with tee-pubkey and runtime-data hash Fixed KBS attestation error 'REPORT_DATA is different from that in Sample Quote' Changes: 1. Generate ephemeral EC key pair BEFORE getting evidence from AA 2. Create runtime-data with nonce + tee-pubkey (JWK format) 3. Hash runtime-data (SHA-256) and use as report_data for AA 4. This binds the tee-pubkey to the TEE evidence per RCAR protocol The report_data in the evidence now matches what KBS expects: hash(runtime-data) instead of computation ID. This completes the full RCAR protocol implementation: - Request → Challenge → Attestation (with bound tee-pubkey) → Response * fix(agent): use simple nonce for Sample attestation report_data For Sample/NoCC attestation, use the raw nonce bytes directly as report_data instead of hashing runtime-data. This avoids JSON serialization mismatches with the KBS Sample verifier. Real TEEs (TDX/SNP) still use runtime-data hash binding to cryptographically bind the ephemeral tee-pubkey to the evidence. * fix(agent): use RFC 8785 canonical JSON for runtime-data hashing The KBS Sample attestation verifier (and likely others) expects the report_data to be the SHA-256 hash of the *canonical* JSON serialization (RFC 8785) of the runtime-data. Standard Go JSON marshaling does not guarantee key ordering, leading to hash mismatches. This change uses github.com/gowebpki/jcs to canonicalize the runtime-data before hashing, ensuring compatibility with the KBS RCAR implementation. Also reverted the temporary 'simple nonce' workaround. * feat(hal): add CoCo Keyprovider and Skopeo packages - Add coco-keyprovider buildroot package with systemd service - Add skopeo buildroot package for OCI image handling - Add ocicrypt_keyprovider.conf for encrypted image decryption - Update Config.in to include new packages This enables standard CoCo ecosystem integration for encrypted OCI images instead of custom S3/HTTP registry clients. * feat(oci): add OCI image handling package with Skopeo integration - Add pkg/oci/types.go with ResourceSource and ImageManifest types - Add pkg/oci/skopeo.go with Skopeo wrapper for pull/decrypt - Add pkg/oci/extract.go for extracting algorithms and datasets from layers This package provides OCI image handling using Skopeo and CoCo Keyprovider for encrypted image decryption, replacing custom S3/HTTP registry clients. * chore: regenerate protobuf files for updated cvms.proto * refactor(agent): replace S3/HTTP/KBS with OCI package - Remove pkg/kbs and pkg/registry imports - Add pkg/oci import for OCI image handling - Replace downloadAndDecryptResource with OCI-based implementation - Use Skopeo + CoCo Keyprovider for automatic decryption - Reduce code from ~240 lines to ~70 lines This eliminates custom KBS RCAR handshake, S3/HTTP registry clients, and manual decryption logic. CoCo Keyprovider handles all decryption automatically via ocicrypt protocol. * chore: remove obsolete pkg/kbs and pkg/registry packages - Delete pkg/kbs/ (custom KBS client, ~300 lines) - Delete pkg/registry/ (S3/HTTP registry clients, ~400 lines) - Remove unused imports from agent/service.go - Run go mod tidy to clean up dependencies These packages have been replaced by pkg/oci with Skopeo and CoCo Keyprovider for standard CoCo ecosystem integration. * fix(agent): update ResourceSource struct to include type and encryption fields Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix(hal): update CoCo Keyprovider to v0.16.0 and fix build path - Update version from v0.11.0 to v0.16.0 (matches attestation agent) - Fix install path: target is at repo root, not in coco_keyprovider subdir - This fixes the build error where coco_keyprovider binary wasn't found The cargo workspace in guest-components builds to a shared target/ directory at the repository root, not within each crate's subdirectory. * feat: Update remote resources testing guide to use kbs-client and coco-keyprovider for key management and encryption, enable insecure TLS for Skopeo, and enhance CVMS with Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Update component versions, revise image encryption documentation, and sanitize OCI image paths for Skopeo compatibility. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Add `decompress` option to Dataset and `algo_type`/`algo_args` to Algorithm protobuf messages, updating client, test, and build configurations. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * Update multiple package versions and enhance OCI image extraction error reporting for missing algorithm files. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * chore: Bump package versions, improve OCI image extraction debugging by returning seen files, and remove unused dataset type parsing from test code. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * refactor: Migrate OCI extraction to use structured logging with `slog` and `context`, and update package versions. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Bump multiple component versions, add encrypted status for computation inputs and algorithms, and refine OCI layer extraction warnings. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * logging Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Add `Encrypted` field to algorithm and dataset resource sources and update all component versions. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: update component versions, integrate coco-keyprovider service, and configure ocicrypt key provider. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: add support for KBS parameters and dataset/algorithm hash calculations in CVMS Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: update resource download and extraction logic to support requirements.txt and improve hash verification Signed-off-by: Sammy Oina <sammyoina@gmail.com> * chore: Update dependencies, improve code style, and add GetRawEvidence to attestation client mocks. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * Refactor code structure for improved readability and maintainability Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: update golangci configuration to include errcheck for build path and remove unnecessary exclusions Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: streamline kernel command line handling in QEMU args construction Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: add attestation binary and update checksum tests and policy structure Signed-off-by: Sammy Oina <sammyoina@gmail.com> * Add unit tests for attestation agent, attestation, log, crypto, OCI, and Skopeo clients - Implement tests for the attestation agent client including Unix socket and TCP address handling, token retrieval, and error scenarios. - Enhance attestation client tests to cover fetching raw evidence for various platforms (SNP, TDX, VTPM, SNPvTPM) and validate error handling. - Introduce log client tests to verify retry behavior for sending logs and events. - Create comprehensive tests for crypto package focusing on AES-GCM decryption, encrypted resource parsing, and key unwrapping. - Add tests for OCI package to validate algorithm and dataset extraction, including JSON serialization of OCILayout. - Implement Skopeo client tests to ensure proper functionality for image pulling, inspecting, and resource source handling. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: handle JSON marshal errors in test cases for decrypt and extract functions Signed-off-by: Sammy Oina <sammyoina@gmail.com> * test: add comprehensive tests for algorithm and dataset extraction with various scenarios Signed-off-by: Sammy Oina <sammyoina@gmail.com> * refactor: replace hardcoded Python script content with constant variable Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: remove redundant mock expectation for SendAgentConfig in TestCreateVMWithAaKbsParams Signed-off-by: Sammy Oina <sammyoina@gmail.com> * test: add tests for event sending failure, dataset extraction with path traversal, and Skopeo client behavior Signed-off-by: Sammy Oina <sammyoina@gmail.com> * test: add tests for download and decryption of resources with various URL formats Signed-off-by: Sammy Oina <sammyoina@gmail.com> * refactor: Introduce OCIClient interface for agent service to improve testability of OCI image operations and enhance related tests. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * refactor: Change `get_uint64_from_tcb` to accept `TcbVersion` by value and use `u64::from` for type conversions. --------- Signed-off-by: Sammy Oina <sammyoina@gmail.com>
Manager
Manager service provides a barebones gRPC API and Service interface implementation for the development of the manager service.
Configuration
The service is configured using the environment variables from the following table. Note that any unset variables will be replaced with their default values.
| Variable | Description | Default |
|---|---|---|
| COCOS_JAEGER_URL | The URL for the Jaeger tracing endpoint. | http://localhost:4318 |
| COCOS_JAEGER_TRACE_RATIO | The ratio of traces to sample. | 1.0 |
| MANAGER_INSTANCE_ID | The instance ID for the manager service. | |
| MANAGER_ATTESTATION_POLICY_BINARY | The file path for the attestation policy binarie. | ../../build/attestation_policy |
| MANAGER_IGVMMEASURE_BINARY | The file path for the igvmmeasure binarie. | ../../build/igvmmeasure |
| MANAGER_PCR_VALUES | The file path for the file with the expected PCR values. | |
| MANAGER_HTTP_HOST | Manager service HTTP host | "" |
| MANAGER_HTTP_PORT | Manager service HTTP port | 7003 |
| MANAGER_HTTP_SERVER_CERT | Manager to HTTP server certificate in pem format | "" |
| MANAGER_HTTP_SERVER_KEY | Path to HTTP server key in pem format | "" |
| MANAGER_HTTP_SERVER_CA_CERTS | Path to HTTP server CA certificate | "" |
| MANAGER_HTTP_CLIENT_CA_CERTS | Path to HTTP client CA certificate | "" |
| MANAGER_GRPC_HOST | Manager service gRPC host | "" |
| MANAGER_GRPC_PORT | Manager service gRPC port | 7001 |
| MANAGER_GRPC_SERVER_CERT | Path to gRPC server certificate in pem format | "" |
| MANAGER_GRPC_SERVER_KEY | Path to gRPC server key in pem format | "" |
| MANAGER_GRPC_SERVER_CA_CERTS | Path to gRPC server CA certificate | "" |
| MANAGER_GRPC_CLIENT_CA_CERTS | Path to gRPC client CA certificate | "" |
| MANAGER_EOS_VERSION | The EOS version used for booting CVMs. | |
| MANAGER_INSTANCE_ID | Manager service instance ID | |
| MANAGER_QEMU_MEMORY_SIZE | The total memory size for the virtual machine. Can be specified in a human-readable format like "2048M" or "4G". | 2048M |
| MANAGER_QEMU_MEMORY_SLOTS | The number of memory slots for the virtual machine. | 5 |
| MANAGER_QEMU_MAX_MEMORY | The maximum memory size for the virtual machine. Can be specified in a human-readable format like "30G". | 30G |
| MANAGER_QEMU_OVMF_CODE_IF | The interface type for the OVMF code. | pflash |
| MANAGER_QEMU_OVMF_CODE_FORMAT | The format of the OVMF code file. | raw |
| MANAGER_QEMU_OVMF_CODE_UNIT | The unit number for the OVMF code. | 0 |
| MANAGER_QEMU_OVMF_CODE_FILE | The file path for the OVMF code. | /usr/share/OVMF/OVMF_CODE.fd |
| MANAGER_QEMU_OVMF_VERSION | The version number of EDKII from which OVMF was built | edk2-stable202408 |
| MANAGER_QEMU_OVMF_CODE_READONLY | Whether the OVMF code should be read-only. | on |
| MANAGER_QEMU_OVMF_VARS_IF | The interface type for the OVMF variables. | pflash |
| MANAGER_QEMU_OVMF_VARS_FORMAT | The format of the OVMF variables file. | raw |
| MANAGER_QEMU_OVMF_VARS_UNIT | The unit number for the OVMF variables. | 1 |
| MANAGER_QEMU_OVMF_VARS_FILE | The file path for the OVMF variables. | /usr/share/OVMF/OVMF_VARS.fd |
| MANAGER_QEMU_NETDEV_ID | The ID for the network device. | vmnic |
| MANAGER_QEMU_HOST_FWD_AGENT | The port number for the host forward agent. | 7020 |
| MANAGER_QEMU_GUEST_FWD_AGENT | The port number for the guest forward agent. | 7002 |
| MANAGER_QEMU_VIRTIO_NET_PCI_DISABLE_LEGACY | Whether to disable the legacy PCI device. | on |
| MANAGER_QEMU_VIRTIO_NET_PCI_IOMMU_PLATFORM | Whether to enable the IOMMU platform for the virtio-net PCI device. | true |
| MANAGER_QEMU_VIRTIO_NET_PCI_ADDR | The PCI address for the virtio-net PCI device. | 0x2 |
| MANAGER_QEMU_VIRTIO_NET_PCI_ROMFILE | The file path for the ROM image for the virtio-net PCI device. | |
| MANAGER_QEMU_DISK_IMG_KERNEL_FILE | The file path for the kernel image. | img/bzImage |
| MANAGER_QEMU_DISK_IMG_ROOTFS_FILE | The file path for the root filesystem image. | img/rootfs.cpio.gz |
| MANAGER_QEMU_SEV_SNP_ID | The ID for the Secure Encrypted Virtualization (SEV-SNP) device. | sev0 |
| MANAGER_QEMU_SEV_SNP_CBITPOS | The position of the C-bit in the physical address. | 51 |
| MANAGER_QEMU_SEV_SNP_REDUCED_PHYS_BITS | The number of reduced physical address bits for SEV-SNP. | 1 |
| MANAGER_QEMU_ENABLE_HOST_DATA | Enable additional data for the SEV-SNP host. | false |
| MANAGER_QEMU_HOST_DATA | Additional data for the SEV-SNP host. | |
| MANAGER_QEMU_TDX_ID | The ID for the Trust Domain Extensions (TDX) device. | tdx0 |
| MANAGER_QEMU_QUOTE_GENERATION_PORT | The port number for virtual socket used to communicate with the Quote Generation Service (QGS). | 4050 |
| MANAGER_QEMU_OVMF_FILE | The file path for the OVMF file (combined OVMF_CODE and OVMF_VARS file). | /usr/share/ovmf/OVMF.fd |
| MANAGER_QEMU_IGVM_ID | The ID of the IGVM file. | igvm0 |
| MANAGER_QEMU_IGVM_FILE | The file path to the IGVM file. | /root/coconut-qemu.igvm |
| MANAGER_QEMU_BIN_PATH | The file path for the QEMU binary. | qemu-system-x86_64 |
| MANAGER_QEMU_USE_SUDO | Whether to use sudo to run QEMU. | false |
| MANAGER_QEMU_ENABLE_SEV_SNP | Whether to enable Secure Nested Paging (SEV-SNP). | true |
| MANAGER_QEMU_ENABLE_TDX | Whether to enable Trust Domain Extensions (TDX). | false |
| MANAGER_QEMU_ENABLE_KVM | Whether to enable the Kernel-based Virtual Machine (KVM) acceleration. | true |
| MANAGER_QEMU_MACHINE | The machine type for QEMU. | q35 |
| MANAGER_QEMU_CPU | The CPU model for QEMU. | EPYC |
| MANAGER_QEMU_SMP_COUNT | The number of virtual CPUs. | 4 |
| MANAGER_QEMU_SMP_MAXCPUS | The maximum number of virtual CPUs. | 64 |
| MANAGER_QEMU_MEM_ID | The ID for the memory device. | ram1 |
| MANAGER_QEMU_NO_GRAPHIC | Whether to disable the graphical display. | true |
| MANAGER_QEMU_MONITOR | The type of monitor to use. | pty |
| MANAGER_QEMU_HOST_FWD_RANGE | The range of host ports to forward. | 6100-6200 |
| MANAGER_MAX_VMS | The maximum number of vms running concurrently on manager. | 10 |
Setup
git clone https://github.com/ultravioletrs/cocos
cd cocos
NB: all relative paths in this document are relative to cocos repository directory.
QEMU-KVM
QEMU-KVM is a virtualization platform that allows you to run multiple operating systems on the same physical machine. It is a combination of two technologies: QEMU and KVM.
- QEMU is an emulator that can run a variety of operating systems, including Linux, Windows, and macOS.
- KVM is a Linux kernel module that allows QEMU to run virtual machines.
To install QEMU-KVM on a Debian based machine, run
sudo apt update
sudo apt install qemu-kvm
Create img directory in cmd/manager.
Virtual filesystem
9P (or Plan 9 Filesystem) in QEMU is a lightweight, network-based file-sharing protocol. In Cocos, the 9P is used to transfer environment variables and TLS certificates for cloud communication from the Manager to the Agent.
You should define the environment variables in a file called environment. For the number and meaning of the environment variables, please refer to the Agent Readme.
Prepare Cocos HAL
Cocos HAL for Linux is framework for building custom in-enclave Linux distribution. Use the instructions in Readme.
Once the image is built copy the kernel and rootfs image to cmd/manager/img from buildroot/output/images/bzImage and buildroot/output/images/rootfs.cpio.gz respectively.
Another option is to use release versions of EOS that can be downloaded from the Cocos GitHub repository.
Test VM creation
cd cmd/manager
sudo find / -name OVMF_CODE.fd
# => /usr/share/OVMF/OVMF_CODE.fd
OVMF_CODE=/usr/share/OVMF/OVMF_CODE.fd
sudo find / -name OVMF_VARS.fd
# => /usr/share/OVMF/OVMF_VARS.fd
# Create a local copy of OVMF_VARS.
cp /usr/share/OVMF/OVMF_VARS.fd .
# Create a directory for the environment file and the certificates for cloud certificates.
mkdir env
mkdir certs
# Enter the env directory and create the environment file.
cd env
touch environment
# Define Computations endpoint URL for agent.
# Make sure the Computation endpoint is running (like Cocos Prism).
echo AGENT_CVM_GRPC_URL=localhost:7001 >> ./environment
# Define log level for the agent.
echo AGENT_LOG_LEVEL=debug >> ./environment
# Optional: Add AWS/S3 credentials for remote resource access
# NOTE: AWS credentials can also be passed via the CreateVM API using CLI flags
# (--aws-access-key-id, --aws-secret-access-key, --aws-endpoint-url, --aws-region)
# If using the API approach, you don't need to add them to this file.
# Replace HOST_IP with your host machine IP address (not localhost)
echo AWS_ACCESS_KEY_ID=minioadmin >> ./environment
echo AWS_SECRET_ACCESS_KEY=minioadmin >> ./environment
echo AWS_ENDPOINT_URL=http://HOST_IP:9000 >> ./environment
echo AWS_REGION=us-east-1 >> ./environment
# Return to cmd/manager
cd ..
OVMF_VARS=./OVMF_VARS.fd
KERNEL="img/bzImage"
INITRD="img/rootfs.cpio.gz"
ENV_PATH=./env
CERTH_PATH=./certs
qemu-system-x86_64 \
-enable-kvm \
-cpu EPYC-v4 \
-machine q35 \
-smp 4 \
-m 2048M,slots=5,maxmem=10240M \
-no-reboot \
-drive if=pflash,format=raw,unit=0,file=$OVMF_CODE,readonly=on \
-drive if=pflash,format=raw,unit=1,file=$OVMF_VARS \
-netdev user,id=vmnic,hostfwd=tcp::7020-:7002 \
-device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= \
-kernel $KERNEL \
-append "earlyprintk=serial console=ttyS0" \
-initrd $INITRD \
-nographic \
-monitor pty \
-monitor unix:monitor,server,nowait \
-fsdev local,id=env_fs,path=$ENV_PATH,security_model=mapped \
-device virtio-9p-pci,fsdev=env_fs,mount_tag=env_share \
-fsdev local,id=cert_fs,path=$CERTH_PATH,security_model=mapped \
-device virtio-9p-pci,fsdev=cert_fs,mount_tag=certs_share
Once the VM is booted press enter and on the login use username root.
Build and run Agent
Agent is started automatically in the VM.
# List running processes and use 'grep' to filter for processes containing 'agent' in their names.
ps aux | grep cocos-agent
# This command helps verify that the 'agent' process is running.
# The output shows the process ID (PID), resource usage, and other information about the 'cocos-agent' process.
# For example: 118 root cocos-agent
We can also check if Agent is reachable from the host machine:
# Use netcat (nc) to test the connection to localhost on port 7020.
nc -zv localhost 7020
# Output:
# nc: connect to localhost (::1) port 7020 (tcp) failed: Connection refused
# Connection to localhost (127.0.0.1) 7020 port [tcp/*] succeeded!
Conclusion
Now you are able to use Manager with Agent. Namely, Manager will create a VM with a separate OVMF variables file on manager /run request.
OVMF
We need Open Virtual Machine Firmware. OVMF is a port of Intel's tianocore firmware - an open source implementation of the Unified Extensible Firmware Interface (UEFI) - used by a qemu virtual machine. We need OVMF in order to run virtual machine with focal-server-cloudimg-amd64. When we install QEMU, we get two files that we need to start a VM: OVMF_VARS.fd and OVMF_CODE.fd. We will make a local copy of OVMF_VARS.fd since a VM will modify this file. On the other hand, OVMF_CODE.fd is only used as a reference, so we only record its path in an environment variable.
sudo find / -name OVMF_CODE.fd
# => /usr/share/OVMF/OVMF_CODE.fd
MANAGER_QEMU_OVMF_CODE_FILE=/usr/share/OVMF/OVMF_CODE.fd
sudo find / -name OVMF_VARS.fd
# => /usr/share/OVMF/OVMF_VARS.fd
MANAGER_QEMU_OVMF_VARS_FILE=/usr/share/OVMF/OVMF_VARS.fd
NB: we set environment variables that we will use in the shell process where we run manager.
Trusted Platform Module (TPM)
The Trusted Platform Module (TPM) plays a fundamental role in this process by providing a tamper-resistant foundation for cryptographic operations, securing sensitive artifacts, measuring system state, and enabling attestation mechanisms.
IGVM
An IGVM file contains all the necessary information to launch a virtual machine on different virtualization platforms. It includes setup commands for the guest system and verification data to ensure the VM is loaded securely and correctly.
Cocos uses the COCONUT-SVSM for the vTPM. The IGVM file contains the OVMF file and the vTPM.
Deployment
To start the service, execute the following shell script (note a server needs to be running see here):
The manager can be started as a systemd service or a standalone executable. To start the manager as a systemd service, look at the systemd service script here. The environment variables are defined in the cocos-manager.env file. Below are examples of how to start the manager.
# Download the latest version of the service
git clone git@github.com:ultravioletrs/cocos.git
cd cocos
# Compile the manager
make manager
# Set the environment variables and run the service
MANAGER_GRPC_URL=localhost:7001 \
MANAGER_LOG_LEVEL=debug \
MANAGER_QEMU_USE_SUDO=false \
./build/cocos-manager
To start SEV-SNP, define the IGVM file that contains the vTPM and the OVMF (combined OVMF_CODE and OVMF_VARS) of the CVM.
To enable AMD SEV-SNP support, start manager like this
MANAGER_GRPC_URL=localhost:7001 \
MANAGER_LOG_LEVEL=debug \
MANAGER_QEMU_ENABLE_SEV_SNP=true \
MANAGER_QEMU_SEV_SNP_CBITPOS=51 \
MANAGER_QEMU_BIN_PATH=<path to QEMU binary> \
MANAGER_QEMU_IGVM_FILE=<path to IGVM file> \
./build/cocos-manager
To enable TDX support, start manager like this
MANAGER_GRPC_URL=localhost:7001 \
MANAGER_LOG_LEVEL=debug \
MANAGER_QEMU_ENABLE_SEV_SNP=false \
MANAGER_QEMU_ENABLE_TDX=true \
MANAGER_QEMU_CPU=host \
MANAGER_QEMU_BIN_PATH=<path to QEMU binary> \
MANAGER_QEMU_OVMF_FILE=<path to OVMF file> \
./build/cocos-manager
Troubleshooting
If the ps aux | grep qemu-system-x86_64 give you something like this
darko 13913 0.0 0.0 0 0 pts/2 Z+ 20:17 0:00 [qemu-system-x86] <defunct>
means that the a QEMU virtual machine that is currently defunct, meaning that it is no longer running. More precisely, the defunct process in the output is also known as a "zombie" process.
You can troubleshoot the VM launch procedure by running directly qemu-system-x86_64 command. When you run manager with MANAGER_LOG_LEVEL=info env var set, it prints out the entire command used to launch a VM. The relevant part of the log might look like this
{"level":"info","message":"/usr/bin/qemu-system-x86_64 -enable-kvm -machine q35 -cpu EPYC -smp 4,maxcpus=64 -m 4096M,slots=5,maxmem=30G -drive if=pflash,format=raw,unit=0,file=/usr/share/OVMF/OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=img/OVMF_VARS.fd -device virtio-scsi-pci,id=scsi,disable-legacy=on,iommu_platform=true -drive file=img/focal-server-cloudimg-amd64.img,if=none,id=disk0,format=qcow2 -device scsi-hd,drive=disk0 -netdev user,id=vmnic,hostfwd=tcp::2222-:22,hostfwd=tcp::9301-:9031,hostfwd=tcp::7020-:7002 -device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= -nographic -monitor pty","ts":"2023-08-14T18:29:19.2653908Z"}
You can run the command - the value of the "message" key - directly in the terminal:
/usr/bin/qemu-system-x86_64 -enable-kvm -machine q35 -cpu EPYC -smp 4,maxcpus=64 -m 4096M,slots=5,maxmem=30G -drive if=pflash,format=raw,unit=0,file=/usr/share/OVMF/OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=img/OVMF_VARS.fd -device virtio-scsi-pci,id=scsi,disable-legacy=on,iommu_platform=true -drive file=img/focal-server-cloudimg-amd64.img,if=none,id=disk0,format=qcow2 -device scsi-hd,drive=disk0 -netdev user,id=vmnic,hostfwd=tcp::2222-:22,hostfwd=tcp::9301-:9031,hostfwd=tcp::7020-:7002 -device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= -nographic -monitor pty
and look for the possible problems. This problems can usually be solved by using the adequate env var assignments. Look in the manager/qemu/config.go file to see the recognized env vars. Don't forget to prepend MANAGER_QEMU_ to the name of the env vars.
Kill qemu-system-x86_64 processes
To kill any leftover qemu-system-x86_64 processes, use
pkill -f qemu-system-x86_64
The pkill command is used to kill processes by name or by pattern. The -f flag to specify that we want to kill processes that match the pattern qemu-system-x86_64. It sends the SIGKILL signal to all processes that are running qemu-system-x86_64.
If this does not work, i.e. if ps aux | grep qemu-system-x86_64 still outputs qemu-system-x86_64 related process(es), you can kill the unwanted process with kill -9 <PID>, which also sends a SIGKILL signal to the process.
Usage
For more information about service capabilities and its usage, please check out the README documentation.