mirror of
https://github.com/ultravioletrs/cocos.git
synced 2026-06-23 04:10:25 +00:00
da31d76c94
CI / checkproto (push) Has been cancelled
CI / lint (push) Has been cancelled
Rust CI Pipeline / rust-check (push) Has been cancelled
CI / test (agent) (push) Has been cancelled
CI / test (cli) (push) Has been cancelled
CI / test (cmd) (push) Has been cancelled
CI / test (internal) (push) Has been cancelled
CI / test (manager, true) (push) Has been cancelled
CI / test (pkg) (push) Has been cancelled
CI / upload-coverage (push) Has been cancelled
* feat(kbs): implement KBS client for attestation and resource retrieval - Added KBS client implementation in pkg/kbs/client.go with methods for attestation and resource retrieval. - Introduced necessary data structures for requests and responses. - Implemented error handling for various scenarios. test(kbs): add unit tests for KBS client - Created comprehensive tests for the KBS client in pkg/kbs/client_test.go. - Included tests for attestation success and failure cases, as well as resource retrieval. feat(registry): introduce HTTP and S3 registry implementations - Added HTTPRegistry for downloading resources over HTTP/HTTPS with retry logic in pkg/registry/http.go. - Implemented S3Registry for downloading resources from AWS S3 and S3-compatible services in pkg/registry/s3.go. - Included error handling and configuration options for both registries. chore(registry): define registry interface and configuration - Created registry interface and configuration struct in pkg/registry/registry.go. - Added default configuration settings for registry clients. docs(cvms): update README for CVMS server configuration and usage - Enhanced documentation for CVMS server with detailed command-line flags and usage examples. - Clarified direct upload and remote resource modes, including KBS integration. fix(cvms): integrate KBS for remote resource handling in main.go - Updated main.go to support remote datasets and algorithms using KBS. - Added validation for command-line flags to ensure proper configuration. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: Move ifeq conditional outside define block in attestation-service.mk Make conditionals cannot be evaluated inside define...endef blocks when used as recipe bodies. Restructured to define the ATTESTATION_SERVICE_INSTALL_INIT_SYSTEMD block conditionally based on BR2_PACKAGE_CC_ATTESTATION_AGENT configuration. * feat: Implement remote resource downloading for algorithms and datasets using AWS S3/MinIO credentials. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Add comprehensive documentation and agent support for testing remote resource download with KBS attestation. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Improve agent logging for remote resource configuration and KBS status, and add a testing guide for remote resource downloads with KBS attestation. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Add a comprehensive guide for testing remote resource download with KBS attestation and update multiple package versions to a specific commit. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Add failure transitions for resource reception states and a comprehensive guide for testing remote resource downloads with KBS attestation. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Implement remote resource download with KBS attestation in the agent and add a comprehensive testing guide. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * test: Add comprehensive guide for testing remote resource download with KBS attestation and include a debug log in the attestation client. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Delegate KBS attestation and token retrieval to a new attestation-agent service and document remote resource testing. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * client fixes Signed-off-by: Sammy Oina <sammyoina@gmail.com> * raw evidence Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: Build all Go files in cmd directories, not just main.go This fixes the issue where fetch_raw_evidence.go wasn't being included in the attestation-service build. * fix: Wrap binary evidence in JSON for KBS compatibility Fixes 'invalid character' error by wrapping raw binary evidence in a JSON structure with base64 encoding, as expected by KBS. * chore: Update buildroot packages toc28cefaeIncludes fixes for: 1. attestation-service build (including fetch_raw_evidence.go) 2. Agent KBS evidence format (wrapping binary in JSON) * fix: Implement KBS RCAR handshake with cookies Fixes 'cookie not found' error (401) from KBS by: 1. Adding CookieJar support to KBS client 2. Implementing GetChallenge() to perform /auth handshake and capture session cookie 3. Updating Agent to get challenge, decode nonce, and use it for evidence generation 4. Regenerating mocks * chore: Update buildroot packages tof6981ac5Includes KBS RCAR handshake fix (cookie support + GetChallenge loop) * fix: Update KBS client JSON tags to kebab-case Fixes deserialization error (401) from KBS by: 1. Using kebab-case (e.g. extra-params) for JSON tags as per protocol. 2. Initializing ExtraParams as empty object {} instead of null/omitted. * fix: Wrap attestation evidence in primary_evidence format Updates Agent to construct 'tee-evidence' payload with: - primary_evidence: containing the actual quote/data - additional_evidence: empty JSON object This matches the Confidential Containers KBS Attestation Protocol requirements. * fix: Update KBS protocol version to 0.4.0 KBS rejected 0.1.0 with a version mismatch error. Bumping to 0.4.0 to match server expectation. * fix: Generate ephemeral key for KBS RuntimeData Updates RuntimeData to include a valid ephemeral EC P-256 public key in JWK format, as required by the KBS RCAR protocol. Also fixes the KBS client struct to support TEEPubKey as an object. * fix: Update sample attestation quote to valid JSON The default attestation.bin was binary, but the KBS Sample Verifier expects a valid JSON quote containing 'svn' and 'report_data'. Updated the embedded bin file to contain this JSON structure. * fix: Generate dynamic JSON quote for Sample TEE in FetchRawEvidence The KBS Sample Verifier expects a JSON object with 'svn' and 'report_data'. Previously, we were returning raw binary data (reportData+nonce). This commit updates FetchRawEvidence to return a marshaled JSON structure with: - svn: "1" - report_data: base64(req.ReportData) * refactor: Delegate Sample Attestation to Provider Refactored sample attestation logic: - Moved JSON Quote generation into EmptyProvider (standalone mode). - Updated FetchRawEvidence to call provider.TeeAttestation instead of manual generation. This enables using the real CC Attestation Agent for UNSPECIFIED platform if configured. * feat: Add comprehensive debug logging and enforce CC AA usage Changes: - Updated EmptyProvider to return error instead of generating mock data This forces proper use of CC Attestation Agent's sample attester - Added detailed logging to attestation-service FetchRawEvidence: * Hex dump of evidence (first 200 bytes) * String preview of evidence * Total evidence length - Added detailed logging to agent service: * Raw evidence hex and string previews * KBS evidence JSON preview (first 500 bytes) * Evidence lengths at each transformation step This logging will help diagnose why KBS Sample Verifier is rejecting evidence. * fix: Enable CC AA by default and add attestation-service log forwarding Changes: - Set USE_CC_ATTESTATION_AGENT=true by default in systemd service - Added StandardOutput/StandardError to forward logs to /var/log/cocos/ - Updated HAL makefile to handle new default value - This ensures attestation-service uses CC AA's sample attester - Logs will now be visible in CVMS output for debugging * feat: Add gRPC log forwarding to attestation-service Implemented the same log forwarding mechanism used by the agent: - Added ProtoHandler to write logs to both stdout and logQueue - Connected to log client (/run/cocos/log.sock) for gRPC forwarding - Added goroutine to forward logs to CVMS via log client - Logs will now appear in CVMS output during computation runs This enables visibility into attestation-service debug output including: - CC AA connection status - Evidence generation details (hex dumps, string previews) - Any errors from providers * fix: Parse sample evidence JSON instead of base64-encoding it The attestation-service returns sample evidence as JSON: {"svn":"1","report_data":"base64..."} The agent was incorrectly base64-encoding this JSON string again. KBS Sample Verifier expects the parsed JSON object directly. Fixed by: - Parsing the JSON evidence from attestation-service - Passing the parsed object directly in primary_evidence.evidence - This matches what KBS Sample Verifier expects * debug: Increase KBS evidence logging preview to 1000 bytes Show the complete JSON structure being sent to KBS to debug the attestation failure. * debug: Add comprehensive CC AA configuration logging Added debug logs to show: - Whether CC AA is enabled in config - CC AA address being used - Connection success/failure - Which provider is ultimately selected - Warning when falling back to EmptyProvider This will help diagnose why EmptyProvider is being used instead of CC Attestation Agent. * debug: Add startup logging for log client connection Added log message to show if log client connection succeeds at attestation-service startup. This will help diagnose why logs aren't appearing in CVMS output. * feat: Add retry logic with exponential backoff to log client Added simple retry mechanism to handle concurrent log requests: - 3 retry attempts with exponential backoff (10ms, 20ms, 40ms) - Applies to both SendLog and SendEvent methods - Centralized in log client so all services benefit - Should eliminate 'failed to send log' errors from concurrent requests This fixes the issue where attestation-service logs weren't appearing in CVMS output due to dropped messages. * fix: Flatten sample evidence fields in primary_evidence for KBS KBS Sample Verifier expects svn and report_data at the top level of primary_evidence, not nested under an 'evidence' key. Changed structure from: {"primary_evidence": {"tee": "sample", "evidence": {"svn": "1", ...}}} To: {"primary_evidence": {"tee": "sample", "svn": "1", "report_data": "...", ...}} This matches what KBS expects when deserializing the Quote structure. * fix: Use sample quote directly as primary_evidence per KBS protocol According to KBS attestation protocol spec, for sample TEE type, primary_evidence should be the sample quote JSON directly: {"svn": "1", "report_data": "..."} Removed extra 'tee' and 'platform' fields that were causing KBS to fail deserializing the Quote structure. The 'tee' field is already sent in the Request payload during RCAR handshake. Refs: - https://github.com/confidential-containers/trustee/blob/main/kbs/docs/kbs_attestation_protocol.md - https://github.com/confidential-containers/guest-components/blob/main/attestation-agent/attester/src/sample/mod.rs * fix: Make CC AA required for sample attestation when configured When USE_CC_ATTESTATION_AGENT=true, attestation-service now requires AA to be available for NoCC/sample platform. This ensures sample evidence always comes from AA with the correct KBS format. Changes: - Error out if AA connection fails for NoCC platform when AA is configured - Only use EmptyProvider if AA is explicitly NOT configured - Prevents incorrect sample evidence format from EmptyProvider This ensures attestation-service delegates to AA for sample evidence generation instead of creating it itself. * fix: Implement proper RCAR protocol with tee-pubkey and runtime-data hash Fixed KBS attestation error 'REPORT_DATA is different from that in Sample Quote' Changes: 1. Generate ephemeral EC key pair BEFORE getting evidence from AA 2. Create runtime-data with nonce + tee-pubkey (JWK format) 3. Hash runtime-data (SHA-256) and use as report_data for AA 4. This binds the tee-pubkey to the TEE evidence per RCAR protocol The report_data in the evidence now matches what KBS expects: hash(runtime-data) instead of computation ID. This completes the full RCAR protocol implementation: - Request → Challenge → Attestation (with bound tee-pubkey) → Response * fix(agent): use simple nonce for Sample attestation report_data For Sample/NoCC attestation, use the raw nonce bytes directly as report_data instead of hashing runtime-data. This avoids JSON serialization mismatches with the KBS Sample verifier. Real TEEs (TDX/SNP) still use runtime-data hash binding to cryptographically bind the ephemeral tee-pubkey to the evidence. * fix(agent): use RFC 8785 canonical JSON for runtime-data hashing The KBS Sample attestation verifier (and likely others) expects the report_data to be the SHA-256 hash of the *canonical* JSON serialization (RFC 8785) of the runtime-data. Standard Go JSON marshaling does not guarantee key ordering, leading to hash mismatches. This change uses github.com/gowebpki/jcs to canonicalize the runtime-data before hashing, ensuring compatibility with the KBS RCAR implementation. Also reverted the temporary 'simple nonce' workaround. * feat(hal): add CoCo Keyprovider and Skopeo packages - Add coco-keyprovider buildroot package with systemd service - Add skopeo buildroot package for OCI image handling - Add ocicrypt_keyprovider.conf for encrypted image decryption - Update Config.in to include new packages This enables standard CoCo ecosystem integration for encrypted OCI images instead of custom S3/HTTP registry clients. * feat(oci): add OCI image handling package with Skopeo integration - Add pkg/oci/types.go with ResourceSource and ImageManifest types - Add pkg/oci/skopeo.go with Skopeo wrapper for pull/decrypt - Add pkg/oci/extract.go for extracting algorithms and datasets from layers This package provides OCI image handling using Skopeo and CoCo Keyprovider for encrypted image decryption, replacing custom S3/HTTP registry clients. * chore: regenerate protobuf files for updated cvms.proto * refactor(agent): replace S3/HTTP/KBS with OCI package - Remove pkg/kbs and pkg/registry imports - Add pkg/oci import for OCI image handling - Replace downloadAndDecryptResource with OCI-based implementation - Use Skopeo + CoCo Keyprovider for automatic decryption - Reduce code from ~240 lines to ~70 lines This eliminates custom KBS RCAR handshake, S3/HTTP registry clients, and manual decryption logic. CoCo Keyprovider handles all decryption automatically via ocicrypt protocol. * chore: remove obsolete pkg/kbs and pkg/registry packages - Delete pkg/kbs/ (custom KBS client, ~300 lines) - Delete pkg/registry/ (S3/HTTP registry clients, ~400 lines) - Remove unused imports from agent/service.go - Run go mod tidy to clean up dependencies These packages have been replaced by pkg/oci with Skopeo and CoCo Keyprovider for standard CoCo ecosystem integration. * fix(agent): update ResourceSource struct to include type and encryption fields Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix(hal): update CoCo Keyprovider to v0.16.0 and fix build path - Update version from v0.11.0 to v0.16.0 (matches attestation agent) - Fix install path: target is at repo root, not in coco_keyprovider subdir - This fixes the build error where coco_keyprovider binary wasn't found The cargo workspace in guest-components builds to a shared target/ directory at the repository root, not within each crate's subdirectory. * feat: Update remote resources testing guide to use kbs-client and coco-keyprovider for key management and encryption, enable insecure TLS for Skopeo, and enhance CVMS with Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Update component versions, revise image encryption documentation, and sanitize OCI image paths for Skopeo compatibility. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Add `decompress` option to Dataset and `algo_type`/`algo_args` to Algorithm protobuf messages, updating client, test, and build configurations. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * Update multiple package versions and enhance OCI image extraction error reporting for missing algorithm files. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * chore: Bump package versions, improve OCI image extraction debugging by returning seen files, and remove unused dataset type parsing from test code. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * refactor: Migrate OCI extraction to use structured logging with `slog` and `context`, and update package versions. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Bump multiple component versions, add encrypted status for computation inputs and algorithms, and refine OCI layer extraction warnings. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * logging Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: Add `Encrypted` field to algorithm and dataset resource sources and update all component versions. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: update component versions, integrate coco-keyprovider service, and configure ocicrypt key provider. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: add support for KBS parameters and dataset/algorithm hash calculations in CVMS Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: update resource download and extraction logic to support requirements.txt and improve hash verification Signed-off-by: Sammy Oina <sammyoina@gmail.com> * chore: Update dependencies, improve code style, and add GetRawEvidence to attestation client mocks. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * Refactor code structure for improved readability and maintainability Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: update golangci configuration to include errcheck for build path and remove unnecessary exclusions Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: streamline kernel command line handling in QEMU args construction Signed-off-by: Sammy Oina <sammyoina@gmail.com> * feat: add attestation binary and update checksum tests and policy structure Signed-off-by: Sammy Oina <sammyoina@gmail.com> * Add unit tests for attestation agent, attestation, log, crypto, OCI, and Skopeo clients - Implement tests for the attestation agent client including Unix socket and TCP address handling, token retrieval, and error scenarios. - Enhance attestation client tests to cover fetching raw evidence for various platforms (SNP, TDX, VTPM, SNPvTPM) and validate error handling. - Introduce log client tests to verify retry behavior for sending logs and events. - Create comprehensive tests for crypto package focusing on AES-GCM decryption, encrypted resource parsing, and key unwrapping. - Add tests for OCI package to validate algorithm and dataset extraction, including JSON serialization of OCILayout. - Implement Skopeo client tests to ensure proper functionality for image pulling, inspecting, and resource source handling. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: handle JSON marshal errors in test cases for decrypt and extract functions Signed-off-by: Sammy Oina <sammyoina@gmail.com> * test: add comprehensive tests for algorithm and dataset extraction with various scenarios Signed-off-by: Sammy Oina <sammyoina@gmail.com> * refactor: replace hardcoded Python script content with constant variable Signed-off-by: Sammy Oina <sammyoina@gmail.com> * fix: remove redundant mock expectation for SendAgentConfig in TestCreateVMWithAaKbsParams Signed-off-by: Sammy Oina <sammyoina@gmail.com> * test: add tests for event sending failure, dataset extraction with path traversal, and Skopeo client behavior Signed-off-by: Sammy Oina <sammyoina@gmail.com> * test: add tests for download and decryption of resources with various URL formats Signed-off-by: Sammy Oina <sammyoina@gmail.com> * refactor: Introduce OCIClient interface for agent service to improve testability of OCI image operations and enhance related tests. Signed-off-by: Sammy Oina <sammyoina@gmail.com> * refactor: Change `get_uint64_from_tcb` to accept `TcbVersion` by value and use `u64::from` for type conversions. --------- Signed-off-by: Sammy Oina <sammyoina@gmail.com>
327 lines
25 KiB
Markdown
327 lines
25 KiB
Markdown
# Manager
|
|
|
|
Manager service provides a barebones gRPC API and Service interface implementation for the development of the manager service.
|
|
|
|
## Configuration
|
|
|
|
The service is configured using the environment variables from the following table. Note that any unset variables will be replaced with their default values.
|
|
|
|
| Variable | Description | Default |
|
|
| ------------------------------------------ | ---------------------------------------------------------------------------------------------------------------- | ------------------------------ |
|
|
| COCOS_JAEGER_URL | The URL for the Jaeger tracing endpoint. | http://localhost:4318 |
|
|
| COCOS_JAEGER_TRACE_RATIO | The ratio of traces to sample. | 1.0 |
|
|
| MANAGER_INSTANCE_ID | The instance ID for the manager service. | |
|
|
| MANAGER_ATTESTATION_POLICY_BINARY | The file path for the attestation policy binarie. | ../../build/attestation_policy |
|
|
| MANAGER_IGVMMEASURE_BINARY | The file path for the igvmmeasure binarie. | ../../build/igvmmeasure |
|
|
| MANAGER_PCR_VALUES | The file path for the file with the expected PCR values. | |
|
|
| MANAGER_HTTP_HOST | Manager service HTTP host | "" |
|
|
| MANAGER_HTTP_PORT | Manager service HTTP port | 7003 |
|
|
| MANAGER_HTTP_SERVER_CERT | Manager to HTTP server certificate in pem format | "" |
|
|
| MANAGER_HTTP_SERVER_KEY | Path to HTTP server key in pem format | "" |
|
|
| MANAGER_HTTP_SERVER_CA_CERTS | Path to HTTP server CA certificate | "" |
|
|
| MANAGER_HTTP_CLIENT_CA_CERTS | Path to HTTP client CA certificate | "" |
|
|
| MANAGER_GRPC_HOST | Manager service gRPC host | "" |
|
|
| MANAGER_GRPC_PORT | Manager service gRPC port | 7001 |
|
|
| MANAGER_GRPC_SERVER_CERT | Path to gRPC server certificate in pem format | "" |
|
|
| MANAGER_GRPC_SERVER_KEY | Path to gRPC server key in pem format | "" |
|
|
| MANAGER_GRPC_SERVER_CA_CERTS | Path to gRPC server CA certificate | "" |
|
|
| MANAGER_GRPC_CLIENT_CA_CERTS | Path to gRPC client CA certificate | "" |
|
|
| MANAGER_EOS_VERSION | The EOS version used for booting CVMs. | |
|
|
| MANAGER_INSTANCE_ID | Manager service instance ID | |
|
|
| MANAGER_QEMU_MEMORY_SIZE | The total memory size for the virtual machine. Can be specified in a human-readable format like "2048M" or "4G". | 2048M |
|
|
| MANAGER_QEMU_MEMORY_SLOTS | The number of memory slots for the virtual machine. | 5 |
|
|
| MANAGER_QEMU_MAX_MEMORY | The maximum memory size for the virtual machine. Can be specified in a human-readable format like "30G". | 30G |
|
|
| MANAGER_QEMU_OVMF_CODE_IF | The interface type for the OVMF code. | pflash |
|
|
| MANAGER_QEMU_OVMF_CODE_FORMAT | The format of the OVMF code file. | raw |
|
|
| MANAGER_QEMU_OVMF_CODE_UNIT | The unit number for the OVMF code. | 0 |
|
|
| MANAGER_QEMU_OVMF_CODE_FILE | The file path for the OVMF code. | /usr/share/OVMF/OVMF_CODE.fd |
|
|
| MANAGER_QEMU_OVMF_VERSION | The version number of EDKII from which OVMF was built | edk2-stable202408 |
|
|
| MANAGER_QEMU_OVMF_CODE_READONLY | Whether the OVMF code should be read-only. | on |
|
|
| MANAGER_QEMU_OVMF_VARS_IF | The interface type for the OVMF variables. | pflash |
|
|
| MANAGER_QEMU_OVMF_VARS_FORMAT | The format of the OVMF variables file. | raw |
|
|
| MANAGER_QEMU_OVMF_VARS_UNIT | The unit number for the OVMF variables. | 1 |
|
|
| MANAGER_QEMU_OVMF_VARS_FILE | The file path for the OVMF variables. | /usr/share/OVMF/OVMF_VARS.fd |
|
|
| MANAGER_QEMU_NETDEV_ID | The ID for the network device. | vmnic |
|
|
| MANAGER_QEMU_HOST_FWD_AGENT | The port number for the host forward agent. | 7020 |
|
|
| MANAGER_QEMU_GUEST_FWD_AGENT | The port number for the guest forward agent. | 7002 |
|
|
| MANAGER_QEMU_VIRTIO_NET_PCI_DISABLE_LEGACY | Whether to disable the legacy PCI device. | on |
|
|
| MANAGER_QEMU_VIRTIO_NET_PCI_IOMMU_PLATFORM | Whether to enable the IOMMU platform for the virtio-net PCI device. | true |
|
|
| MANAGER_QEMU_VIRTIO_NET_PCI_ADDR | The PCI address for the virtio-net PCI device. | 0x2 |
|
|
| MANAGER_QEMU_VIRTIO_NET_PCI_ROMFILE | The file path for the ROM image for the virtio-net PCI device. | |
|
|
| MANAGER_QEMU_DISK_IMG_KERNEL_FILE | The file path for the kernel image. | img/bzImage |
|
|
| MANAGER_QEMU_DISK_IMG_ROOTFS_FILE | The file path for the root filesystem image. | img/rootfs.cpio.gz |
|
|
| MANAGER_QEMU_SEV_SNP_ID | The ID for the Secure Encrypted Virtualization (SEV-SNP) device. | sev0 |
|
|
| MANAGER_QEMU_SEV_SNP_CBITPOS | The position of the C-bit in the physical address. | 51 |
|
|
| MANAGER_QEMU_SEV_SNP_REDUCED_PHYS_BITS | The number of reduced physical address bits for SEV-SNP. | 1 |
|
|
| MANAGER_QEMU_ENABLE_HOST_DATA | Enable additional data for the SEV-SNP host. | false |
|
|
| MANAGER_QEMU_HOST_DATA | Additional data for the SEV-SNP host. | |
|
|
| MANAGER_QEMU_TDX_ID | The ID for the Trust Domain Extensions (TDX) device. | tdx0 |
|
|
| MANAGER_QEMU_QUOTE_GENERATION_PORT | The port number for virtual socket used to communicate with the Quote Generation Service (QGS). | 4050 |
|
|
| MANAGER_QEMU_OVMF_FILE | The file path for the OVMF file (combined OVMF_CODE and OVMF_VARS file). | /usr/share/ovmf/OVMF.fd |
|
|
| MANAGER_QEMU_IGVM_ID | The ID of the IGVM file. | igvm0 |
|
|
| MANAGER_QEMU_IGVM_FILE | The file path to the IGVM file. | /root/coconut-qemu.igvm |
|
|
| MANAGER_QEMU_BIN_PATH | The file path for the QEMU binary. | qemu-system-x86_64 |
|
|
| MANAGER_QEMU_USE_SUDO | Whether to use sudo to run QEMU. | false |
|
|
| MANAGER_QEMU_ENABLE_SEV_SNP | Whether to enable Secure Nested Paging (SEV-SNP). | true |
|
|
| MANAGER_QEMU_ENABLE_TDX | Whether to enable Trust Domain Extensions (TDX). | false |
|
|
| MANAGER_QEMU_ENABLE_KVM | Whether to enable the Kernel-based Virtual Machine (KVM) acceleration. | true |
|
|
| MANAGER_QEMU_MACHINE | The machine type for QEMU. | q35 |
|
|
| MANAGER_QEMU_CPU | The CPU model for QEMU. | EPYC |
|
|
| MANAGER_QEMU_SMP_COUNT | The number of virtual CPUs. | 4 |
|
|
| MANAGER_QEMU_SMP_MAXCPUS | The maximum number of virtual CPUs. | 64 |
|
|
| MANAGER_QEMU_MEM_ID | The ID for the memory device. | ram1 |
|
|
| MANAGER_QEMU_NO_GRAPHIC | Whether to disable the graphical display. | true |
|
|
| MANAGER_QEMU_MONITOR | The type of monitor to use. | pty |
|
|
| MANAGER_QEMU_HOST_FWD_RANGE | The range of host ports to forward. | 6100-6200 |
|
|
| MANAGER_MAX_VMS | The maximum number of vms running concurrently on manager. | 10 |
|
|
|
|
## Setup
|
|
|
|
```sh
|
|
git clone https://github.com/ultravioletrs/cocos
|
|
cd cocos
|
|
```
|
|
|
|
NB: all relative paths in this document are relative to `cocos` repository directory.
|
|
|
|
### QEMU-KVM
|
|
|
|
[QEMU-KVM](https://www.qemu.org/) is a virtualization platform that allows you to run multiple operating systems on the same physical machine. It is a combination of two technologies: QEMU and KVM.
|
|
|
|
- QEMU is an emulator that can run a variety of operating systems, including Linux, Windows, and macOS.
|
|
- [KVM](https://wiki.qemu.org/Features/KVM) is a Linux kernel module that allows QEMU to run virtual machines.
|
|
|
|
To install QEMU-KVM on a Debian based machine, run
|
|
|
|
```sh
|
|
sudo apt update
|
|
sudo apt install qemu-kvm
|
|
```
|
|
|
|
Create `img` directory in `cmd/manager`.
|
|
|
|
#### Virtual filesystem
|
|
|
|
9P (or Plan 9 Filesystem) in QEMU is a lightweight, network-based file-sharing protocol. In Cocos, the 9P is used to transfer environment variables and TLS certificates for cloud communication from the Manager to the Agent.
|
|
|
|
You should define the environment variables in a file called environment. For the number and meaning of the environment variables, please refer to the Agent [Readme](https://github.com/ultravioletrs/cocos/blob/main/agent/README.md).
|
|
|
|
### Prepare Cocos HAL
|
|
|
|
Cocos HAL for Linux is framework for building custom in-enclave Linux distribution. Use the instructions in [Readme](https://github.com/ultravioletrs/cocos/blob/main/hal/linux/README.md).
|
|
Once the image is built copy the kernel and rootfs image to `cmd/manager/img` from `buildroot/output/images/bzImage` and `buildroot/output/images/rootfs.cpio.gz` respectively.
|
|
|
|
Another option is to use release versions of EOS that can be downloaded from the [Cocos GitHub repository](https://github.com/ultravioletrs/cocos/releases).
|
|
|
|
#### Test VM creation
|
|
|
|
```sh
|
|
cd cmd/manager
|
|
|
|
sudo find / -name OVMF_CODE.fd
|
|
# => /usr/share/OVMF/OVMF_CODE.fd
|
|
OVMF_CODE=/usr/share/OVMF/OVMF_CODE.fd
|
|
|
|
sudo find / -name OVMF_VARS.fd
|
|
# => /usr/share/OVMF/OVMF_VARS.fd
|
|
|
|
# Create a local copy of OVMF_VARS.
|
|
cp /usr/share/OVMF/OVMF_VARS.fd .
|
|
|
|
# Create a directory for the environment file and the certificates for cloud certificates.
|
|
mkdir env
|
|
mkdir certs
|
|
|
|
# Enter the env directory and create the environment file.
|
|
cd env
|
|
touch environment
|
|
|
|
# Define Computations endpoint URL for agent.
|
|
# Make sure the Computation endpoint is running (like Cocos Prism).
|
|
echo AGENT_CVM_GRPC_URL=localhost:7001 >> ./environment
|
|
# Define log level for the agent.
|
|
echo AGENT_LOG_LEVEL=debug >> ./environment
|
|
|
|
# Optional: Add AWS/S3 credentials for remote resource access
|
|
# NOTE: AWS credentials can also be passed via the CreateVM API using CLI flags
|
|
# (--aws-access-key-id, --aws-secret-access-key, --aws-endpoint-url, --aws-region)
|
|
# If using the API approach, you don't need to add them to this file.
|
|
# Replace HOST_IP with your host machine IP address (not localhost)
|
|
echo AWS_ACCESS_KEY_ID=minioadmin >> ./environment
|
|
echo AWS_SECRET_ACCESS_KEY=minioadmin >> ./environment
|
|
echo AWS_ENDPOINT_URL=http://HOST_IP:9000 >> ./environment
|
|
echo AWS_REGION=us-east-1 >> ./environment
|
|
|
|
# Return to cmd/manager
|
|
cd ..
|
|
|
|
OVMF_VARS=./OVMF_VARS.fd
|
|
KERNEL="img/bzImage"
|
|
INITRD="img/rootfs.cpio.gz"
|
|
ENV_PATH=./env
|
|
CERTH_PATH=./certs
|
|
|
|
qemu-system-x86_64 \
|
|
-enable-kvm \
|
|
-cpu EPYC-v4 \
|
|
-machine q35 \
|
|
-smp 4 \
|
|
-m 2048M,slots=5,maxmem=10240M \
|
|
-no-reboot \
|
|
-drive if=pflash,format=raw,unit=0,file=$OVMF_CODE,readonly=on \
|
|
-drive if=pflash,format=raw,unit=1,file=$OVMF_VARS \
|
|
-netdev user,id=vmnic,hostfwd=tcp::7020-:7002 \
|
|
-device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= \
|
|
-kernel $KERNEL \
|
|
-append "earlyprintk=serial console=ttyS0" \
|
|
-initrd $INITRD \
|
|
-nographic \
|
|
-monitor pty \
|
|
-monitor unix:monitor,server,nowait \
|
|
-fsdev local,id=env_fs,path=$ENV_PATH,security_model=mapped \
|
|
-device virtio-9p-pci,fsdev=env_fs,mount_tag=env_share \
|
|
-fsdev local,id=cert_fs,path=$CERTH_PATH,security_model=mapped \
|
|
-device virtio-9p-pci,fsdev=cert_fs,mount_tag=certs_share
|
|
```
|
|
Once the VM is booted press enter and on the login use username `root`.
|
|
|
|
#### Build and run Agent
|
|
|
|
Agent is started automatically in the VM.
|
|
```sh
|
|
# List running processes and use 'grep' to filter for processes containing 'agent' in their names.
|
|
ps aux | grep cocos-agent
|
|
# This command helps verify that the 'agent' process is running.
|
|
# The output shows the process ID (PID), resource usage, and other information about the 'cocos-agent' process.
|
|
# For example: 118 root cocos-agent
|
|
```
|
|
|
|
We can also check if `Agent` is reachable from the host machine:
|
|
|
|
```sh
|
|
# Use netcat (nc) to test the connection to localhost on port 7020.
|
|
nc -zv localhost 7020
|
|
# Output:
|
|
# nc: connect to localhost (::1) port 7020 (tcp) failed: Connection refused
|
|
# Connection to localhost (127.0.0.1) 7020 port [tcp/*] succeeded!
|
|
```
|
|
|
|
#### Conclusion
|
|
|
|
Now you are able to use `Manager` with `Agent`. Namely, `Manager` will create a VM with a separate OVMF variables file on manager `/run` request.
|
|
|
|
### OVMF
|
|
|
|
We need [Open Virtual Machine Firmware](https://wiki.ubuntu.com/UEFI/OVMF). OVMF is a port of Intel's tianocore firmware - an open source implementation of the Unified Extensible Firmware Interface (UEFI) - used by a qemu virtual machine. We need OVMF in order to run virtual machine with *focal-server-cloudimg-amd64*. When we install QEMU, we get two files that we need to start a VM: `OVMF_VARS.fd` and `OVMF_CODE.fd`. We will make a local copy of `OVMF_VARS.fd` since a VM will modify this file. On the other hand, `OVMF_CODE.fd` is only used as a reference, so we only record its path in an environment variable.
|
|
|
|
```sh
|
|
sudo find / -name OVMF_CODE.fd
|
|
# => /usr/share/OVMF/OVMF_CODE.fd
|
|
MANAGER_QEMU_OVMF_CODE_FILE=/usr/share/OVMF/OVMF_CODE.fd
|
|
|
|
sudo find / -name OVMF_VARS.fd
|
|
# => /usr/share/OVMF/OVMF_VARS.fd
|
|
MANAGER_QEMU_OVMF_VARS_FILE=/usr/share/OVMF/OVMF_VARS.fd
|
|
```
|
|
|
|
NB: we set environment variables that we will use in the shell process where we run `manager`.
|
|
|
|
### Trusted Platform Module (TPM)
|
|
|
|
The Trusted Platform Module (TPM) plays a fundamental role in this process by providing a tamper-resistant foundation for cryptographic operations, securing sensitive artifacts, measuring system state, and enabling attestation mechanisms.
|
|
|
|
### IGVM
|
|
|
|
An IGVM file contains all the necessary information to launch a virtual machine on different virtualization platforms. It includes setup commands for the guest system and verification data to ensure the VM is loaded securely and correctly.
|
|
|
|
Cocos uses the [COCONUT-SVSM](https://github.com/coconut-svsm/svsm/blob/main/Documentation/docs/installation/INSTALL.md) for the vTPM. The IGVM file contains the OVMF file and the vTPM.
|
|
|
|
## Deployment
|
|
|
|
To start the service, execute the following shell script (note a server needs to be running see [here](../test/cvms/README.md)):
|
|
|
|
The manager can be started as a *systemd* service or a standalone executable. To start the manager as a systemd service, look at the systemd service script [here](https://github.com/ultravioletrs/cocos/blob/main/init/systemd/cocos-manager.service). The environment variables are defined in the `cocos-manager.env` file. Below are examples of how to start the manager.
|
|
|
|
```bash
|
|
# Download the latest version of the service
|
|
git clone git@github.com:ultravioletrs/cocos.git
|
|
|
|
cd cocos
|
|
|
|
# Compile the manager
|
|
make manager
|
|
|
|
# Set the environment variables and run the service
|
|
MANAGER_GRPC_URL=localhost:7001 \
|
|
MANAGER_LOG_LEVEL=debug \
|
|
MANAGER_QEMU_USE_SUDO=false \
|
|
./build/cocos-manager
|
|
```
|
|
|
|
To start SEV-SNP, define the IGVM file that contains the vTPM and the OVMF (combined OVMF_CODE and OVMF_VARS) of the CVM.
|
|
|
|
To enable [AMD SEV-SNP](https://www.amd.com/en/developer/sev.html) support, start manager like this
|
|
|
|
```sh
|
|
MANAGER_GRPC_URL=localhost:7001 \
|
|
MANAGER_LOG_LEVEL=debug \
|
|
MANAGER_QEMU_ENABLE_SEV_SNP=true \
|
|
MANAGER_QEMU_SEV_SNP_CBITPOS=51 \
|
|
MANAGER_QEMU_BIN_PATH=<path to QEMU binary> \
|
|
MANAGER_QEMU_IGVM_FILE=<path to IGVM file> \
|
|
./build/cocos-manager
|
|
```
|
|
|
|
To enable [TDX](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html) support, start manager like this
|
|
|
|
```sh
|
|
MANAGER_GRPC_URL=localhost:7001 \
|
|
MANAGER_LOG_LEVEL=debug \
|
|
MANAGER_QEMU_ENABLE_SEV_SNP=false \
|
|
MANAGER_QEMU_ENABLE_TDX=true \
|
|
MANAGER_QEMU_CPU=host \
|
|
MANAGER_QEMU_BIN_PATH=<path to QEMU binary> \
|
|
MANAGER_QEMU_OVMF_FILE=<path to OVMF file> \
|
|
./build/cocos-manager
|
|
```
|
|
|
|
### Troubleshooting
|
|
|
|
If the `ps aux | grep qemu-system-x86_64` give you something like this
|
|
|
|
```
|
|
darko 13913 0.0 0.0 0 0 pts/2 Z+ 20:17 0:00 [qemu-system-x86] <defunct>
|
|
```
|
|
|
|
means that the a QEMU virtual machine that is currently defunct, meaning that it is no longer running. More precisely, the defunct process in the output is also known as a ["zombie" process](https://en.wikipedia.org/wiki/Zombie_process).
|
|
|
|
You can troubleshoot the VM launch procedure by running directly `qemu-system-x86_64` command. When you run `manager` with `MANAGER_LOG_LEVEL=info` env var set, it prints out the entire command used to launch a VM. The relevant part of the log might look like this
|
|
|
|
```
|
|
{"level":"info","message":"/usr/bin/qemu-system-x86_64 -enable-kvm -machine q35 -cpu EPYC -smp 4,maxcpus=64 -m 4096M,slots=5,maxmem=30G -drive if=pflash,format=raw,unit=0,file=/usr/share/OVMF/OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=img/OVMF_VARS.fd -device virtio-scsi-pci,id=scsi,disable-legacy=on,iommu_platform=true -drive file=img/focal-server-cloudimg-amd64.img,if=none,id=disk0,format=qcow2 -device scsi-hd,drive=disk0 -netdev user,id=vmnic,hostfwd=tcp::2222-:22,hostfwd=tcp::9301-:9031,hostfwd=tcp::7020-:7002 -device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= -nographic -monitor pty","ts":"2023-08-14T18:29:19.2653908Z"}
|
|
```
|
|
|
|
You can run the command - the value of the `"message"` key - directly in the terminal:
|
|
|
|
```sh
|
|
/usr/bin/qemu-system-x86_64 -enable-kvm -machine q35 -cpu EPYC -smp 4,maxcpus=64 -m 4096M,slots=5,maxmem=30G -drive if=pflash,format=raw,unit=0,file=/usr/share/OVMF/OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=img/OVMF_VARS.fd -device virtio-scsi-pci,id=scsi,disable-legacy=on,iommu_platform=true -drive file=img/focal-server-cloudimg-amd64.img,if=none,id=disk0,format=qcow2 -device scsi-hd,drive=disk0 -netdev user,id=vmnic,hostfwd=tcp::2222-:22,hostfwd=tcp::9301-:9031,hostfwd=tcp::7020-:7002 -device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= -nographic -monitor pty
|
|
```
|
|
|
|
and look for the possible problems. This problems can usually be solved by using the adequate env var assignments. Look in the `manager/qemu/config.go` file to see the recognized env vars. Don't forget to prepend `MANAGER_QEMU_` to the name of the env vars.
|
|
|
|
#### Kill `qemu-system-x86_64` processes
|
|
|
|
To kill any leftover `qemu-system-x86_64` processes, use
|
|
|
|
```sh
|
|
pkill -f qemu-system-x86_64
|
|
```
|
|
|
|
The pkill command is used to kill processes by name or by pattern. The -f flag to specify that we want to kill processes that match the pattern `qemu-system-x86_64`. It sends the SIGKILL signal to all processes that are running `qemu-system-x86_64`.
|
|
|
|
If this does not work, i.e. if `ps aux | grep qemu-system-x86_64` still outputs `qemu-system-x86_64` related process(es), you can kill the unwanted process with `kill -9 <PID>`, which also sends a SIGKILL signal to the process.
|
|
|
|
## Usage
|
|
|
|
For more information about service capabilities and its usage, please check out the [README documentation](../README.md).
|