COCOS-346 - Explore cloud init for Cloud setup (#357)
CI / ci (push) Has been cancelled
Rust CI Pipeline / rust-check (push) Has been cancelled

* Add qemu cloud init

Signed-off-by: Jilks Smith <smithjilks@gmail.com>

* Update qemu cloud init

Signed-off-by: Jilks Smith <smithjilks@gmail.com>

* Add qemu cloud init

Signed-off-by: Jilks Smith <smithjilks@gmail.com>

* Update qemu cloud init

Signed-off-by: Jilks Smith <smithjilks@gmail.com>

* Update qemu cloud config

* Update cloud init

Signed-off-by: Jilks Smith <smithjilks@gmail.com>

* Update cloud init

Signed-off-by: Jilks Smith <smithjilks@gmail.com>

* Add cloud init README.md

Signed-off-by: Jilks Smith <smithjilks@gmail.com>

* Add cocos release workflow

Signed-off-by: Jilks Smith <smithjilks@gmail.com>

---------

Signed-off-by: Jilks Smith <smithjilks@gmail.com>
This commit is contained in:
Smith Jilks
2025-01-31 17:48:26 +03:00
committed by GitHub
parent 5969ae3bcb
commit da88fe1e45
6 changed files with 518 additions and 8 deletions
+15 -8
View File
@@ -1,9 +1,9 @@
name: Build and Release
name: Build and Release Hal
on:
push:
tags:
- '*'
- "*"
jobs:
build:
@@ -32,8 +32,8 @@ jobs:
with:
root-reserve-mb: 35000
swap-size-mb: 1024
remove-dotnet: 'true'
remove-android: 'true'
remove-dotnet: "true"
remove-android: "true"
- name: Check free space
run: |
echo "Free space:"
@@ -48,26 +48,33 @@ jobs:
- name: Checkout cocos
uses: actions/checkout@v4
with:
repository: 'ultravioletrs/cocos'
repository: "ultravioletrs/cocos"
path: cocos
- name: Checkout buildroot
uses: actions/checkout@v4
with:
repository: 'buildroot/buildroot'
repository: "buildroot/buildroot"
path: buildroot
ref: 2024.11-rc2
- name: Build
- name: Build hal
run: |
cd buildroot
make BR2_EXTERNAL=../cocos/hal/linux cocos_defconfig
make
- name: Build cocos
run: |
cd cocos
make
- name: Release
uses: softprops/action-gh-release@v2
with:
files: |
buildroot/output/images/bzImage
buildroot/output/images/rootfs.cpio.gz
cocos/build/cocos-agent
cocos/build/cocos-cli
cocos/build/cocos-manager
+89
View File
@@ -0,0 +1,89 @@
#### memory config
MEMORY_SIZE=2048M
MEMORY_SLOTS=5
MAX_MEMORY=30G
#### ovmf code config
OVMF_CODE_IF=pflash
OVMF_CODE_FORMAT=raw
OVMF_CODE_UNIT=0
OVMF_CODE_FILE=/usr/share/OVMF/OVMF_CODE.fd
OVMF_CODE_READONLY=on
OVMF_VERSION=
#### ovmf vars config
OVMF_VARS_IF=pflash
OVMF_VARS_FORMAT=raw
OVMF_VARS_UNIT=1
OVMF_VARS_FILE=/usr/share/OVMF/OVMF_VARS.fd
#### net dev config
NET_DEV_ID=vmnic
NET_DEV_HOST_FWD_AGENT=7020
NET_DEV_GUEST_FWD_AGENT=7002
#### Virtio Net Pci Config
VIRTIO_NET_PCI_DISABLE_LEGACY=on
VIRTIO_NET_PCI_IOMMU_PLATFORM=true
VIRTIO_NET_PCI_ADDR=0x2
VIRTIO_NET_PCI_ROMFILE=
#### Disk image config
DISK_IMG_KERNEL_FILE=
DISK_IMG_ROOTFS_FILE=
KERNEL_COMMAND_LINE="quiet console=null"
#### Sev Config
SEV_ID=sev0
SEV_CBIT_POS=51
SEV_REDUCED_PHYS_BITS=1
SEV_HOST_DATA=
#### VSock Config
VSOCK_ID=vhost-vsock-pci0
VSOCK_GUEST_CID=3
BIN_PATH=qemu-system-x86_64
USE_SUDO=false
ENABLE_SEV=false
ENABLE_SEV_SNP=false
ENABLE_KVM=true
MACHINE=q35
CPU=EPYC
SMP_COUNT=8
SMP_MAXCPUS=64
MEM_ID=ram1
KERNEL_HASH=false
NO_GRAPHIC=true
MONITOR=pty
HOST_FWD_RANGE=6100-6200
CERTS_MOUNT=/etc/cocos/certs
ENV_MOUNT=/etc/cocos/environment
COCOS_AGENT_VERSION=v0.3.1
#### Base image URL and names
BASE_IMAGE_URL=https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img
BASE_IMAGE=ubuntu-base.qcow2
CUSTOM_IMAGE=ubuntu-custom.qcow2
#### Paths for OVMF firmware
OVMF_CODE=/usr/share/ovmf/x64/OVMF_CODE.4m.fd
OVMF_VARS=/usr/share/ovmf/x64/OVMF_VARS.4m.fd
#### VM parameters
VM_NAME=cocos-vm
RAM=16G
DISK_SIZE=10G # Size for root filesystem
QEMU_BINARY=qemu-system-x86_64
AGENT_GRPC_SERVER_CERT=/etc/cocos/certs/server.pem
AGENT_GRPC_SERVER_KEY=/etc/cocos/certs/key.pem
AGENT_GRPC_SERVER_CA_CERTS=/etc/cocos/ca.pem
AGENT_GRPC_CLIENT_CA_CERTS=/etc/cocos/ca.pem
+114
View File
@@ -0,0 +1,114 @@
# Agent Cloud Init Setup
## Overview
The `hal/cloud` directory contains essential files required for setting up a virtual machine (VM) with cloud-init. This setup ensures the automated installation of dependencies, configuration of the environment, and deployment of the Cocos agent as a systemd service.
### Directory Contents
- **`config.yaml`**: This YAML file provides configuration instructions for the cloud image.
- **`meta-data`**: Contains VM metadata, such as instance-specific details and identifiers.
- **`qemu.sh`**: A Bash script for downloading and configuring a cloud image, running QEMU to simulate a VM with the cloud-init configuration.
- **`.env`**: Contains environment variables for starting the VM in different modes, configuring disk space, memory allocation, and other parameters.
## Configuration
### Preparing the Cloud-Config File
The `config.yaml` file defines system configurations, including user creation, package installations, file management, and command execution.
Ensure that the cloud-config file is set up with the following configurations:
- **User Credentials**: Specify the default username and password.
- **Certificates and Keys**: Certificate files for agent for secure communication.
- **Environment Variables**: Configuration parameters required by the system.
The `config.yaml` file is divided into multiple sections, each addressing a specific aspect of the setup process.
### 1. User Configuration
This section creates a default user with specific permissions and configurations:
- Creates a user named **`cocos_user`**.
- Adds `cocos_user` to the `sudo` and `docker` groups.
- Sets a default password (should be changed for production use).
- Configures the users shell as `/bin/bash`.
### 2. Package Installation
Installs essential system packages required for various operations:
- **`curl`**: For downloading files from the web.
- **`make`**: A utility for building software.
- **`git`**: Version control system for managing code repositories.
- **`python3` and `python3-dev`**: Required for running Python-based tools.
- **`net-tools`**: Provides networking utilities such as `ifconfig` and `route`.
### 3. File Management (write_files)
Creates and configures critical files required for the setup:
- **Certificates**: Cert files (`cert.pem`, `ca.pem`, `key.pem`) located at `/etc/cocos/certs/`.
- **Environment Variables**: An env file stored at `/etc/cocos/environment`.
- **Systemd Service File**: Cocos agent service configuration file at `/etc/systemd/system/cocos-agent.service` for managing the Cocos agent.
- **Agent Scripts**:
- `agent_setup.sh`: Configures network interfaces and resizes the root filesystem.
- `agent_start_script.sh`: Sets up Docker and starts the Cocos agent.
### 4. Execution of Commands (runcmd)
A sequence of commands is executed to finalize the setup:
- Creates necessary directories: `/cocos`, `/cocos_init`, `/var/log/cocos`, `/etc/cocos`.
- Downloads and installs the Cocos agent binary.
- Installs **Wasmtime** and configures its environment variables.
- Installs **Docker** and adds `cocos_user` to the Docker group.
- Reloads systemd and enables the Cocos agent service.
## Running the Agent
To test the cloud-init configuration, execute the `qemu.sh` script to bring up a VM using QEMU:
```bash
sudo ./qemu.sh
```
**Important:** The script must be executed as root.
Once the QEMU boots the VM, the Cocos agent will run as a systemd service. The service is configured to start automatically on boot and restart in case of failure.
## Debugging and Monitoring
For troubleshooting and monitoring the Cocos agent service, use the following commands within the VM:
### Manually Start the Service
To manually start the agent service, execute:
```bash
sudo systemctl start cocos-agent.service
```
### Verify Service Status
To check if the service is running properly, use:
```bash
sudo systemctl status cocos-agent.service
```
### View Service Logs
To inspect logs generated by the agent service, execute:
```bash
journalctl -u cocos-agent.service
```
### Check Standard Output and Error Logs
To check logs stored in the system, use:
```bash
cat /var/log/cocos/agent.stdout.log
cat /var/log/cocos/agent.stderr.log
```
+174
View File
@@ -0,0 +1,174 @@
#cloud-config
package_update: true
package_upgrade: false
users:
- default
- name: cocos_user
gecos: Default User
groups:
- sudo
- docker # Add cocos user to the docker group
sudo:
- ALL=(ALL:ALL) ALL
shell: /bin/bash
chpasswd:
list: |
cocos_user:password
expire: False
ssh_pwauth: True
packages:
- curl
- make
- git
- python3
- python3-dev
- net-tools # Add net-tools to install the 'route' command
write_files:
- path: /etc/cocos/certs/cert.pem
content: |
# Add certificate content here
permissions: "0644"
- path: /etc/cocos/certs/ca.pem
content: |
# Add CA certificate content here
permissions: "0644"
- path: /etc/cocos/certs/key.pem
content: |
# Add private key content here
permissions: "0600"
- path: /etc/cocos/environment
content: |
# Add environment variables here
permissions: "0644"
- path: /etc/systemd/system/cocos-agent.service
content: |
[Unit]
Description=Cocos AI agent
After=network.target
Before=docker.service
[Service]
WorkingDirectory=/cocos
StandardOutput=file:/var/log/cocos/agent.stdout
StandardError=file:/var/log/cocos/agent.stderr
EnvironmentFile=/etc/cocos/environment
ExecStartPre=/cocos_init/agent_setup.sh
ExecStart=/cocos_init/agent_start_script.sh
Restart=always
[Install]
WantedBy=default.target
permissions: "0644"
# Agent setup script
- path: /cocos_init/agent_setup.sh
content: |
#!/bin/sh
WORK_DIR="/cocos"
# IFACES are all network interfaces excluding lo (LOOPBACK) and sit interfaces
IFACES=$(ip link show | grep -vE 'LOOPBACK|sit*' | awk -F': ' '{print $2}')
# This for loop brings up all network interfaces in IFACES and dhclient obtains an IP address for the every interface
for IFACE in $IFACES; do
STATE=$(ip link show $IFACE | grep DOWN)
if [ -n "$STATE" ]; then
ip link set $IFACE up
fi
IP_ADDR=$(ip addr show $IFACE | grep 'inet ')
if [ -z "$IP_ADDR" ]; then
dhclient $IFACE
fi
done
if [ ! -d "$WORK_DIR" ]; then
mkdir -p $WORK_DIR
fi
# Resize the root filesystem to 100% of available space
ROOT_DEV=$(findmnt / -o SOURCE -n) # Get the root filesystem device
resize2fs "$ROOT_DEV" && echo "Root filesystem resized successfully" || echo "Failed to resize root filesystem"
permissions: "0755"
# Agent start script
- path: /cocos_init/agent_start_script.sh
content: |
#!/bin/sh
# Change the docker.service file to allow Docker to run in RAM
mkdir -p /etc/systemd/system/docker.service.d
# Create or overwrite the override.conf file with the new Environment variable
tee /etc/systemd/system/docker.service.d/override.conf > /dev/null <<EOF
[Service]
Environment=DOCKER_RAMDISK=true
EOF
systemctl daemon-reload
NUM_OF_PERMITED_IFACE=1
NUM_OF_IFACE=$(ip route | grep -Eo 'dev [a-z0-9]+' | awk '{ print $2 }' | grep -v '^docker' | sort | uniq | wc -l)
if [ $NUM_OF_IFACE -gt $NUM_OF_PERMITED_IFACE ]; then
echo "More than one network interface in the VM"
exit 1
fi
DEFAULT_IFACE=$(route | grep '^default' | grep -o '[^ ]*$')
AGENT_GRPC_HOST=$(ip -4 addr show $DEFAULT_IFACE | grep inet | awk '{print $2}' | cut -d/ -f1)
export AGENT_GRPC_HOST
exec /bin/cocos-agent
permissions: "0755"
runcmd:
# Create necessary directories
- mkdir -p /cocos
- mkdir -p /cocos_init
- mkdir -p /var/log/cocos
- mkdir -p /etc/cocos
# Download the cocos-agent binary
- echo "[ COCOS AGENT SETUP ] Downloading the cocos-agent binary..."
- curl -L -O -J https://github.com/smithjilks/cocos/releases/download/v1.0.0/cocos-agent --progress-bar && echo "[ COCOS AGENT SETUP ] cocos-agent binary downloaded successfully" || echo "Failed to download cocos-agent binary"
# Install the agent binary
- echo "[ COCOS AGENT SETUP ] Installing cocos-agent binary..."
- install -D -m 0755 cocos-agent /bin/cocos-agent && echo "[ COCOS AGENT SETUP ] cocos-agent binary installed successfully" || echo "[ COCOS AGENT SETUP ] Failed to install cocos-agent binary"
# Install Wasmtime
- echo "Installing Wasmtime runtime..."
- curl https://wasmtime.dev/install.sh -sSf | bash && echo "Wasmtime installed successfully" || echo "Failed to install Wasmtime"
- echo "Configuring Wasmtime environment variables..."
- echo "export WASMTIME_HOME=$HOME/.wasmtime" >> /etc/profile.d/wasm_env.sh
- echo "export PATH=\$WASMTIME_HOME/bin:\$PATH" >> /etc/profile.d/wasm_env.sh
- . /etc/profile.d/wasm_env.sh && echo "Wasmtime environment variables configured successfully" || echo "Failed to configure Wasmtime environment variables"
# Install Docker
- echo "Starting Docker installation..."
- curl -fsSL https://get.docker.com -o get-docker.sh && echo "Docker install script downloaded successfully" || echo "Failed to download Docker install script"
- sh ./get-docker.sh && echo "Docker installed successfully" || echo "Failed to install Docker"
- usermod -aG docker cocos_user && echo "Added cocos_user to the docker group" || echo "Failed to add cocos_user to the docker group"
# Reload systemd and enable the service
- echo "[ COCOS AGENT SETUP ] Reloading systemd daemon..."
- systemctl daemon-reload && echo "[ COCOS AGENT SETUP ] Systemd daemon reloaded successfully" || echo "[ COCOS AGENT SETUP ] Failed to reload systemd daemon"
- echo "[ COCOS AGENT SETUP ] Enabling cocos-agent.service..."
- systemctl enable cocos-agent.service && echo "[ COCOS AGENT SETUP ] cocos-agent.service enabled successfully" || echo "[ COCOS AGENT SETUP ] Failed to enable cocos-agent.service"
- echo "[ COCOS AGENT SETUP ] Starting cocos-agent.service..."
- systemctl start cocos-agent.service && echo "[ COCOS AGENT SETUP ] cocos-agent.service started successfully" || echo "[ COCOS AGENT SETUP ] Failed to start cocos-agent.service"
final_message: "Cocos agent setup complete. Verify logs to confirm successful service startup."
+2
View File
@@ -0,0 +1,2 @@
instance-id: iid-cocos-vm
local-hostname: cocos-vm
+124
View File
@@ -0,0 +1,124 @@
#!/bin/bash
# Source environment variables
source ./.env
# Required commands
REQUIRED_CMDS=("wget" "cloud-localds" "$QEMU_BINARY" "qemu-img")
# Check for required commands
for cmd in "${REQUIRED_CMDS[@]}"; do
if ! command -v "$cmd" &> /dev/null; then
echo "Error: $cmd is not installed. Please install it and try again."
exit 1
fi
done
# Ensure script is run as root
if [[ $EUID -ne 0 ]]; then
echo "Error: This script must be run as root."
exit 1
fi
# Create the root filesystem image if it doesn't exist
if [ ! -f "$BASE_IMAGE" ]; then
echo "Downloading base Ubuntu image..."
wget -q "$BASE_IMAGE_URL" -O "$BASE_IMAGE" --show-progress
fi
# Create custom image
echo "Creating custom QEMU image..."
qemu-img create -f qcow2 -b "$BASE_IMAGE" -F qcow2 "$CUSTOM_IMAGE" "$DISK_SIZE"
# Cloud-init configuration files
CLOUD_CONFIG="config.yaml"
META_DATA="meta-data"
SEED_IMAGE="seed.img"
# Create seed image for cloud-init
echo "Creating seed image..."
cloud-localds "$SEED_IMAGE" "$CLOUD_CONFIG" "$META_DATA"
# Construct QEMU arguments from environment variables
construct_qemu_args() {
args=()
args+=("-name" "$VM_NAME")
# Virtualization (Enable KVM)
if [ "$ENABLE_KVM" == "true" ]; then
args+=("-enable-kvm")
fi
# Machine, CPU, RAM
if [ -n "$MACHINE" ]; then
args+=("-machine" "$MACHINE")
fi
if [ -n "$CPU" ]; then
args+=("-cpu" "$CPU")
fi
args+=("-boot" "d")
args+=("-smp" "$SMP_COUNT,maxcpus=$SMP_MAXCPUS")
args+=("-m" "$MEMORY_SIZE,slots=$MEMORY_SLOTS,maxmem=$MAX_MEMORY")
# OVMF (if applicable)
if [ "$ENABLE_SEV_SNP" != "true" ]; then
args+=("-drive" "if=$OVMF_CODE_IF,format=$OVMF_CODE_FORMAT,unit=$OVMF_CODE_UNIT,file=$OVMF_CODE,readonly=$OVMF_CODE_READONLY")
args+=("-drive" "if=$OVMF_VARS_IF,format=$OVMF_VARS_FORMAT,unit=$OVMF_VARS_UNIT,file=$OVMF_VARS")
fi
# Network configuration
args+=("-netdev" "user,id=$NET_DEV_ID,hostfwd=tcp::$NET_DEV_HOST_FWD_AGENT-:$NET_DEV_GUEST_FWD_AGENT")
args+=("-device" "virtio-net-pci,disable-legacy=$VIRTIO_NET_PCI_DISABLE_LEGACY,iommu_platform=$VIRTIO_NET_PCI_IOMMU_PLATFORM,netdev=$NET_DEV_ID,addr=$VIRTIO_NET_PCI_ADDR,romfile=$VIRTIO_NET_PCI_ROMFILE")
args+=("-device" "vhost-vsock-pci,id=$VSOCK_ID,guest-cid=$VSOCK_GUEST_CID")
# SEV (if enabled)
if [ "$ENABLE_SEV" == "true" ] || [ "$ENABLE_SEV_SNP" == "true" ]; then
sev_type="sev-guest"
kernel_hash=""
host_data=""
args+=("-machine" "confidential-guest-support=$SEV_ID,memory-backend=$MEM_ID")
if [ "$ENABLE_SEV_SNP" == "true" ]; then
args+=("-bios" "$OVMF_CODE_FILE")
sev_type="sev-snp-guest"
if [ -n "$SEV_HOST_DATA" ]; then
host_data=",host-data=$SEV_HOST_DATA"
fi
fi
if [ "$ENABLE_KERNEL_HASH" == "true" ]; then
kernel_hash=",kernel-hashes=on"
fi
args+=("-object" "memory-backend-memfd,id=$MEM_ID,size=$MEMORY_SIZE,share=true,prealloc=false")
args+=("-object" "$sev_type,id=$SEV_ID,cbitpos=$SEV_CBIT_POS,reduced-phys-bits=$SEV_REDUCED_PHYS_BITS$kernel_hash$host_data")
fi
# Disk image configuration
args+=("-drive" "file=$SEED_IMAGE,media=cdrom")
args+=("-drive" "file=$CUSTOM_IMAGE,if=none,id=disk0,format=qcow2")
args+=("-device" "virtio-scsi-pci,id=scsi,disable-legacy=on,iommu_platform=true")
args+=("-device" "scsi-hd,drive=disk0")
# Display options
if [ "$NO_GRAPHIC" == "true" ]; then
args+=("-nographic")
fi
args+=("-monitor" "$MONITOR")
args+=("-no-reboot")
args+=("-vnc" ":9")
echo "${args[@]}"
}
qemu_args=$(construct_qemu_args)
echo "Running QEMU with the following arguments: $qemu_args"
echo "Starting QEMU VM..."
$QEMU_BINARY $qemu_args