* Implemented mTLS support across services
Extended gRPC configuration to support mutual TLS (mTLS) in agent and manager components for enhanced security. This includes the loading of Certificate Authority (CA) certificates, server, and client certificates, and keys. Updated README documentation to reflect the new environment variables required for mTLS configuration. Additionally, streamlined secure gRPC client connection setup and logging messages to indicate whether a service is running with TLS, mTLS, or without TLS.
The change ensures secure communication between services by verifying both client and server identities, thus addressing potential security concerns in network-level interactions.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Enhance agent cert handling and update copyright
- Implement function to create certificate files for the agent configuration dynamically, ensuring file paths are updated to reflect newly created files. This improves the agent's setup process by automating the certificate handling.
- Update copyright clause to reflect the new owning entity, Ultraviolet, affirming correct attribution and compliance with legal requirements.
- Refactor gRPC client connection code to remove redundant package alias, streamlining the codebase and improving readability.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Refactor cert loading with fallbacks
Removed redundant certificate file creation logic in the agent module and introduced a more robust loading mechanism in the gRPC server module to support direct byte content aside from file paths. This change simplifies the initial setup process for the agent by removing the need to create certificate files preemptively, thereby streamlining deployment in environments with varying filesystem access. It supports using certificate contents directly, enhancing compatibility with in-memory configurations or environments where file storage may not be ideal.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* fix lint
Signed-off-by: SammyOina <sammyoina@gmail.com>
---------
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Refactor GRPC manager service and client
The manager service and client have been restructured for stream communication, facilitating real-time agent events, logs, and run responses. The `Run` RPC is replaced by the `Process` stream RPC, enabling bidirectional streaming between clients and the manager service. This allows continuous interchange of different message types including `WhoAmIRequest`, `AgentLog`, `AgentEvent`, and `RunResponse`.
Several message types have been adjusted and new fields introduced, like `AgentPort` in `RunResponse` and various agent-config attributes including CA files and instance IDs, to support TLS client authentication and distinguish between agent instances.
We've also incorporated `google.protobuf.Timestamp` in `AgentEvent` for precise event logging. The client code reflects these modifications with updated method calls and stream handling logic for ongoing communication. Moreover, the updates necessitate corresponding changes throughout service, grpc, and sdk layers to interoperate with the new streaming approach.
The transition to streaming paves the way for a more interactive, flexible communication system that can accommodate future expansion and real-time monitoring features.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* fix lint
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Update GitHub Actions to Latest Versions
Upgraded GitHub Actions 'checkout' to version 4 and 'setup-go' to version 5 across various workflow files to leverage the latest features and improvements for better performance and reliability. This also ensures compatibility with Go version 1.21.x which is specified in the workflows.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Refactor event handling and logging
Reworked event and log processing to use channels instead of direct HTTP calls. Removed obsolete events package and consolidated event structures, leading to cleaner and more maintainable code. Updated agent events to use channels, enhanced error handling in log forwarding, and simplified manager `New` function signature to accept an event channel directly.
- Removed `events` and `agentevents` packages to reduce complexity.
- Replaced direct event server communication with internal channel usage.
- Introduced `AgentEvent` struct in events.go for standardized event objects.
- Adapted `managerService` to dispatch events and logs through channels.
- Streamlined manager construction by removing the now-unnecessary event service and host IP parameters.
This change results in a more robust and easier to extend event and log management system within the agent-manager interaction.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* fix ci
Signed-off-by: SammyOina <sammyoina@gmail.com>
* remove unused code
Signed-off-by: SammyOina <sammyoina@gmail.com>
* add comments
Signed-off-by: SammyOina <sammyoina@gmail.com>
---------
Signed-off-by: SammyOina <sammyoina@gmail.com>
The HTTP server-related code, documentation, and configurations have been removed as part of a shift towards prioritizing gRPC for service communication. This update includes deletions of HTTP host and port configs across various components, the manager HTTP API alongside its Swagger definition, and the removal of related scaffolding and utility code. This change simplifies the overall architecture and eliminates redundant HTTP support, focusing on optimizing gRPC performance and security features.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Ensure graceful shutdown and improve connection handling
Refined the network connection handling in both agent events and logs to continuously process incoming data in a loop, enabling the services to handle more than a single message per connection. Additionally, instituted a deferred close operation for the event service to guarantee resources are cleanly released upon the application's termination.
Resolves potential resource leakage and enhances log processing efficiency.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Improve robustness in event and log handling
Altered handling in agent event and log services to continue processing incoming data rather than returning early upon encountering errors. This ensures that a single erroneous data point does not prematurely halt the processing loop, improving the robustness and reliability of the services. Potential errors are now reported and logged, yet the system remains operational to handle subsequent data.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Upgrade gRPC telemetry to use StatsHandler
Switched from using gRPC's UnaryInterceptor for telemetry to the more comprehensive StatsHandler provided by otelgrpc. This enhances telemetry collection by allowing the capture of a wider range of RPC stats, leading to improved monitoring and observability of the gRPC server.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* use constants
Signed-off-by: sammy <sammyoina@gmail.com>
---------
Signed-off-by: SammyOina <sammyoina@gmail.com>
Signed-off-by: sammy <sammyoina@gmail.com>
* Simplify event handling and config
Streamlined event service interface by consolidating `SendEvent` and introducing `SendRaw`. Removed `notification_server_url` and `instance_id` parameters from several event publication calls to leverage centralized event construction. This change not only cleans up redundancy in event-related code but also simplifies the configuration data flow across the system, making it easier to manage and less error-prone. Uniform event generation now improves consistency and maintainability.
Refactored configuration management in the agent and manager services. Removed notifications URL from the agent configuration, relying on a simplification that assumes a single source of events. Updated Manager Port to VsockConfigPort for clarity and consistency across vsock communication.
These modifications should facilitate easier integration and extension of event and configuration systems in the future.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* fix lint
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Refactor error handling in agent event forwarding
Introduced context and error channel handling to the agent event forwarding process. The logger now warns on errors during forward operations asynchronously, allowing for non-blocking error reporting. Additionally, reliance on the global logger was removed in favor of passing error information via channels, improving modularity and error flow control.
Resolves issue with silent forwarding failures by providing a means to alert system operators without halting the service. This enhancement makes the error reporting more robust and reactive while maintaining service continuity.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* remove unused field
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Enhance agent logging via vsock connection
Redirected agent logging to use a vsock connection instead of standard output, improving the process isolation and enabling centralized log management. The change involved dialing to the specified vsock log port and initializing the logger with the vsock connection rather than stdout.
Additionally, the manager service now maintains a map of agent vsock cids to computation IDs, providing better tracking of computation resources. A routine to retrieve logs from agents was also initiated during the service setup to facilitate log collection.
Consequential to these changes is the removal of a redundant os package import in the agent's main.go, further cleaning up the dependencies.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* fail gracefully
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Updated backoff strategy and VM configurations
- Added `github.com/cenkalti/backoff` to direct dependencies for robust retry logic in agent configuration sending.
- Modified the vsock logs port to align with the updated port range standards.
- Enclosed kernel console arguments in quotes to ensure proper parsing in QEMU configurations.
- Implemented exponential backoff when sending agent configurations to handle transient failures.
Refactors:
- Streamlined creation of `AgentConfig` within the computation setup to avoid unnecessary initializations when `c.AgentConfig` is not nil.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Refactor command execution and improve argument construction
Consolidated the error handling in the command execution function for better readability. In the QEMU configuration, the argument assembly process is enhanced for clarity and correctness; the VNC parameter is now separate, and string quoting is handled properly for kernel parameters. These changes result in more maintainable code and prevent potential formatting issues during QEMU argument parsing.
Resolves issues with argument construction in QEMU config module.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Refine default config handling and unpacking
Improved the agent configuration by dynamically setting default values for the log level and port if they are not specified in the incoming configuration. Also streamlined configuration unpacking in the endpoint and service layers, reducing redundancy and ensuring all required fields are correctly copied over to the Manager's configuration structure. This change ensures better fault tolerance and more maintainable code by handling edge cases where configuration values might be missing.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* rename dir
Signed-off-by: SammyOina <sammyoina@gmail.com>
* fix lint
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Ensure runRes.Empty() reflects non-empty state
Changed the always-true return value of the `runRes.Empty()` method to `false` to accurately indicate the presence of a response body. This adjustment ensures downstream handling of API responses aligns with actual content state.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Replace mglog with slog across codebase
Updated various components to replace the `mglog` logger implementation with the `slog` logger. This change affects logging initialization and calls throughout the codebase including the agent, manager, and internal server components. Transitioning to `slog` is part of a broader shift to standardize the logging mechanism to improve maintainability and consistency.
Signed-off-by: SammyOina <sammyoina@gmail.com>
---------
Signed-off-by: SammyOina <sammyoina@gmail.com>
* add status endpoint
Signed-off-by: sammy <sammyoina@gmail.com>
* feat: Update code generation tools to latest versions
Update the code generation tools, including protoc-gen-go and
protoc-gen-go-grpc, to their latest versions (v1.31.0 and v1.3.0,
respectively). This ensures compatibility with the latest features and
improvements. The updated tools also require gRPC-Go v1.32.0 or later.
The new versions bring important updates and bug fixes, enhancing the
performance and stability of the generated code. By staying up-to-date
with the latest tooling, we can take advantage of the latest
functionality and ensure a smooth development experience.
No code changes are included in this commit. These updates only impact
the code generation process.
Signed-off-by: sammy <sammyoina@gmail.com>
* Update Go version to 1.21.x
Signed-off-by: sammy <sammyoina@gmail.com>
* Refactor agent and manager services to publish event notifications
The refactoring includes changes to the agent and manager services to incorporate event notifications. By publishing events, the services can inform subscribers about the current state of the computation or any updates. Specifically, the `agentService` now includes a `cmpHash` field to store the SHA-256 hash of the computation, which is subsequently used when publishing events. The `agentService` and `managerService` now use the `publisher` interface to publish events to the topic "manager". Notably, the removed `pubsub.go` file is no longer necessary.
This commit improves the service architecture by allowing subscribers to receive relevant updates and monitor the progress of computations. It enhances the overall system by providing more transparency and enabling better coordination between the agent and manager services.
Signed-off-by: sammy <sammyoina@gmail.com>
* Improve generated Go file comparison in checkproto workflow
Refactor the file comparison logic in the checkproto workflow to use the `-p` flag instead of `-s` for improved accuracy. This change ensures that the generated Go files are thoroughly compared with the original ones, detecting any discrepancies and preventing out-of-sync files from passing the validation. By using the `-p` flag, we now check both the contents and the metadata of the files, providing more robust synchronization checks. This update enhances the reliability of the checkproto workflow and helps maintain consistency between the proto files and their corresponding generated Go files.
Signed-off-by: sammy <sammyoina@gmail.com>
* Update file comparison command to detect differences line by line
The code change updates the file comparison command used in the CI workflow to detect differences line by line instead of only reporting the first difference encountered. This change improves the accuracy of detecting inconsistencies between the original protobuf files and the generated Go files. Previously, only the first difference was reported, leading to potential missed issues. By comparing the files line by line, we can now detect and report all differences accurately. This change enhances the reliability of our CI pipeline and ensures that the generated Go files stay in sync with the protobuf files.
Signed-off-by: sammy <sammyoina@gmail.com>
* add event exporting to external server
Signed-off-by: sammy <sammyoina@gmail.com>
* feat: Add support for notification server URL
The commit adds a new environment variable, `COCOS_NOTIFICATION_SERVER_URL`, which allows specifying the server to receive notification events from the agent. This addition provides flexibility to configure the notification server URL based on the deployment environment. This change enables seamless integration with different notification server instances and enhances the extensibility of the system. It resolves the need to modify the code directly when changing the server URL.
Signed-off-by: sammy <sammyoina@gmail.com>
* Refactor gRPC client and server, remove unused handlers
The commit refactors the gRPC client and server code by removing the unused `nopDecoder` and `status` handlers from the client and server, respectively. This cleanup reduces code clutter and improves maintenance. No significant consequences are expected.
Signed-off-by: sammy <sammyoina@gmail.com>
* Ensure generated Go files stay in sync with proto files during the CI workflow
Fixes an issue in the CI workflow where proto files and their corresponding generated Go files were not being properly compared for synchronization. Previously, the `cmp -l` command was used, which only printed differing byte positions, leading to false negatives. This has been corrected by using `cmp -s` instead, which outputs nothing if the files are identical. This change ensures that any differences between the proto files and their generated Go files will be detected, helping to maintain consistency and accuracy in the codebase.
Signed-off-by: sammy <sammyoina@gmail.com>
* Enhance notification payload and endpoint
Extended the notification system to include 'status' and 'details' in the payload, improving traceability and debugging. Adapted the serialized JSON structure for clarity and added an 'originator' field to track the source service. Transitioned to a generalized event endpoint, facilitating a more streamlined event handling process.
Refactors POST request to a more appropriate endpoint and updates the notification service interface to reflect new payload requirements.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Refactor event notification logic
Removed the legacy notifications package and consolidated event notification functionality using the new internal events service. Modified agent, manager, and main application code to use this service for consistent event reporting and error handling workflows across services. This change simplifies event management, improves error visibility, and allows for more maintainable code by centralizing event-related logic. The substitution of verbose state-specific publishEvent calls with generic status reporting aligns with the new service's capabilities.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Expand agent protobuf message types and improve error handling
The protobuf definition for agent messages has been updated to include an additional message type, facilitating future data structure expansions. Additionally, error handling for event sending in the main agent execution has been enhanced to log errors when sending 'init' events fail, ensuring issues are properly tracked. The unused `notificationTopic` constant in the manager service has been removed for cleaner code maintenance.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Optimize JSON parsing and fix header omission
Removed unnecessary unquoting of a JSON string before unmarshaling, streamlining the computation value extraction process. Also corrected a missing Content-Type header in the event sending function, ensuring proper handling of JSON requests by recipients. These changes improve performance and communication reliability.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* align vars
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Remove computation request timeout feature
The timeout feature for computation requests has been removed to simplify the computation execution flow. This involved changes across multiple files, including protobuf definitions, HTTP endpoint handling, and the internal computation logic. We eliminated the timeout field, associated logic, and error handling to ensure the system no longer supports timeouts for computations, mitigating any unintended timeout impacts on long-running processes.
Signed-off-by: SammyOina <sammyoina@gmail.com>
---------
Signed-off-by: sammy <sammyoina@gmail.com>
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Add CoCos-AI Manager API specification
This commit adds the CoCos-AI Manager API specification
in the form of a YAML file. The specification includes
information about the title, description, contact,
license, and version of the API. It also defines the
servers where the API is hosted and the paths and
operations available, such as running computation on
a virtual machine.
The API specification is based on OpenAPI 3.0.1 and
provides a clear and concise overview of the CoCos-AI
Manager service.
The commit also includes a link to the CoCos-AI repository
and the license information.
This commit is necessary to provide a clear and documented
API specification for the CoCos-AI Manager service.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Refactor go.mod and go.sum files
The go.mod and go.sum files have been refactored to remove the go.opentelemetry.io/contrib/propagators/jaeger package, which is no longer needed. This package was causing compatibility issues with the current version of the project. The refactoring ensures that the project is using the latest compatible versions of the required dependencies.
This commit removes the go.opentelemetry.io/contrib/propagators/jaeger package from the go.mod file and updates the go.sum file accordingly.
Note: The go.mod file now uses go.opentelemetry.io/otel v1.19.0 and go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.19.0.
Please review the changes to ensure compatibility and functionality.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* fix typo
Signed-off-by: SammyOina <sammyoina@gmail.com>
---------
Signed-off-by: SammyOina <sammyoina@gmail.com>
* add stringer
Signed-off-by: SammyOina <sammyoina@gmail.com>
* rename module to cocos
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Fix fmt.Stringer implementation in computations.go
The fmt.Stringer implementation for Datasets and Algorithms in computations.go was fixed to correctly use pointers.
This commit addresses the issue where the String() method for Datasets and Algorithms in computations.go was not correctly implemented. The fix ensures that the String() method now correctly marshals the data to JSON and returns the string representation.
The changes made in this commit will improve the functionality and accuracy of the String() method for Datasets and Algorithms.
Signed-off-by: SammyOina <sammyoina@gmail.com>
---------
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Fix bug in agent state machine
The bug in the agent state machine caused an error when attempting an invalid transition. This commit fixes the bug by properly locking and unlocking the state machine before and after transitioning to the next state. Additionally, the logger now correctly logs the current and next state during a valid transition.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Fix race condition in state machine
The commit fixes a race condition in the state machine implementation in the `Start` method. The race condition occurs when multiple goroutines try to access and modify the state concurrently. To fix this, a mutex lock and unlock are added around the critical sections of code to ensure exclusive access to the state variable. This prevents race conditions and ensures the state transitions are executed correctly.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Fix race condition in StateMachine.Start()
The StateMachine.Start() method was experiencing a race condition
when multiple events were being processed concurrently. This was
caused by not properly locking and unlocking the state machine
before and after updating the state. This commit fixes the issue
by adding proper locking and unlocking around the state update
operation. Additionally, the logging statement has been updated
to include the previous and next states for better debugging.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* add magistrala dep
Signed-off-by: SammyOina <sammyoina@gmail.com>
* remove mainflux
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Fix agentService New function to include messaging.Publisher parameter
The agentService New function has been updated to include a messaging.Publisher parameter. This change allows the agent service to publish messages to a messaging system. The messaging.Publisher parameter has been added to the agentService struct and the New function signature has been updated accordingly. This change ensures that the agent service can communicate with other components using the messaging system.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Refactor service.go state functions
The commit refactors the state functions in the service.go file.
The functions for each state have been modified to use the svc.publishEvent
method to publish events with appropriate messages.
- Refactor state functions in service.go
- Use svc.publishEvent to publish events with messages for each state
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Fix computation run event publishing and add pubsub functionality
The computation run event publishing in the agent service was fixed to correctly call the publishEvent function. Additionally, the pubsub functionality was added to the manager package.
- Fixed computation run event publishing in agent service
- Added pubsub functionality to manager package
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Fix license header in pubsub.go file
The commit fixes the license header in the pubsub.go file.
The copyright and SPDX-License-Identifier have been added
to comply with the Apache-2.0 license.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Add Docker environment variables for Nats, RabbitMQ, Message Broker, and Jaeger.
The commit message should be:
"Add Docker environment variables for Nats, RabbitMQ, Message Broker, and Jaeger"
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Fix Makefile to properly set DOCKER_PROJECT and COCOS_MESSAGE_BROKER_TYPE
The Makefile has been updated to fix an issue with setting the DOCKER_PROJECT and COCOS_MESSAGE_BROKER_TYPE variables. The USER_REPO variable is now used to generate the DOCKER_PROJECT name following the Docker Compose guidelines. Additionally, the COCOS_MESSAGE_BROKER_TYPE variable is now properly set to "nats" if it is empty. This ensures that the correct values are used when compiling and installing the service.
Summary:
Fix Makefile to properly set DOCKER_PROJECT and COCOS_MESSAGE_BROKER_TYPE
Details:
- Update USER_REPO variable to generate DOCKER_PROJECT name
- Set COCOS_MESSAGE_BROKER_TYPE to "nats" if empty
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Fix Makefile Docker profile assignment and build flags
The Makefile was updated to fix the assignment of the Docker profile and build flags. The Docker profile is now assigned based on the value of COCOS_MESSAGE_BROKER_TYPE, and if it is not provided, the default value is set to "nats". The build flags were also updated to include the COCOS_MESSAGE_BROKER_TYPE value as a tag for the Go build process.
This commit addresses the issue with the Docker profile assignment and ensures that the correct build flags are used during the build process.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* fix makefile
Signed-off-by: SammyOina <sammyoina@gmail.com>
* Fix notification topic in agent service and update NATS ports in Docker environment variables
The agent service's notification topic was incorrectly set to "channels.manager" instead of "agent". This commit fixes the issue by updating the notification topic.
Additionally, the NATS ports in the Docker environment variables were incorrect. The COCOS_NATS_PORT and COCOS_NATS_HTTP_PORT have been updated to the correct values.
These changes ensure that the agent service uses the correct notification topic and the NATS ports are properly configured.
Signed-off-by: SammyOina <sammyoina@gmail.com>
* add pubsub
Signed-off-by: SammyOina <sammyoina@gmail.com>
* update protoc
Signed-off-by: SammyOina <sammyoina@gmail.com>
---------
Signed-off-by: SammyOina <sammyoina@gmail.com>