NOISSUE - Enable WASM Support and FileSystem Support (#189)

* feat(algorithm): Add wasm as an algo type

Signed-off-by: Rodney Osodo <socials@rodneyosodo.com>

* feat(algorithm): Use filesystem to store results

Move from unix socket for results storage to filesystem

* test: test new filesystem changes

Signed-off-by: Rodney Osodo <socials@rodneyosodo.com>

* refactor(files): rename resultFile to resultsFilePath

* feat(wasm-runtime): change from wasmtime to wasmedge

Wasmedge enables easier directory mapping to get results

Signed-off-by: Rodney Osodo <socials@rodneyosodo.com>

* feat(algorithm): send results as zipped directory

Create a new function to zip the results directory and send it back to the user

* fix(wasm): runtime argument

Fix the directory mapping for wasm runtime arguments

Signed-off-by: Rodney Osodo <socials@rodneyosodo.com>

* fix(errors): provide useful error message

* chore(gitignore): add results zip to gitignore

* feat(filesystem): Enable storing results on filesystem for python algos

* refactor: revert to upstream cocos repo

Signed-off-by: Rodney Osodo <socials@rodneyosodo.com>

* fix: remove AddDataset from algorithm interface

* fix: agent to handle results zipping

* test: test zipping directories

* refactor(agent): Handle file operations from agent

* test: run test inside eos

Signed-off-by: Rodney Osodo <socials@rodneyosodo.com>

* refactor(test): Document and test algos are running

Document steps on running the 2 python exampls and ensure they are running on eos

Signed-off-by: Rodney Osodo <socials@rodneyosodo.com>

* fix: remove witheDataset option

* test: test without dataset argument

Signed-off-by: Rodney Osodo <socials@rodneyosodo.com>

---------

Signed-off-by: Rodney Osodo <socials@rodneyosodo.com>
This commit is contained in:
b1ackd0t
2024-08-06 20:06:48 +03:00
committed by GitHub
parent 3c855e3b68
commit afc306a85b
23 changed files with 519 additions and 267 deletions
+98 -10
View File
@@ -1,17 +1,105 @@
# Algorithm
Agent accepts binaries programs. To use the python program you need to bundle or compile it.
In this example we'll use [pyinstaller](https://pypi.org/project/pyinstaller/)
Agent accepts binaries programs, python scripts, and wasm files. It runs them in a sandboxed environment and returns the output.
```shell
pip install pandas scikit-learn
pip install -U pyinstaller
pyinstaller --onefile lin_reg.py
## Python Example
To test this examples work on your local machine, you need to install the following dependencies:
```bash
pip install -r requirements.txt
```
Make the binary static:
This can be done in a virtual environment.
```shell
pip install staticx
staticx <dynamic_binary_file_path> <output_file_path>
```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
To run the example, you can use the following command:
```bash
python3 test/manual/algo/addition.py
```
The addition example is a simple algorithm to demonstrate you can run an algorithm without any external dependencies and input arguments. It returns the sum of two numbers.
```bash
python3 test/manual/algo/lin_reg.py
```
The linear regression example is a more complex algorithm that requires external dependencies.It returns a linear regression model trained on the iris dataset found [here](../data/) for demonstration purposes.
```bash
python3 test/manual/algo/lin_reg.py predict result.zip test/manual/data
```
This will make inference on the results of the linear regression model.
To run the examples in the agent, you can use the following command:
```bash
go run ./test/computations/main.go ./test/manual/algo/lin_reg.py public.pem false ./test/manual/data/iris.csv
```
This command is run from the root directory of the project. This will start the computation server.
In another window, you can run the following command:
```bash
sudo MANAGER_QEMU_SMP_MAXCPUS=4 MANAGER_GRPC_URL=localhost:7001 MANAGER_LOG_LEVEL=debug MANAGER_QEMU_USE_SUDO=false MANAGER_QEMU_ENABLE_SEV=false MANAGER_QEMU_SEV_CBITPOS=51 MANAGER_QEMU_ENABLE_SEV_SNP=false MANAGER_QEMU_OVMF_CODE_FILE=/usr/share/edk2/x64/OVMF_CODE.fd MANAGER_QEMU_OVMF_VARS_FILE=/usr/share/edk2/x64/OVMF_VARS.fd go run main.go
```
This command is run from the [manager main directory](../../../cmd/manager/). This will start the manager. Make sure you have already built the [qemu image](../../../hal/linux/README.md).
In another window, you can run the following command:
```bash
./build/cocos-cli algo ./test/manual/algo/lin_reg.py ./private.pem -a python -r ./test/manual/algo/requirements.txt
```
make sure you have built the cocos-cli. This will upload the algorithm and the requirements file.
Next we need to upload the dataset
```bash
./build/cocos-cli data ./test/manual/data/iris.csv ./private.pem
```
After some time when the results are ready, you can run the following command to get the results:
```bash
./build/cocos-cli results ./private.pem
```
This will return the results of the algorithm.
To make inference on the results, you can use the following command:
```bash
python3 test/manual/algo/lin_reg.py predict result.zip test/manual/data
```
For addition example, you can use the following command:
```bash
go run ./test/computations/main.go ./test/manual/algo/addition.py public.pem false
```
```bash
./build/cocos-cli algo ./test/manual/algo/addition.py ./private.pem -a python
```
```bash
./build/cocos-cli results ./private.pem
```
## Wasm Example
More information on how to run wasm files can be found [here](https://github.com/ultravioletrs/ai/tree/main/burn-algorithms).
## Binary Example
More information on how to run binary files can be found [here](https://github.com/ultravioletrs/ai/tree/main/burn-algorithms).
+31 -41
View File
@@ -1,9 +1,14 @@
import sys, io
import joblib
import socket
import os
import sys
import zipfile
RESULTS_DIR = "results"
RESULTS_FILE = "result.txt"
class Computation:
result = 0
def __init__(self):
"""
Initializes a new instance of the Computation class.
@@ -16,45 +21,35 @@ class Computation:
"""
self.result = a + b
def send_result(self, socket_path):
def save_result(self):
"""
Sends the result to a socket.
Sends the result to a file.
"""
buffer = io.BytesIO()
try:
joblib.dump(self.result, buffer)
except Exception as e:
print("Failed to dump the result to the buffer: ", e)
return
os.makedirs(RESULTS_DIR)
except FileExistsError:
pass
data = buffer.getvalue()
with open(RESULTS_DIR + os.sep + RESULTS_FILE, "w") as f:
f.write(str(self.result))
client = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
try:
try:
client.connect(socket_path)
except Exception as e:
print("Failed to connect to the socket: ", e)
return
try:
client.send(data)
except Exception as e:
print("Failed to send data to the socket: ", e)
return
finally:
client.close()
def read_results_from_file(self, results_file):
"""
Reads the results from a file.
"""
try:
results = joblib.load(results_file)
print("Results: ", results)
except Exception as e:
print("Failed to load results from file: ", e)
return
if results_file.endswith(".zip"):
try:
os.makedirs(RESULTS_DIR)
except FileExistsError:
pass
with zipfile.ZipFile(results_file, "r") as zip_ref:
zip_ref.extractall(RESULTS_DIR)
with open(RESULTS_FILE, "r") as f:
print(f.read())
else:
with open(results_file, "r") as f:
print(f.read())
if __name__ == "__main__":
a = 5
@@ -62,15 +57,10 @@ if __name__ == "__main__":
computation = Computation()
if len(sys.argv) == 1:
print("Please provide a socket path or a file path")
exit(1)
if sys.argv[1] == "test" and len(sys.argv) == 3:
computation.read_results_from_file(sys.argv[2])
elif len(sys.argv) == 2:
computation.compute(a, b)
computation.send_result(sys.argv[1])
computation.save_result()
elif len(sys.argv) == 3 and sys.argv[1] == "test":
computation.read_results_from_file(sys.argv[2])
else:
print("Invalid arguments")
exit(1)
+100 -31
View File
@@ -1,47 +1,116 @@
import sys, io
import os
import sys
import joblib
import socket
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import zipfile
from sklearn import metrics
csv_file_path = sys.argv[2]
iris = pd.read_csv(csv_file_path)
DATA_DIR = "datasets"
RESULTS_DIR = "results"
RESULTS_FILE = "model.bin"
# Droping the Species since we only need the measurements
X = iris.drop(['Species'], axis=1)
# converting into numpy array and assigning petal length and petal width
X = X.to_numpy()[:, (3,4)]
y = iris['Species']
class Computation:
model = None
# Splitting into train and test
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.5, random_state=42)
def __init__(self):
"""
Initializes a new instance of the Computation class.
"""
pass
log_reg = LogisticRegression()
log_reg.fit(X_train,y_train)
def _read_csv(self, data_path=""):
"""
Reads the CSV file.
"""
files = os.listdir(data_path)
if len(files) != 1:
print("No files found in the directory")
exit(1)
csv_file_path = data_path + os.sep + files[0]
return pd.read_csv(csv_file_path)
# Serialize the trained model to a byte buffer
model_buffer = io.BytesIO()
joblib.dump(log_reg, model_buffer)
def compute(self):
"""
Trains a logistic regression model.
"""
iris = self._read_csv(DATA_DIR)
# Get the serialized model as a bytes object
model_bytes = model_buffer.getvalue()
# Droping the Species since we only need the measurements
X = iris.drop(["Species"], axis=1)
# Define the path for the Unix domain socket
socket_path = sys.argv[1]
# converting into numpy array and assigning petal length and petal width
X = X.to_numpy()[:, (3, 4)]
y = iris["Species"]
# Create a Unix domain socket client
client = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
X_train, _, y_train, _ = train_test_split(X, y, test_size=0.5, random_state=42)
try:
# Connect to the server
client.connect(socket_path)
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
self.model = log_reg
# Send the serialized model over the socket
client.send(model_bytes)
def save_result(self):
"""
Sends the result to a file.
"""
try:
os.makedirs(RESULTS_DIR)
except FileExistsError:
pass
finally:
# Close the socket
client.close()
results_file = RESULTS_DIR + os.sep + RESULTS_FILE
joblib.dump(self.model, results_file)
def read_results_from_file(self, results_file):
"""
Reads the results from a file.
"""
if results_file.endswith(".zip"):
try:
os.makedirs(RESULTS_DIR)
except FileExistsError:
pass
with zipfile.ZipFile(results_file, "r") as zip_ref:
zip_ref.extractall(RESULTS_DIR)
self.model = joblib.load(RESULTS_DIR + os.sep + RESULTS_FILE)
else:
self.model = joblib.load(results_file)
def predict(self, data_path=""):
iris = self._read_csv(data_path)
# Droping the Species since we only need the measurements
X = iris.drop(["Species"], axis=1)
# converting into numpy array and assigning petal length and petal width
X = X.to_numpy()[:, (3, 4)]
y = iris["Species"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=42
)
training_prediction = self.model.predict(X_train)
test_prediction = self.model.predict(X_test)
print("Precision, Recall, Confusion matrix, in training\n")
print(metrics.classification_report(y_train, training_prediction, digits=3))
print(metrics.confusion_matrix(y_train, training_prediction))
print("Precision, Recall, Confusion matrix, in testing\n")
print(metrics.classification_report(y_test, test_prediction, digits=3))
print(metrics.confusion_matrix(y_test, test_prediction))
if __name__ == "__main__":
computation = Computation()
if len(sys.argv) == 1:
computation.compute()
computation.save_result()
elif len(sys.argv) == 4 and sys.argv[1] == "predict":
computation.read_results_from_file(sys.argv[2])
computation.predict(sys.argv[3])
else:
print("Invalid arguments")
exit(1)
-51
View File
@@ -1,51 +0,0 @@
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import metrics
import joblib
import sys
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=UserWarning)
csv_file_path = sys.argv[1]
model_filename = sys.argv[2]
# Load the CSV file into a Pandas DataFrame
iris = pd.read_csv(csv_file_path)
log_reg = joblib.load(model_filename)
# Now you have the Iris dataset loaded into the iris_df DataFrame
print(iris.head()) # Display the first few rows of the DataFrame
# Droping the Species since we only need the measurements
X = iris.drop(['Species'], axis=1)
# converting into numpy array and assigning petal length and petal width
X = X.to_numpy()[:, (3,4)]
y = iris['Species']
# Splitting into train and test
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.5, random_state=42)
training_prediction = log_reg.predict(X_train)
test_prediction = log_reg.predict(X_test)
print("Precision, Recall, Confusion matrix, in training\n")
# Precision Recall scores
print(metrics.classification_report(y_train, training_prediction, digits=3))
# Confusion matrix
print(metrics.confusion_matrix(y_train, training_prediction))
print("Precision, Recall, Confusion matrix, in testing\n")
# Precision Recall scores
print(metrics.classification_report(y_test, test_prediction, digits=3))
# Confusion matrix
print(metrics.confusion_matrix(y_test, test_prediction))