Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .github/workflows/linkspector.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: Linkspector
on: [pull_request]
jobs:
check-links:
name: runner / linkspector
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run linkspector
uses: umbrelladocs/action-linkspector@v1
with:
github_token: ${{ secrets.github_token }}
reporter: github-pr-check
fail_level: any
filter_mode: added
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ docs/coconut:
docs/grpc:
@echo -e "generating gRPC API documentation \033[1;33m==>\033[0m \033[1;34m./docs\033[0m"
@cd apricot/protos && PATH="$(ROOT_DIR)/tools:$$PATH" protoc --doc_out="$(ROOT_DIR)/docs" --doc_opt=markdown,apidocs_apricot.md "apricot.proto"
@cd core/protos && PATH="$(ROOT_DIR)/tools:$$PATH" protoc -I=. -I=../../common --doc_out="$(ROOT_DIR)/docs" --doc_opt=markdown,apidocs_aliecs.md "o2control.proto"
@cd core/protos && PATH="$(ROOT_DIR)/tools:$$PATH" protoc -I=. -I=../../common --doc_out="$(ROOT_DIR)/docs" --experimental_allow_proto3_optional --doc_opt=markdown,apidocs_aliecs.md o2control.proto ../../common/protos/events.proto ../../common/protos/common.proto
@cd occ/protos && PATH="$(ROOT_DIR)/tools:$$PATH" protoc --doc_out="$(ROOT_DIR)/docs" --doc_opt=markdown,apidocs_occ.md "occ.proto"

docs/swaggo:
Expand Down
211 changes: 175 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,54 +2,193 @@
[![godoc](https://img.shields.io/badge/godoc-Reference-5272B4.svg)](https://godoc.org/github.com/AliceO2Group/Control)
# AliECS

The ALICE Experiment Control System
The ALICE Experiment Control System (**AliECS**) is the piece of software to drive and control data taking activities in the experiment.
It is a distributed system that combines state of the art cluster resource management and experiment control functionalities into a single comprehensive solution.

## Install instructions
Please refer to the [CHEP 2023 paper](https://doi.org/10.1051/epjconf/202429502027) for the latest design overview.

What is your use case?
## How to get started

* I want to **run AliECS** and other O²/FLP software
Regardless of your particular interests, it is recommended to get acquainted with the main [AliECS concepts](docs/handbook/concepts.md).

:arrow_right: [O²/FLP Suite deployment instructions](https://alice-flp.docs.cern.ch/system-configuration/utils/o2-flp-setup/)
After that, please find your concrete use case:

These instructions apply to both single-node and multi-node deployments.
### I want to **run AliECS** and other O²/FLP software

Contact [alice-o2-flp-support](mailto:alice-o2-flp-support@cern.ch) for assistance with provisioning and deployment.

* I want to ensure AliECS can **run and control my process**
See [O²/FLP Suite deployment instructions](https://alice-flp.docs.cern.ch/system-configuration/utils/o2-flp-setup/)

* My software is based on FairMQ and/or O² DPL

:palm_tree: Nothing to do, AliECS natively supports FairMQ (and DPL) devices.

* My software does not use FairMQ and/or DPL, but should be controlled through a state machine

:telescope: See [the OCC documentation](occ/README.md) to learn how to integrate the O² Control and Configuration library with your software. [Readout](https://github.com/AliceO2Group/Readout) is currently the only example of this setup.

* My software is a command line utility with no state machine

:palm_tree: Nothing to do, AliECS natively supports generic commands. Make sure the task template for your command sets the control mode to `basic` ([see example](https://github.com/AliceO2Group/ControlWorkflows/blob/basic-tasks/tasks/sleep.yaml)).

* I want to build and run AliECS for **development** purposes
These instructions apply to both single-node and multi-node deployments.
Contact [alice-o2-flp-support](mailto:alice-o2-flp-support@cern.ch) for assistance with provisioning and deployment.

:hammer_and_wrench: [Building instructions](https://alice-flp.docs.cern.ch/aliecs/building/)

:arrow_right: [Running instructions](https://alice-flp.docs.cern.ch/aliecs/running/)
There are two ways of interacting with AliECS:

* I want to communicate with AliECS via one of the plugins

* [Receive updates on running environments via Kafka](docs/kafka.md)
- The AliECS GUI (a.k.a. Control GUI, COG) - not in this repository, but included in most deployments, recommended

## Using AliECS
:arrow_right: [AliECS GUI documentation](hacking/COG.md)

There are two ways of interacting with AliECS:
* The AliECS GUI - not in this repository, but included in most deployments, recommended
- `coconut` - the command-line control and configuration utility, included with AliECS core, typically for developers and advanced users

:arrow_right: [Using `coconut`](https://alice-flp.docs.cern.ch/aliecs/coconut/)

:arrow_right: [AliECS GUI documentation](hacking/COG.md)
:arrow_right: [`coconut` command reference](https://alice-flp.docs.cern.ch/aliecs/coconut/doc/coconut/)

* `coconut` - the command-line control and configuration utility, included with AliECS core
### I want to ensure AliECS can **run and control my process**

:arrow_right: [Using `coconut`](https://alice-flp.docs.cern.ch/aliecs/coconut/)
* **My software is based on FairMQ and/or O² DPL (Data Processing Later)**

AliECS natively supports FairMQ (and DPL) devices.
Head to [ControlWorkflows](https://github.com/AliceO2Group/ControlWorkflows) for instructions on how to configure your software to be controlled by AliECS.

* **My software does not use FairMQ and/or DPL, but should be controlled through a state machine**

See [the OCC documentation](occ/README.md) to learn how to integrate the O² Control and Configuration library with your software. [Readout](https://github.com/AliceO2Group/Readout) is an example of this setup.

Once ready, head to [ControlWorkflows](https://github.com/AliceO2Group/ControlWorkflows) for instructions on how to configure it to be controlled by AliECS.

* **My software is a command line utility with no state machine**

AliECS natively supports generic commands.
Head to [ControlWorkflows](https://github.com/AliceO2Group/ControlWorkflows) for instructions to have your command ran by AliECS.
Make sure the task template for your command sets the control mode to `basic` ([see example](https://github.com/AliceO2Group/ControlWorkflows/blob/master/tasks/o2-roc-cleanup.yaml)).

:arrow_right: [`coconut` command reference](https://alice-flp.docs.cern.ch/aliecs/coconut/doc/coconut/)
### I want to develop AliECS

:hammer_and_wrench: Welcome to the team, please head to [contributing instructions](/docs/CONTRIBUTING.md)

### I want to receive updates about environments or services controlled by AliECS

:pager: Learn more about the [kafka event service](/docs/kafka.md)

### I want my application to send requests to AliECS

:scroll: See the API docs of AliECS components:

- [core gRPC server](/docs/apidocs_aliecs.md)
- [apricot gRPC server](/docs/apidocs_apricot.md)
- [apricot HTTP server](/apricot/docs/apricot_http_service.md)

### I want my service to be sent requests by AliECS

:electric_plug: Learn more about the [plugin system](/core/integration/README.md)

## Table of Contents

* Introduction
* [Basic Concepts](/docs/handbook/concepts.md#basic-concepts)
* [Tasks](/docs/handbook/concepts.md#tasks)
* [Workflows, roles and environments](/docs/handbook/concepts.md#workflows-roles-and-environments)
* [Design Overview](/docs/handbook/overview.md#design-overview)
* [AliECS Structure](/docs/handbook/overview.md#aliecs-structure)
* [Resource Management](/docs/handbook/overview.md#resource-management)
* [FairMQ](/docs/handbook/overview.md#fairmq)
* [State machines](/docs/handbook/overview.md#state-machines)

* Component reference
* AliECS GUI
* [AliECS GUI overview](/hacking/COG.md)
* AliECS core
* [Workflow Configuration](/docs/handbook/configuration.md#workflow-configuration)
* [The AliECS workflow template language](/docs/handbook/configuration.md#the-aliecs-workflow-template-language)
* [Workflow template structure](/docs/handbook/configuration.md#workflow-template-structure)
* [Task roles](/docs/handbook/configuration.md#task-roles)
* [Call roles](/docs/handbook/configuration.md#call-roles)
* [Aggregator roles](/docs/handbook/configuration.md#aggregator-roles)
* [Iterator roles](/docs/handbook/configuration.md#iterator-roles)
* [Include roles](/docs/handbook/configuration.md#include-roles)
* [Template expressions](/docs/handbook/configuration.md#template-expressions)
* [Task Configuration](/docs/handbook/configuration.md#task-configuration)
* [Task template structure](/docs/handbook/configuration.md#task-template-structure)
* [Variables pushed to controlled tasks](/docs/handbook/configuration.md#variables-pushed-to-controlled-tasks)
* [Resource wants and limits](/docs/handbook/configuration.md#resource-wants-and-limits)
* [Integration plugins](/core/integration/README.md#integration-plugins)
* [Plugin system overview](/core/integration/README.md#plugin-system-overview)
* [Integrated service operations](/core/integration/README.md#integrated-service-operations)
* [Bookkeeping](/core/integration/README.md#bookkeeping)
* [CCDB](/core/integration/README.md#ccdb)
* [DCS](/core/integration/README.md#dcs)
* [DCS operations](/core/integration/README.md#dcs-operations)
* [DCS PrepareForRun behaviour](/core/integration/README.md#dcs-prepareforrun-behaviour)
* [DCS StartOfRun behaviour](/core/integration/README.md#dcs-startofrun-behaviour)
* [DCS EndOfRun behaviour](/core/integration/README.md#dcs-endofrun-behaviour)
* [DD Scheduler](/core/integration/README.md#dd-scheduler)
* [Kafka (legacy)](/core/integration/README.md#kafka-legacy)
* [ODC](/core/integration/README.md#odc)
* [Test plugin](/core/integration/README.md#test-plugin)
* [Trigger](/core/integration/README.md#trigger)
* [Environment operation order](/docs/handbook/operation_order.md#environment-operation-order)
* [State machine triggers](/docs/handbook/operation_order.md#state-machine-triggers)
* [START_ACTIVITY (Start Of Run)](/docs/handbook/operation_order.md#start_activity-start-of-run)
* [STOP_ACTIVITY (End Of Run)](/docs/handbook/operation_order.md#stop_activity-end-of-run)
* [Protocol documentation](/docs/apidocs_aliecs.md)
* coconut
* [The O² control and configuration utility overview](/coconut/README.md#the-o-control-and-configuration-utility-overview)
* [Configuration file](/coconut/README.md#configuration-file)
* [Using coconut](/coconut/README.md#using-coconut)
* [Creating an environment](/coconut/README.md#creating-an-environment)
* [Controlling an environment](/coconut/README.md#controlling-an-environment)
* [Command reference](/coconut/doc/coconut.md)
* apricot
* [ALICE configuration service overview](/apricot/README.md#alice-configuration-service-overview)
* [HTTP service](/apricot/docs/apricot_http_service.md#apricot-http-service)
* [Configuration](/apricot/docs/apricot_http_service.md#configuration)
* [Usage and options](/apricot/docs/apricot_http_service.md#usage-and-options)
* [Examples](/apricot/docs/apricot_http_service.md#examples)
* [Protocol documentation](/docs/apidocs_apricot.md)
* [Command reference](/apricot/docs/apricot.md)
* occ
* [O² Control and Configuration Components](/occ/README.md#o-control-and-configuration-components)
* [Developer quick start instructions for OCClib](/occ/README.md#developer-quick-start-instructions-for-occlib)
* [Manual build instructions](/occ/README.md#manual-build-instructions)
* [Run example](/occ/README.md#run-example)
* [The OCC state machine](/occ/README.md#the-occ-state-machine)
* [Single process control with peanut](/occ/README.md#single-process-control-with-peanut)
* [OCC API debugging with grpcc](/occ/README.md#occ-api-debugging-with-grpcc)
* [Dummy process example for OCC library](/occ/occlib/examples/dummy-process/README.md#dummy-process-example-for-occ-library)
* [Protocol documentation](/docs/apidocs_occ.md)
* peanut
* [Process control and execution utility overview](/occ/peanut/README.md)
* Event service
* [Kafka producer functionality in AliECS core](/docs/kafka.md#kafka-producer-functionality-in-aliecs-core)
* [Making sure that AliECS sends messages](/docs/kafka.md#making-sure-that-aliecs-sends-messages)
* [Currently available topics](/docs/kafka.md#currently-available-topics)
* [Decoding the messages](/docs/kafka.md#decoding-the-messages)
* [Legacy events: Kafka plugin](/docs/kafka.md#legacy-events-kafka-plugin)
* [Making sure that AliECS sends messages](/docs/kafka.md#making-sure-that-aliecs-sends-messages-1)
* [Currently available topics](/docs/kafka.md#currently-available-topics-1)
* [Decoding the messages](/docs/kafka.md#decoding-the-messages-1)
* [Getting Start of Run and End of Run notifications](/docs/kafka.md#getting-start-of-run-and-end-of-run-notifications)
* [Using Kafka debug tools](/docs/kafka.md#using-kafka-debug-tools)

* Developer documentation
* [Contributing](/docs/CONTRIBUTING.md)
* [Package pkg.go.dev documentation](https://pkg.go.dev/github.com/AliceO2Group/Control)
* [Building AliECS](/docs/building.md#building-aliecs)
* [Overview](/docs/building.md#overview)
* [Building with aliBuild](/docs/building.md#building-with-alibuild)
* [Manual build](/docs/building.md#manual-build)
* [Go environment](/docs/building.md#go-environment)
* [Clone and build (Go components only)](/docs/building.md#clone-and-build-go-components-only)
* [Makefile reference](/docs/makefile_reference.md)
* [Component Configuration](/docs/handbook/appconfiguration.md#component-configuration)
* [Apache Mesos](/docs/handbook/appconfiguration.md#apache-mesos)
* [Connectivity to controlled nodes](/docs/handbook/appconfiguration.md#connectivity-to-controlled-nodes)
* [Running AliECS as a developer](/docs/running.md#running-aliecs-as-a-developer)
* [Running the AliECS core](/docs/running.md#running-the-aliecs-core)
* [Running AliECS in production](/docs/running.md#running-aliecs-in-production)
* [Health checks](/docs/running.md#health-checks)
* [Development Information](/docs/development.md#development-information)
* [Release Procedure](/docs/development.md#release-procedure)
* [Metrics in ECS](/docs/metrics.md#metrics-in-ecs)
* [Overview and simple usage](/docs/metrics.md#overview-and-simple-usage)
* [Types and aggregation of metrics](/docs/metrics.md#types-and-aggregation-of-metrics)
* [Metric types](/docs/metrics.md#metric-types)
* [Aggregation](/docs/metrics.md#aggregation)
* [Implementation details](/docs/metrics.md#implementation-details)
* [Event loop](/docs/metrics.md#event-loop)
* [Hashing to aggregate](/docs/metrics.md#hashing-to-aggregate)
* [Sampling reservoir](/docs/metrics.md#sampling-reservoir)
* [OCC API debugging with grpcc](/docs/using_grpcc_occ.md#occ-api-debugging-with-grpcc)

* Resources
* T. Mrnjavac et. al, [AliECS: A New Experiment Control System for the ALICE Experiment](https://doi.org/10.1051/epjconf/202429502027), CHEP23

18 changes: 7 additions & 11 deletions apricot/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
# `APRICOT`
# ALICE configuration service overview

**A** **p**rocessor and **r**epos**i**tory for **co**nfiguration **t**emplates
**A** **p**rocessor and **r**epos**i**tory for **co**nfiguration **t**emplates, or apricot, implements the configuration service for the ALICE data taking activities.
It adds templating, load balancing and caching on top of the configuration store.

The `o2-apricot` binary implements a centralized configuration (micro)service for ALICE O².
See also:

```
Usage of bin/o2-apricot:
--backendUri string URI of the Consul server or YAML configuration file (default "consul://127.0.0.1:8500")
--listenPort int Port of apricot server (default 32101)
--verbose Verbose logging
```

Protofile: [apricot.proto](apricot/protos/apricot.proto)
* [apricot HTTP service](docs/apricot_http_service.md) - make essential cluster information available via a web server
* Protofile: [apricot.proto](protos/apricot.proto)
* [Command reference](docs/apricot.md)
6 changes: 1 addition & 5 deletions apricot/docs/apricot.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,4 @@ Usage of bin/o2-apricot:
--backendUri string URI of the Consul server or YAML configuration file (default "consul://127.0.0.1:8500")
--listenPort int Port of apricot server (default 32101)
--verbose Verbose logging
```

### SEE ALSO

* [apricot HTTP service](apricot_http_service.md) - make essential cluster information available via a web server
```
2 changes: 1 addition & 1 deletion apricot/docs/apricot_http_service.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,4 @@ Besides configuration retrieval, the API also includes calls for browsing the co
Getting a template-processed configuration payload for a component (entry `tpc-full-qcmn` for component `qc`, with `list_of_detectors` and `run_type` passed as template variables):

* In a browser: `http://localhost:32188/components/qc/ANY/any/tpc-full-qcmn?process=true&list_of_detectors=tpc,its&run_type=PHYSICS`
* With `curl`: `curl http://127.0.0.1:32188/components/qc/ANY/any/tpc-full-qcmn\?process\=true\&list_of_detectors\=tpc,its\&run_type\=PHYSICS`
* With `curl`: `curl http://127.0.0.1:32188/components/qc/ANY/any/tpc-full-qcmn\?process\=true\&list_of_detectors\=tpc,its\&run_type\=PHYSICS`
3 changes: 2 additions & 1 deletion coconut/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# `coconut` - the O² control and configuration utility
# The O² control and configuration utility overview

The O² **co**ntrol and **con**figuration **ut**ility is a command line program for interacting with the AliECS core.

Expand Down Expand Up @@ -98,6 +98,7 @@ A valid workflow template (sometimes called simply "workflow" for brevity) must

Workflows and tasks are managed with a git based configuration system, so the workflow template may be provided simply by name or with repository and branch/tag/hash constraints.
Examples:

* `coconut env create -w myworkflow` - loads workflow `myworkflow` from default configuration repository at HEAD of master branch
* `coconut env create -w github.com/AliceO2Group/MyConfRepo/myworkflow` - loads a workflow from a specific git repository, HEAD of master branch
* `coconut env create -w myworkflow@rev` - loads a workflow from default repository, on branch, tag or revision `rev`
Expand Down
1 change: 1 addition & 0 deletions coconut/doc/coconut_environment_create.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ A valid workflow template (sometimes called simply "workflow" for brevity) must

Workflows and tasks are managed with a git based configuration system, so the workflow template may be provided simply by name or with repository and branch/tag/hash constraints.
Examples:

* `coconut env create -w myworkflow` - loads workflow `myworkflow` from default configuration repository at HEAD of master branch
* `coconut env create -w github.com/AliceO2Group/MyConfRepo/myworkflow` - loads a workflow from a specific git repository, HEAD of master branch
* `coconut env create -w myworkflow@rev` - loads a workflow from default repository, on branch, tag or revision `rev`
Expand Down
1 change: 1 addition & 0 deletions coconut/doc/coconut_repository.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ The repository command performs operations on the repositories used for task and
A valid workflow configuration repository must contain the directories `tasks` and `workflows` in its `master` branch.

When referencing a repository, the clone method should never be prepended. Supported repo backends and their expected format are:

- https: [hostname]/[repo_path]
- ssh: [hostname]:[repo_path]
- local [repo_path] (local repo entries are ephemeral and will not survive a core restart)
Expand Down
1 change: 1 addition & 0 deletions coconut/doc/coconut_repository_add.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ the ensuing list is followed until a valid revision has been identified:
Exhaustion of the aforementioned list results in a repo add failure.

`coconut repo add` can be called with

1) a repository identifier
2) a repository identifier coupled with the `--default-revision` flag (see examples below)

Expand Down
Loading