===================
Understanding Flows
===================

The basic pattern used in AI/ML is Ingest -> Data Management -> AI/ML Model -> Output.
Umbra seeks to make these stages pluggable and re-usable through a system we call `flows`.

Flow Stages
===========

The `Flow` defines how this pipeline can be executed. Taking input data from any pluggable
source and then moving it through the process. Umbra breaks this process up into multiple
stages. These stages are called `ingress`, `data`, `model`, `egress`.

Ingress
-------

The ingress system is used to attach to an ongoing ingress system. Typically an event
based system. The event system will emit events as they occur and push them through the
pipeline. The ingress system can be a single instance ingress system as well, say in the
form of a json file.

Data
----

The data stage is used to create the input and output data paths. For most AI systems all
of the input data needs to be reduced to numbers. The data plugins are made to take arbitrary
data types and reform them into standardized data sets.

For instance the `salt_event` data plugin takes the information in the Salt event stream and
converts the words into numbers dynamically. The word datasets are created on the fly allowing
different matched event streams to be prepared.

Model
-----

The model is the meat of the process. This is the area where Umbra calls out to tools like
Tensorflow, pyod, and Scikit. The model receives the conditioned data from the data stage
and crunches it. The model will also determine if the data is to be used for training or
for predictions based on options like `train_for`.

Egress
------

Once the model has run it can emit predictions and suggestions. The Egress system allows for
the suggestions to be emitted out on another event based system. This can be an alerting system,
notifications, or just a datastore holding the information.

Flows and Pipes
===============

In the flow configurations you define pipes and each pipe has the options for the named stages,
as well as additional options for the pipe. All of the pipes defined in the flow files need to
have unique names.

A flow file with a single pipe called 'sh' looks like this:

.. code-block:: yaml

    sh:
      ingress:
        salt_event: 'salt/beacon/*/sh*'
      data: salt_event
      model: knn
      egress: salt_event
      train_for: 50000
      enabled: True

This defines that we will be attaching to the salt event bus as our ingress point and looking
for events that match the given tag. The data modifier to use is obviously `salt_event` because
we are attached to the salt_event ingress system. In this case we are using the simple `knn`
model for outlier detection. Finally the data will be emitted back on the `salt_event` system
as well.

The additional options here are `train_for` and `enabled`. The `train_for` option allows for
setting a finite number of data entries to train on before running predictions. `enabled`
allows you to enable or disable the given pipe.