Understanding Flows

The basic pattern used in AI/ML is Ingest -> Data Management -> AI/ML Model -> Output. Umbra seeks to make these stages pluggable and re-usable through a system we call flows.

Flow Stages

The Flow defines how this pipeline can be executed. Taking input data from any pluggable source and then moving it through the process. Umbra breaks this process up into multiple stages. These stages are called ingress, data, model, egress.

Ingress

The ingress system is used to attach to an ongoing ingress system. Typically an event based system. The event system will emit events as they occur and push them through the pipeline. The ingress system can be a single instance ingress system as well, say in the form of a json file.

Data

The data stage is used to create the input and output data paths. For most AI systems all of the input data needs to be reduced to numbers. The data plugins are made to take arbitrary data types and reform them into standardized data sets.

For instance the salt_event data plugin takes the information in the Salt event stream and converts the words into numbers dynamically. The word datasets are created on the fly allowing different matched event streams to be prepared.

Model

The model is the meat of the process. This is the area where Umbra calls out to tools like Tensorflow, pyod, and Scikit. The model receives the conditioned data from the data stage and crunches it. The model will also determine if the data is to be used for training or for predictions based on options like train_for.

Egress

Once the model has run it can emit predictions and suggestions. The Egress system allows for the suggestions to be emitted out on another event based system. This can be an alerting system, notifications, or just a datastore holding the information.

Flows and Pipes

In the flow configurations you define pipes and each pipe has the options for the named stages, as well as additional options for the pipe. All of the pipes defined in the flow files need to have unique names.

A flow file with a single pipe called ‘sh’ looks like this:

sh:
  ingress:
    salt_event: 'salt/beacon/*/sh*'
  data: salt_event
  model: knn
  egress: salt_event
  train_for: 50000
  enabled: True

This defines that we will be attaching to the salt event bus as our ingress point and looking for events that match the given tag. The data modifier to use is obviously salt_event because we are attached to the salt_event ingress system. In this case we are using the simple knn model for outlier detection. Finally the data will be emitted back on the salt_event system as well.

The additional options here are train_for and enabled. The train_for option allows for setting a finite number of data entries to train on before running predictions. enabled allows you to enable or disable the given pipe.