BlogIntroducing Coflux

January 26, 2024

Introducing Coflux

Joe Freeman

This is the foundation for the product I've spent years thinking about. I've worked as a software engineer across a number of domains - more recently science-based domains (using model predictive control to operate industrial-scale greenhouses, and using bioinformatics to do pathogen analysis). I've been drawn to the appeal of orchestration frameworks, but I haven't felt satisfied with existing projects.

Pain points

Some of the pain points I've hit with existing orchestration projects are:

  • Static graphs - defined either in configuration files, or using a DSL, give predictable execution graphs, but add complexity to development and can limit the type of workflows that can be implemented.
  • Configuration/setup - aside from the execution graph itself, an excess of configuration for standard features is difficult to setup.
  • Latency - tasks are slow to get assigned or start up.
  • Portability - frameworks can require significant buy-in.
  • Opaque - lack of visibility of the underlying mechanism and trade-offs.

Most of these problems can be fixed by emphasising the use of workflows that are defined with plain Python code.

Principles

Coflux has a number of guiding principles, informing features and trade-offs:

  1. Developer productivity - Writing workflows should be just like writing plain Python code. You shouldn't need to learn a new language or framework to get workflows up and running.
  2. Low risk - It should be easy to try and adopt Coflux. Moving away, if needed, should also require little more than removing annotations.
  3. Data ownership - With agents running in your own infrastructure, data should generally stay out of the orchestrator.

Use cases

The project has a number of domains and use cases in mind:

  • Background tasks - Coflux provides a natural evolution beyond a typical job scheduler, allowing developers to break up these jobs into sub-tasks.
  • Data pipelines - Either using scheduled batch processes, or sensors, for reacting to events in real-time, Coflux workflows are suitable for typical ETL/ELT processes.
  • Chat bots - Workflows can be used to manage user input, interact with language models, and maintain context through a conversation.
  • Simulation/control - Specifically for higher period control loops, where decisions are made in the order of seconds or minutes (e.g., climate control; not autonomous vehicles).
  • Bioinformatics - Running compute-intensive algorithms on large datasets.
  • Machine learning - Training/evaluating models, collecting/processing data, etc.

Why orchestrate?

  • Remove the need to implement or manage logic for orthogonal tasks - for example, automating an effective retry strategy or reliably implementing caching - and hence reduce the amount of code to maintain (and the cost of making changes).
  • Provide visibility of a workflow - monitor workflows in production, disagnose issues, and generally increase confidence that a process flow operates as expected.
  • Coordinate tasks - e.g., parallelising and synchronising tasks.
  • Optimise resources - running parts of your workflow on the optimal hardware - e.g., switching between CPU- or memory-optimised instances, operating in specific geographical regions, or providing access to features like GPUs.

Trade-offs

It's also important to consider the trade-offs of adopting any new technology. The obvious overhead from orchestration comes from the coordination of function execution and serialising and persisting results/arguments passed between functions.

In situations where you might need to minimise latency (for example, serving API requests), you wouldn't want an orchestration framework sitting in the critical path. Although there are cases where having a background process (managed by an orchestration framework) could lead to reduced API latency - for example by pre-computing results and keeping them up-to-date.

A trade-off that's more specific to Coflux comes from the emphasis on dynamic execution graphs. Rather than requiring static graphs - either defined at design time, or at the start of a workflow, Coflux tasks make decisions as they execute on which additional tasks need to be executed. In most cases this added flexibility makes a lot of sense, but can add an aspect of unpredictability to workflows.

Lift off

An early version of Coflux is available to try. Refer to the docs for a guide on getting started. Sign up to the mailing list to get notified of new releases.

I'd be delighted to hear from anyone looking to set up orchestration to understand your needs (joe@coflux.com).

Join the mailing list

Get notified of new product features.