Release Date: 2021-11-20T13:17:39Z

Implement adr2 relative pipelines + api changes.

In brief, this release lets pipelines reference custom modules & child pipelines relative to the pipeline itself, rather than the current directory. This lets you create portable, re-usable & composable pipeline libraries.

breaking changes

This is a major version increment because it comes with BREAKING CHANGES:

  1. API: replaces both pipelinerunner.main() and pipelinerunner.main_with_context()
  2. API: def get_pipeline_definition(pipeline_name, working_directory) signature for custom pype loaders changes to def get_pipeline_definition(pipeline_name, parent)
  3. CLI: the —dir flag now only sets the directory for ad hoc custom Python modules, it does NOT also set the directory for pipelines anymore
  4. Final removal of deprecated get_formatted_iterable, get_formatted_string #195 & pypyr.steps.contextset #184. Where previously these would just give deprecation warnings, they are now completely removed.
  5. pypyr.pypeloaders.fileloader renamed pypyr.loaders.file

non-breaking changes

  • You can now access the current pipeline’s metadata & loader information from within a pipeline with context.current_pipeline
  • Improve handling of absolute paths in file loader only to search path once, rather than unnecessarily go through the same relative path lookup sequence with the same path.
  • Typing support added for the pypyr API entrypoint.

upgrade guide

pipeline authors

Your pipelines can now references custom modules (e.g custom steps) and child pipelines you call with pypyr.steps.pype relative to the pipeline itself.

Assume a file structure like this:

    |- captainhook/
        |- my-pipelines/
            |- subdir/
                |- my-pipe.yaml
                |- sub-pipe.yaml

That you used in a pipeline like this

# ~/my-pipelines/subdir/my-pipe.yaml
    - name: pypyr.steps.pype
      comment: old way. reference child relative to working dir.
            name: subdir/sub-pipe # Looks for subdir/sub-pipe relative to $PWD
    - subdir.mystep # looks for subdir/ relative to $PWD

This old way of doing things meant that you could ONLY run this pipeline from the ~/my-pipelines directory, because all the references in the pipeline are relative to the working directory.

$ cd ~/my-pipelines
$ pypyr my-pipe # this will work

$ cd ~
$ pypyr my-pipelines/my-pipe # this will not work

# you would've had to set --dir to get this to work
$ pypyr my-pipe --dir my-pipelines # this will work

So forget about all that nonsense. As of now you can put your references relative to the pipeline itself, like this:

# ~/shared-pipelines/subdir/my-pipe.yaml
    - name: pypyr.steps.pype
      comment: new better way. resolves relative to current pipeline dir.
            name: sub-pipe # sub-pipe.yaml in same dir as current pipeline
    - mystep # looks for relative to current pipeline

You can now run your pipeline from anywhere:

$ cd ~/my-pipelines
$ pypyr my-pipe # this will work

$ cd ~
$ pypyr my-pipelines/my-pipe # this will also work

Woohoo! 🎉 This means you can now make re-usable, composable shared pipeline libraries that you can call from anywhere on your system!

See here for more details on the pipeline name resolution look-up order.

cli

If you were using the --dir flag, now you only need to set --dir if your pipeline references custom modules or child pipelines that are NOT in the pipeline’s direct parent directory and NOT in the current working directory.

# to run pipeline mydir/mypipe.yaml
# custom modules & child pipelines in mydir/
# the old way
$ pypyr --dir mydir mypipe

# the new way
$ pypyr mydir/mypipe

# to run pipeline mydir/subdir/mypipe.yaml
# custom modules & child pipelines in mydir/
# the old way
$ pypyr --dir mydir subdir/mypipe

# the new way
$ pypyr mydir/subdir/mypipe --dir mydir

In short, you very probably shouldn’t be using the --dir flag anymore. Instead, change your pipelines to resolve child pipelines & custom modules relative to itself as described in the previous section for pipeline authors. The only likely reason to be using --dir is if your custom modules live in a completely separate different filesystem location than the pipelines.

api entrypoint

A single run() function replaces both main() and main_with_context().

The old:

                    pipeline_context_input=['arbkey=pipe', 'anotherkey=song'])

                   args_in=['arbkey=pipe', 'anotherkey=song'])

The old:

context_out = pipelinerunner.main_with_context(
    dict_in={'arbkey': 'pipe',
             'anotherkey': 'song'})


context_out ='pipeline-dir/my-pipe',
                                 dict_in={'arbkey': 'pipe',
                                          'anotherkey': 'song'})

See here for full details on how to use the new pipeline runner api.

py_dir replaces working_dir

Be aware that the working_dir input doesn’t exist anymore. The new py_dir input ONLY refers to the directory from which to load custom Python modules. The pipeline name does NOT resolve relative to py_dir as it used to for working_dir:

The old:

context_out ='pipeline-dir/my-pipe',
                                 dict_in={'arbkey': 'pipe',
                                          'anotherkey': 'song'},


context_out ='pipeline-dir/my-pipe',
                                 dict_in={'arbkey': 'pipe',
                                          'anotherkey': 'song'},

You do NOT need to set py_dir if your pipeline only references custom modules that are in the pipeline’s directory itself.

You do NOT need to set py_dir if you are not using any custom modules, or if your custom modules are installed in the current python environment.

Only use py_dir if your custom modules are NOT installed in the current python environment and they are NOT relative to the pipeline’s directory.

See here for a full description of the pypyr custom module resolution order.

custom pype loader

If you have a custom pype loader with a signature like:

def get_pipeline_definition(pipeline_name, working_dir):

Replace it with the function signature:

def get_pipeline_definition(pipeline_name, parent):

Just this change should be sufficient for most cases. Be aware that unlike the old working_dir, parent will always be None for the 1st/root pipeline in a call-chain.

The full new function signature is:

get_pipeline_definition(name: str, parent: Any) -> pypyr.pipedef.PipelineDefinition | Mapping

If you wanted to, you can now add extra metadata properties from your custom loader by setting a pypyr.pipedef.PipelineInfo object and returning it in a pypyr.pipedef.PipelineDefinition from your custom get_pipeline_definition function. This is entirely optional - if you want to keep on returning just the bare yaml dict-like payload, feel free to keep on doing so.

See here for full details on how to use the new custom pipeline loader

Oh, and BONUS! If you are using a custom pype loader, your pipeline authors do NOT explicitly need to set the custom loader on every child pipeline anymore on pypyr.steps.pype - the parent’s loader will automatically cascade to the child!

detailed technical breakdown

  • Introduce new classes to model pipeline payload, rather than just using the bare dict-like yaml directly.
    • PipelineInfo - pipeline metadata set by loader. This maintains a pipeline’s parent/path info so that child pipelines can load relative to the parent.
    • PipelineDefinition - this wraps the pipeline payload and its metadata (PipelineInfo) to allow pypyr to cache it all with one reference
  • Add new Pipeline class for the run-time properties of a single run.
    • The Pipeline references the shared cached PipelineDefinition.
    • Move run + load_and_run logic from pypyr.pipelinerunner to the new Pipeline class. This massively streamlines the pipeline invocation process, since run/load_and_run can just operate on the shared Pipeline state rather than sling a bunch of args between different functions as before.
  • Add a call-stack of running Pipeline instances on Context. i.e Parent -> child1 -> child2 where the root pipeline calls child pipelines via pype
    • Add current_pipeline attribute to Context, controlled with a context manager to scope itself to an individual pipeline run’s lifespan.
      • This means that steps can access current pipeline’s properties.
      • This allows pypyr.steps.pype to find the current (i.e parent) pipeline’s metadata such as path, to load child pipeline relative to the calling pipeline’s location.
      • When a child pipeline completes, the calling pipeline (i.e the previous Pipeline in the call-stack) becomes the current pipeline
  • Amend pypyr.steps.pype to instantiate Pipeline object to load_and_run() child.
    • pype now deals the metadata to work out whether to cascade parent path down to child, so child can load relative to the parent.
    • Notably, the parent loader now cascades to the child, so pipeline authors don’t need explicitly to set the same custom loader repeatedly for each child.
    • Given the context manager controlling current pipeline scope in Pipeline.load_and_run_pipeline remove the clumsy side-shuffle for pipeline_name, working_dir to swap out these values as child pipe runs and swap these back when it completes/errors.
  • Remove global PipelineCache. Replace with distinct pipeline cache per loader.
    • This resolves a long standing limitation where pypyr assumed unique pipeline names across all loaders.
    • The per-loader pipeline cache stores PipelineDefinition objects.
    • Introduce Loader class, which wraps loader & its pipeline cache. Loader is what the LoaderCache caches.
    • Thus LoaderCache -> Loader -> _pipelineCache -> PipelineDefinition
  • File loader has a private file cache keyed on absolute path of file
    • This is to prevent >1 load+parse where the same underlying pipeline.yaml file has different names in the loader’s pipeline cache
      • e.g (name=‘dir/mypipe’, parent=None) and (name=‘mypipe’, parent=‘dir’) both resolve to dir/mypipe.yaml
    • Caching a reference to the PipelineDefinition object, so not duplicating memory
  • Improve handling of absolute paths in file loader only to search path X1, rather than unecessarilly go through the same relative path lookup sequence with the same path.
  • File loader get_pipeline_definition now returns a PipelineDefinition with PipelineFileInfo to store file-system specific metadata for the loader pipeline
  • Remove working_dir global. The py_dir input on run() now refers ONLY to module paths, NOT pipeline locations.
    • The CLI —dir flag, or py_dir input on run() basically adds the specified directory to sys.path.
  • Add current pipeline’s parent directory to sys.path on load. This allows child pipelines to resolve custom modules relative to itself.
  • pypyr.dsl.Step does not need StepsRunner anymore, because it can get it from the context.current_pipeline instead.
  • Recode (some) integration tests to take advantage of list pypyr.steps.append step and checking that for output on return context rather than intercepting logger.NOTIFY.
  • Rename master branch to main in CI/CD GitHub actions
  • Add typing annotations to the public run() function and the Pipeline class public accessors. The idea is NOT to type pypyr exhaustively, just to provide annotations for the sensible/likely entrypoint to enhance API user experience. Include py.typed in pypyr package.
  • Remedy packaging snafu where tests.common was deploying alongside pypyr because exclude condition in find_packages didn’t include wildcard for sub packages.

what's changed

Full Changelog:

how to install upgrade

If you want to upgrade (and you totally should!):

$ pip install --upgrade pypyr

source

You can find pypyr release v5.0.0 on github, where you can click through to associated Issues, Pull Requests and Users.

Released by yaythomas.

