pypyr release v5.0.0
Release Date: 2021-11-20T13:17:39Z
Implement adr2 relative pipelines + api changes.
In brief, this release lets pipelines reference custom modules & child pipelines relative to the pipeline itself, rather than the current directory. This lets you create portable, re-usable & composable pipeline libraries.
breaking changes
This is a major version increment because it comes with BREAKING CHANGES:
- API:
pipelinerunner.run()replaces bothpipelinerunner.main()andpipelinerunner.main_with_context() - API:
def get_pipeline_definition(pipeline_name, working_directory)signature for custom pype loaders changes todef get_pipeline_definition(pipeline_name, parent) - CLI: the
—dirflag now only sets the directory for ad hoc custom Python modules, it does NOT also set the directory for pipelines anymore - Final removal of deprecated
get_formatted_iterable,get_formatted_string#195 &pypyr.steps.contextset#184. Where previously these would just give deprecation warnings, they are now completely removed. pypyr.pypeloaders.fileloaderrenamedpypyr.loaders.file
non-breaking changes
- You can now access the current pipeline’s metadata & loader information from within a pipeline with
context.current_pipeline - Improve handling of absolute paths in file loader only to search path once, rather than unnecessarily go through the same relative path lookup sequence with the same path.
- Typing support added for the pypyr API entrypoint.
upgrade guide
pipeline authors
Your pipelines can now references custom modules (e.g custom steps) and child pipelines you call with pypyr.steps.pype relative to the pipeline itself.
Assume a file structure like this:
/Users
|- captainhook/
|- my-pipelines/
|- subdir/
|- my-pipe.yaml
|- mystep.py
|- sub-pipe.yamlThat you used in a pipeline like this
# ~/my-pipelines/subdir/my-pipe.yaml
steps:
- name: pypyr.steps.pype
comment: old way. reference child relative to working dir.
in:
pype:
name: subdir/sub-pipe # Looks for subdir/sub-pipe relative to $PWD
- subdir.mystep # looks for subdir/mystep.py relative to $PWDThis old way of doing things meant that you could ONLY run this pipeline from
the ~/my-pipelines directory, because all the references in the pipeline are
relative to the working directory.
$ cd ~/my-pipelines
$ pypyr my-pipe # this will work
$ cd ~
$ pypyr my-pipelines/my-pipe # this will not work
# you would've had to set --dir to get this to work
$ pypyr my-pipe --dir my-pipelines # this will work
So forget about all that nonsense. As of now you can put your references relative to the pipeline itself, like this:
# ~/shared-pipelines/subdir/my-pipe.yaml
steps:
- name: pypyr.steps.pype
comment: new better way. resolves relative to current pipeline dir.
in:
pype:
name: sub-pipe # sub-pipe.yaml in same dir as current pipeline
- mystep # looks for mystep.py relative to current pipelineYou can now run your pipeline from anywhere:
$ cd ~/my-pipelines
$ pypyr my-pipe # this will work
$ cd ~
$ pypyr my-pipelines/my-pipe # this will also work
Woohoo! 🎉 This means you can now make re-usable, composable shared pipeline libraries that you can call from anywhere on your system!
See here for more details on the pipeline name resolution look-up order.
cli
If you were using the --dir flag, now you only need to set --dir if your
pipeline references custom modules or child pipelines that are NOT in the
pipeline’s direct parent directory and NOT in the current working directory.
# to run pipeline mydir/mypipe.yaml
# custom modules & child pipelines in mydir/
# the old way
$ pypyr --dir mydir mypipe
# the new way
$ pypyr mydir/mypipe
# to run pipeline mydir/subdir/mypipe.yaml
# custom modules & child pipelines in mydir/
# the old way
$ pypyr --dir mydir subdir/mypipe
# the new way
$ pypyr mydir/subdir/mypipe --dir mydir
In short, you very probably shouldn’t be using the --dir flag anymore.
Instead, change your pipelines to resolve child pipelines & custom modules
relative to itself as described in the previous section for pipeline
authors. The only likely reason to be using --dir is if
your custom modules live in a completely separate different filesystem location
than the pipelines.
api entrypoint
A single run() function replaces both main() and main_with_context().
The old:
pipelinerunner.main(pipeline_name='pipeline-dir/my-pipe',
pipeline_context_input=['arbkey=pipe', 'anotherkey=song'])becomes:
pipelinerunner.run(pipeline_name='pipeline-dir/my-pipe',
args_in=['arbkey=pipe', 'anotherkey=song'])The old:
context_out = pipelinerunner.main_with_context(
pipeline_name='pipeline-dir/my-pipe',
dict_in={'arbkey': 'pipe',
'anotherkey': 'song'})becomes:
context_out = pipelinerunner.run(pipeline_name='pipeline-dir/my-pipe',
dict_in={'arbkey': 'pipe',
'anotherkey': 'song'})See here for full details on how to use the new pipeline runner api.
py_dir replaces working_dir
Be aware that the working_dir input doesn’t exist anymore. The new py_dir
input ONLY refers to the directory from which to load custom Python modules. The
pipeline name does NOT resolve relative to py_dir as it used to for
working_dir:
The old:
context_out = pipelinerunner.run(pipeline_name='pipeline-dir/my-pipe',
dict_in={'arbkey': 'pipe',
'anotherkey': 'song'},
working_dir=Path.cwd())becomes:
context_out = pipelinerunner.run(pipeline_name='pipeline-dir/my-pipe',
dict_in={'arbkey': 'pipe',
'anotherkey': 'song'},
py_dir=Path.cwd())You do NOT need to set py_dir if your pipeline only references custom modules
that are in the pipeline’s directory itself.
You do NOT need to set py_dir if you are not using any custom modules, or if
your custom modules are installed in the current python environment.
Only use py_dir if your custom modules are NOT installed in the current python
environment and they are NOT relative to the pipeline’s directory.
See here for a full description of the pypyr custom module resolution order.
custom pype loader
If you have a custom pype loader with a signature like:
def get_pipeline_definition(pipeline_name, working_dir):Replace it with the function signature:
def get_pipeline_definition(pipeline_name, parent):Just this change should be sufficient for most cases. Be aware that unlike the
old working_dir, parent will always be None for the 1st/root pipeline in a
call-chain.
The full new function signature is:
get_pipeline_definition(name: str, parent: Any) -> pypyr.pipedef.PipelineDefinition | MappingIf you wanted to, you can now add extra metadata properties from your custom
loader by setting a pypyr.pipedef.PipelineInfo object and returning it in
a pypyr.pipedef.PipelineDefinition from your custom get_pipeline_definition
function. This is entirely optional - if you want to keep on returning just the
bare yaml dict-like payload, feel free to keep on doing so.
See here for full details on how to use the new custom pipeline loader
Oh, and BONUS! If you are using a custom pype loader, your pipeline authors do NOT explicitly need to set the custom loader on every child pipeline anymore on pypyr.steps.pype - the parent’s loader will automatically cascade to the child!
detailed technical breakdown
- Introduce new classes to model pipeline payload, rather than just using the bare dict-like yaml directly.
PipelineInfo- pipeline metadata set by loader. This maintains a pipeline’s parent/path info so that child pipelines can load relative to the parent.PipelineDefinition- this wraps the pipeline payload and its metadata (PipelineInfo) to allow pypyr to cache it all with one reference
- Add new
Pipelineclass for the run-time properties of a single run.- The
Pipelinereferences the shared cachedPipelineDefinition. - Move run + load_and_run logic from pypyr.pipelinerunner to the new
Pipelineclass. This massively streamlines the pipeline invocation process, since run/load_and_run can just operate on the sharedPipelinestate rather than sling a bunch of args between different functions as before.
- The
- Add a call-stack of running
Pipelineinstances onContext. i.e Parent -> child1 -> child2 where the root pipeline calls child pipelines viapype- Add
current_pipelineattribute toContext, controlled with a context manager to scope itself to an individual pipeline run’s lifespan.- This means that steps can access current pipeline’s properties.
- This allows
pypyr.steps.pypeto find the current (i.e parent) pipeline’s metadata such as path, to load child pipeline relative to the calling pipeline’s location. - When a child pipeline completes, the calling pipeline (i.e the previous
Pipelinein the call-stack) becomes the current pipeline
- Add
- Amend
pypyr.steps.pypeto instantiatePipelineobject toload_and_run()child.pypenow deals thecontext.current_pipeline.pipeline_definition.infometadata to work out whether to cascade parent path down to child, so child can load relative to the parent.- Notably, the parent loader now cascades to the child, so pipeline authors don’t need explicitly to set the same custom loader repeatedly for each child.
- Given the context manager controlling current pipeline scope in
Pipeline.load_and_run_pipelineremove the clumsy side-shuffle for pipeline_name, working_dir to swap out these values as child pipe runs and swap these back when it completes/errors.
- Remove global PipelineCache. Replace with distinct pipeline cache per loader.
- This resolves a long standing limitation where pypyr assumed unique pipeline names across all loaders.
- The per-loader pipeline cache stores
PipelineDefinitionobjects. - Introduce
Loaderclass, which wraps loader & its pipeline cache.Loaderis what theLoaderCachecaches. - Thus
LoaderCache->Loader->_pipelineCache->PipelineDefinition
- File loader has a private file cache keyed on absolute path of file
- This is to prevent >1 load+parse where the same underlying pipeline.yaml file has different names in the loader’s pipeline cache
- e.g
(name=‘dir/mypipe’, parent=None)and(name=‘mypipe’, parent=‘dir’)both resolve todir/mypipe.yaml
- e.g
- Caching a reference to the
PipelineDefinitionobject, so not duplicating memory
- This is to prevent >1 load+parse where the same underlying pipeline.yaml file has different names in the loader’s pipeline cache
- Improve handling of absolute paths in file loader only to search path X1, rather than unecessarilly go through the same relative path lookup sequence with the same path.
- File loader
get_pipeline_definitionnow returns aPipelineDefinitionwithPipelineFileInfoto store file-system specific metadata for the loader pipeline - Remove
working_dirglobal. Thepy_dirinput onrun()now refers ONLY to module paths, NOT pipeline locations.- The CLI
—dirflag, orpy_dirinput onrun()basically adds the specified directory tosys.path.
- The CLI
- Add current pipeline’s parent directory to
sys.pathon load. This allows child pipelines to resolve custom modules relative to itself. pypyr.dsl.Stepdoes not needStepsRunneranymore, because it can get it from thecontext.current_pipelineinstead.- Recode (some) integration tests to take advantage of list
pypyr.steps.appendstep and checking that for output on return context rather than intercepting logger.NOTIFY. - Rename master branch to main in CI/CD GitHub actions
- Add typing annotations to the public
run()function and thePipelineclass public accessors. The idea is NOT to type pypyr exhaustively, just to provide annotations for the sensible/likely entrypoint to enhance API user experience. Includepy.typedinpypyrpackage. - Remedy packaging snafu where
tests.commonwas deploying alongside pypyr because exclude condition infind_packagesdidn’t include wildcard for sub packages.
what’s changed
- Relative pipelines & API run() replaces main/main_with_context by @yaythomas in https://github.com/pypyr/pypyr/pull/243
Full Changelog: https://github.com/pypyr/pypyr/compare/v4.6.0...v5.0.0
how to install upgrade
If you want to upgrade (and you totally should!):
$ pip install --upgrade pypyrsource
You can find pypyr release v5.0.0 on github, where you can click through to associated Issues, Pull Requests and Users.
Released by yaythomas.