pypyr release v5.0.0
Release Date: 2021-11-20T13:17:39Z
Implement adr2 relative pipelines + api changes.
In brief, this release lets pipelines reference custom modules & child pipelines relative to the pipeline itself, rather than the current directory. This lets you create portable, re-usable & composable pipeline libraries.
breaking changes
This is a major version increment because it comes with BREAKING CHANGES:
- API:
pipelinerunner.run()
replaces bothpipelinerunner.main()
andpipelinerunner.main_with_context()
- API:
def get_pipeline_definition(pipeline_name, working_directory)
signature for custom pype loaders changes todef get_pipeline_definition(pipeline_name, parent)
- CLI: the
—dir
flag now only sets the directory for ad hoc custom Python modules, it does NOT also set the directory for pipelines anymore - Final removal of deprecated
get_formatted_iterable
,get_formatted_string
#195 &pypyr.steps.contextset
#184. Where previously these would just give deprecation warnings, they are now completely removed. pypyr.pypeloaders.fileloader
renamedpypyr.loaders.file
non-breaking changes
- You can now access the current pipeline’s metadata & loader information from within a pipeline with
context.current_pipeline
- Improve handling of absolute paths in file loader only to search path once, rather than unnecessarily go through the same relative path lookup sequence with the same path.
- Typing support added for the pypyr API entrypoint.
upgrade guide
pipeline authors
Your pipelines can now references custom modules (e.g custom steps) and child pipelines you call with pypyr.steps.pype relative to the pipeline itself.
Assume a file structure like this:
/Users
|- captainhook/
|- my-pipelines/
|- subdir/
|- my-pipe.yaml
|- mystep.py
|- sub-pipe.yaml
That you used in a pipeline like this
# ~/my-pipelines/subdir/my-pipe.yaml
steps:
- name: pypyr.steps.pype
comment: old way. reference child relative to working dir.
in:
pype:
name: subdir/sub-pipe # Looks for subdir/sub-pipe relative to $PWD
- subdir.mystep # looks for subdir/mystep.py relative to $PWD
This old way of doing things meant that you could ONLY run this pipeline from
the ~/my-pipelines
directory, because all the references in the pipeline are
relative to the working directory.
$ cd ~/my-pipelines
$ pypyr my-pipe # this will work
$ cd ~
$ pypyr my-pipelines/my-pipe # this will not work
# you would've had to set --dir to get this to work
$ pypyr my-pipe --dir my-pipelines # this will work
So forget about all that nonsense. As of now you can put your references relative to the pipeline itself, like this:
# ~/shared-pipelines/subdir/my-pipe.yaml
steps:
- name: pypyr.steps.pype
comment: new better way. resolves relative to current pipeline dir.
in:
pype:
name: sub-pipe # sub-pipe.yaml in same dir as current pipeline
- mystep # looks for mystep.py relative to current pipeline
You can now run your pipeline from anywhere:
$ cd ~/my-pipelines
$ pypyr my-pipe # this will work
$ cd ~
$ pypyr my-pipelines/my-pipe # this will also work
Woohoo! 🎉 This means you can now make re-usable, composable shared pipeline libraries that you can call from anywhere on your system!
See here for more details on the pipeline name resolution look-up order.
cli
If you were using the --dir
flag, now you only need to set --dir
if your
pipeline references custom modules or child pipelines that are NOT in the
pipeline’s direct parent directory and NOT in the current working directory.
# to run pipeline mydir/mypipe.yaml
# custom modules & child pipelines in mydir/
# the old way
$ pypyr --dir mydir mypipe
# the new way
$ pypyr mydir/mypipe
# to run pipeline mydir/subdir/mypipe.yaml
# custom modules & child pipelines in mydir/
# the old way
$ pypyr --dir mydir subdir/mypipe
# the new way
$ pypyr mydir/subdir/mypipe --dir mydir
In short, you very probably shouldn’t be using the --dir
flag anymore.
Instead, change your pipelines to resolve child pipelines & custom modules
relative to itself as described in the previous section for pipeline
authors. The only likely reason to be using --dir
is if
your custom modules live in a completely separate different filesystem location
than the pipelines.
api entrypoint
A single run()
function replaces both main()
and main_with_context()
.
The old:
pipelinerunner.main(pipeline_name='pipeline-dir/my-pipe',
pipeline_context_input=['arbkey=pipe', 'anotherkey=song'])
becomes:
pipelinerunner.run(pipeline_name='pipeline-dir/my-pipe',
args_in=['arbkey=pipe', 'anotherkey=song'])
The old:
context_out = pipelinerunner.main_with_context(
pipeline_name='pipeline-dir/my-pipe',
dict_in={'arbkey': 'pipe',
'anotherkey': 'song'})
becomes:
context_out = pipelinerunner.run(pipeline_name='pipeline-dir/my-pipe',
dict_in={'arbkey': 'pipe',
'anotherkey': 'song'})
See here for full details on how to use the new pipeline runner api.
py_dir replaces working_dir
Be aware that the working_dir
input doesn’t exist anymore. The new py_dir
input ONLY refers to the directory from which to load custom Python modules. The
pipeline name does NOT resolve relative to py_dir
as it used to for
working_dir
:
The old:
context_out = pipelinerunner.run(pipeline_name='pipeline-dir/my-pipe',
dict_in={'arbkey': 'pipe',
'anotherkey': 'song'},
working_dir=Path.cwd())
becomes:
context_out = pipelinerunner.run(pipeline_name='pipeline-dir/my-pipe',
dict_in={'arbkey': 'pipe',
'anotherkey': 'song'},
py_dir=Path.cwd())
You do NOT need to set py_dir
if your pipeline only references custom modules
that are in the pipeline’s directory itself.
You do NOT need to set py_dir
if you are not using any custom modules, or if
your custom modules are installed in the current python environment.
Only use py_dir
if your custom modules are NOT installed in the current python
environment and they are NOT relative to the pipeline’s directory.
See here for a full description of the pypyr custom module resolution order.
custom pype loader
If you have a custom pype loader with a signature like:
def get_pipeline_definition(pipeline_name, working_dir):
Replace it with the function signature:
def get_pipeline_definition(pipeline_name, parent):
Just this change should be sufficient for most cases. Be aware that unlike the
old working_dir
, parent
will always be None
for the 1st/root pipeline in a
call-chain.
The full new function signature is:
get_pipeline_definition(name: str, parent: Any) -> pypyr.pipedef.PipelineDefinition | Mapping
If you wanted to, you can now add extra metadata properties from your custom
loader by setting a pypyr.pipedef.PipelineInfo
object and returning it in
a pypyr.pipedef.PipelineDefinition
from your custom get_pipeline_definition
function. This is entirely optional - if you want to keep on returning just the
bare yaml dict-like payload, feel free to keep on doing so.
See here for full details on how to use the new custom pipeline loader
Oh, and BONUS! If you are using a custom pype loader, your pipeline authors do NOT explicitly need to set the custom loader on every child pipeline anymore on pypyr.steps.pype - the parent’s loader will automatically cascade to the child!
detailed technical breakdown
- Introduce new classes to model pipeline payload, rather than just using the bare dict-like yaml directly.
PipelineInfo
- pipeline metadata set by loader. This maintains a pipeline’s parent/path info so that child pipelines can load relative to the parent.PipelineDefinition
- this wraps the pipeline payload and its metadata (PipelineInfo
) to allow pypyr to cache it all with one reference
- Add new
Pipeline
class for the run-time properties of a single run.- The
Pipeline
references the shared cachedPipelineDefinition
. - Move run + load_and_run logic from pypyr.pipelinerunner to the new
Pipeline
class. This massively streamlines the pipeline invocation process, since run/load_and_run can just operate on the sharedPipeline
state rather than sling a bunch of args between different functions as before.
- The
- Add a call-stack of running
Pipeline
instances onContext
. i.e Parent -> child1 -> child2 where the root pipeline calls child pipelines viapype
- Add
current_pipeline
attribute toContext
, controlled with a context manager to scope itself to an individual pipeline run’s lifespan.- This means that steps can access current pipeline’s properties.
- This allows
pypyr.steps.pype
to find the current (i.e parent) pipeline’s metadata such as path, to load child pipeline relative to the calling pipeline’s location. - When a child pipeline completes, the calling pipeline (i.e the previous
Pipeline
in the call-stack) becomes the current pipeline
- Add
- Amend
pypyr.steps.pype
to instantiatePipeline
object toload_and_run()
child.pype
now deals thecontext.current_pipeline.pipeline_definition.info
metadata to work out whether to cascade parent path down to child, so child can load relative to the parent.- Notably, the parent loader now cascades to the child, so pipeline authors don’t need explicitly to set the same custom loader repeatedly for each child.
- Given the context manager controlling current pipeline scope in
Pipeline.load_and_run_pipeline
remove the clumsy side-shuffle for pipeline_name, working_dir to swap out these values as child pipe runs and swap these back when it completes/errors.
- Remove global PipelineCache. Replace with distinct pipeline cache per loader.
- This resolves a long standing limitation where pypyr assumed unique pipeline names across all loaders.
- The per-loader pipeline cache stores
PipelineDefinition
objects. - Introduce
Loader
class, which wraps loader & its pipeline cache.Loader
is what theLoaderCache
caches. - Thus
LoaderCache
->Loader
->_pipelineCache
->PipelineDefinition
- File loader has a private file cache keyed on absolute path of file
- This is to prevent >1 load+parse where the same underlying pipeline.yaml file has different names in the loader’s pipeline cache
- e.g
(name=‘dir/mypipe’, parent=None)
and(name=‘mypipe’, parent=‘dir’)
both resolve todir/mypipe.yaml
- e.g
- Caching a reference to the
PipelineDefinition
object, so not duplicating memory
- This is to prevent >1 load+parse where the same underlying pipeline.yaml file has different names in the loader’s pipeline cache
- Improve handling of absolute paths in file loader only to search path X1, rather than unecessarilly go through the same relative path lookup sequence with the same path.
- File loader
get_pipeline_definition
now returns aPipelineDefinition
withPipelineFileInfo
to store file-system specific metadata for the loader pipeline - Remove
working_dir
global. Thepy_dir
input onrun()
now refers ONLY to module paths, NOT pipeline locations.- The CLI
—dir
flag, orpy_dir
input onrun()
basically adds the specified directory tosys.path
.
- The CLI
- Add current pipeline’s parent directory to
sys.path
on load. This allows child pipelines to resolve custom modules relative to itself. pypyr.dsl.Step
does not needStepsRunner
anymore, because it can get it from thecontext.current_pipeline
instead.- Recode (some) integration tests to take advantage of list
pypyr.steps.append
step and checking that for output on return context rather than intercepting logger.NOTIFY. - Rename master branch to main in CI/CD GitHub actions
- Add typing annotations to the public
run()
function and thePipeline
class public accessors. The idea is NOT to type pypyr exhaustively, just to provide annotations for the sensible/likely entrypoint to enhance API user experience. Includepy.typed
inpypyr
package. - Remedy packaging snafu where
tests.common
was deploying alongside pypyr because exclude condition infind_packages
didn’t include wildcard for sub packages.
what’s changed
- Relative pipelines & API run() replaces main/main_with_context by @yaythomas in https://github.com/pypyr/pypyr/pull/243
Full Changelog: https://github.com/pypyr/pypyr/compare/v4.6.0...v5.0.0
how to install upgrade
If you want to upgrade (and you totally should!):
$ pip install --upgrade pypyr
source
You can find pypyr release v5.0.0 on github, where you can click through to associated Issues, Pull Requests and Users.
Released by yaythomas.