Reputation
Badges 1
662 × Eureka!@<1523701070390366208:profile|CostlyOstrich36> I added None btw
Thanks! I'll wait for the release note/docs update š
Happens with the latest version indeed.
I canāt share our code, but the gist of it is:
pipe = PipelineController(name=..., project=..., version=...)
pipe.add_function_step(...) # Many calls
pipe.set_default_execution_queue(...)
pipe.start(queue=..., wait=True)
At any case @<1537605940121964544:profile|EnthusiasticShrimp49> this seems like a good approach, but itās not quite there yet. For example, even if Iād provide a simple def run_step(ā¦)
function, Iād still need to pass the instance to the function. Passing it along in the kwargs
for create_function_task
does not seem to work, so now I need to also upload the inputs, etc ā Iām bringing this up because the pipelines do already do this for you.
So maybe summarizing (sorry for the spam):
- Pipelines:- Pros: Automatic upload and serialization of input arguments
- Cons: Clutter, does not support classes, cannot inject code, does not recognize environment when run from e.g. IPython- Tasks:- Pros: Tidier and matches original idea, recognizes environment even when run from IPython
- Cons: Does not support classes, cannot inject code, does not automatically upload input arguments
More experiments @<1537605940121964544:profile|EnthusiasticShrimp49> - the core issue with the create_function_step
seems to be that the chosen executable will be e.g. IPython
or some notebook, and not e.g. python3.10
, so it fails running it as a taskā¦ š¤
Interesting, why wonāt it be possible? Quite easy to get the source code using e.g. dill
.
Thanks David! I appreciate that, it would be very nice to have a consistent pattern in this!
TimelyPenguin76 that would have been nice but I'd like to upload files as artifacts (rather than parameters).
AgitatedDove14 I mean like a grouping in the artifact. If I add e.g. foo/bar
to my artifact name, it will be uploaded as foo/bar
.
Consider e.g:
# steps.py
class DataFetchingStep:
def __init__(self, source, query, locations, timestamps):
# ...
def run(self, queue=None, **kwargs):
# ...
class DataTransformationStep:
def __init__(self, inputs, transformations):
# inputs can include instances of DataFetchingStep, or local files, for example
# ...
def run(self, queue=None, **kwargs):
# ...
And then the following SDK usage in a notebook:
from steps imp...
I can elaborate in more detail if you have the time, but generally the code is just defined in some source files.
Iāve been trying to play around with pipelines for this purpose, but as suspected, it fails finding the definition for the pickled objectā¦
I think also the script path in the created task will cause some issues, but letās seeā¦
I'll see if we can do that still (as the queue name suggests, this was a POC, so I'm trying to fix things before they give up š ).
Any other thoughts? The original thread https://clearml.slack.com/archives/CTK20V944/p1641490355015400 suggests this PR solved the issue
Hey @<1537605940121964544:profile|EnthusiasticShrimp49> ! Youāre mostly correct. The Step
classes will be predefined (of course developers are encouraged to add/modify as needed), but as in the DataTransformationStep
, there may be user-defined functions specified. Thatās not a problem though, I can provide these functions with the helper_functions
argument.
- The
.add_function_step
is indeed a failing point. I canāt really create a task from the notebook because calling `Ta...
Iāll give the create_function_task
one more try š¤
See e None @<1523701087100473344:profile|SuccessfulKoala55>
Dynamic pipelines in a notebook, so I donāt have to recreate a pipeline every time a step is changed š¤
Then I wonder:
- How to achieve this? The pipeline controller seems to only work with functions, not classes, so running smaller steps remotely seems more difficult then I imagined. I was already prepared to upload artifacts myself etc, but now Iām not sure?
- Do I really need to recreate the pipeline everytime from scratch? Or can I remove/edit steps? Itās mostly used as aā¦ controller for notebook-based executions and experimentations, before the actual pipeline is known. That is, it will ...
Thanks @<1537605940121964544:profile|EnthusiasticShrimp49> ! Thatās definitely the route I was hoping to go, but the create_function_task
is still a bit of a mystery, as Iād like to use an entire class with relevant logic and proper serialization for inputs, and potentially Iāll need to add more āhelper functionsā (as in the case of DataTransformationStep
, for example). Any thoughts on that? š¤
We're wondering how many on-premise machines we'd like to deprecate. For that, we want to see how often our "on premise" queue is used (how often a task is submitted and run), for how long, how many resources it consumes (on average), etc.
I can also do this via Mongo directly, but I was hoping to skip the K8S interaction there.
An internal project I've accidentally made with a hidden tag while playing around with the ClearML internal code.
Where do I import this APIClient from AgitatedDove14 ? I meanwhile edited it directly in mongo, but editing a db directly on a Friday is a big nono
I'd like to remove the hidden
system tag from a project
One way to circumvent this btw would be to also add/use the --python
flag for virtualenv
Still failing with 1.2.0rc3 š AgitatedDove14 any thoughts on your end?
So basically I'm wondering if it's possible to add some kind of small hierarchy in the artifacts, be it sections, groupings, tabs, folders, whatever.
I think -
- Creating a pipeline from tasks is useful when you already ran some of these tasks in a given format, and you want to replicate the exact behaviour (ignoring any new code changes for example), while potentially changing some parameters.
- From decorators - when the pipeline logic is very straightforward and you'd like to mostly leverage pipelines for parallel execution of computation graphs
- From functions - as I described earlier :)