Reputation
Badges 1
662 × Eureka!Created this for follow up, SuccessfulKoala55 ; I'm really stumped. Spent the entire day on this 🥹
https://github.com/allegroai/clearml-agent/issues/134
Heh, well, John wrote that in the first reply in this thread 🙂
And in Task.init
main documentation page (nowhere near the code), it says the following -
@<1523701070390366208:profile|CostlyOstrich36> I added None btw
Thanks! I'll wait for the release note/docs update 😁
At any case @<1537605940121964544:profile|EnthusiasticShrimp49> this seems like a good approach, but it’s not quite there yet. For example, even if I’d provide a simple def run_step(…)
function, I’d still need to pass the instance to the function. Passing it along in the kwargs
for create_function_task
does not seem to work, so now I need to also upload the inputs, etc — I’m bringing this up because the pipelines do already do this for you.
So maybe summarizing (sorry for the spam):
- Pipelines:- Pros: Automatic upload and serialization of input arguments
- Cons: Clutter, does not support classes, cannot inject code, does not recognize environment when run from e.g. IPython- Tasks:- Pros: Tidier and matches original idea, recognizes environment even when run from IPython
- Cons: Does not support classes, cannot inject code, does not automatically upload input arguments
More experiments @<1537605940121964544:profile|EnthusiasticShrimp49> - the core issue with the create_function_step
seems to be that the chosen executable will be e.g. IPython
or some notebook, and not e.g. python3.10
, so it fails running it as a task… 🤔
Interesting, why won’t it be possible? Quite easy to get the source code using e.g. dill
.
Thanks David! I appreciate that, it would be very nice to have a consistent pattern in this!
TimelyPenguin76 that would have been nice but I'd like to upload files as artifacts (rather than parameters).
AgitatedDove14 I mean like a grouping in the artifact. If I add e.g. foo/bar
to my artifact name, it will be uploaded as foo/bar
.
Consider e.g:
# steps.py
class DataFetchingStep:
def __init__(self, source, query, locations, timestamps):
# ...
def run(self, queue=None, **kwargs):
# ...
class DataTransformationStep:
def __init__(self, inputs, transformations):
# inputs can include instances of DataFetchingStep, or local files, for example
# ...
def run(self, queue=None, **kwargs):
# ...
And then the following SDK usage in a notebook:
from steps imp...
I can elaborate in more detail if you have the time, but generally the code is just defined in some source files.
I’ve been trying to play around with pipelines for this purpose, but as suspected, it fails finding the definition for the pickled object…
I think also the script path in the created task will cause some issues, but let’s see…
I'll see if we can do that still (as the queue name suggests, this was a POC, so I'm trying to fix things before they give up 😛 ).
Any other thoughts? The original thread https://clearml.slack.com/archives/CTK20V944/p1641490355015400 suggests this PR solved the issue
Hey @<1537605940121964544:profile|EnthusiasticShrimp49> ! You’re mostly correct. The Step
classes will be predefined (of course developers are encouraged to add/modify as needed), but as in the DataTransformationStep
, there may be user-defined functions specified. That’s not a problem though, I can provide these functions with the helper_functions
argument.
- The
.add_function_step
is indeed a failing point. I can’t really create a task from the notebook because calling `Ta...
I’ll give the create_function_task
one more try 🤔
See e None @<1523701087100473344:profile|SuccessfulKoala55>
Dynamic pipelines in a notebook, so I don’t have to recreate a pipeline every time a step is changed 🤔
Then I wonder:
- How to achieve this? The pipeline controller seems to only work with functions, not classes, so running smaller steps remotely seems more difficult then I imagined. I was already prepared to upload artifacts myself etc, but now I’m not sure?
- Do I really need to recreate the pipeline everytime from scratch? Or can I remove/edit steps? It’s mostly used as a… controller for notebook-based executions and experimentations, before the actual pipeline is known. That is, it will ...
Thanks @<1537605940121964544:profile|EnthusiasticShrimp49> ! That’s definitely the route I was hoping to go, but the create_function_task
is still a bit of a mystery, as I’d like to use an entire class with relevant logic and proper serialization for inputs, and potentially I’ll need to add more “helper functions” (as in the case of DataTransformationStep
, for example). Any thoughts on that? 🤔
We're wondering how many on-premise machines we'd like to deprecate. For that, we want to see how often our "on premise" queue is used (how often a task is submitted and run), for how long, how many resources it consumes (on average), etc.
I can also do this via Mongo directly, but I was hoping to skip the K8S interaction there.
An internal project I've accidentally made with a hidden tag while playing around with the ClearML internal code.
Where do I import this APIClient from AgitatedDove14 ? I meanwhile edited it directly in mongo, but editing a db directly on a Friday is a big nono
I'd like to remove the hidden
system tag from a project
One way to circumvent this btw would be to also add/use the --python
flag for virtualenv
Still failing with 1.2.0rc3 😞 AgitatedDove14 any thoughts on your end?