So, for example, if I have a machine with 64 CPU cores, I will be able to run up to 64 agents (in case my system consists of only this machine), right?
Thanks for helping. You and your team are doing a great job for the ML community.
Hi AgitatedDove14 , great, glad it was fixed quickly!
By the way, before releasing version 1.1.3 you might want to take a look at this mock example. I'm trying to run the same pipeline (with different configurations) in a single for loop, as you can see below:
` from clearml import Task
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue1")
def step_1(msg: str):
msg += "\nI've survived step 1!"
re...
Sure, converting pipelines into components also works for me (ignoring still having to fix the problem with LazyEvalWrapper
return values). But this way some interesting features of the pipeline are missing, such as displaying the step execution DaG in the PLOTS tab .
Then ClearML should also detect the dependencies of the imported scripts? In this case shouldn't it detect that I am going to use tensorflow
and install it as well? Because it is not actually recognizing it.
Well I tried several things but none of them have worked. I'm a bit lost
Mmm what would be the implications of not being part of the DAG? I mean, how could that step be launched if it is not part of the execution graph?
Yes, I like it! I was already get used to the ' execute_steps_as_functions' argument of PipelineDecorator.debug_pipeline()
but I find your proposal to be more intuitive.
Indeed it does! But what still puzzles me so badly is why I get below path when running dataset.get_local_copy()
on one of the machines of my cluster:/home/user/.clearml/cache/storage_manager/datasets/.lock.000.ds_61ff8d4335dd4b74bd78c3576fa44131.clearml
Why is it pointing to a .lock file?
BTW, how can I run 'execute_orchestrator' concurrently? That is, launch it for several configurations at the same time? The way it's implemented now, it doesn't start the next configuration until the current one is finished.
Exactly, at first I was trying to call a component from another component, but it didn't work. Then I thought it would be more natural to do this using a pipeline, but it didn't recognize the user_config_creation
function despite I imported it as I would do under PipelineDecorator.component
. I really like the idea of enabling an argument to specify the components you are going to use in the pipeline so they are in the step's context! I will be eagerly waiting for that feature :D
Sure, but I mean, apart from label it as a local path, what's the point of renaming the original path if my goal is to access it later using the name I gave it?
Mmm that's weird. Because I can see the type hints in the function's arguments of the automatically generated script. So, maybe I'm doing something wrong or it's a bug, since they have been passed to the created step (I'm using clearml version 1.1.2 and clearml-agent version 1.1.0).
Hi AgitatedDove14 ,
Any updates on the new ClearML release that fixes the bugs we mentioned in this thread? :)
Well, I need to write boilerplate code to do parsing stuff if I want to use the original values after I connect the dictionary to the task, so it's a bit messy.
Currently I'm using clearml v1.0.5 and clearml-agent v1.0.0
I see, but I don't understand the part where you talk about passing the task ID to the child processes. Sorry if it's something trivial. I recently started working with ClearML.
SuccessfulKoala55 I have not tried yet with argparse, but maybe I will encounter the same problem
I don't know if you remember the need I had some time ago to launch the same pipeline through configuration. I've been thinking about it and I think PipelineController fits my needs better than PipelineDecorator in that respect.
For any reason I can't get the values in their original types. Only the dictionary keys are returned as the raw nested dictionary, but the values remain casted.
Sure, it's already enabled. I noticed in the ClearML agent configuration another parameter related to environment caching, named as venv_update
(I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?
Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable an output_uri
parameter in the PipelineDecorator.component
. Anyway, could another task be initialized in the same scr...
Since I am still on time, I would like to report another minor bug related to the 'add_pipeline_tags' parameter of PipelineDecorator.pipeline
. It turns out when the pipeline consists of components that in turn use other components (via 'helper_functions'), these nested components are not tagged with 'pipe: <pipeline_task_id>'. I assume this should not be like that, right?
Are you suggesting just taking the read_and_process_file
function out of the read_dataset
method, or maybe decoupling the read_dataset
method from the NetCDFReader
class so it is not pickle along with the class instance itself?
As for the second option, you mean create the task in the __init__
method of the NetCDFReader class?
It would be a great idea to make the Task picklelizable, since at the moment what are the most frequently used options for integrating ...
Well, this is just a mock example 🙂 . In the real application I'm working on there will be more than one configuration file (in principle one for the data and one for the DL model). Regarding the fix, I am not in a hurry at the moment. I'll happily wait for tomorrow (or the day after) when the commit is pushed!
How can I tell clearml
I will use the same virtual environment in all steps and there is no need to waste time re-installing all packages for each step?
Mmm I see. So the agent is taking the parameters from the base task registered in the server. Then if I call task.get_parameter_as_dict
for a task that has not been executed by an agent, should I get the original types of the values?
Thanks AgitatedDove14 ! Wow, I was definitely not expecting that behavior 🤣 I will check it out tomorrow. Just one more thing, what do you mean by "my_task_id_that_i_generated_before_here"?
Hi AgitatedDove14 , so isn't it ClearML best practice to create a draft pipeline to have the task on the server so that it can be cloned, modified and executed at any time?
Exactly!! That's what I was looking for: create the pipeline but not launching it. Thanks again AgitatedDove14