
Reputation
Badges 1
25 × Eureka!I called task.wait_for_status() to make sure the task is done
This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object
What about Calling Taskl.init Without the agent?
ClearML best practice to create a draft pipeline to have the task on the server so that it can be cloned, modified and executed at any time?
Well it is, we just assume that you executed the pipeline somewhere (i.e. made sure it works) 🙂
Correction:
What you actually are looking for (and I will make sure we have it in the doc) is :pipeline.start(queue=None)
It will just leave it as is, so you can manually enqueue / clone it 🙂
Hi ObedientDolphin41
I keep bumping against the
ModuleNotFoundError: No module named
exception.
Import the package inside the component function (the one you decorated), it will make sure it lists it in the requirements section automatically.
You can also set it manually by passing it to as the "packages" argument on the decorator function:
If I edit directly the OmegaConf in the UI than the port changes correctly
This will only work if you change the Hydra/allow_omegaconf_edit to True in the UI. Did you?
By default the remote link (i..e the Task you are creating with Task.create will have all the auto logging turned on)
For finer control we kind of assume you have Task.init inside your remote script, and then just pass add_task_init_call=False
does that make sense ?
Do you think we should have a way to configure those auto_connect args when creating the Task?
Hi @<1797800418953138176:profile|ScrawnyCrocodile51>
Will the docker container / disk space (really I am more interested about the dataset that download by the task) get automatically clean up?
Yes, the agent is running the container with --rm
🙂
Hi @<1547390438648844288:profile|ScaryJellyfish75>
These hyperpaters are now in the "Args" section of my Clearml task
Sure that would probably mean
UniformParameterRange(
"Args/training/optimizer/lr",
min_value=0.00025,
max_value=0.01,
step_size=0.00025,
),
assuming your Task has training/optimizer/lr
in its Args section (under configuration tab), make sense ?
Is this a common case? maybe we should change the run_pipeline_steps_locally
argument to False?
(The idea of run_pipeline_steps_locally=True
is that it will be easier to debug the entire pipeline on the same machine)
corporate firewall... let's start with http 🙂
Okay here is a standalone code that should be close enough? (if I missed anything let me know)
` import tempfile
from datetime import datetime
from pathlib import Path
import tensorflow as tf
import tensorflow_datasets as tfds
from clearml import Task
task = Task.init(project_name="debug", task_name="test")
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def normalize_img(image, labe...
Could I use "register artifact"
I think this is somewhat deprecated and we should probably replace it with something similar to what you mentioned (i.e. watch a file change).
Right now the easiest way would e to manually upload the trainer_state.json
every checkpoint:Task.current_task().upload_artifact('trainer_state.json
, name='state') `
thanks MagnificentSeaurchin79 , yes that makes it clear.
If that is the case, I think building a container is the easiest solution 🙂
(BTW: You could also build a wheel, if you have setup.py then running is once bdist_wheel will build a wheel, and then install the wheel)
Yes, no reason to attach the second one (imho)
Right, DepressedChimpanzee34 what's the clearml version you are using ?
(since you are using venv mode, if the cuda is not detected at startup time, it will not install the GPU version, as it has no CUDA support)
Hi JitteryCoyote63
What do you have in the agent.cuda_version
?
(you can see it printed at the beginning of the log)
HealthyStarfish45
Is there a way to say to a worker that it should not take new tasks? If there is such a feature then one could avoid the race condition
Still undocumented, but yes, you can tag it as disabled.
Let me check exactly how.
HealthyStarfish45 if I understand correctly the trains-agent is running as daemon (i.e. automatically pulling jobs and executes them), the only point might be cancelling a daemon will cause the Task executed by that daemon to be canceled as well.
Other than that, sounds great!
Hi OutrageousSheep60
Is there a way to instantiate a
clearml-task
while providing it a
Dockerfile
that it needs to build prior to executing the task?
Currently not really, as at the aned the agent does need to pull a container,
But you can cheive basically the same by adding the "dockerfile" script as --docker_bash_setup_script
Notice of course that this is an actual bash script not Docker script, so no need for "RUN" prefix.
wdyt?
oh dear ...
ScrawnyLion96 let me check with front-end guys 😞