Reputation
Badges 1
25 × Eureka!My apologies, let me rephrase:
if you are using pip ans package manager and not running in docker-mode, trains-agent
cannot touch the cuda/cuddn drivers (actually .so) library.
If you want to verify you can check echo $LD_LIBRARY_PATH
GiddyTurkey39 I think I need some more details, what exactly is the scenario here?
Ohh, the controller task itself holds the artifacts ?
I mean , the python package, not the trains-server version
Yes, that makes sense. But did you see the callback being executed ? it seems it was supposed to, then the next call would have been 2:30 hours later, am I missing something ?
GrievingTurkey78 where do you see this message? Can you send the full server log
?
EnviousStarfish54
and the 8 charts are actually identical
Are you plotting the same plot 8 times?
SparklingElephant70 , let me make sure I understand, the idea is to make sure the pipeline will launch a specific commit/branch, and that you can control it? Also are you using the pipeline add_step
function or are you decorating a function with PipelineDecorator ?
Yes, it's a bit confusing, the gist of it is that we wanted to have the ability to have diff configurations for diff buckets
(you can find it in the pipeline component page)
EnviousStarfish54 Notice that you can configure it on the agent machine only, so in development you are not "wasting" storage when uploading debug checkpoints/models π
The problem is that even when I mount the SSH key into the root home directory (e.g.,
/root/.ssh/id_rsa
with the correct permissions set to 400) I still encounter the same error.
The agent automatically mount's the .ssh folder from the host into the container, making sure all the permissions are set,
how can I run
pip install -e .
in general the agent will add the "working" dir into the PYTHONPATH so that you should not have to manually do "-e ."
Tha...
MagnificentSeaurchin79
Do notice that the pipeline controller assumes you have an agent running
Hi WickedGoat98 ,
I think you are correct π
I would guess it is something with the ingress configuration (i.e. ConfigMap)
Hi JitteryCoyote63 ,
These properties are usually not available on the UI and are used internal, hence the lack of documentation. Regrading parent
property, it will hold a parent Task.id (str) , that said it has no real effect on the Task itself. You can however search for Tasks with a specific parent ID (For examples, this is how the the hyper parameter class is using this property)
But this is clearml python package, it is not really related to the server. Could it be you also update the clearml package ?
Why can't it be updated after creation?
You can but then you have to rerun it again. I mean technically this is obviously solvable, but the idea was to make it simple to use, and since we "assume" in most cases there is a single Task per execution, it made sense. wdyt?
Hi EnthusiasticCoyote38
Does clearml-agent hasΒ option
Fully supported π
Should work out of the box, it will always clone with --recursive and will bring all submodules
I want to optimizer hyperparameters with trains.automation but: ...
Yes you are correct, in case of the example code, it should be "General/..." if you have ArgParser, it should be "Args/..." Yes it looks like the metric is wrong, it should be "epoch_accuracy" & "epoch_accuracy"
Can you verify by adding the the following to your extra_docker_shell_script:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L152extra_docker_shell_script: ["echo machine example.com > ~/.netrc", "echo login MY_USERNAME >> ~/.netrc", "echo password MY_PASSWORD >> ~/.netrc"]
@<1523710674990010368:profile|GreasyPenguin14> If I understand correctly you can use tokens as user/pass (it's basically the same interface from the git client perspective, meaning from ClearML
git_user = gitlab-ci-token
git_pass = <the_actual_toke>
WDYT?
Yes (Mine isn't and it is working π )
Hi @<1631102016807768064:profile|ZanySealion18>
ClearML doesn't pick up model checkpoints automatically.
What's the framework you are using?
BTW:
Task.add_requirements("requirements.txt")
if you want to specify Just your requirements.txt, do not use add_requirements use:
Task.force_requirements_env_freeze(requirements_file="requirements.txt")
(add requirements with a filename does the same thing, but this is more readable)
I see the problem now: conda is failing to install the package from the git, then it reverts to pip install, and pip just fails... " //github.com/ajliu/pytorch_baselines "
I think that what you need is the triggers, check this one:
https://clear.ml/docs/latest/docs/references/sdk/trigger
Thanks ShortElephant92 ! PR looks good, I'll ask the guts to take a look
Okay, make sure that in your trains.conf
on all the trains-agent machine you add the following:agent.extra_docker_arguments: ["-v", "/etc/hosts:/etc/hosts",]