The latest TAO doesn't use python for fine tuning, rather it uses the CLI entirely
It's a good question, but I think the CLI actually just runs a python code (the CLI is their interface). Generally speaking I'm pretty sure it will not be complicated to convert the TLT integration to support TAO (Nvidia helps with that, and I think we had a similar proces with Nvidia Clara/MONAI)
BTW: how are you using Nvidia TAO ?
and if you add --skip-task-init
?
I think what happens is that the clearml-Task, adds a Task.init
call without the output_uri
that is called before "your" Task.init, and this is what causes it to be ignored. Could that be the case?
SmallAnt76
see https://clear.ml/pricing/ , under "What plan should I choose?"
what you are looking for is the first column "open-source". make sense ?
I located the issue, I'm assuming the fix will be in the next RC π
(probably tomorrow or before the weekend)
Hi ShallowArcticwolf27
from the command line to a remote machine while loading a localΒ
.env
Β file as a configuration object?
Where would the ".env" go to ? Are we trying to pass it to the remote machine somehow ?
MistakenBee55 how about a Task doing the Model quantization, then trigger it with TriggerScheduler ?
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py
Hi @<1570220858075516928:profile|SlipperySheep79>
Is there a way to specify the working dir from the decoratoe
not directly, but why would that change anything? I mean the coponent code will be created in the git root, and you can still access files inside the subfolders
from .subfolder import something
what am I missing?
I would like to put table with url links and image thumnails.
StraightParrot3 links will work inside table (your code sample looks like the correct way to add them), but I think plotly (which is the UI package that displays the table) does not support embedding images into tables π
When they add it, the support will be transparent and it would work as you expect
HugeArcticwolf77 from the CLI you cannot control it (but we could probably add that), from code you can:
https://github.com/allegroai/clearml/blob/d17903d4e9f404593ffc1bdb7b4e710baae54662/clearml/datasets/dataset.py#L646
pass compression=ZIP_STORED
Yes, the container level (when these docker shell scripts run).
I think this is the tricky part, in code you can access the user ID of the Task, and download the .env and apply it, but before the process starts I can't really think of a way to do that ...
That said, I think that in the paid version they have "vault" support, which allows you to store the .env file on the clearml-server, and then the agent automatically applies it at the beginning of the container execution.
Okay I'll dig into it π
EnviousStarfish54 a fix is already available in the latest RC
Could you verify it solves your issue as well?pip install trains==0.16.2rc0
JitteryCoyote63 maybe this is an old example of the pytrorch ddp code? it is basically copy pasted from the pytorch website:
https://pytorch.org/tutorials/intermediate/dist_tuto.html
PompousParrot44 unfortunately not yet π
But the gist is :
MongoDB stores experiment data (i.e. execution parameters, git ref etc.)
ElasticSearch stores results (i.e. metrics console logs, debug image links etc.)
Does that help?
for example train.py & eval.py under the same repo
Hi JitteryCoyote63
Or even better: would it be possible to have a support for HTML files as artifacts?
If you report html files as debug media they will be previewed, as long as the link is accessible.
You can check this example:
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
In the artifacts, I think html are also supported (maybe not previewed as nicely but clickable.
Regrading the s3 link, I think you are supposed to get a popup window as...
Hi @<1569496075083976704:profile|SweetShells3>
Try to do:
import torch.distributed as dist
if dist.get_rank()==0:
task = Task.init(...)
This will make sure only the "master" process is logged
or
if int(os.environ.get('RANK'))==0:
task = Task.init(...)
Hi IrritableOwl63
Yes this seems like a docker setup issue π
either run the agent with sudo (not really recommended π ) or add to suduers :
https://docs.docker.com/engine/install/linux-postinstall/
1e876021bbef49a291d66ac9a2270705
just make sure you reset it π
Ohh, yes, we need to map the correct clearml.conf, sorry, try (I fixed both clearml.conf mapping and ,ssh folder mapping):
` docker run -t --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /home/dwhitena/clearml.conf:/root/clearml.conf -v /home/dwhitena/.ssh:/root/.ssh -v /home/dwhitena/.clearml/apt-cache.1:/var/cache/apt/arc...
Also in the same open docker session, can you try:$LOCAL_PYTHON -m clearml_agent execute --disable-monitoring --id <task_id_here>
Where the Task ID is one of the failed executions (only reset it before)
Yes please, just to verify my hunch.
I think that somehow the docker mounts the agent is creating are (for some reason) messing it up.
Basically you can just run the following (it will do everything automatically) (replace the <TASK_ID_HERE> with the actual one)
` docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig ...
GiganticTurtle0 can you please add a github issue with feature request to clearml-agent? I think this is a great use case!
I'm not sure if it matters but 'kwcoco' is being imported inside one of the repo's functions and not on the script's header.
Should work.
when you run pip freeze inside the same env what are you getting ?
Also, is there anyother import that is missing? (basically 'clearml' tryies to be smart, and see if maybe the script itself, even though inside a repo, is not actually importing anything from the repo itself, and if this is the case it will only analyze the original script. Basically...
Could you give an example of such configurations ?
(e.g. what would be diff from one to another)