Reputation
Badges 1
25 × Eureka!Hi GrotesqueOctopus42
Dispite having reuse_last_task_id=True on Task.init, it always creates a new task id. Anyone ever had this issue?
So the way "reuse_last_task_id=True" works is that if there are no artifacts on the Task it will reuse it, but when running inside jupyter it always has artifacts (the notebook itself), so it starts a new Task.
You can however pass a specific Task ID and it will reuse it "reuse_last_task_id=aabb11", would that help?
Hi SmallDeer34
I need some help what is the difference between the manual one and the automatic one ?
from your previous log, this is the bash command executed inside the container, can you try to "step by step" try to catch who/what is messing it up ?
` docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /tmp/...
Hi DepressedChimpanzee34
How do I reproduce the issue ?
What are we expecting to get there ?
Is that a Colab issue or hyper-parameter encoding issue ?
But it does make me think, if instead of changing the optimizer I launch a few workers that "pull" enqueued tasks, and then report values for them in such a way that the optimizer is triggered to collect the results? would it be possible?
But this is Exactly how the optimizer works.
Regardless of the optimizer (OptimizerOptuna or OptimizerBOHB) both set the next step based on the scalars reported by the tasks executed by agents (on remote machines), then decide on the next set of para...
Hi RipeGoose2
when I'm using the set_credentials approach does it mean the trains.conf is redundant? if
Yes this means there is no need for trains.conf , all the important stuff (i.e. server + credentials you provide from code).
BTW: When you execute the same code (i.e. code with set_credentials call) the agent's coniguration will override what you have there, so you will be able to run the Task later either on prem/cloud without needing to change the code itself π
Is there a way to capture uncommited changes withΒ
Task.create
Β just likeΒ
Task.init
Β does? Actually, I would like to populate the repo, branch and packages automatically...
You can pass a local repo path to Task create I "think" it will also store the uncommitted changes.
I start my main task like this:Β
python my_script.py --myarg "myargs"
. How are the arguments captured?
At runtime when argparse is called.
You can use ` clea...
Hi UnsightlySeagull42
does anyone know how this works with git ssh credentials?
These will be taken from the host ~/.ssh folder
Hi @<1560798754280312832:profile|AntsyPenguin90>
The image itself is uploaded in a blackground process, flush just triggers the starting of the process.
Could it be that it is showing a few seconds after?
BTW: how is it missing listing torch
? Do you have "import torch" in the code ?
It might be that the worker was killed before unregistered, you will see it there but the last update will be stuck (after 10min it will be automatically removed)
JitteryCoyote63 how can I reproduce it? (obviously when I tested it was okay)
The Overview panel would be extremely well suited for the task of selecting a number of projects for comparing them.
Could you elaborate ?
Another useful feature would be to allow adding information (e.g. metrics or metadata) to the tooltip.
You mean are we still talking about the "Overview" Tab?
what is user properties
Think of them as parameters you can add post execution, that you can also add to the Task table (i.e. customize columns)
how can I add parameters
task.set_user_properties([{"name": "backbone", "description": "network type", "value": "great"},]
If possible, i would like all together prevent the fileserver and write everything to S3 (without needing every user to change their config)
There is no current way to "globally" change the default files server (I think this is part of the enterprise version, alongside vault etc.).
What you can do is use an OS environment to override the conf file:CLEARML_FILES_HOST="
"
PricklyRaven28 wdyt?
Will using Model.remove, completely delete from storage as well?Β (edited)
correct see argument delete_weights_file=True
I solved the issue by implementing my own ClearML logger
This is awesome! any chance you want to PR it to transformers ?
does this work for multiple levels?
Yep π
GreasyPenguin14 you mean the artifacts/models ?
odd message though ... it should have said something about boto3
Please go ahead with the PR π
1
One reason I don't like using the configuration section is that it makes debugging much much harder.
debugging ? please explain how it relates to the configuration, and presentation (i.e. preview)
2.
Yes in theory, but in your case it will not change things, unless these "configurations" are copied on any Task (which is just storage, otherwise no real harm)
3.
I was thinking "zip" file that the Task creates and uploads, and a new configuration type, say "external/zip" , and in the c...
Any chance you can test with the latest RC ? 1.8.4rc2
Hmm do you host it somewhere? Is it pre-installed on the container?
Hi GreasyPenguin14
However the cleanup service is also running in a docker container. How is it possible that the cleanup service has access and can remove these model checkpoints?
The easiest solution is to launch the cleanup script with a mount point from the storage directory, to inside the container ( -v <host_folder>:<container_folder>
)
The other option, which clearml version 1.0 and above supports, is using the Task.delete, that now supports deleting the artifacts and mod...
hit ctrl-f5 (reload the page) do you still ge the same error? Is it limited to a specific experiment?
SlipperyDove40 I just installed a fresh copy py3.6 and plotly on ubuntu. the entire venv dir is ~86MB
ProudMosquito87 I think this is what you are looking for: https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L101