Fully automatic, just have them defined and Task.init and everything else will work out of the box.
Notice the Env will override clearml.conf, so you can have clearml.conf with other default values inside the container, and have the Env override the definition
(not to worry, it is Not a must to have clearml.conf , it's just a nice way to add default values)
Hi @<1694157594333024256:profile|DisturbedParrot38>
You mean how to tell the agent to pull only some submodules of your git?
If this is the case you can actually remove them on your git branch, submodule is a file with a soft link. Wdyt?
Because submodules inside a git are basically a requirement for a git repo to run. Skipping over a few or selecting manually will break the agent. That said maybe shallow clone might be easier or faster. Regardless it should be an environment passed per Task. Feel free to add a GH issue request, if this is not a unique edge case we will add it
Yes I was thinking a separate branch.
The main issue with telling git to skip submodules is that it will be easily forgotten and will break stuff. BTW the git repo itself is cached so the second time there is no actual pull. Lastly it's not clear on where one could pass a git argument per task. Wdyt?
Omg that's a lot of submodules!
It has nothing with what the tasks sees if you are inside a git repo you will have to cone it on the remote machine. Let me check in the code maybe you have a workaround
Let me check... I think you might need to docker exec
Anyhow, I would start by upgrading the server itself.
Sounds good?
I double checked the code it's always being passed 😞
It manages the scheduling process, so no need to package your code, or worry about building dockers etc. It also has an AWS autoscaler, that spins ec2 instances based on the amount of jobs you have in the execution queue, and the limit of your budget (obviously spinning down machines that are idle)
I can definitely feel you!
(I think the implementation is not trivial, metrics data size is collected and stored as commutative value on the account, going over per Task is actually quite taxing for the backend, maybe it should be an async request ? like get me a list of the X largest Tasks? How would the UI present it? As fyi, keeping some sort of book keeping per task is not trivial either, hence the main issue)
These instructions should create the exact chart:
None
What am I missing ?
Hi UnevenOstrich23
if --docker is enable that will means every new experiments will be executed into dedicated agent worker containers?
Correct
I think the missing part is how to specify the docker for the experiment?
If this is the case, in the web UI, clone your experiment (which will create a draft copy, that you can edit), then in the Execution tab, scroll down to the "base docker image" and specify the docker image to use.
Notice that you can also add flags after the docker im...
DepressedChimpanzee34 what would be easier curl
or python ?
CourageousLizard33 column order / specific selection is stored per user. If you press the share button you will have a link with all the definitions embedded on it.
Column resizing and order is in the next version release :)
Hi PunyGoose16 ,
next release includes it (eta after this weekend 😉 )
DeliciousBluewhale87
Upon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there..
Hmm that means it is working...
Do you see there a *.conf files? What do they contain? (it point to the correct clearml-server config)
Hi SmarmyDolphin68
Maybe the plot_report can help?
See here:
https://github.com/allegroai/trains/blob/a28a97b16067fd5c548ec73b061badde2515aa9f/examples/reporting/pandas_reporting.py#L32
for a TPU with more than 16GB GRAM and less than 40GB, so sometime we need to provision a A100 to get the training speed we want but we don't use all the GRAM
Oh that makes sense...
Just saw this one, this might help?
https://www.globenewswire.com/news-release/2022/10/24/2539924/0/en/ClearML-and-Genesis-Cloud-Announce-New-MLOps-Partnership-Delivering-100-Green-Energy-Compute-Solution-for-Machine-Learning.html
I am running from noebook and cell has returned
Well the Task will close when you shut down the notebook 🙂
Tried context provider for Task?
I guess that would only make sense inside notebooks ?!
Yeah, Curious - is a lot of clearml usecases not geared for notebooks?
That is somewhat correct, notebooks are not actually used with a lot of deep-learning projects as they require entire repository to support.
I guess generally speaking the workflow is, "test your code" (i.e. small scale with limited data), then clone and enqueue for remote execution.
That said, I think it will be great to expand the support.
TrickySheep9 I like the idea of context for Tasks, can you expand on how...
MysteriousBee56 Okay, let's try this one:docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && echo done"
Okay now let's try: EDITdocker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && python3 -m trains-agent --help"
Okay that might explain the issue...
MysteriousBee56 so what you are saying ispython3 -m trains-agent --help
does NOT work
but trains-agent --help
does work?
MysteriousBee56 that is very strange definitely explains it, kudos on debugging it !!!
MysteriousBee56 and please this one: "when you run the trains-agent
with --foreground , before it starts the docker it print the full command line"
BTW, we figure out that
'
is belong the echo
yep, when seeing the full command it is apparent
I think the ClearmlLogger is kind of deprecated ...
Basically all you need is Task.init at the beginning , the default tensorboard logger will be caught by clearml