Makes sense to add it to docker run by default if GPUs are mentioned in agent.
I think this is an arch thing, --privileged is not needed on ubuntu flavor, that said you can always have it if you add it here:
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L149
clearml-agent daemon --gpus 0 --queue default --docker
But docker still sees all GPUs.
Yes --gpus should be enough, are you sure regrading the --privileged flag ?
Hmm, maybe the original Task was executed with older versions? (before the section names were introduced)
Let's try:DiscreteParameterRange('epochs', values=[30]),
Does that gives a warning ?
SarcasticSparrow10 sure see "execute_remotely" it does exactly that:
https://allegro.ai/docs/task.html#trains.task.Task.execute_remotely
It will stop the current process (after syncing everything) and launch itself remotely (i.e. enqueue itself)
When the same code is running by the "trains-agent" the execute_remotely call becomes a no-operation and is basically skipped
We are working hard on release 1.7 once that is out we will push an RC for review (I hope) 🙂
Hmm that is odd, can you send an email to support@clear.ml ?
at the end of the manual execution
Wait even without the pipeline decorator this function creates the warning?
I have a task where I create a dataset but I also create a set of matplotlib figures, some numeric statistics and a pandas table that describe the data which I wish to have associated with the dataset and vieawable from the clearml web page for the dataset.
Oh sure, use https://clear.ml/docs/latest/docs/references/sdk/dataset#get_logger they will be visible on the Dataset page on the version in question
MagnificentSeaurchin79 do you have the "." package listed under "installed packages" after you reset the Task ?
VexedCat68
. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.
Are you uploading the checkpoints manually with artifacts? or is it autologged & uploaded ?
Also why no reuse and overwrite older checkpoints ?
PompousParrot44 did you manage to get it working ?
I think the reason is that the "original" task is already the right type. I'll make sure we fix it, and always set the system tag
Hi @<1524560082761682944:profile|MammothParrot39>
By default you have the last 100 iterations there (not sure why you are only seeing the last 3), but this is configurable:
None
I can definitely feel you!
(I think the implementation is not trivial, metrics data size is collected and stored as commutative value on the account, going over per Task is actually quite taxing for the backend, maybe it should be an async request ? like get me a list of the X largest Tasks? How would the UI present it? As fyi, keeping some sort of book keeping per task is not trivial either, hence the main issue)
Hi BurlyPig26
I think you can easily change the Web port, but not the API (8008) or files (8081) port
How are you deploying it?
Can you let me know if i can override the docker image using template.yaml?
No, you cannot.
But you can pass OS environment "CLEARML_DOCKER_IMAGE" to set a diff default one
And the agent section on this machine is:api_server:Â
web_server:Â
files_server:Â
Is that correct?
but instead, they cannot be run if the files they produce, were not committed.
The thing with git, if you have new files and you did not add them, they will not appear in the git diff, hence missing when running from the agent. Does that sound like your case?
BTW: any specific reason for going the RestAPI way and not using the python SDK ?
Hi JitteryCoyote63
The NVIDIA_VISIBLE_DEVICES
is set automatically for the process the trains-agent spins, so from your code, it is transparent, you can only "see" GPU 0.
(Obviously not using docker you can forcefully change the OS environment in runtime, but you should avoid that ;))
are you referring to the same line? 47 in cache.py?
TenseOstrich47 this looks like elasticserach is out of space...
Lol yeah Hydra is great. Notice you still have the ability to override Hydra from the UI so you really have the best of the two worlds
Hi ResponsiveCamel97
Let me explain how it works, essentially it creates a new venv inside the docker, inheriting all the packages form the main system packages.
This allows it to use the installed packages if the version match, and upgrade/change if you need, all without the need to rebuild a new container. Make sense ?
What do you mean by a custom queue ?
In the queues page you have a plus button, this will just create a new queue
I'm hoping we are ready to release
seems like the server returned 400 error, verify that you are working with your trains-server and not the demoserver :)
@<1523701868901961728:profile|ReassuredTiger98> how did you install the nightly locally ?
Can you also provide the full log?