This will disable storing the uncommitted changes
This is inside the container on the machine running the agent. It is taking the configurations as a basis from the local clearml.conf
Hi @<1790190274475986944:profile|UpsetPanda50> , Optuna has an internal mechanism for early stopping
FiercePenguin76 , we'll try to reproduce. Thanks for the input!
What happens if you remove the repository information?
Regarding connect_configuration() , reading into the docs I see that this method needs to be called before reading the config file
https://clear.ml/docs/latest/docs/references/sdk/task#connect_configuration
From the looks of this example this should be connected automatically actually
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
VexedCat68 , you can iterate through all 'running' tasks in a project and abort them through the api. The endpoint is tasks.stop
Hi @<1529633475710160896:profile|ThickChicken87> , I would suggest opening developer tools (F12) and observing what api calls go out when going over the experiment object. This way you can replicate the api calls to pull all the relevant data. I'd suggest reading more here - None
@<1544853721739956224:profile|QuizzicalFox36> , yes 🙂
Hi @<1829328217773707264:profile|DiminutiveButterfly84> , how are you building the pipeline? Is it from tasks or from decorators?
Hi @<1829328217773707264:profile|DiminutiveButterfly84> , Is the code part of some repository or its just a folder with some script files?
can you add full log of the pipeline + step? So you mean its not cloning the utils properly or doesn't import them properly?
Hi SubstantialElk6 ,
That's an interesting idea. I think if you want to preprocess a lot of data I think the best would be using multiple datasets (each per process) or different versions of datasets. Although I think you can also pull specific chunks of dataset and then you can use just the one - I'm not sure about the last point.
What do you think?
You're still using both n1-standard-1 and nvidia/cuda:10.2-runtime-ubuntu18.04
Hi @<1582179661935284224:profile|AbruptJellyfish92> , connectivity issues should not affect training and should cache everything until connection is restored and everything should be sent to the server. Did you encounter a different behavior?
So when you do torch.save() it doesn't save the model?
Hi @<1769534171857817600:profile|DepressedSeaurchin77> , can you please provide the full screenshot for context?
TenseOstrich47 , you can specify a docker image withtask.set_base_docker(docker_image="<DOCKER_IMAGE>")You will of course need to login to ECR on that machine so it will be able to download the docker image.
Then add a screenshot of the info section
Aren't you getting logs from the docker via ClearML? I think you can build that capability fairly easily with ClearML, maybe add a PR?
Hi @<1572395190897872896:profile|ShortWhale75> , this capability exists as part of the HyperDatasets feature which is present in the Scale/Enterprise licenses.
Hi TeenyHamster79 ,
I think the API you're looking for is tasks.get_by_id and the fields you're looking for are:data.tasks.0.execution.queue.namedata.tasks.0.execution.queue.id
Tell me if it helps 🙂
and also take a look into development.apply_environment
Hi ExcitedSeaurchin87 ,
How are you trying to run the agents? Also, are you trying to run multiple agents on the same GPU?
this doesn't explain as to why the env variables didn't work though
Maybe you defined the env variables outside the container? Maybe incorrect usage on your end? The env variables work when properly configured.
My guess would be something related to your environments.
Hi @<1529271085315395584:profile|AmusedCat74> , thanks for reporting this, I'll ask the ClearML team to look into this
I am not very familiar with KubeFlow but as far as I know it is mainly for orchestration whereas ClearML offers a full E2E solution 🙂