Can you also try specifying the branch/commit?
Hi @<1669152726245707776:profile|ManiacalParrot65> , you can find the full documentation here - None
I'm not sure. Maybe @<1523703436166565888:profile|DeterminedCrab71> might have some input
Hi @<1531807732334596096:profile|ObliviousClams17> , are you self deployed? Can you please provide the full log?
What happens if you clear the commit and just run with latest master?
Hi @<1750689997440159744:profile|ShinyPanda97> , I think you can simply move the model to a different project as part of the pipeline
Hi @<1547028074090991616:profile|ShaggySwan64> , You can try this. However, Elastic takes space according to the amount of metrics you're saving. Clearing some older experiments would free up space. What do you think?
You need the agents to run the various pipeline steps + controllers
Hi @<1533619734988197888:profile|DistressedSquid12> , what errors are you getting? How are you trying to connect it?
Please try like this model.tags=['Test']
and not with append
Hi @<1726047624538099712:profile|WorriedSwan6> , to answer your questions:
Would you recommend to:
- Host the dataset on clearml-server ?
- Host on S3 \ R2 ?
- Host in our K8S with minio \ on specific NFS path?
I personally like using AWS S3 if available or minio if running locally. It really depends on your infrastructure. I would suggest testing what setup works best for you.
Also, is it true unless streaming is explicitly enabled, ClearML Agent downloads the entire dataset befo...
I think in venv mode you cannot limit memory usage but with containers you can limit memory usage
My bad, if you set auto_connect_streams to false, you basically disable the console logging... Please see the documentation:
auto_connect_streams (Union[bool, Mapping[str, bool]]) – Control the automatic logging of stdout and stderr.
Hi @<1714813627506102272:profile|CheekyDolphin49> , how are you setting the parameter in the HPO?
Hi SpotlessPenguin79 , can you please elaborate on this?
for non-aws cloud providers?
What exactly are you trying to do?
BoredPigeon26 , when you copy and paste the link provided to you by the UI into the browser, can you see the image?
Please see the relevant docks. Also for the update failures, can you open Developer tools (F12) and see what you get in the network for the failed API Calls?
Hi @<1533619716533260288:profile|SmallPigeon24> , can you please elaborate on your usecase ?
@<1543766544847212544:profile|SorePelican79> , what version of the SDK are you using? Also, what version of the backend are you running?
Hi GrotesqueDog77 ,
Can you please add a small code snippet of this behavior? Is there a reason you're not reporting this from within the task itself or to the controller?
Hi OddShrimp85 ,
Please note that Datasets in this context are part of the HyperDatasets feature. This is an advanced feature for managing unstructured data. So in context of HyperDatasets, a Dataset is a collection of DatasetVersions that are structured by some logic within.
Hi @<1577468611524562944:profile|MagnificentBear85> , are you running the services agent in docker mode or venv?
TenseOstrich47 , you can specify a docker image withtask.set_base_docker(docker_image="<DOCKER_IMAGE>")
You will of course need to login to ECR on that machine so it will be able to download the docker image.
Also a small clarification:
ClearML doesn't build the docker image itself. You need to have a docker image already built to be used by ClearML
They need to switch to your workspace, create credentials on your workspace and then use them instead of their own. Makes sense?
Hi @<1547028031053238272:profile|MassiveGoldfish6> , the expected behavior would be pulling only the 100 files 🙂
Hi @<1561885921379356672:profile|GorgeousPuppy74> , ClearML does support running with multiple GPUs
Slow network or network died all of a sudden - you wouldn't want your experiments to fail just because there is a network issue. Once network is restored - everything is sent as it should be to the backend. This is the resilience I was talking about
Regarding mistake in URL - I guess it is your responsibility to keep track of not using wrong URLs. As I said before - How would you suggest the SDK know what is the 'right' URL and what is the 'wrong'...
GiganticTurtle0 , I think this would be up your alley, there is a 'parent' parameter even 🙂
https://clear.ml/docs/latest/docs/references/sdk/task#taskget_tasks
So even if you abort it on the start of the experiment it will keep running and reporting logs?
Try setting it outside of any section. Basically set an environment section by itself