You mean you want the new task created by add_step
to take in certain parameters? Provided where/by who?
Hi @<1523701601770934272:profile|GiganticMole91> , As long as experiments are deleted then their associated scalars are deleted as well.
I'd check the ES container for logs. Additionally, you can always beef up the machine with more RAM to give elastic more to work with.
and just making sure - by pipeline we're talking about the ClearML pipelines, correct?
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller
Do you need to pull it later somewhere? Is there a specific use case?
Because I think you can get the params dict back via code as a the same dict
You must perform Task.init()
to have something reported 🙂
AbruptWorm50 , the guys tell me that it's under progress and we will be updated in the following minutes 🙂
The idea is that the server is not something that should be moved on a whim. Once you do setups you can test but once everything is setup it is expected that the server will remain in the same address.
Also, this is when you're using the files server. If you were using minio from the start this would be a non issue.
This will disable storing the uncommitted changes
I meant that maybe you ran it with a newer version of the SDK
Hi @<1523701504827985920:profile|SubstantialElk6> , I don't think there is such an option currently. Maybe open a GitHub feature request?
DistressedKoala73 , can you send me a code snippet to try and reproduce the issue please?
Hi @<1584716355783888896:profile|CornyHedgehog13> , you can only see a list of files inside a dataset/version. I'm afraid you can't really pull individual files since everything is compressed and chunked. You can download individual chunks.
Regarding the second point - there is nothing out of the box but you can get a list of files in all datasets and then compare if some file exists in others.
Does that make sense?
Hi @<1857232027015712768:profile|PompousCrow47> , the resource requirements are basically abstracted via the queue. You connect agents according to the resources/queues you want to expose to users.
@<1707203455203938304:profile|FoolishRobin23> , the agent in the docker compose is a services agent and it's not for running GPU jobs. I'd suggest running the clearml-agent with the GPU manually.
MagnificentWorm7 , I'm taking a look if it's possible 🙂
As a workaround - I think you could split the dataset into different versions and then use Dataset.squash
to merge into a single dataset
https://clear.ml/docs/latest/docs/references/sdk/dataset#datasetsquash
Hi @<1632551554206666752:profile|DelightfulBear99> , maybe you're logging a lot of metrics or very large log files? Is it possible you have large configurations or previews to artifacts?
Is there an example script you can provide that creates a lot of metrics storage for you?
Hmmmm I think you would need to change some configurations in the docker-compose to use https
Hi @<1523704674534821888:profile|SourLion48> , I'd suggest connecting your batch size as a configuration parameter of the experiment, for example using argparser, and then regardless of the committed or uncommitted code, you will be able to control this value through the configuration section.
What do you think?
How did the tasks fail?
Hi ExuberantParrot61 , that's a good question. This is a bit hacky but what if you try to catch the task with Task.current_task()
from inside the step and try to change the output_uri
attribute there?
ClumsyElephant70 , I'm not sure. There usually a roadmap provided on our community talks so it'd be great if you joined next time to see what's next 🙂
Its a bit confusing, please add the full list of action you took + whatever the console printed.
Preferably using the following format for logs
This is a very convenient way to post logs
Hi @<1577468638728818688:profile|DelightfulArcticwolf22> , what email did you use? Can you try again now?
Hi @<1573119955400921088:profile|CloudyPelican46> , you can certainly do this. You can find all the related api calls here - None
I suggest opening developer tools (F12) and seeing what is sent in the UI to fetch the various metrics you're looking for
You mentioned you are self deployed. When you deploy the server, one of containers deployed is the ES container. Did you not deploy the server via docker compose?
SuperficialDolphin93 , looks like a strange issue. Can you maybe open a github issue for better tracking?