@<1523701137134325760:profile|CharmingStarfish14> ,interesting, so what are you suggesting? Creating Jira tasks from special tags on ClearML?
Hi @<1523701295830011904:profile|CluelessFlamingo93> , I think you would need to expose those configurations through the pipeline controller and then the tasks would take those configurations and override them with what you inserted into the controller.
Makes sense?
I think that something like that exists, it appears to be done in the paid version called hyper-datasets. The documentation is open for all apparently 🙂
Do you mean you don't have a files server running? You can technically circumvent this by overring the api.files_server
in clearml.conf
and set it to your default storage
Hi RoundMosquito25 , where is this error coming from? API server?
Hi @<1664079296102141952:profile|DangerousStarfish38> , you can control it in the agent.default_docker.image
section of the clearml.conf
where the agent is running. You can also control it via the CLI when you use the --docker
tag and finally, you can also control it via the webUI in the execution tab -> container -> image section
You need to point the SDK to the different clearml.conf
file
I see. Sounds like a good idea! Please open a GitHub feature request 🙂
Hi @<1523707653782507520:profile|MelancholyElk85> , in Task.init()
you have the auto_connect_frameworks
Parameter.
you're always running a single task at a time. The whole point is that everything is reported to the task (auto-magic bindings, console logs etc.), so there cannot be any ambiguity. You can close the current task ( task.close()
) and init a new one if you'd like, but you can't init several at the same time.
SparklingElephant70 , a full log would be the best. It can be downloaded from the webapp 🙂
Hi @<1554638166823014400:profile|ExuberantBat24> , you mean dynamic GPU allocation on the same machine?
Hi @<1523701868901961728:profile|ReassuredTiger98> , you can select multiple experiments and compare between the experiments, this way you can see all the scalars at once.
You can also utilize the reports feature to create really cool looking dashboard
None
SubstantialElk6 , either that or the one mounted outside 🙂
Hi @<1774245260931633152:profile|GloriousGoldfish63> , its a progress circle 🙂
Hi @<1533620191232004096:profile|NuttyLobster9> , thank you for the update. Can you please point out what were the changes that were done?
Hi @<1710827340621156352:profile|HungryFrog27> , I'd suggest running the agent with --debug
flag for more information. Can you provide a full log of both the HPO task and one of the children?
Hi @<1717350332247314432:profile|WittySeal70> , that sounds like a neat idea! Maybe open a GitHub feature request for this?
Hi @<1603198163143888896:profile|LonelyKangaroo55> , you certainly can. I think you need to enable editing these configurations but it certainly is possible with some tinkering 🙂
Hi @<1523701504827985920:profile|SubstantialElk6> , I think you have models.get_by_id
for example to fetch a model object. Inside that object you can find the uri where the model is saved
You can do that via the SDK, either by the project IDs or project name
SmallDeer34 Hi 🙂
I don't think there is a way out of the box to see GPU hours per project, but it can be a pretty cool feature! Maybe open a github feature request for this.
Regarding on how to calculate this, I think an easier solution for you would be to sum up the runtime of all experiments in a certain project rather than looking by GPU utilization graphs
I might not be able to get to that but if you create an issue I'd be happy to link or post what I came up with, wdyt?
Taking a look at your snippet, I wouldn't mind submitting a PR for such a cool feature 🙂
ClumsyElephant70 , I'm not sure. There usually a roadmap provided on our community talks so it'd be great if you joined next time to see what's next 🙂
Please see the error:
2024-08-18 12:55:25,030 - clearml.automation.job - WARNING - Error enqueuing Task <clearml.task.Task object at 0x723c45320610> to 1xGPU: Could not find queue named "1xGPU"
You don't have a queue called 1xGPU.
I think there should be a way to run it locally as well - https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L123
Hi @<1529271098653282304:profile|WorriedRabbit94> , do you maybe have autoscalers that ran for very long? Easiest is simply deleting all projects and applications and waiting a few hours
Hi @<1523701601770934272:profile|GiganticMole91> , As long as experiments are deleted then their associated scalars are deleted as well.
I'd check the ES container for logs. Additionally, you can always beef up the machine with more RAM to give elastic more to work with.
If you want to access them as artifacts via code (OR via UI) you'll have to register via code and call them back that way.
Use the following:
https://clear.ml/docs/latest/docs/references/sdk/task#register_artifact
https://clear.ml/docs/latest/docs/references/sdk/task#get_registered_artifacts
Also please note the difference between reporting those tables as data via the logger and as artifacts since the logger saves things as events (plots, scalars, debug samples).