Reputation
Badges 1
282 × Eureka!Thanks that did solve the problem, the tasks are running again.
yes, previously run experiments. I will just kill clearml-elastic container if that may solve the problem.
I'm not familiar with elastic. What role does elastic play in ClearML?
Ok that worked. So every time i have changes in codes, i will have to rerun the experiment on my own machine that doesn't have any GPUs?
Kinda defeat the purpose of using ClearML Agent.
running git diff
on my terminal in this repo gave nothing. nothing at all.
Yes! I definitely think this is important, and hopefully we will see something thereÂ
 (or at least in the docs)
Hi AgitatedDove14 , any updates in the docs to demonstrate this yet?
Nice, what are the names of the talks?
Hi, this is the setup.
clientfrom clearml import Task, Logger task = Task.init(project_name='DETECTRON2',task_name='Train',task_type='training') task.set_base_docker("quay.io/fb/detectron2:v3 --env GIT_SSL_NO_VERIFY=true --env TRAINS_AGENT_GIT_USER=testuser --env TRAINS_AGENT_GIT_PASS=testuser" ) task.execute_remotely(queue_name="single_gpu", exit_process=True)
k8s_glue_example.py spawned a pod and starts running.
ClearML UI -> Experiment -> Results -> Console.
` At the top it will pri...
Hi thanks.
So i suppose ClearML make use of the information in .git folder at the root of the script folder to gather those info.
I have yet to go through thoroughly with ClearML agent. TimelyPenguin76 , so if i run a training with uncommited changes and didn't commit/push after. When i clone the task, isn't ClearML agent unable to pull that script from the git repo?
I see i understand better now. Thanks.
Yeah.. issue is ClearML unable to talk to the nodes cos pytorch distributed needs to know their IP. There is some sort of integration missing that would enable this.
This would be solved if --env GIT_SSL_NO_VERIFY=true is passed to the k8s pod that's spawned to run the job. Currently its not.
The first is probably done using pipeline controllers, the second using Datasets or HyperDatasets. Its not very clear how the last one is achieved, especially on the searchable data catalogs.
Yeah that'll cover the first two points, but I don't see how it'll end up as a dataset catalogue as advertised.
I see. Is there a more elaborate codeset that describes the above interactions?
Likely network. Can you run a curl on ClearML server api server from jenkin stage and see if that gets through?
I'm using this feature, in this case i would create 2 agents, one with cpu only queue and the other with gpu queue. And then at the code level decide with queue to send to.
ah... thanks!
For example, it would useful to integrate https://github.com/whylabs/whylogs#features into ClearML as part of data and model monitoring. WhyLogs would have their own static page that would preferably be displayed as a new custom tab (besides logs, scalars and plots.).
Hi SuccessfulKoala55 , just wondering how i can follow up on this.
Thanks AgitatedDove14 , will take a look.
Is there enterprise support for k8s glue on OpenShift?
alright thanks. Its impt we clarify it works before we migrate the ifra.
Hi, Self-hosted using docker-compose.
I think the default action of clearml-agent k8s glue when running a task is to create a virtual env and installing the dependancies. So i'm just checking how to change that behaviour to look at global instead.
Ok sure. Thanks.