Reputation
Badges 1
14 × Eureka!Thanks for the answer - just thinking about how to build an easy solution for new feature requests.
Combining pre-existing pipes in the described way would make things "simpler" the way I see it, but could also lead to other problems. Will have to think about it a little more.
Now working around it by creating a kube job to POST to the appropriate apiserver address and creating what I need
On AWS EKS with:
Image: allegroai/clearml-agent-k8s-base
clearml-agent version: 1.2.4rc3
python: 3.6.9
We have CloudWatch also configured, so I could probably do some searches there if I knew what to look for
Without the joblib dump I do not get my models registered as models, even though the experiment runs fine and logs everything else : )
Edit: Note that I also want ClearML to store these into my predefined artifact store, which it does with the aforementioned "hacky" solution.
Will do, if the solution is migrated away from terraform and back into the clearml helm charts!
It would be a nice feature 😛
In the deployment if I add agents to queues that are not present (using k8sglue), they fail to launch until the queues become available. Would like to avoid manually setting up queues that I know I'll be using.
It is not out yet indeed, why I am wondering. Also a good question about the mongodb stuff and quite relevant, because we just went ham switching from AWS EBS => EFS and also deleting (or supposedly deleting?) all of our old pvcs + pvs. Somehow after redeployment, ClearML still had access to the old data, which should've been destroyed.
Note: we are still in dev, so this is ok 😅
Yes, thank you from me too! Wrapping up a project where we ended up deploying a self hosted version on EKS and leveraging its autoscaling abilities. Ticked a lot boxes for our team from model deployment to running pipelines, tracking experiments, storing artifacts and even allowing the deployment of some R code/models by making the use of custom docker images a breeze 😅
Given once your pipes become sufficiently complex and start to veer more outside the ML domain, you might op...
Ok, was able to get a crash and log some output from the apiserver:
` [2022-08-11 09:21:13,727] [11] [INFO] [clearml.service_repo] Returned 200 for tasks.stopped in 17ms
[2022-08-11 09:21:13,829] [11] [INFO] [clearml.service_repo] Returned 200 for queues.get_next_task in 11ms
[2022-08-11 09:21:13,871] [11] [INFO] [clearml.service_repo] Returned 200 for queues.get_next_task in 8ms
[2022-08-11 09:21:13,986] [11] [WARNING] [clearml.service_repo] Returned 400 for queues.get_by_id in 4ms, msg=Inv...
Basically I've defined some extended sklearn models, which I import in my ClearML task file and set them up with some initial parameters.
Some pseudocode:
` mdl = SomeExtendedSklearnModel(**params)
Load data
X = load_data(...)
Run
task = Task.init(...)
output_models = OutputModel(task=task, ..., framework="ScikitLearn")
preds = mdl.fit_predict(X)
joblib.dump(mdl, "mdl.pkl") `
The queue 'feature_pipelines" should exist and the latter queue is something that the agents sometimes want to create for some reason (though it should not be required?)
Latter warning is ok I guess.
API server does not restart during the process. I'll try to see if I catch up something in its logs or where should I monitor the networking in? I.e., what is the flow 😅