Reputation
Badges 1
14 × Eureka!We have CloudWatch also configured, so I could probably do some searches there if I knew what to look for
Ok, was able to get a crash and log some output from the apiserver:
` [2022-08-11 09:21:13,727] [11] [INFO] [clearml.service_repo] Returned 200 for tasks.stopped in 17ms
[2022-08-11 09:21:13,829] [11] [INFO] [clearml.service_repo] Returned 200 for queues.get_next_task in 11ms
[2022-08-11 09:21:13,871] [11] [INFO] [clearml.service_repo] Returned 200 for queues.get_next_task in 8ms
[2022-08-11 09:21:13,986] [11] [WARNING] [clearml.service_repo] Returned 400 for queues.get_by_id in 4ms, msg=Inv...
The queue 'feature_pipelines" should exist and the latter queue is something that the agents sometimes want to create for some reason (though it should not be required?)
Latter warning is ok I guess.
Basically I've defined some extended sklearn models, which I import in my ClearML task file and set them up with some initial parameters.
Some pseudocode:
` mdl = SomeExtendedSklearnModel(**params)
Load data
X = load_data(...)
Run
task = Task.init(...)
output_models = OutputModel(task=task, ..., framework="ScikitLearn")
preds = mdl.fit_predict(X)
joblib.dump(mdl, "mdl.pkl") `
Without the joblib dump I do not get my models registered as models, even though the experiment runs fine and logs everything else : )
Edit: Note that I also want ClearML to store these into my predefined artifact store, which it does with the aforementioned "hacky" solution.
In the deployment if I add agents to queues that are not present (using k8sglue), they fail to launch until the queues become available. Would like to avoid manually setting up queues that I know I'll be using.
Now working around it by creating a kube job to POST to the appropriate apiserver address and creating what I need
It would be a nice feature 😛
Will do, if the solution is migrated away from terraform and back into the clearml helm charts!
Yes, thank you from me too! Wrapping up a project where we ended up deploying a self hosted version on EKS and leveraging its autoscaling abilities. Ticked a lot boxes for our team from model deployment to running pipelines, tracking experiments, storing artifacts and even allowing the deployment of some R code/models by making the use of custom docker images a breeze 😅
Given once your pipes become sufficiently complex and start to veer more outside the ML domain, you might op...
Thanks for the answer - just thinking about how to build an easy solution for new feature requests.
Combining pre-existing pipes in the described way would make things "simpler" the way I see it, but could also lead to other problems. Will have to think about it a little more.
API server does not restart during the process. I'll try to see if I catch up something in its logs or where should I monitor the networking in? I.e., what is the flow 😅
On AWS EKS with:
Image: allegroai/clearml-agent-k8s-base
clearml-agent version: 1.2.4rc3
python: 3.6.9
It is not out yet indeed, why I am wondering. Also a good question about the mongodb stuff and quite relevant, because we just went ham switching from AWS EBS => EFS and also deleting (or supposedly deleting?) all of our old pvcs + pvs. Somehow after redeployment, ClearML still had access to the old data, which should've been destroyed.
Note: we are still in dev, so this is ok 😅