Reputation
Badges 1
282 × Eureka!Hi CostlyOstrich36 , That's correct.
Thanks. Which brings me to the question. How does ClearML deal with all the CVEs? What is your process for response?
Hi, by deployment strategies I meant by canary, blue-green...etc..etc. I figured this should be done by clearml-serving and maybe seldon as well.
Thanks, its attached.
I also noted that the status on the ClearML is always in 'pending', unlike others which says 'Running'. Is this a side effect of using k8s glue?
Its. 0.17-63.
It doesn't appear in profile page.
[root@2c7498711bef elasticsearch]# curl `
{
"index" : "events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2021-05-22T11:33:38.932Z",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisi...
From ClearML perspective, how would we enable this, considering we don't have direct control or even IP of the agents
They don't have the same version. I do seem to notice that if the client is using version 3.8, during remote execution will try to use that same version despite the docker image not installed with that version.
Ok sure. Thanks.
Yeah that'll cover the first two points, but I don't see how it'll end up as a dataset catalogue as advertised.
Ok thanks. that explains alot. We have been doing this wrongly the whole time, thinking that the clearml.conf on the client side would be acknowledged by the remote agent execution. In reality, only the API section is utilised.
Does the glue write any error logs anywhere? I only see CLEARML_AGENT_UPDATE_VERSION =
and nothing else.
Ok. That brings me back to the spawned pod. At this point, clearml-agent and its config would be a controbuting factor. Is the absence of /tmp/.clearml_agent.xxxxxx.cfg an issue?
Ok i get the logic now. extra_docker_shell_script executes before clearml-agent talks to clearml server.
Do you have more info on vault?
Actually it only make sense if the entire department or organisation are saving their models in a common repo. In our case this is not possible due to client security (e.g. training data from clients can potentially be 'reverse engineered' from trained models in future). So each department and even projects will need their own repo.
So the context I'm asking is I realise I'll need to catalogue all the dataset ids created by ppl separately on a spreadsheet. And for each experiment, I'll need to go into the code commit to see which id is being used. But on the other hand, I thought I've seen advertised use cases where the experiment can be directly linked to the dataset id being used. The brain's a bit rusty to recall how it was done.
can you please verify that you have all the required packages installed locally ?
Its not installed on the image that runs the experiment. But its reflected in the requirements.txt.
what is the setting ofÂ
agent.package_manager.system_site_packages
True.
Hi AgitatedDove14 . I'm trying out passing env via the code instead.task.set_base_docker("nvcr.io/nvidia/tensorflow:19.11-tf2-py3 --env TRAINS_AGENT_GIT_USER=git_username_here --env TRAINS_AGENT_GIT_PASS=git_password_here")So the strange thing is when my k8sglue pulls a task, this happens.
` Pulling task xxxxxxxxxx launching on kubernetes cluster
Pushing task xxxxxxxxxx into temporary pending queue
Kubernetes scheduling task id=xxxxxxxxxxxx
skipping docker argument TRAINS_AGENT_GIT_USE...
Hi FriendlySquid61 , AgitatedDove14 , the issue and possible fix is in this issue raise. https://github.com/allegroai/clearml-agent/issues/51
Hi, any idea if i can acheive this? I just need a list of usernames.
I thought of another potential way but not sure if the SDK supports it.
We will perform manual save and upload of model using vanilla boto3 and credentials passed in as env var. Use ClearML SDK to update the Model Repo on the location of the model, without ClearML uploading it explicitly.Would the above work?
This is a env var?
CLEARML_CONFIG_FILE
clearml=1.0.3
python=3.8.10clearml-data upload --id 12314jhg42342j4j --storagehttp://ecs.ai is an on-prem DELL EMC ECS that serves as our S3 storage configured with s self signed cert.
Hi AgitatedDove14 , i dug a bitt deeper. I saw this in installed packages in the original completed task. When the task is cloned, this is copied over and thus the problem. Can i ask, how ClearML create the list of installed packages? Why is it that some of them (E.g. attr is being pulled from @ file:///tmp/build/80754af9/attrs_1604765588209/work)
` absl-py==0.11.0
alabaster==0.7.12
antlr4-python3-runtime==4.8
apex==0.1
appdirs==1.4.4
argon2-cffi==20.1.0
ascii-graph==1.5.1
async-gener...
I can't seem to find the version number on the clearml web app. Is there a specific way?
I would like to run ClearML agent on kubernetes. So basically I need to run the image on a pod, but there isn't any information on how the agent would communicate with the code, nor how it would spawn more pods to run the task.
Then you pass the tolerations definition through a different pod template?
Yup.
Hi TimelyPenguin76 ,
If you notice in the last screenshot, it state the bucket name to be http://ecs.ai . It then it tries to open http://s3.amazonaws.com/ecs.ai/clearml-models/artifact/uploading_file?X-Amz-Algorithm= ....