Reputation
Badges 1
282 × Eureka!Hi, when i tried ip:port, it references the right host and bucket....BUT... the file is not found on the ECS S3 even though i can see from the logs that it states Completed model upload to s3://ecs.ai:80/clearml-models/artifacts/ ...
I didn't track the version on this change in behaviour. But last I tried it was able to download the content after I provide the credentials.
This is strange then, is it possible for clearml logs to register successfully saving into a S3 storage when actually it isn't? For example, i've seen in past experiences with certain S3 client that saved onto a local folder called 's3:/' instead of putting it on S3 storage itself.
Hi. Yup the model was not physically uploaded with the up:port into the bucket, although ClearML does indicate that it's there, except that I can't download it. I also verified this with another S3 client, the model was not there as well.
can you please verify that you have all the required packages installed locally ?
Its not installed on the image that runs the experiment. But its reflected in the requirements.txt.
what is the setting ofÂ
agent.package_manager.system_site_packages
True.
Ok. I noted this is due to the venv_update setting. It needs to be disabled as it has a dependancy on the internet url. We can close this.
Do you mean this?Removing containers section: [{'image': 'clearml-agent:latest"', 'env': [{'name': 'PIP_INDEX_URL', 'value': '
'},
I'm also noticing a lot of this while the k8s glue is running.Ex: Expecting value: line 1 column 1 (char 0) K8S Glue pods monitor: Failed parsing kubectl output:
Thanks. This appears to be solely for web UI and API, What if i want to orchestrate on K8S?
Hi, it make sense to automate this part just like how you automate the rest of the MLOps flow, especially when you already support Data Versioning/Lineage, Data Provenance (How it works with the experiment and as a model source) should be in too. Although i agree technically it's probably not possible to tell if the users actually used the indicated datasets after they do a datasets.get_copy()
.
Okay this part I missed, why would you need to add additional "catalog" when you have the UI?
Yeah this is the part i am trying to reconcile. I don't see any UI for datasets, Or is this a feature of hyperdatasets and i just mixed them up.
From an efficiency perspective, we should be pulling data as we feed into training. That said, always a good idea to uncompress large zip files and store them as smaller ones that allow you to batch pull for training.
Hi, i changed it, but it still point to https://files.pythonhosted.org/packages .
Hi, any advice on this? thanks.
[root@2c7498711bef elasticsearch]# curl
`
{
"index" : "events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2021-05-22T11:33:38.932Z",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisi...
Hi, please correct me if i am wrong, to use the glue, i need the following.
A k8s cluster A kubectl that is connected to the k8s cluster A pip install of clearml-agent 0.17.1
So i did all the above, I'm not what it meant by running the entire thing on own machine.
I meant the dataset id.
Hi SuccessfulKoala55 , would they need the fileserver to route to minio then? E.g.
This will ensure that any actions by clearml-data and models are saved into the S3 object store.
api {
files_server: s3://ecs.ai:80/clearml-data/default
}
aws {
s3 {
credentials {
host: http://ecs.ai:80
## Insert the iam credentials provided by your SAs here.
}
}
}
But if user forgot to do above, they will be saved on ClearML server. If I switch off f...
Thanks. Which brings me to the question. How does ClearML deal with all the CVEs? What is your process for response?
Hi, by deployment strategies I meant by canary, blue-green...etc..etc. I figured this should be done by clearml-serving and maybe seldon as well.
Its running as a long running POD on K8S. I'm using log -f
to track its stdout.
Is there a way for k8s glue to pass on self signed cert information to the agent pods?
If we run all the rank 0 and rank n tasks individually, it's defeats the purpose of using ClearML.
Thanks could you share the URL to this full API documentation?
Hi, scenario as follows.
client.py runs task.execute_remotely(queue='myqueue', exit_process=True)
The API section of clearml.conf at client side is read in. client side calls clearml server and insert task into queue. K8S glue retrieves task from queue. Spawn a K8S pod. K8S pod performs git clone Error. ssh keys not found.
Each individual has their own key in the gitlab profile and gitlab is configured to only work via ssh.
We can't place the key in the image as this is as good as ...
I want to rule out the glue being the problem. Is the Glue significant in initialising clearml-agent after the pod is spawned?
I used nvcr pytorch image and instruct clearml to inherit global dependencies. No need to install torch and work well.