
Reputation
Badges 1
282 × Eureka!Hi, by deployment strategies I meant by canary, blue-green...etc..etc. I figured this should be done by clearml-serving and maybe seldon as well.
It would make sense on a very large resource cluster. Unfortunately we only have less than 50 GPUs to share across. A multi-tenant SAAS would cut the resources into even more smaller clusters and not help with efficiency. Or would you have a suggestion?
Transform feature engineering and data processing code into recurring data ingestion workflows. Start building data stores, develop, automate, and schedule complex data processing jobs.
i passed it through the yaml as follows.apiVersion: v1 kind: Pod spec: containers: - image: clearml-agent:latest" env: - name: PIP_INDEX_URL value: "
" - name: PIP_TRUSTED_HOST value: "192.168.56.253" - name: PIP_FIND_LINKS value: "
` "
- name: GIT_SSL_NO_VERIFY
value: true
resources:
requests:
cpu: "2"
...
I see. Is there a more elaborate codeset that describes the above interactions?
I would say yes, otherwise the vscode feature is only available on internet connected premises due to the hard coded URL to download vscode.
After some churning, this is the answer. Change it in the clearml-agent init
generated clearml.conf.
` default_docker: {
# default docker image to use when running in docker mode
image: "nvidia/cuda:10.1-runtime-ubuntu18.04"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]
arguments: ["--env GIT_SSL_NO_VERIFY=true",]
} `
In the Kube logs of the pod, i see 'Err:1 http://security.ubuntu.com/ubuntu bionic-security InRelease Temporary failure resolving http://security.ubuntu.com '. My guess is its trying to do a apt update.
As we are on disconnected network, we have a server hosting the repo but on a differennt name.
The agent is running on a disconnected server on docker mode. I have a client that runs clearml-session and i saw from the agent's logs that the installation of vscode fails.
No, i can't see the files. But i can see if i don't use ':port' in the URL when uploading. I can't access the machine today, i'll try to check the S3 logs when i'm back.
Does the enterprise version support natively?
Hi, is this currently not working? http://app.community.clear.ml ? I noticed that cleaml UI will cache on the browser and if the backend is not running, its not clear to user that something is wrong (except for broken pages).
Executing task id [228caa5d25d94ac5aa10fa7e1d02f03c]:
repository = https://192.168.50.88:18443/tkahsion/pytorchmnist
branch = master
version_num = cfb833bcc70f3e10d3b6a96cfad3225ed682382b
tag =
docker_cmd = nvidia/cuda:10.1-runtime-ubuntu18.04
entry_point = pytorch_mnist.py
working_dir = .
Warning: could not locate requested Python version 3.9, reverting to version 3.6
Using base prefix '/usr'
New python executable in /root/.clearml/venvs-builds/3.6/bin/python3.6
Also creating executable i...
Hi Jake, thanks for the suggestion, let me try it out.
Hi, for both of them, args.lastiter
is the exact same value. But when plotted out, they are 2 actually iterations apart.
Thanks. We set this configuration and the client ran and submitted the job for remote execution (agent running k8s glue). However when the job runs, and tries to save into model repo, this error came up.
ClearML.storage - ERROR - Failed creating storage object S3://ecs.ai Reason; Missing key and secret for S3 storage access ( S3://ECS.ai ).
I remember being told that the ClearML.conf on the client will not be used in a remote execution like the above so I think this was the problem. I also...
AgitatedDove14 , i'm Jax, not Manoj! lol. 😅 😅
Hi. nice read. Your permalink is wrong though, here's the right one.
https://cpatrickalves.com/mlops-what-it-is-and-why-does-it-matter
Ok, i guess i will have to kill the whole thing and refresh it.
Do you mean by this that you want to be able to seamlessly deploy models that were tracked using ClearML experiment manager with ClearML serving?
Ideally that's best. Imagine that i used Spacy (Among other frameworks) and i just need to add the one or two lines of clearml codes in my python scripts and i get to track the experiments. Then when it comes to deployment, i don't have to worry about Spacy having a model format that Triton doesn't recognise.
Do you want clearml serving ...
Hi CostlyOstrich36 , That's correct.
Its hard to tell, but the agent change was a significant one. Unless python versions has something to do with it.
[root@2c7498711bef elasticsearch]# curl
`
{
"cluster_name" : "clearml",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4,
"active_shards" : 4,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 8,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" ...
Hi TimelyPenguin76 , i am adding a debug sample to an existing task using the above method. What should i put for the iteration? I do not want to overwrite existing ones but i do not know what's the last count. This is for both scalar and media reporting.
I'm not familiar with elastic. What role does elastic play in ClearML?
what feature on this paid roadmap are you referring to? I am indeed communicating with Noem on paid features.
Thanks SuccessfulKoala55 . Just pm'ed him.