Reputation
Badges 1
282 × Eureka!Transform feature engineering and data processing code into recurring data ingestion workflows. Start building data stores, develop, automate, and schedule complex data processing jobs.
Yeah that'll cover the first two points, but I don't see how it'll end up as a dataset catalogue as advertised.
They don't have the same version. I do seem to notice that if the client is using version 3.8, during remote execution will try to use that same version despite the docker image not installed with that version.
Hi, so you meant i need to installl virtualenv in my base image?
Hi, it looks like the entire http://clear.ml domain is offline for more than 12 hours. Main pages and documentation are inaccessible as well.
Hi this is the log. I didn't see any attempt from the agent to install virtualenv on the base image.
` 1618369068169 clearml-gpu-id-b926b4b809f544c49e99625380a1534b:gpuGPU-4ad68290-0daf-4634-6768-16fad73d47a3 DEBUG Current configuration (clearml_agent v0.17.2, location: /tmp/.clearml_agent.wgsmv2t9.cfg):
agent.worker_id = clearml-gpu-id-b926b4b809f544c49e99625380a1534b:gpuGPU-4ad68290-0daf-4634-6768-16fad73d47a3
agent.worker_name = clearml-gpu-id-b926b4b809f544c49e99625...
Congrats on v1.0. 🎉
I see. Is there a more elaborate codeset that describes the above interactions?
The first is probably done using pipeline controllers, the second using Datasets or HyperDatasets. Its not very clear how the last one is achieved, especially on the searchable data catalogs.
Hi, Self-hosted using docker-compose.
Ok. Problem was resolved with latest version of clearml-agent and clearml.
In the ClearML config that's being run by the ClearML container?
Ok sure. Thanks.
ah... thanks!
Thanks. The challenge we encountered is that we only expose our Devs to the ClearML queues, so users have no idea what's beyond the queue except that it will offer them the resources associated with the queue. In the backend, each queue is associated with more than one host.
So what we tried is as followed.
We create a train.py script much like what Tobias shared above. In this script, we use the socket library to pull the ipaddr.
import socket
hostname=socket.gethostname()
ipaddr=dock...
From ClearML perspective, how would we enable this, considering we don't have direct control or even IP of the agents
ok, i'll wait till i get my hands on vault then. thanks.
thanks SuccessfulKoala55 . I verified your last comment and it works.
Yeah.. issue is ClearML unable to talk to the nodes cos pytorch distributed needs to know their IP. There is some sort of integration missing that would enable this.
Sorry, dev end I was referring to my developers.
I didn't think Horovod needs to be as complicated as you described. It can also work by running on multiple known nodes. How would i add a glue for multinode?
Horovod does also work with other similar products such as yours (E.g. Polyaxon).
I think a related question is, ClearML replies heavily on Triton (Good thing) but Triton only support a few frameworks out of the box. So this 'engine' need to make sure its can work with Triton and use all its wonderful features such as request batching, GPU reuse...etc.
Executing task id [228caa5d25d94ac5aa10fa7e1d02f03c]:
repository = https://192.168.50.88:18443/tkahsion/pytorchmnist
branch = master
version_num = cfb833bcc70f3e10d3b6a96cfad3225ed682382b
tag =
docker_cmd = nvidia/cuda:10.1-runtime-ubuntu18.04
entry_point = pytorch_mnist.py
working_dir = .
Warning: could not locate requested Python version 3.9, reverting to version 3.6
Using base prefix '/usr'
New python executable in /root/.clearml/venvs-builds/3.6/bin/python3.6
Also creating executable i...
yeah, someone should call them out.
I used nvcr pytorch image and instruct clearml to inherit global dependencies. No need to install torch and work well.
Yes! I definitely think this is important, and hopefully we will see something there
(or at least in the docs)
Hi AgitatedDove14 , any updates in the docs to demonstrate this yet?
Next step to figure out if i can do all that in the python code instead of UI.
After some churning, this is the answer. Change it in the clearml-agent init
generated clearml.conf.
` default_docker: {
# default docker image to use when running in docker mode
image: "nvidia/cuda:10.1-runtime-ubuntu18.04"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]
arguments: ["--env GIT_SSL_NO_VERIFY=true",]
} `
Hi,
It did, nvidia/cuda:10.1-runtime-ubuntu18.04.
So if i need to set this every time, what is the following config for? And how do i pass in new env parameters?
` default_docker: {
# default docker image to use when running in docker mode
image: "dockerrepo/mydocker:custom"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]
arguments: ["--env GIT_SSL_NO_VERIFY=true",]
} `
For example, it would useful to integrate https://github.com/whylabs/whylogs#features into ClearML as part of data and model monitoring. WhyLogs would have their own static page that would preferably be displayed as a new custom tab (besides logs, scalars and plots.).