Reputation
Badges 1
282 × Eureka!Share data across R&D teams with searchable data catalogs available on any environment.
Hi, i'm gonna hijack this thread a bit. My community uses ClearML and is looking at various model deployment strategies. We are looking at a seamless integration with Triton but noted they Triton does not support deployment strategies. ClearML-Serving seems to but the strategies are rather limited. Is there a roadmap to expand Clearml-serving?
No, i can't see the files. But i can see if i don't use ':port' in the URL when uploading. I can't access the machine today, i'll try to check the S3 logs when i'm back.
The first stage is a rank0 pytorch script. The downstream stages are rankN scripts, they are waiting for the IP address of the first stage. But the first stage doesn’t return, it simply waits for the rankN scripts to connect to it. But in this case, the rankN scripts doesn’t start. So its probably necessary to have just a single stage.
If i were to start a single rank0, and subsequent rankN tasks, it would be rather messy on ClearML Dashboard. Best to have either a single clearml application...
like create multiple datasets?
create parent (all) - upload to S3
create child1 (first 100k)
create child2 (second 100k)...blah blah
Then only pull indices from children. Technically workable but not sure if its best approach since different ppl have different batch sizes in mind.
Hi CostlyOstrich36 , What you described is task. I was referring to the pipeline controller.
Does the enterprise version support natively?
Thanks 👍 . Should i create an issue on Github?
Hi SuccessfulKoala55 , can i confirm the following comments in the docker-compose.yml ?
And after that to run docker-compose commands without loss of data?
docker-compose down docker-compose up
docker-compose.yml
`
version: "3.6"
services:
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/config:/opt/clearml/config
#...
Hi AgitatedDove14 , do you mean the configuration tab in the UI? No, i don't see it.
Hi. If we disable the API service, how will it affect the system? How do we disable?
Hi SuccessfulKoala55 , would they need the fileserver to route to minio then? E.g.
This will ensure that any actions by clearml-data and models are saved into the S3 object store.
api {
files_server: s3://ecs.ai:80/clearml-data/default
}
aws {
s3 {
credentials {
host: http://ecs.ai:80
## Insert the iam credentials provided by your SAs here.
}
}
}
But if user forgot to do above, they will be saved on ClearML server. If I switch off f...
Hi erez, i think i would want to reference the code that transformed the data. Take for example, i received 10k images, i performed some transformation and save it as a next version before i split it up for my ML training. Some time later, i receive a new set of 10k images and wants to apply the same transformation and then append it to the previous 10k as another version. Clearml-data does well for the data-versioning part, but in terms of data provenance, its not clear how i can associate t...
Sorry, in case i misunderstood you. Are you refering to the extra_docker_shell_script .
From ClearML perspective, how would we enable this, considering we don't have direct control or even IP of the agents
yah i got that too. This happens when i run the client code on the same machine as the clearml-agent. So i'm wondering if sharing the same clearml.conf cause that problem. Is there a way to specify the clearml.conf instead of defaulting to ~/clearml.conf?
Do you mean by this that you want to be able to seamlessly deploy models that were tracked using ClearML experiment manager with ClearML serving?
Ideally that's best. Imagine that i used Spacy (Among other frameworks) and i just need to add the one or two lines of clearml codes in my python scripts and i get to track the experiments. Then when it comes to deployment, i don't have to worry about Spacy having a model format that Triton doesn't recognise.
Do you want clearml serving ...
Hi, just wondering if this 'feature: Passing env via the code' is in the works?
https://clearml.slack.com/archives/CTK20V944/p1616677400127900?thread_ts=1616585832.098200&cid=CTK20V944
Hi. The upgrade seems to go well but i'm seeing one wierd output. When i ran a task and observe the software installed under the execution tab , i still see clearml=0.17 . Is this expected?
Ok, let me check this out first thing on Monday, thanks AgitatedDove14 .
Thanks. We set this configuration and the client ran and submitted the job for remote execution (agent running k8s glue). However when the job runs, and tries to save into model repo, this error came up.
ClearML.storage - ERROR - Failed creating storage object S3://ecs.ai Reason; Missing key and secret for S3 storage access ( S3://ECS.ai ).
I remember being told that the ClearML.conf on the client will not be used in a remote execution like the above so I think this was the problem. I also...
I can't seem to find the version number on the clearml web app. Is there a specific way?
Thanks. The challenge we encountered is that we only expose our Devs to the ClearML queues, so users have no idea what's beyond the queue except that it will offer them the resources associated with the queue. In the backend, each queue is associated with more than one host.
So what we tried is as followed.
We create a train.py script much like what Tobias shared above. In this script, we use the socket library to pull the ipaddr.
import socket
hostname=socket.gethostname()
ipaddr=dock...
Unfortunately due to security, clients can't have direct access to the nodes. Is there any possible workarounds at the moment?
AgitatedDove14 , will these be fixed?
Passing env via the code Passing env via template yaml