Reputation
Badges 1
282 × Eureka!Hi, this is what i got. No mention of the env variables.
` Current configuration (clearml_agent v0.17.2, location: /home/jax/clearml.conf):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
ap...
the default for base_pod_num is 1.
I think a related question is, ClearML replies heavily on Triton (Good thing) but Triton only support a few frameworks out of the box. So this 'engine' need to make sure its can work with Triton and use all its wonderful features such as request batching, GPU reuse...etc.
unfortunately, our security posture is so strict that we cannot have an agent git user that have unfettered read access to all repos.
Can i dig into the mongodb or ES to pull these data?
Hi SuccessfulKoala55 , thanks, tested the patch and its working as expected now.
Hi, it make sense to automate this part just like how you automate the rest of the MLOps flow, especially when you already support Data Versioning/Lineage, Data Provenance (How it works with the experiment and as a model source) should be in too. Although i agree technically it's probably not possible to tell if the users actually used the indicated datasets after they do a datasets.get_copy() .
Hi HelpfulDeer76 , I'm facing similar issues. Would you mind describing in detail how you deploy clearml-agent? Is it running as a pod on k8s?
Hi AgitatedDove14 , what version i should change it to? I'm currently on v0.17.2rc3.
I would like to run ClearML agent on kubernetes. So basically I need to run the image on a pod, but there isn't any information on how the agent would communicate with the code, nor how it would spawn more pods to run the task.
We are deploying ClearML Server via the docker-compose.
For ClearML-Agent. We have the choice of Docker or K8S preferred (Using the Glue).
For K8S, we can't get the glue to work ( https://clearml.slack.com/archives/CTK20V944/p1614525898114200?thread_ts=1613923591.002100&cid=CTK20V944 ) so we can't make an assessment of whether it actually works for us.
yes, previously run experiments. I will just kill clearml-elastic container if that may solve the problem.
Share data across R&D teams with searchable data catalogs available on any environment.
Hi, i'm gonna hijack this thread a bit. My community uses ClearML and is looking at various model deployment strategies. We are looking at a seamless integration with Triton but noted they Triton does not support deployment strategies. ClearML-Serving seems to but the strategies are rather limited. Is there a roadmap to expand Clearml-serving?
No, i can't see the files. But i can see if i don't use ':port' in the URL when uploading. I can't access the machine today, i'll try to check the S3 logs when i'm back.
The first stage is a rank0 pytorch script. The downstream stages are rankN scripts, they are waiting for the IP address of the first stage. But the first stage doesn’t return, it simply waits for the rankN scripts to connect to it. But in this case, the rankN scripts doesn’t start. So its probably necessary to have just a single stage.
If i were to start a single rank0, and subsequent rankN tasks, it would be rather messy on ClearML Dashboard. Best to have either a single clearml application...
like create multiple datasets?
create parent (all) - upload to S3
create child1 (first 100k)
create child2 (second 100k)...blah blah
Then only pull indices from children. Technically workable but not sure if its best approach since different ppl have different batch sizes in mind.
Hi CostlyOstrich36 , What you described is task. I was referring to the pipeline controller.
Does the enterprise version support natively?
Thanks 👍 . Should i create an issue on Github?
Hi SuccessfulKoala55 , can i confirm the following comments in the docker-compose.yml ?
And after that to run docker-compose commands without loss of data?
docker-compose down docker-compose up
docker-compose.yml
`
version: "3.6"
services:
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/config:/opt/clearml/config
#...
Hi AgitatedDove14 , do you mean the configuration tab in the UI? No, i don't see it.
Hi. If we disable the API service, how will it affect the system? How do we disable?
Hi SuccessfulKoala55 , would they need the fileserver to route to minio then? E.g.
This will ensure that any actions by clearml-data and models are saved into the S3 object store.
api {
files_server: s3://ecs.ai:80/clearml-data/default
}
aws {
s3 {
credentials {
host: http://ecs.ai:80
## Insert the iam credentials provided by your SAs here.
}
}
}
But if user forgot to do above, they will be saved on ClearML server. If I switch off f...
Hi erez, i think i would want to reference the code that transformed the data. Take for example, i received 10k images, i performed some transformation and save it as a next version before i split it up for my ML training. Some time later, i receive a new set of 10k images and wants to apply the same transformation and then append it to the previous 10k as another version. Clearml-data does well for the data-versioning part, but in terms of data provenance, its not clear how i can associate t...