Reputation
Badges 1
282 × Eureka!Transform feature engineering and data processing code into recurring data ingestion workflows. Start building data stores, develop, automate, and schedule complex data processing jobs.
Hi, i'm gonna hijack this thread a bit. My community uses ClearML and is looking at various model deployment strategies. We are looking at a seamless integration with Triton but noted they Triton does not support deployment strategies. ClearML-Serving seems to but the strategies are rather limited. Is there a roadmap to expand Clearml-serving?
I see. Is there a more elaborate codeset that describes the above interactions?
Setting the credentials on agent machine means the users cannot use their own credentials since an k8s glue agent serves multiple users.
Referencing your suggestion, we can configure output_uri on task.set_base_docker() but how should we do this for the credentials?
Create immutable and differentiable versions on-prem or in the cloud with our data agnostic solution.
ok thanks.
yes, previously run experiments. I will just kill clearml-elastic container if that may solve the problem.
This is strange then, is it possible for clearml logs to register successfully saving into a S3 storage when actually it isn't? For example, i've seen in past experiences with certain S3 client that saved onto a local folder called 's3:/' instead of putting it on S3 storage itself.
Hi, any advice on this? thanks.
Hi thanks.
So i suppose ClearML make use of the information in .git folder at the root of the script folder to gather those info.
I have yet to go through thoroughly with ClearML agent. TimelyPenguin76 , so if i run a training with uncommited changes and didn't commit/push after. When i clone the task, isn't ClearML agent unable to pull that script from the git repo?
Thought this looked familiar.
https://clearml.slack.com/archives/CTK20V944/p1635323823155700?thread_ts=1635323823.155700&cid=CTK20V944
Hi, i can't seem to find the source. What are the kind of situations where it will try to install torch outside of user requirements?
I thought of another potential way but not sure if the SDK supports it.
We will perform manual save and upload of model using vanilla boto3 and credentials passed in as env var. Use ClearML SDK to update the Model Repo on the location of the model, without ClearML uploading it explicitly.
Would the above work?
Thanks, its attached.
I also noted that the status on the ClearML is always in 'pending', unlike others which says 'Running'. Is this a side effect of using k8s glue?
Is there enterprise support for k8s glue on OpenShift?
Ok that works. thanks.
Yes of cos, its a long one.
running git diff
on my terminal in this repo gave nothing. nothing at all.
Okay this part I missed, why would you need to add additional "catalog" when you have the UI?
Yeah this is the part i am trying to reconcile. I don't see any UI for datasets, Or is this a feature of hyperdatasets and i just mixed them up.
Previously we had similar issues when we switched images used in agent. Might want to check on that.
Hi, Self-hosted using docker-compose.
Hi, so this means if i want to use Kubernetes, i would have to 'manually' install multiple agents on all the worker nodes?
python k8s_glue_example.py --queue gpu --namespace default
Traceback (most recent call last):
File "k8s_glue_example.py", line 86, in <module>
main()
File "k8s_glue_example.py", line 80, in main
namespace=args.namespace,
File "/home/administrator/clearml-agent-k8s/venv/lib/python3.6/site-packages/clearml_agent/helper/base.py", line 239, in _ call _
cls. instances[cls] = super(Singleton, cls). call_(*args, **kwargs)
TypeError: _ init _() got an unexpected keyword argument 'base_pod...
Here's my two cents worth.
I thought its really nice to start off the topic highlighting 'pipelines', its unfortunately one of the most missed component when ppl start off with ML work. Your article mentioned about drfits and how MLOps process covered it. I thought there are 2 more components that was important and deserves some mention.Retraining pipelines. ML engineers tend not to give much thought to how they want to transit a training pipeline in development to a automated retraining pipe...
In the ClearML config that's being run by the ClearML container?
What type of pipeline steps are you running? From task, decorator or function?
We were trying with 'from task' at the moment. But the question apply to all methods.
If they're all running on the same container why not make them the same task and do things in parallel?
The tasks were created by different teams and their tasks content is rather independent and modular. Usage of them is usually optional. For example, task1 performs 'image whitening', task2 performs 'image resize'.
Hi we did a check. Only 7.16.1 and 6.8.21 and above mitigates the attack. What's the current version that ClearML is using?
yes its on purpose, each user would have their own AWS credentials for default_output_uri.
Yes, as listed in the snippet. The torch library is torchvision.