Reputation
Badges 1
282 × Eureka!Just to put a ping for those on this side of the timezone to look at. Thanks.
Hi, by deployment strategies I meant by canary, blue-green...etc..etc. I figured this should be done by clearml-serving and maybe seldon as well.
Try set docker_force_pull: true under agent section of your agent's clearml.conf.
Hi AgitatedDove14 , i've got the same error. It would appear that the code references clearml_agent/helper/base.py which i believe is part of clearml-agent v0.17.1. Could that be the issue?
thanks. That seems to work. I got a question, does it save the best model or the model in the last epoch?
I've been reading the documentation for a while and I'm not getting the following very well.
Given an open source codes say, huggingface. I wanted to do some training and i wanted to track my experiments using ClearML. The obvious choice would be to use Explicit Reporting in ClearML. But the part on sending my training job. and let ClearML orchestrate is vague. Would appreciate if i can be guided to the right documentation on this.
ah... thanks!
Hi Jake, thanks for the suggestion, let me try it out.
Ok. Problem was resolved with latest version of clearml-agent and clearml.
This one can be solved with shared cache + pipeline step, refreshing the cache in the shared cache machine.
Would you have an example of this in your code blogs to demonstrate this utilisation?
I'm using this feature, in this case i would create 2 agents, one with cpu only queue and the other with gpu queue. And then at the code level decide with queue to send to.
I did notice that in the tmp folder, .clearml_agent.xxxxx.cfg does not exists.
Space is way above nominal. What created this folder that it's trying to process? What processing is this?Processing /tmp/build/80754af9/attrs_1604765588209/workIs there any paths in the agent machine that i can clear out to remove any possible issues from previous versions?
They don't have the same version. I do seem to notice that if the client is using version 3.8, during remote execution will try to use that same version despite the docker image not installed with that version.
the hackathon is 3 days.
This is probably the whole script.
kubectl get nodespip install clearml-agentpython k8s_glue_example.py
can you please verify that you have all the required packages installed locally ?
Its not installed on the image that runs the experiment. But its reflected in the requirements.txt.
what is the setting of
agent.package_manager.system_site_packages
True.
The apply.yaml template is not working (E.g. the arguments env is not passed to the container), this is why i tried the code approaach instead.
Hi, i was reading this thread and wondered which version of clearml-server and clearml-agent has this taken effect with?
In the ClearML config that's being run by the ClearML container?
Hi AgitatedDove14 , i changed everything to cuda 10.1 and tried again with the same rrror. the section as follows. I made sure torch==1.6.0+cu101 and torchvision==0.8.2+cu101 are in the pypi repo. But the same error still came up.
` # Python 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
boto3 == 1.14.56
clearml == 0.17.4
numpy == 1.19.1
torch == 1.6.0
torchvision == 0.7.0
Detailed import analysis
**************************
IMPORT PACKAGE boto3
clearml.storage: 0
IMPORT PACKAG...
I can't seem to find the fix to this. Ended up using an image that comes with torch installed.
I would say yes, otherwise the vscode feature is only available on internet connected premises due to the hard coded URL to download vscode.
Here's my two cents worth.
I thought its really nice to start off the topic highlighting 'pipelines', its unfortunately one of the most missed component when ppl start off with ML work. Your article mentioned about drfits and how MLOps process covered it. I thought there are 2 more components that was important and deserves some mention.Retraining pipelines. ML engineers tend not to give much thought to how they want to transit a training pipeline in development to a automated retraining pipe...
Yeah that'll cover the first two points, but I don't see how it'll end up as a dataset catalogue as advertised.
To note, the latest codes have been pushed to the Gitlab repo.
The doc also mentioned preconfigured services with selectors in the form of"ai.allegro.agent.serial=pod-<number>" and a targetPort of 10022. Would you have any examples of how to do this?
No issues. I know its hard to track open threads with Slack. I wish there's a plugin for this too. 🙂
Any idea where i can find the relevant API calls for this?
alright thanks. Its impt we clarify it works before we migrate the ifra.