Hi TimelyPenguin76 , i am adding a debug sample to an existing task using the above method. What should i put for the iteration? I do not want to overwrite existing ones but i do not know what's the last count. This is for both scalar and media reporting.
Hi, it's a preference from my developers. They preferred that the they install the python libraries into the images, load them up into the registry. In other words, they prefer to have libraries installed at image time.
Hi, we are still not getting the model repo to work, mainly due to clearml.storage failing to save the models.
We tried a vanilla boto3 code and it works, but we can't figure out why we get connectionreseterror 104 when clearml does it.
How do we configure clearml in correspondence to following boto code?
S3= boto3.resource('s3', endpoint_url=' https://ecs.ai ', aws_access_key_id='mykey', aws_secret_access_key='mysevret', config=Config(signature_version='s3v4'), region_name='us-east-1', ve...
So the context I'm asking is I realise I'll need to catalogue all the dataset ids created by ppl separately on a spreadsheet. And for each experiment, I'll need to go into the code commit to see which id is being used. But on the other hand, I thought I've seen advertised use cases where the experiment can be directly linked to the dataset id being used. The brain's a bit rusty to recall how it was done.
My assumption is that the agent will have pulled that off the client's clearml.conf.
unfortunately, our security posture is so strict that we cannot have an agent git user that have unfettered read access to all repos.
Congrats on v1.0. 🎉
Yeah that sounds good. But from user perspective, especially the untrained, they wouldn't know what to point to. Example, some may think it's an exe, some think it's a zip bundle, and others think it's any github repo with the word vscode.
I'm having the same problem. You using latest clearmagent? Is your docker image a root user by default?
We are deploying ClearML Server via the docker-compose.
For ClearML-Agent. We have the choice of Docker or K8S preferred (Using the Glue).
For K8S, we can't get the glue to work ( https://clearml.slack.com/archives/CTK20V944/p1614525898114200?thread_ts=1613923591.002100&cid=CTK20V944 ) so we can't make an assessment of whether it actually works for us.
Hi SuccessfulKoala55 , thanks, tested the patch and its working as expected now.
Ok thanks, that worked.
Although I think you can also pull specific chunks of dataset
How do you do that with clearml-data?
Thanks this would be a good alternative before the enterprise version comes in. How is this different from argparser btw?
Yeah.. issue is ClearML unable to talk to the nodes cos pytorch distributed needs to know their IP. There is some sort of integration missing that would enable this.
Hi thanks for the examples! I will look into them. Quite a fair bit of my teams uses tf datasets to pull data directly from object stores, so tfrecords and stuff are heavily involved. I'm trying to figure if they should version the raw data or the tfrecords with ClearML, and if downloading entire set of data to local can be avoided as tf datasets is able to handle batch downloading quite well.
Hi. The upgrade seems to go well but i'm seeing one wierd output. When i ran a task and observe the software installed
under the execution
tab , i still see clearml=0.17
. Is this expected?
It also stopped taking in tasks from the queue after that.
Hi this is the log. I didn't see any attempt from the agent to install virtualenv on the base image.
` 1618369068169 clearml-gpu-id-b926b4b809f544c49e99625380a1534b:gpuGPU-4ad68290-0daf-4634-6768-16fad73d47a3 DEBUG Current configuration (clearml_agent v0.17.2, location: /tmp/.clearml_agent.wgsmv2t9.cfg):
agent.worker_id = clearml-gpu-id-b926b4b809f544c49e99625380a1534b:gpuGPU-4ad68290-0daf-4634-6768-16fad73d47a3
agent.worker_name = clearml-gpu-id-b926b4b809f544c49e99625...
Thanks AgitatedDove14 , unfortunately it didn't take effect.
Next step to figure out if i can do all that in the python code instead of UI.
Ok thanks, looking forward to it. Would you advise on the bug you encountered?
Hi, i tried the k8s-glue on my k8s setup and needed some clarifications on some of the arguments.
--queue. Does this only refer to default and service? How can i create new queue to which it can sync with the ClearML server? --ports-mode. I'm not sure what ports mode does. doc says "add a label to the pod which can be used as service". Which pod is it referring to in the first place? All args pertaining to --ports-mode. (E.g. base-pod-num, gateway-address...etc) --overrides-yaml. What is the ...
Its 1.0.0. As printed on the top of the logs in ClearML Server UI.
does the bash script need clearml-agent to be able to communicate to the https clearml-server first? If yes, there's a chicken/egg problem here.
It didn't work as expected.
` task init
task report iter 10
task init
task report iter 10
The second task pushed the reporting iteration to 20 instead. `
Thanks CostlyOstrich36 , how do i know how is the parts indexed in the first place? Or rather, how is chunk and parts defined? Say in the context of images, videos, text documents...etc.