Reputation
Badges 1
282 × Eureka!Hi, the problem is the same.
I noticed that its not checking out the latest version in gitlab. This latest version would contain the requirements.txt.Using cached repository in "/root/.clearml/vcs-cache/pytorchmnist.f220373e7227ec760b28c7f4cd99b534/pytorchmnist" warning: redirecting to
Note: checking out 'cfb833bcc70f3e10d3b6a96cfad3225ed682382b'.
But i'm guessing this block below applied the diff..does it include the requirements.txt though?
` HEAD is now at cfb833b Upload New Fil...
Hi this is the log. I didn't see any attempt from the agent to install virtualenv on the base image.
` 1618369068169 clearml-gpu-id-b926b4b809f544c49e99625380a1534b:gpuGPU-4ad68290-0daf-4634-6768-16fad73d47a3 DEBUG Current configuration (clearml_agent v0.17.2, location: /tmp/.clearml_agent.wgsmv2t9.cfg):
agent.worker_id = clearml-gpu-id-b926b4b809f544c49e99625380a1534b:gpuGPU-4ad68290-0daf-4634-6768-16fad73d47a3
agent.worker_name = clearml-gpu-id-b926b4b809f544c49e99625...
Hi, so you meant i need to installl virtualenv in my base image?
Thought this looked familiar.
https://clearml.slack.com/archives/CTK20V944/p1635323823155700?thread_ts=1635323823.155700&cid=CTK20V944
I think in general, the 'published' action can be considered an 'approval'. The question is, how do we control who has the authority to 'publish'? The Web UI today does not support any uploads outside of the coding environment, would be nice it would be supported. But for now, the only workaround is to include parameters that stores document urls in the user properties.
Hi, i have the same question. Why would this be ignored if called remotely?
https://clear.ml/docs/latest/docs/references/sdk/task/#set_base_docker
Hi CostlyOstrich36 , What you described is task. I was referring to the pipeline controller.
Hi, currently the ClearML SDK only supports python. If i want to run my ML in other languages, can i use a SDK in that language? Or is there other means such as a Web API calls that does the same as the SDK?
Thanks could you share the URL to this full API documentation?
My assumption is that the agent will have pulled that off the client's clearml.conf.
Ok thanks. that explains alot. We have been doing this wrongly the whole time, thinking that the clearml.conf on the client side would be acknowledged by the remote agent execution. In reality, only the API section is utilised.
Thanks. We set this configuration and the client ran and submitted the job for remote execution (agent running k8s glue). However when the job runs, and tries to save into model repo, this error came up.
ClearML.storage - ERROR - Failed creating storage object S3://ecs.ai Reason; Missing key and secret for S3 storage access ( S3://ECS.ai ).
I remember being told that the ClearML.conf on the client will not be used in a remote execution like the above so I think this was the problem. I also...
Setting the credentials on agent machine means the users cannot use their own credentials since an k8s glue agent serves multiple users.
Referencing your suggestion, we can configure output_uri on task.set_base_docker() but how should we do this for the credentials?
Hi CostlyOstrich36 , That's correct.
We are using k8s glue to spawn the job. Would you be able to advise in detail of steps on what goes on when the above code executes?
The server is running only the ClearML components. Could you advise on the ELB part, how should we optimise it?
Hi, i will have to get back to you again. Need to check every client's repo to determine your hypothesis.
I'm having the same problem. You using latest clearmagent? Is your docker image a root user by default?
Its running as a long running POD on K8S. I'm using log -f
to track its stdout.
Hi, how may i task.init() within these sub processes without write access to the 3rd party scripts and python executables?
Hi SuccessfulKoala55 ,i managed to install clearml-agent==1.0.1rc5. However, the same issues occur.
Hi SuccessfulKoala55 , just to add, my clearml.conf (client) and clearml.agent.conf (agent) can have differing values. I'm not sure which one takes precedence and if this could be the cause.
i see. Can i take it that when the client usestask.execute_remotely(queue_name="1gpu", exit_process=True)
then none of the content in its clearml.conf will be used, except for the API part. And Clearml simply uses whatever is on the Agent side.api { # Notice: 'host' is the api server (default port 8008), not the web server. api_server:
web_server:
files_server:
# Credentials are generated using the webapp,
`
# Override with os environment: ...
yes its on purpose, each user would have their own AWS credentials for default_output_uri.
Ok, let me check this out first thing on Monday, thanks AgitatedDove14 .