Reputation
Badges 1
282 × Eureka!Hi TimelyPenguin76 ,
If you notice in the last screenshot, it state the bucket name to be http://ecs.ai . It then it tries to open http://s3.amazonaws.com/ecs.ai/clearml-models/artifact/uploading_file?X-Amz-Algorithm= ....
Hi thanks. How about Agent, does its docker mode or k8s mode require docker.sock to be exposed?
Hi, just wondering if this 'feature: Passing env via the code' is in the works?
https://clearml.slack.com/archives/CTK20V944/p1616677400127900?thread_ts=1616585832.098200&cid=CTK20V944
Hi CostlyOstrich36 , What you described is task. I was referring to the pipeline controller.
Hi AgitatedDove14 , i dug a bitt deeper. I saw this in installed packages
in the original completed task. When the task is cloned, this is copied over and thus the problem. Can i ask, how ClearML create the list of installed packages? Why is it that some of them (E.g. attr is being pulled from @ file:///tmp/build/80754af9/attrs_1604765588209/work)
` absl-py==0.11.0
alabaster==0.7.12
antlr4-python3-runtime==4.8
apex==0.1
appdirs==1.4.4
argon2-cffi==20.1.0
ascii-graph==1.5.1
async-gener...
Yeah that sounds good. But from user perspective, especially the untrained, they wouldn't know what to point to. Example, some may think it's an exe, some think it's a zip bundle, and others think it's any github repo with the word vscode.
Hi, building a container with vscode is not possible. If i have an alternative location for the vscode, where should i indicate in the configuration?
where should i indicate in the configuration?
Any idea?
The agent is running on a disconnected server on docker mode. I have a client that runs clearml-session and i saw from the agent's logs that the installation of vscode fails.
I also see this on my logs, noting that the config is read in but its still printing the supposedly hidden keys on the logs and UI.agent.hide_docker_command_env_vars.enabled = true agent.hide_docker_command_env_vars.extra_keys.0='TRAINS_AGENT_GIT_USER' ..... docker_cmd=harbor.ai/public/detectron2:v3 --env TRAINS_AGENT_GIT_USER=gituser
I would say its intermittent.
Thanks. The challenge we encountered is that we only expose our Devs to the ClearML queues, so users have no idea what's beyond the queue except that it will offer them the resources associated with the queue. In the backend, each queue is associated with more than one host.
So what we tried is as followed.
We create a train.py script much like what Tobias shared above. In this script, we use the socket library to pull the ipaddr.
import socket
hostname=socket.gethostname()
ipaddr=dock...
Unfortunately it's not. The problem previously encountered with the docker method surfaced again. In this case, the BASE DOCKER IMAGE
nvidia/cuda:10.1-runtime-ubuntu18.04 --env GIT_SSL_NO_VERIFY=true
is not taking effect with the k8s glue.
ok, i'll wait till i get my hands on vault then. thanks.
This one can be solved with shared cache + pipeline step, refreshing the cache in the shared cache machine.
Would you have an example of this in your code blogs to demonstrate this utilisation?
Ok thanks, looking forward to it. Would you advise on the bug you encountered?
Hi SuccessfulKoala55 , thanks, tested the patch and its working as expected now.
Hi, it make sense if i only had to change hyperparameters, but it's not so when i am still changing the model architecture (training code) and train and repeat.
So these (PIP_INDEX_URL) weren't used when clearml starts running pip.
I did another test by runningkubectl exec pod-name -- echo $PIP_INDEX_URL
and it returned nothing. So the env are not passed to the container at all.
Then you pass the tolerations definition through a different pod template?
Yup.
Hi yes, still getting the SSLs. It looks like some incompatibility with the OS ssl libraries.
What type of pipeline steps are you running? From task, decorator or function?
We were trying with 'from task' at the moment. But the question apply to all methods.
If they're all running on the same container why not make them the same task and do things in parallel?
The tasks were created by different teams and their tasks content is rather independent and modular. Usage of them is usually optional. For example, task1 performs 'image whitening', task2 performs 'image resize'.
No issues. I know its hard to track open threads with Slack. I wish there's a plugin for this too. 🙂
Can i dig into the mongodb or ES to pull these data?
Any idea where i can find the relevant API calls for this?