 
			Reputation
Badges 1
53 × Eureka!clearml-agent daemon --docker --gpus all --queue Q_NAME --log-level DEBUG --detached
Okay I found your twitteer profile pic to be adequate after upsampling. Thank you and sorry 😅
Hey AgitatedDove14 ,
This sort of works but not quite. The process runs sucessfully (i can attach the container, ping thr JPH etc) however the portforwarding fails.
When I do the port forward on my own using  ssh -L  also seems to fail for jupyterlab and vscode, too, which i find odd
CostlyOstrich36 jupyterhub is a multi-user server, which allows many users to login and spawn their own jupyterlab instances (with custom dependencies, data etc) for runing notebooks
AgitatedDove14 no errors, because I don't know how to start 😅 I am just exploring if anyone did this before I get my hands dirty
We have deployed clearml-agents as systemd services. This allows you to tell systemd to restart the agent whenever it crashes, and it automatically starts them up when the server boots!
Mostly the configurabilty of  clearml-session  and how it was designed. Jupyterhub spawns a process at :8000 which we had to port foreward by hand, but spawning new docker containers using  jupyterhub.Dockerspawner  and connecting them to the correct network (the hub should talk to them without  --network host ) seem too difficult or even impossible.
Oh, and there was no JupyterHub stdout in the console output on clearml server, it shows the jupyterlab's output by default
i think you're right, the default elastic values do not seem to work for us
No errors in logs, but that's because I restarted the deployment :(
You are not missing nothing, it is what we would like to have, to allow multiple people have their own notebook servers. We have multiple people doing different experiments, and JupyterHub would be their "playground" environment
we didn't change a thing from the defaults that's in your github 😄 so it's 500M?
This was actually a reset (of a one experiment) not a delete
When installing locally you said to pip to look for packages at that page, and you dont say that to the remote pip
SOLVED: It was an expired service account key in a clearml config
For now,  docker compose down && docker compose up -d  helps
Okay, thank you for the suggestions, we'll try it out
By language, I meant the syntax. What is  Args  and what is  batch  in  Args/batch  and what other values exist  😀
By commit hash, I mean the hash od the commit a task was run from. I wish to refer to that commit hash in another task (started with a triggerscheduler) in code
Yup, absolutely. Otherwise it cannot run your code haha
Nothing at all. There are only 2 logs from this day, and all were at 2am
Yes, thank you. That's exactly what I'm refering to.
The agent is deployed on our on-premise machines
Thank you, I understand now :D
Ok great. We were writing clearml triggers and they didn't work with "aborted". 😅
I would kindly suggest perhaps adding a set of all statuses in the docs
Is the trigger controller running on the services queue ?
Yes, yes it is
I haven't looked, I'll let you know next time it happens
I think I know why though.
Clearml tries to install a package using pip, and pip cannot find the installation because it's not on pypi but it's listed in the pytorch download page
The log suggests there is no cu113 installation as well:
Warning, could not locate PyTorch torch==1.12.1 matching CUDA version 113
Yeah, sorry I typoed 😅 "newer than 18.04" was I supposed to say
What I meant was that we rebuilt them with 22.04
Yes, that's right. We deployed it on a GCP instance
trigger.add_task_trigger(name='export', schedule_task_id=SCHEDULE_ID, task_overrides={...})I would like to override the commit hash of the  SCHEDULE_ID  with  task_overrides
That's only a part of a solution.
You'd also have to allow specifying  jupyterhub_config.py , mounting it inside a container at a right place, mounting the docker socket in a secure manner to allow spawning user containers, connecting them to the correct network ( --host won't work), persisting the user database and user data...