![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/RotundHedgehog76.png)
Reputation
Badges 1
53 × Eureka!I guess I'll let you know the next time this happens haha
You are not missing nothing, it is what we would like to have, to allow multiple people have their own notebook servers. We have multiple people doing different experiments, and JupyterHub would be their "playground" environment
we didn't change a thing from the defaults that's in your github 😄 so it's 500M?
CostlyOstrich36 jupyterhub is a multi-user server, which allows many users to login and spawn their own jupyterlab instances (with custom dependencies, data etc) for runing notebooks
AgitatedDove14 no errors, because I don't know how to start 😅 I am just exploring if anyone did this before I get my hands dirty
Ok great. We were writing clearml triggers and they didn't work with "aborted". 😅
I would kindly suggest perhaps adding a set of all statuses in the docs
I succeeded with your instructions, so thank you!
However, we concluded that we don't want to run it through ClearML after all, so we ran it standalone.
But, I'll update you if we ever run it with ClearML so you could also provide it
Unfortunately, no, I can't paste the whole code. In a nutshell, the trigger spawns a new GCE instance with a clearml-agent
running to schedule the experiments in Cloud.
This is an excerpt:
def gcp_start_trigger(task_id: str):
curr_task = Task.get_task(task_id)
#curr_task.reset(force=True)
config = extract_config(curr_task)
machine_type = config.get('machine-type')
queue_name = f"gcp/{machine_type}"
ensure_queue(queue_name) # creates a new queue if it doesn't...
Yeah, you are right.
We use an empty queue to enqueue our tasks in, just to trigger the scheduler 😅 it's only importance is that the experiment is not enqueued anywhere else, but the trigger then enqueues it
It's just that the trigger is never triggered
(Except when a new task is created - this was not the case)
Hello, a similar thing happened today. In the developer's console there was this line
https://server/api/v2.19/tasks.reset_many 504 (Gateway time-out)
For now, docker compose down && docker compose up -d
helps
This means that an agent only ever spins up one particular image? I'd like to define different container images for different tasks, possibly even build them in the process of starting a task. Is such a thing possible?
Hey AgitatedDove14 ,
This sort of works but not quite. The process runs sucessfully (i can attach the container, ping thr JPH etc) however the portforwarding fails.
When I do the port forward on my own using ssh -L
also seems to fail for jupyterlab and vscode, too, which i find odd
We have deployed clearml-agents as systemd services. This allows you to tell systemd to restart the agent whenever it crashes, and it automatically starts them up when the server boots!
AgitatedDove14 Well, we have gotten relatively close to the goal, i suppose you wouldn't have to do a lot of work to support it natively
I tried to build allegroai/clearml-agent-services on my laptop with ubuntu:22.04
and it failed
It's not because of the remote machine, it's the requirements 😅 as i said, the package is not on pypi. Try adding this at the top of your requirements.txt:
-f
torch==1.12.1+cu113 ...other deps...
i think you're right, the default elastic values do not seem to work for us
This was actually a reset (of a one experiment) not a delete
Is the trigger controller running on the services queue ?
Yes, yes it is
Yes, thank you. That's exactly what I'm refering to.
The agent is deployed on our on-premise machines
to answer myself, the first part, task.get_parameters()
retrieves a list of all the arguments which can be set. The syntax seems to be Args/{argparse destination}
However, this does not return the commit hash :((
It could work but slack demands a minimum of 512x512
SOLVED: It was an expired service account key in a clearml config
CostlyOstrich36 this sounds great. How do I accomplish that?
Haha we manage our own deployment without k8s, so no dice there
But, it turns out we are using nginx as a reverse proxy so putting a client_max_body_size
inside a nginx.conf solved it for us. Thanks :))