Reputation
Badges 1
53 × Eureka!Ok great. We were writing clearml triggers and they didn't work with "aborted". 😅
I would kindly suggest perhaps adding a set of all statuses in the docs
I succeeded with your instructions, so thank you!
However, we concluded that we don't want to run it through ClearML after all, so we ran it standalone.
But, I'll update you if we ever run it with ClearML so you could also provide it
Hey AgitatedDove14 ,
This sort of works but not quite. The process runs sucessfully (i can attach the container, ping thr JPH etc) however the portforwarding fails.
When I do the port forward on my own using ssh -L
also seems to fail for jupyterlab and vscode, too, which i find odd
We've sucessfully deployed it without helm with custom made docker-compose and makefiles 😄
Errors pop in occasionally in the Web UI. All we see is a dialog with the text "Error"
This was actually a reset (of a one experiment) not a delete
Okay I found your twitteer profile pic to be adequate after upsampling. Thank you and sorry 😅
By language, I meant the syntax. What is Args
and what is batch
in Args/batch
and what other values exist 😀
By commit hash, I mean the hash od the commit a task was run from. I wish to refer to that commit hash in another task (started with a triggerscheduler) in code
Haha we manage our own deployment without k8s, so no dice there
But, it turns out we are using nginx as a reverse proxy so putting a client_max_body_size
inside a nginx.conf solved it for us. Thanks :))
Yes, thank you. That's exactly what I'm refering to.
The agent is deployed on our on-premise machines
CostlyOstrich36 this sounds great. How do I accomplish that?
Nothing at all. There are only 2 logs from this day, and all were at 2am
This means that an agent only ever spins up one particular image? I'd like to define different container images for different tasks, possibly even build them in the process of starting a task. Is such a thing possible?
I haven't looked, I'll let you know next time it happens
trigger.add_task_trigger(name='export', schedule_task_id=SCHEDULE_ID, task_overrides={...})
I would like to override the commit hash of the SCHEDULE_ID
with task_overrides
Thank you, I understand now :D
Yeah, you are right.
We use an empty queue to enqueue our tasks in, just to trigger the scheduler 😅 it's only importance is that the experiment is not enqueued anywhere else, but the trigger then enqueues it
It's just that the trigger is never triggered
(Except when a new task is created - this was not the case)
Is the trigger controller running on the services queue ?
Yes, yes it is
Unfortunately, no, I can't paste the whole code. In a nutshell, the trigger spawns a new GCE instance with a clearml-agent
running to schedule the experiments in Cloud.
This is an excerpt:
def gcp_start_trigger(task_id: str):
curr_task = Task.get_task(task_id)
#curr_task.reset(force=True)
config = extract_config(curr_task)
machine_type = config.get('machine-type')
queue_name = f"gcp/{machine_type}"
ensure_queue(queue_name) # creates a new queue if it doesn't...
Mostly the configurabilty of clearml-session
and how it was designed. Jupyterhub spawns a process at :8000 which we had to port foreward by hand, but spawning new docker containers using jupyterhub.Dockerspawner
and connecting them to the correct network (the hub should talk to them without --network host
) seem too difficult or even impossible.
Oh, and there was no JupyterHub stdout in the console output on clearml server, it shows the jupyterlab's output by default
The log suggests there is no cu113 installation as well:
Warning, could not locate PyTorch torch==1.12.1 matching CUDA version 113
It is likely you have mismatched cuda. I presume you locally have cu113 but cu114 remotely. Were you running any updates lately?
I think I know why though.
Clearml tries to install a package using pip, and pip cannot find the installation because it's not on pypi but it's listed in the pytorch download page