
Reputation
Badges 1
53 × Eureka!No errors in logs, but that's because I restarted the deployment :(
I haven't looked, I'll let you know next time it happens
Nothing at all. There are only 2 logs from this day, and all were at 2am
For now, docker compose down && docker compose up -d
helps
Is the trigger controller running on the services queue ?
Yes, yes it is
It could work but slack demands a minimum of 512x512
Okay I found your twitteer profile pic to be adequate after upsampling. Thank you and sorry 😅
i think you're right, the default elastic values do not seem to work for us
I tried to build allegroai/clearml-agent-services on my laptop with ubuntu:22.04
and it failed
Yeah, you are right.
We use an empty queue to enqueue our tasks in, just to trigger the scheduler 😅 it's only importance is that the experiment is not enqueued anywhere else, but the trigger then enqueues it
It's just that the trigger is never triggered
(Except when a new task is created - this was not the case)
Ok great. We were writing clearml triggers and they didn't work with "aborted". 😅
I would kindly suggest perhaps adding a set of all statuses in the docs
SOLVED: It was an expired service account key in a clearml config
Haha we manage our own deployment without k8s, so no dice there
But, it turns out we are using nginx as a reverse proxy so putting a client_max_body_size
inside a nginx.conf solved it for us. Thanks :))
I guess I'll let you know the next time this happens haha
It is likely you have mismatched cuda. I presume you locally have cu113 but cu114 remotely. Were you running any updates lately?
Okay, thank you for the suggestions, we'll try it out
This was actually a reset (of a one experiment) not a delete
By language, I meant the syntax. What is Args
and what is batch
in Args/batch
and what other values exist 😀
By commit hash, I mean the hash od the commit a task was run from. I wish to refer to that commit hash in another task (started with a triggerscheduler) in code
Errors pop in occasionally in the Web UI. All we see is a dialog with the text "Error"
to answer myself, the first part, task.get_parameters()
retrieves a list of all the arguments which can be set. The syntax seems to be Args/{argparse destination}
However, this does not return the commit hash :((
This means that an agent only ever spins up one particular image? I'd like to define different container images for different tasks, possibly even build them in the process of starting a task. Is such a thing possible?
we didn't change a thing from the defaults that's in your github 😄 so it's 500M?
But do consider a sort of a designer's press kit on your page haha