Reputation
Badges 1
25 × Eureka!I was thinking such limitations will exist only for published
Published Task could not me "marked started" even when with force flag
task = Task.get_task('task_id_here') task.mark_started(force=True) task.upload_artifact(..., wait_on_upload=True) task.mark_completed()
I think you can force it to be started, let me check (I pretty sure you can on aborted Task).
(no objection to add an argument but, I just wonder what's the value)
This is the thread checking the state of the running pods (and updating the Task status, so you have visibility into the state of the pod inside the cluster before it starts running)
Hi AbruptWorm50
I am currently using the repo cache,
What do you mean by "using the repo cache" ? This is transparent, the agent does that, users should not access that folder?
I also looked at the log you send, why do you think it is re-downloading the repo?
I think EmbarrassedSpider34 is correct.
When you pass the requirements to clearml-task, actually the agent depending on how it was configured (conda / pip) will do the installation.
That said, maybe it is worth adding support to provide the env.yml in the CLI ?
(Notice that adding specific channels needs to be configured on the agent, they are not stored per Task)
AlertCamel57 wdyt?
Was wondering how it can handle 10s, 100s of models.
Yes, it supports dynamically loading/unloading models based on requests
(load balancing multiple nodes is disconnected from it, but assuming they are under diff endpoints, the load balancer can be configured to route accordingly)
Think I will have to fork and play around with itΒ
NICE! (BTW: if you manage to get it working I'll be more than happy to help push the PR)
Maybe the quickest win is to store just the .py as model ?
Tried context provider for Task?
I guess that would only make sense inside notebooks ?!
This is odd it says 1.0.0 but then, it was updated t weeks ago ...
could it be the polling on the Task (can't remember whats the interval), but it will update it's state once every X minutes/seconds
I just called exit(0)
in a notebooke and it closed it (the kernel) no exception
Ohh then YES!
the Task will be closed by the process, and since the process is inside the Jupyter and the notebook kernel is running, it is still running
Any idea why the Pipeline Controller is Running despite the task passing?
What do you mean by "the task passing"
For future readers, see discussion here:
https://clearml.slack.com/archives/CTK20V944/p1629840257158900?thread_ts=1629091260.446400&cid=CTK20V944
basically use the template π we will deprecate the override option soon
overrides -> "kubectl run --overrides "
template -> "kubectl apply template.yaml"
another option is the download fails (i.e. missing credentials on the client side, i.e. clearml.conf)
Hi AverageBee39
It seems the json is corrupted, could that be ?
Does it say it runs something ?
(on the workers tab on the agents table it should say which Task it is running)
You can just spin another agent on the same machine π
I'm still unclear on why cloning the repo in use happens automatically for the pipeline task and not for component tasks.
I think in the pipeline it was the original default, but it turns out for a lot of users this was not their defualt use case ...
Anyhow you can also pass repo="."
which will load + detect the repo in the execution environemtn and automatically fill it in
Can you post here the docker-compose.yml you are spinning? Maybe it is the wring one?
Step 4 here:
https://github.com/thepycoder/asteroid_example#deployment-phase