I ended up using
task = Task.init(
continue_last_task
=task_id)
to reload a specific task and it seems to work well so far.
Exactly, this will initialize and auto log the current process into existing task (task_id). Without the argument continue_last_task ` it will just create a new Task and auto log everything to it π
I mean test with:pipe.start_locally(run_pipeline_steps_locally=False)This actually creates the steps as Tasks and launches them on remote machines
The main issue is applying the patch requires git clone and that would fail on local (not pushed) commits.
What's the use case itself ?
(btw, if you copy the uncommitted changed into a file and git apply it, it will work)
AgitatedTurtle16 could you check with the latest clearml RC (I remember a similar issue was fixed).pip install clearml==0.17.5rc3Then run againclearml-task ...
this is the code for task scheduler
So it makes sense the first "scheduled" job is epoch time 0 (1970) because "executes_immediately" basically means it sets a date that passed, so it triggers it. does that make sense ?
This looks like 'feast' error, could it be a configuration missing?
Could not locate channel name 'gg_clearml'CheerfulGorilla72 these are the permissions:
https://github.com/allegroai/clearml/blob/427b98270cc846b5d7e4af49f9732e3eb8d7d3ae/examples/services/monitoring/slack_alerts.py#L13channels:join channels:read chat:write
My understanding is that on remote execution Task.init is supposed to be a no-op right?
Not really a no-op, it would sync Argpasrer and the like, start background reporting services etc.
This is so odd! literally nothing printed
Can you tell me something about the node "mrl-plswh100:0" ?
is this like a sagemaker node? we have seen things similar where Python threads / subprocesses are not supported and instead of python crashing it just hangs there
Hi ShinyPuppy47 ,
Yes that is correct. Use Task.init for automagic logging
Hi JitteryCoyote63
If you want to refresh the task object, call task.reload() It will also refresh the artifacts.
The reason for not always do so when accessing the .artifacts objects is for speed optimization (It might be slow compared to dict access, and we assume users will expect it to behave the dict)
Hi @<1523701260895653888:profile|QuaintJellyfish58>
Is there a way or a trigger to detect when the number of workers in a queue reaches zero?
You mean to spin them down? what's the rational ?
Iβd like to implement a notification system that alerts me when there are no workers left in the queue.
How are they "dropping" ?
Specifically to your question, let me check I'm sure there is an API that get's that data becuase you can see it in the UI π
So you have two options
- Build the container from your docker file and push it to your container registry. Notice that if you built it on the machine with the agent, that machine can use it as Tasks base cintainer
- Use the From container as the Tasks base container and have the rest as docker startup bash script. Wdyt?
Questions
I want to trigger a retrain task when F1
That means that in inference you are reporting the F1 score, correct?
As part of the retraining I have to train all the models and then have to choose best one and deploy it
Are you using passing output_uri to Task.init? are you storing the model as artifact?
You can tag your model/task with "best" tag (and untag the previous one). Then in production , look for the "best" task and get its model
Thoughts?
Back to the feature request, if this is taken care of (both adding a missed package, and the S3 upload), do you still believe there is a room for this kind of feature?
Why can I only callΒ
import_model
Actually creates a new Model object in the system
InputModel(id) will "load" a model based on the model id
Make sense ?
Hi GiddyTurkey39
us the config file connect to the Task via Task.connect_configuration ?
Hi UptightMouse31
First, thank you π
And to your question:
variable in the project is the kpi,
You mean like add it to the experiment table and get kind of leader-board ?
Damn, okay I'll make sure we fix the order.
Could you verify the ~= works as intended (if the order id correct)
still it is a chatgpt interface correct ?
Actually, no. And we will change the wording on the website so it is more intuitive to understand.
The idea is you actually train your own model (not chatgpt/openai) and use that model internally, which means everything is done inside your organisation, from data through training and ending with deployment. Does that make sense ?
MagnificentSeaurchin79
Can this be solved by using a docker image with the preinstalled packages at a user level?
Yes π
BTW: I think I missed how you managed to install the object_detection API in the first place?
Is it the git repo of the Task? did you fork it? is it a submodule of your git repo?
p.s.
Yes Slack is quite good at reminding you, but generally saying always prefer @ , it will send me an email if I miss the message :)
hmm DeliciousKoala34
what are you getting if you put this at the top of your code (the one you are running in the remote docker)import os print([(k, os.environ[k]) for k in os.environ if k.startswith("CLEARML_")])
Why? The task should have completed successfully, how is this aborting?
Early stopping by the HPO process, like hyper-band, e.g. this training model is going nowhere let's stop it.
That makes total sense, this is exactly an OS scenario for signal 9 π
Why is it using an OutputModel and an InputModel?
So calling OutputModel will create the new Model entity and upload the data, InputModel will store it as required input Model.
Basically on the Task you have input & output section, when you clone the Task you are copying the input section into the newly created Task, and the assumption is that when you execute it, your code will create the output section.
Here when you clone the Task you will be clone the reference to the InputModel (i...
I'm sorry wrong line reference:
I'm assuming the error is due to ulimit missing:
try adding 16777216 to both soft/hard ulimit
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L58