
Reputation
Badges 1
25 × Eureka!And it is not working ? what's the Working Dir you have under the Execution Tab ?
TrickySheep9
Is there a way to see a roadmap on such thingsΒ
?Β (edited)
Hmm I think we have some internal one, I have to admit these things change priority all the time (so it is hard to put an actual date on them).
Generally speaking, pipelines with functions should be out in a week or so, TaskScheduler + Task Triggers should be out at about the same time.
UI for creating pipelines directly from the web app is in the working, but I do not have a specific ETA on that
Hi LovelyHamster1
That is a good point, sine the Pipeline kind of assumes the task are already in the system, it clone them (leaving you with the original Draft Task).
I think we should add a flag to that pipeline that if the Task is in draft it will use it (instead of cloning it) Since it seems your pipeline is quite straight forward, I'm not sure you actually need the pipeline controller class, you can perform the entire thing manually, see example here: https://github.com/allegroai/clea...
RobustSnake79 let's assume that the trace figure above is probably too much to get into the WebUI, which simple figures might still have value in your scenario ?
Hi @<1545216070686609408:profile|EnthusiasticCow4> let me know if this one solves the issue
pip install clearml==1.14.2rc0
for example train.py & eval.py under the same repo
Yes that makes total sense to me. How about a GitHub issue on the clearml-docs ?
but I'd prefer to have a new instance deployed for each new experiment and that it also terminates when no new experiments are queued
I'm not objecting, just wondered on the rational behind the decision π
Back to the AWS autoscaler:
Basically if you have the services-agent running on your cluster, it will just run the aws-autoscaler for you π
The idea of the service-agent is to run logic/monitoring Tasks suck as the aws autoscaler. Notice that service-mode means multiple job per...
Hi RipeGoose2
Are you continuing the Task, i.e. passing Task.init(..., continue_last_task=True)
Hi @<1547028074090991616:profile|ShaggySwan64>
I'm guessing just copying the data folder with rsync is not the most robust way to do that since there can be writes into mongodb etc.
Yep
Does anyone have experience with something like that?
basically you should just backup the 3 DBs (mongo, redis, elastic) each one based on their own backup workflows. Then just rsync the files server & configuration.
You are correct, the agent will clone the git and install the requirements, as written in the task installed packages section. Regrading the git branch, notice it will pull the specific commit id as stated in the execution section, and it will apply any uncommitted changes. You can edit the execution section and change the commit to the latest in a specific version (you should probably also clear the uncommitted changes of you do that)
Nice π
@<1523710674990010368:profile|GreasyPenguin14> for future reference the agent
part in the clearml.conf is only created when you call clearml-agent init (no need for it for the python SDK). Full default configuration is here:
None
Hi FiercePenguin76
Maybe it makes sense to use
schedule_function
I think you are correct. This means the easiest would be to schedule a function, and have that function do the Task cloning/en-queuing. wdyt?
As a side note , maybe we should have the ability of custom function that Returns a task ID. the main difference is that the Task ID that was created will be better logged / visible (as opposed to the schedule_function, where the fact there was a Task that was created / ...
BTW: StickyMonkey98 if you feel like writing a few examples I think it will be easy to push into the docs, so that at least we improve iteratively...
I see... We could definitely add an argument to control it. I'll update here once there is an RC
AbruptHedgehog21 what exactly do you store as a Mode file ? is this a python object pickled ?
MuddySquid7 the fix was pushed to GitHub, you can now install directly from the repo:pip install git+
Hi WorriedParrot51
Take a look at the Experiment execution section:
there is script
and working directory
working directory is the base of the git repository (which is cloned into the docker file)
So if for some reason trains did not properly detect the current working dir here is what should solve the issue, without changing the PYTHONPATH
script path: ./sub_folder/scripy.py working directory: .
What do you think?
The release was supposed to be out this week, got delayed by some py2 support issue, anyhow the release will be almost exactly like the latest we now have on the GitHub repo (and I'm assuming it will be out just after the weekend)
Hi StickyMonkey98
aΒ
very
Β large number of running and pending tasks, and doing that kind of thing via the web-interface by clicking away one-by-one is not a viable solution.
Bulk operations are now supported , upgrade the clearml-server to 1.0.2 π
Is it possible to fetch a list of tasks via Task.get_tasks,
Sure:Task.get_tasks(project_name='example', task_filter=dict(system_tags=['-archived']))
Now I suspect what happened is it stayed on another node, and your k8s never took care of that
Shouldn't this be a real value and not a template
you mean value being pulled to the pod that failed ?
Also I would suggest using Task.execute_remotely
https://clear.ml/docs/latest/docs/references/sdk/task#execute_remotely
but I can't seem to figure out a way to do something similar using a task in add_step
VexedCat68 With "add_step" it assumes the Task you are adding is self contained (i.e. there is no "return object" to serialize), this means you can only add arguments, or use the artifacts the Task (i.e. step) will recreate, obviously you knowing in advance what the step creates. Make sense ?
Is the clearml-agent queue not available in the open source?
fully available in the open source, what is missing is the SLURM connection, in the open source daemon is installed per machine (node) and spins containers/venv on the machine. The enterprise version adds support so it uses SLURM to provision the node. I hope it helps π
so do you think it would be possible to spin up another daemon, which listens to this daemon, which then runs a slurm job?
This is exactly what the ...
BTW:
Error response from daemon: cannot set both Count and DeviceIDs on device request.
Googling it points to a docker issue (which makes sense considering):
https://github.com/NVIDIA/nvidia-docker/issues/1026
What is the host OS?
Hi MinuteCamel2
I can I disable it from automatically uploading model checkpoints to ClearML servers?
Maybe this one can help :)
https://www.youtube.com/watch?v=etGjxOKG9lo
deleted all of the models from my ClearML project but I still receive this message. Do you know why?
It might take it a few hours to update... π
That somehow the PV never worked and it was all local inside the pod
SoggyBeetle95 is this secret a per Task secret, or is it for the agent itself (I.e. for all Tasks the agent will spin)?