You mean one machine with multiple clearml-agents ?
(worker is a unique ID of an agent, so you cannot have two agents with the exact same worker name)
Or do you mean two agents pulling from the same queue ? (that is supported)
SmallDeer34 the function Task.get_models() incorrectly returned the input model "name" instead of the object itself. I'll make sure we push a fix.
I found a different solution (hardcoding the parent tasks by hand),
I have to wonder, how does that solve the issue ?
OddAlligator72 FYI you can also import / export an entire Task (basically allowing you to create it from scratch/json, even without calling Task.create)Task.import_task(...) Task.export_task(...)
Should not be complicated, it's basically here
https://github.com/allegroai/clearml/blob/1eee271f01a141e41542296ef4649eeead2e7284/clearml/task.py#L2763
wdyt?
this sounds like docker build issue on macos M1
https://pythonspeed.com/articles/docker-build-problems-mac/
For example, opening a project or experiment page might take half a minute.
This implies mongodb performance issue
What's the size of the mongo DB?
Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient
client = APIClient()
queue_ids = client.queues.get_all(name="queue_name_here")
while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = ta...
Hi @<1688721797135994880:profile|ThoughtfulPeacock83>
the configuration vault parameters of a pipeline step with the add_function_step method?
The configuration vault are a per set at execution user/project/company .
What would be the value you need to override ? and what is the use case?
Hi GrittyKangaroo27
Maybe check the TriggerScheduler , and have a function trigger something on k8s every time you "publish" a model?
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py
Hi @<1523715429694967808:profile|ThickCrow29>
Is there a way to specify a callback upon an abort action from the user
You mean abort of the entire pipeline?
None
Just verifying the Pod does get allocated 2 gpus, correct ?
What do you have under the "script path" in the Task?
JitteryCoyote63 not yet 😞
I actually wonder how poplar https://github.com/pallets/click is ?
Hi @<1523708901155934208:profile|SubstantialBaldeagle49>
If you report on the same iteration with the same title/series you are essentially overwriting the data (as expected)
Regrading the plotly report size.
Two options:
- round down numbers (by default it will store all the digits, and usually after the forth it's quite useless, and it will drastically decrease the plot size)
- Use logger.report_scatter2d , it is more efficient and has a mechanism to subsample extremely large graphs.
p.s. StraightCoral86 I might be missing something here, please feel free to describe the entire execution scenario and what you are trying to achieve 🙂
My current experience is there is only print out in the console but no training graph
Yes Nvidia TLT needs to actually use tensorboard for clearml to catch it and display it.
I think that in the latest version they added that. TimelyPenguin76 might know more
My question is what happens if I launch in parallel multiple doit commands that create new Tasks.
Should work out of the box.
I would like to confirm that current_task ...
Correct.
so it would be better just to use the original code files and the same conda env. if possible…
Hmm you can actually run your code in "agent mode" assuming you have everything else setup.
This basically means you set a few environment variables prior to launching the code:
Basically:export CLEARML_TASK_ID=<The_task_id_to_run> export CLEARML_LOG_TASK_TO_BACKEND=1 export CLEARML_SIMULATE_REMOTE_TASK=1 python my_script_here.py
My goal is to automatically run the AWS Autoscaler task on a clearml-agent pod when I deploy
LovelyHamster1 this is very cool!
quick question, if you are running on EKS, why not use the EKS autoscaling instead of the ClearML aws EC2 autoscaling ?
Is trains-agent using docker-mode or virtual-env ?
It can be a different agent.
If inside a docker thenclearml-agent execute --id <task_id here> --docker
If you need venv doclearml-agent execute --id <task_id here>
You can run that on any machine and it will respin and continue your Task
(obviously your code needs to be aware of that and be able to pull its own last model checkpoint from the Task artifacts / models)
Is this what you are after?
GrumpySeaurchin29 you can pass s3 credential for the autoscaler, but all the tasks will have them. Are you saying two diff sets of credentials is the issue, or is it the visibility?
StickyBlackbird93 the agent is supposed to solve for the correct version of pytorch based on the Cuda in the container. Sounds like for some reason it fails? Can you provide the log of the Task that failed? Are you running the agent in docker-mode , or inside a docker?
@<1562610699555835904:profile|VirtuousHedgehong97>
source_url="s3:...",
This means your data is already on S3 bucket, it will not "upload" it it will just register it.
If you want to upload files, then they should be local and then when you call upload you can specify the target S3 bucket, and the data will be stored in a unique folder in the bucket
Does that make sense ?
Hi ElegantCoyote26 , yes I did 🙂
It seems cometml puts their default callback logger for you, that's it.
Thanks ElegantCoyote26 I'll look into it. Seems like someone liked our automagical approach 🙂
Ohh, sure then editing git config will solve it.
btw: why would you need to do that, the agent knows how to do this conversion on the fly