
Reputation
Badges 1
25 × Eureka!hi ElegantCoyote26
but I can't see any documentation or examples about the updates done in version 1.0.0
So actually the docs are only for 1.0... https://clear.ml/docs/latest/docs/clearml_serving/clearml_serving
Hi there, are there any plans to add better documentation/example
Yes, this is work in progress, the first Item on the list is custom model serving example (kind of like this one https://github.com/allegroai/clearml-serving/tree/main/examples/pipeline )
about...
AstonishingSeaturtle47 How would the code run without the sub-modules? And what is the problem we are trying to solve? (Because unfortunately there is no switch to disable it)
Hi @<1523702868694011904:profile|AbruptCow41>
Check what are you getting when running git status
inside the working directory, this is essentially how it works. Are you expecting to later run it with an agent?
Then the dynamic gpu allocation is exactly what you need, I suggest talking to the sales ppl, I'm sure they can help. https://clear.ml/contact-us/
clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0
If the user running this command can run "docker run", then you should ne fine
HandsomeCrow5client.events.debug_images(metrics=[dict(task='6adb929f66d14731bc76e3493ab89d80', metric='image')])
I want to keep the above setup, the remote branch that will track my local will be onΒ
fork
Β so it needs to pull from there. Currently it recognizesΒ
origin
Β so it doesnβt work because the agent then canβt find the commit.
So you do not want to push the change set ?
You can basically add the entire change set (uncomitted changes) from the last pushed commit).
In your clearml.conf, set store_code_diff_from_remote: true
https://github.com/allegroai...
docker mode. they do share the same folder with the training data mounted as a volume, but only for reading the data.
Any chance they try to store the TensorBoard on this folder ? This could lead to "No such file or directory: 'runs'" if one is deleting it, and the other is trying to access, or similar scenarios
You mean one machine with multiple clearml-agents ?
(worker is a unique ID of an agent, so you cannot have two agents with the exact same worker name)
Or do you mean two agents pulling from the same queue ? (that is supported)
SmallDeer34 the function Task.get_models() incorrectly returned the input model "name" instead of the object itself. I'll make sure we push a fix.
I found a different solution (hardcoding the parent tasks by hand),
I have to wonder, how does that solve the issue ?
OddAlligator72 FYI you can also import / export an entire Task (basically allowing you to create it from scratch/json, even without calling Task.create)Task.import_task(...) Task.export_task(...)
Should not be complicated, it's basically here
https://github.com/allegroai/clearml/blob/1eee271f01a141e41542296ef4649eeead2e7284/clearml/task.py#L2763
wdyt?
this sounds like docker build issue on macos M1
https://pythonspeed.com/articles/docker-build-problems-mac/
For example, opening a project or experiment page might take half a minute.
This implies mongodb performance issue
What's the size of the mongo DB?
Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient
client = APIClient()
queue_ids = client.queues.get_all(name="queue_name_here")
while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = ta...
Hi @<1688721797135994880:profile|ThoughtfulPeacock83>
the configuration vault parameters of a pipeline step with the add_function_step method?
The configuration vault are a per set at execution user/project/company .
What would be the value you need to override ? and what is the use case?
Hi GrittyKangaroo27
Maybe check the TriggerScheduler , and have a function trigger something on k8s every time you "publish" a model?
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py
Hi @<1523715429694967808:profile|ThickCrow29>
Is there a way to specify a callback upon an abort action from the user
You mean abort of the entire pipeline?
None
Just verifying the Pod does get allocated 2 gpus, correct ?
What do you have under the "script path" in the Task?
JitteryCoyote63 not yet π
I actually wonder how poplar https://github.com/pallets/click is ?
Hi @<1523708901155934208:profile|SubstantialBaldeagle49>
If you report on the same iteration with the same title/series you are essentially overwriting the data (as expected)
Regrading the plotly report size.
Two options:
- round down numbers (by default it will store all the digits, and usually after the forth it's quite useless, and it will drastically decrease the plot size)
- Use logger.report_scatter2d , it is more efficient and has a mechanism to subsample extremely large graphs.
p.s. StraightCoral86 I might be missing something here, please feel free to describe the entire execution scenario and what you are trying to achieve π
My current experience is there is only print out in the console but no training graph
Yes Nvidia TLT needs to actually use tensorboard for clearml to catch it and display it.
I think that in the latest version they added that. TimelyPenguin76 might know more
My question is what happens if I launch in parallel multiple doit commands that create new Tasks.
Should work out of the box.
I would like to confirm that current_task ...
Correct.
so it would be better just to use the original code files and the same conda env. if possibleβ¦
Hmm you can actually run your code in "agent mode" assuming you have everything else setup.
This basically means you set a few environment variables prior to launching the code:
Basically:export CLEARML_TASK_ID=<The_task_id_to_run> export CLEARML_LOG_TASK_TO_BACKEND=1 export CLEARML_SIMULATE_REMOTE_TASK=1 python my_script_here.py
My goal is to automatically run the AWS Autoscaler task on a clearml-agent pod when I deploy
LovelyHamster1 this is very cool!
quick question, if you are running on EKS, why not use the EKS autoscaling instead of the ClearML aws EC2 autoscaling ?