Hmm are you getting the warning on the client side , or in the clearml-server ?
the use case i have is to allow people from my team to run their workloads on set of servers without stepping over each other..
So does that mean CPU only workloads?
Also are we afraid of fairness? (i.e. someone "taking" all the CPU for themselves)
Yes it does. I'm assuming each job is launched using a multiprocessing.Pool (which translates into a sub process). Let me see if I can reproduce this behavior.
ChubbyLouse32 could it be the configuration file is not passed to the agent machine itself ?
(were you able to run anything against this internal server? I mean to connect to it from code, clearml/cleamrl-agent) ?
is there a way to assign a job to a specific worker? or is it only working on queue level
Only on a queue level, but you can have as many as you like and spin the agent on it (notice you can have multiple queues on the same agent, pulling based on priority/order).
Containers (and Pods) do not share GPUs. There's no overcommitting of GPUs.
Actually I am as well, this is Kubernets doing the resource scheduling and actually Kubernetes decided it is okay to run two pods on the Same GPU, which is cool, but I was not aware Nvidia already added this feature (I know it was in beta for a long time)
https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/
I also see thety added dynamic slicing and Memory Proteciton:
Notice you can control ...
Hi JitteryCoyote63
Wait a few hours, there is a new fix, I'll make sure we upload it later today (scheduled to be there anyhow, I'll push it forward)
Ohh yes, if the execution script is not on git and git exists, it will not add it (it will add it if it is in a tracked file via the uncommitted changes section)
ZanyPig66 in order to expand the support to your case. Can you explain exactly which files are on git and which are not?
Are you asking regrading the k8s integration ?
(This is not a must, you can run the clearml-agent
bare-metal on any OS)
in the UI, find the task (just search for the Task ID, it will find it), then tight click it, and select "reset"
well, it's only when adding aÂ
- name
 to the template
Nonetheless it should not break it 🙂
Hi FiercePenguin76
Artifacts are as you mentioned, you can create as many as you like but at the end , there is no "versioning" on top , it can be easily used this way with name+counter.
Contrary to that, Models do offer to create multiple entries with the same name and version is implied by order. Wdyt?
Hi DeliciousBluewhale87
You can achieve the same results programmatically with Task.create
https://github.com/allegroai/clearml/blob/d531b508cbe4f460fac71b4a9a1701086e7b6329/clearml/task.py#L619
But the git apply failed, the error message is the "xxx already exists in working directory" (xxx is the name of the untracked file)
DefeatedOstrich93 what's the clearml-agent
version?
Hi GrotesqueOctopus42
Dispite having reuse_last_task_id=True on Task.init, it always creates a new task id. Anyone ever had this issue?
So the way "reuse_last_task_id=True" works is that if there are no artifacts on the Task it will reuse it, but when running inside jupyter it always has artifacts (the notebook itself), so it starts a new Task.
You can however pass a specific Task ID and it will reuse it "reuse_last_task_id=aabb11", would that help?
The new parameterÂ
abort_on_failed_steps
 could be a list containing the name of the
I like that, we can also have it as an argument per step (i.e. the decorator can say, abort_pipeline_on_fail or continue_pipeline_processing)
Hi SmallDeer34
I need some help what is the difference between the manual one and the automatic one ?
from your previous log, this is the bash command executed inside the container, can you try to "step by step" try to catch who/what is messing it up ?
` docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /tmp/...
Notice that you can embed links to specific view of an experiment, by copying the full address bar when viewing it.
is there a way for me to get a link to the task execution? I want to write a message to slack, containing the URL so collaborators can click and see the progress
WackyRabbit7 Nice!
basically you can use this one:task.get_output_log_web_page()
WickedGoat98 Nice!!!
BTW: The fix should solve both (i.e. no need to manually cast), I'll make sure the fix is on GitHub so you'll be able to verify 🙂
If I try to connect a dictionary of typeÂ
dict[str, list]
 withÂ
task.connect
, when retrieving this dictionary with
Wait, this should work out of the box, do you have any specific example?
Hmm so the SaaS service ? and when you delete (not archive) a Task it does not ask for S3 credentials when you select delete artifacts ?
BTW, this one seems to work ....
` from time import sleep
from clearml import Task
Task.set_offline(True)
task = Task.init(project_name="debug", task_name="offline test")
print("starting")
for i in range(300):
print(f"{i}")
sleep(1)
print("done") `
(I think the GCP is already up, I'll double check)
I believe that happens natively thanks to pyhocon? No idea why it fails on mac
That's the only explanation ...
But the weird thing is, it did not work on my linux box?!
Sounds good let's work on it after the weekend, 🙂