Okay, could you try to run again with the latest clearml package from github?pip install -U git+
@<1595587997728772096:profile|MuddyRobin9> are you sure it was able to spin the EC2 instance ? which clearml version autoscaler are you running ?
Hi GentleSwallow91
I am very much concerned with docker container spin up time.
To accelerate spin up time (mostly pip install) use the venv cahing (basically it will store a cache of the entire installed venv so it oes not need to reinstall it)
Unmark this line:
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L116
The problem above could be that I used a non-root user to train a model and all packages are installed for ...
Hi @<1661180197757521920:profile|GiddyShrimp15>
I think the is a better channel for this kind of question
(they will be able to help with that)
Pretty confusing that neither
services
StickyLizard47 basically this is how a services queue agent should be spinned:
https://github.com/allegroai/clearml-server/blob/9b108740da21f25407bd2c59583ca1c86f8e1faa/docker/docker-compose.yml#L123
When spinning on a k8s cluster, this is a bit more complicated, as it needs to work with the clearml-k8s-glue.
See here how to spin it on k8s
https://github.com/allegroai/clearml-agent/tree/master/docker/k8s-glue
Hi PanickyFish98
It verifies it has access to it when actually creating the Task, maybe it should be a warning?!
fyi: you can also change the value from the UI (under Execution output) or have a default one set in the clearml.conf
used by the agent
BTW,
has this at the bottom:
Yes, it is the company legal entity name. But I think that for refrencing it makes more sense to mention the product name ClearML
I think this looks good 🙂
An upload of 11GB took around 20 hours which cannot be right.
That is very very slow this is 152kbps ...
So this is an additional config file with enterprise?
Extension to the "clearml.conf" capabilities
Is this new config file deployable via helm charts?
Yes, you can also set it company/user wide using the clearml Vault feature (again enterprise, sorry 😞 )
Yes, I find myself trying to select "points" on the overview tab. And I find myself wanting to see more interesting info in the tooltip.
Yep that's a very good point.
The Overview panel would be extremely well suited for the task of selecting a number of projects for comparing them.
So what you are saying, this could be a way to multi select experiments for detailed comparison (i.e. selecting the "dots" on the overview graph), is this what you had in mind?
Correct, but do notice that (1) task names are not unique and you can change them after the Task was executed (2) when you clone the Task, you can actually rename it, when an agent is running the Task, basically the init
function is ignored, because the Task already exists. Make sense ?
SoggyFrog26 there is a full pythonic interface, why don't you use this one instead, much cleaner 🙂
Hi HandsomeGiraffe70
First:# During pipeline initialisation pipeline_params is empty and we need to use default values. # When pipeline start the run, params are lunched again, and then pipeline_params can be used.
Hmm that should probably be fixed, maybe a function on the pipeline to deal with it ?
When I reduce tune_optime value to just 'recall'. Pipeline execution failed with msg:
ValueError: Node 'tune_et_for_Precision', base_task_id is empty
.
I would...
What are you seeing?
Hi CooperativeFox72
I think the upload reporting (files over 5mb) was added post 0.17 version, hence the log.
The default is upload chunk reporting is 5MB, but it is not configurable, maybe we should add it to the clearml.conf ? wdyt?
Okay let me see if I can think of something...
Basically crashing on the assertion here ?
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train.py#L495
Could it be your are passing "Args/resume" True, but not specifying the checkpoint ?
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train.py#L452
I think I know what's going on:
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train...
Yes my bad 😞
Let's try again:
` docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.7rjdh80a.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.ppsd9sze:/root/.ssh -v /home/dwhitena/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/dwhitena/.clearml/pip-cache:/root/.cache/pip ...
Hi GleamingGrasshopper63
How well can the ML Ops component handle job queuing on a multi-GPU server
This is fully supported 🙂
You can think of queues as a way to simplify resources for users (you can do more than that,but let's start simple)
Basicalli qou can create a queue per type of GPU, for example a list of queues could be: on_prem_1gpu, on_prem_2gpus, ..., ec2_t4, ec2_v100
Then when you spin the agents, per type of machine you attach the agent to the "correct" queue.
Int...
ShakyOstrich31
I am reusing an old task ...
Which means that the old Task stores the requirements on the Task itself (see "Installed Packages" section), Notice it also stores the exact git commit to use.
When you are cloning the Task (i.e. in the pipeline), you should probably:
set the commit / branch to the latest in the branch clear the "installed packages" section, which would cause the agent to use the "requirements.txt" stored in the git repo itself.As far as I understand this s...
Hi ScaryLeopard77
I think the error message you are getting is actually "passed" from Triton. Basically someone needs to tell it what the Model in/out look like (matrix size/type) this is essentially the content of the "config.pbtxt" , and this has to be set when spinning the model endpoint. does that make sense to you?
I'm assuming the reason it fails is that the docker network is Only available for the specific docker compose. This means when you spin Another docker compose they do not share the same names. Just replace with host name or IP it should work. Notice this has nothing to do with clearml or serving these are docker network configurations
Hi @<1578555761724755968:profile|GrievingKoala83>
Is it possible to overrided the parameters through the configuration file when restarting the pipeline from ui?
The parameters of the Pipeline are overridden from the UI, not the pipeline components,
you can to use the pipeline parameters as is as the pipeline components parameters
Is your pipeline built from Tasks, or decorators over functions ?
Task.add_requirements('.')
Should work
But this is clearml python package, it is not really related to the server. Could it be you also update the clearml package ?
Now I’m just wondering if I could remove the PIP install at the very beginning, so it starts straightaway
AbruptCow41 CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
does exactly that 🙂 BTW, I would just set the venv cache and this means it will just be able to restore the entire thing (even if you have changed the requirements
https://github.com/allegroai/clearml-agent/blob/077148be00ead21084d63a14bf89d13d049cf7db/docs/clearml.conf#L115
Hmm that is odd, can you send an email to support@clear.ml ?
LazyTurkey38 notice the assumption is that the docker entry-point ends with bash, and only then the agent take charge. I'm assuming this is not te case hence the agent spins the docker, then the docker just ends, could that be?