So dynamic or static are basically the same thing, just in dynamic, I can edit the artifact while running the experiment?
Correct
Second, why would it be overwritten if I run a different run of the same experiment?
Sorry, I meant in the same run, if you reuse the artifact name you will be overwriting it. Obviously different runs different artifacts :)
CLI? Code ?
in order to work with ssh cloning, one has to manually install openssh-client to the docker image, looks like that
Correct, you have to have SSH inside the container so that git can use it.
You can always install with the following setup inside your agent's clearml.conf:extra_docker_shell_script: ["apt-get install -y openssh-client", ]
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L145
at the end it's just another env var
It should work GIT_SSH_COMMAND
is used by pip
store_code_diff_from_remote
don't seem to change anything in regards of this issue
Correct, it is always from remote
i'll be using the update_task, that worked just fine, thanks
(edite
Sure thing.
ShakyJellyfish91 , I took a quick look at the diff between the versions can you hack a non working version (preferably the latest) and verify the issue for me?
- In a notebook, create a method and decorate it by fastai.script’s
@call_parse
.Any chance you have a very simple code/notebook to reference (this will really help in fixing the issue)?
have a CI/CD (e.g Github Actions) thats update my “production” pipeline on ClearML UI,
I think this is the easiest way, basically the CI/CD launches a pipeline (which under the hood is another type of Task), by querying the latest "Published" pipeline that is also Not archived, then cloning+pushing it to execution queue.
In the UI when you want to "upgrade" the production pipeline you just right click "Publish" on the pipeline you want to launch. Another way is to do the same with Tags...
WickedGoat98 the agent itself can be executed on bare metal, no need to setup a docker for it (although fully supported)
Specifically the docker compose has the docker running in services mode, i.e. for CPU light weight tasks such as running pipelines .
If the agent running on GPU, the easiest way to is run on bare metal
It will always set it's own environment, wither with static analysis or with "pip freeze" / "conda freeze"
It needs to log the exact setup that was actually installed.
When you later launch it on a remote machine, it can either use this to recreate the environment (using pip or conda), or you can clear the entire section, where it will fall back to "requirements.txt"
Any reason for specifically using the "environment.yaml" ?
Guys, any chance you can verify the RC solves the issue?pip install clearml==1.0.2rc0
A few examples here:
None
Grafana model performance example:
browse to
login with: admin/admin
create a new dashboard
select Prometheus as data source
Add a query: 100 * increase(test_model_sklearn:_latency_bucket[1m]) / increase(test_model_sklearn:_latency_sum[1m])
Change type to heatmap, and select on the right hand-side under "Data Format" s...
These instructions should create the exact chart:
None
What am I missing ?
Why? The task should have completed successfully, how is this aborting?
Early stopping by the HPO process, like hyper-band, e.g. this training model is going nowhere let's stop it.
Hi @<1541954607595393024:profile|BattyCrocodile47> and @<1523701225533476864:profile|ObedientDolphin41>
"we're already on AWS, why not use SageMaker?"
TBH, I've never gone through the ML workflow with SageMaker.
LOL I'm assuming this is why you are asking 🙂
- First, you can use SageMaker and still log everything to ClearML (2 lines integration). At least you will have visibility to everything that is running/failing 🙂
- SageMaker job is a container, which means for ...
Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)
This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.
open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).
Yes, that's exactly how clearml is designed, a...
2,3 ) the question is whether the serving is changing from one tenant to another, does it?
Does it wok if you remove the Task.init call?
Hi @<1541954607595393024:profile|BattyCrocodile47>
But the files API is still open to the world, right?
No, of course not 🙂 (i.e. API is authenticated with JWT header, this is why you need to generate the secret/key in the UI)
That said, the login process itself is user/pass stored on the server, but other than that the web/api are secured. The file server on the other hand is plain http storage and does not verify the connection like the API does. So if you are going the self-ho...
Now I'm curious what's the workaround ?
@<1541954607595393024:profile|BattyCrocodile47>
Is that instance only able to handle one task at a time?
You could have multiple agents on the same machine, each one with its own dedicated GPU, but you will not be able to change the allocation (i.e. now I want 2 GPUs on one agent) without restarting the agents on the instance. In either case, this is for a "bare-metal" machine, and in the AWS autoscaler case, this goes under "dynamic" GPUs (see above)
Is there a way I could move the JWT authentication (not authorization) logic into an API Gateway or Load Balancer?
Hmm in theory, but not in practice 😞
if ClearML is following OAuth 2.0, t
This is for the SSO part, not for the API, API is only using JWT for verification, the login process itself is with external SSO (OAuth 2.0). But the open-source version does not support SSO 😞
Why are you trying to add another ELB with JWT verification on it ? ...
I don't want a new task every 5 minutes as that will create a lot of tasks over a day. It would be better if I had just one task.
Oh you mean the Task that will be launched will override the previous "instance", correct ?
If the load balancer it Gateway can do the computation and leverage caching,
Oh that's True. But unfortunately out of scope for the open-source (well at the end someone needs to pay our salaries 🙂 )
I’d prefer not to have our EC2 instance directly exposed to the public Internet.
Yep, I tend to agree 🙂
. Is it possible for two agents to be utilizing the same GPU?
It is, as long as memory wise they do not limit one another.
(If you are using k8s and clearml enterprise, then it supports GPU slicing and dynamic memory allocation)
Hmm that sounds like the agent needs to access a vault with credentials per user, unfortunately this is not covered in the open-source 😞 I "think" this is supported in the enterprise version as part of the permission management
The agent is using Bash (but when you add command line to the docker run, .bashrc is not executed, hence no conda
in PATH)
Maybe add the full path to the conda executable:ocker_setup_bash_script= [ "export PATH=""/workspace/miniconda/bin:$PATH", "export LOCAL_PYTHON=/workspace/miniconda/bin/python3","/workspace/miniconda/bin/conda activate /PATH_GOES_HERE"])