Reputation
Badges 1
25 × Eureka!I failed to update the "STARTED AT" and the "COMPLETED AT" attributes in the "INFO" tab.
I'm not sure this can actually be overridden...
Hi LazyTurkey38
Configuring these folders will be pushed later today 🙂
Basically you'll have in your clearml.conf
` agent {
docker_internal_mounts {
sdk_cache: "/clearml_agent_cache"
apt_cache: "/var/cache/apt/archives"
ssh_folder: "/root/.ssh"
pip_cache: "/root/.cache/pip"
poetry_cache: "/root/.cache/pypoetry"
vcs_cache: "/root/.clearml/vcs-cache"
venv_build: "/root/.clearml/venvs-builds"
pip_download: "/root/.clearml/p...
I have to admit mounting it to a different drive is a good reason to bring this feature back, the reasoning was it means the agent needs to make sure it manages them (e.g. multiple agents running on the same machine)
Hmm, not a bad idea 🙂
Could you please open a Git Issue, so it will not get forgotten ?
(btw: I'm not sure how trivial it is to implement, nonetheless obviously possible 😉
😞 anything that can be done?
Thanks MinuteGiraffe30 , fix will be pushed later today
Hi @<1628565287957696512:profile|AloofBat92>
Yeah the name is confusing, we should probably change that. The idea is it is a low code / high code , train your own LLM and deploy it. Not really chatgpt 1:1 comparison, more like, GenAI for enterprises. make sense ?
Hmm yes this is exactly what should not happen 🙂
Let me check it
Are these experiments logged too (with the train-valid curves, etc)?
Yes every run is log as a new experiment (with it's own set of HP). Do notice that the execution itself is done by the "trains-agent". Meaning the HP process creates experiments with new set of HP an dputs them into the execution queue, then trains-agent pulls them from the queue and starts executing them. You can have multiple trains-agent on as many machines as you like with specific GPUs etc. each one ...
@<1539780258050347008:profile|CheerfulKoala77> make sure the AMI id matches the zone of the EC2 machine
JitteryCoyote63 next week is the Trains next release with upgrade to ES 7, do you want to wait or sort a solution for this one ?
(BTW: I think that you can mount a license file or delete one, and it should be okay, I'll ask the backend guys regradless)
i'm Jax, not Manoj! lol.
I know 😄 I just mentioned that this issue is being actively discussed
so that one app I am using inside the Task can use the python packages installed by the agent and I can control the packages using clearml easily
That's the missing part for me, You have all the requiremnts on the Task (that you can fully control), the agent is setting a brand new venv for each Task inside a container (the venv is cahced, and you can also make the agent just use the default python without installing anything). The part where I'm lost is why would you need the path to t...
My plan is to have a AWS Step Functions state machine (DAG) that treats running a ClearML job as one step (task) in the DAG.
...
Yep, that should work
That said, after you have that working, I would actually check pipelines + clearml aws autoscaler, easier setup, and possibly cheaper on the cloud (Lambda vs EC2 instance)
If this works, we might be able to fully replace Metaflow with ClearML!
Can't wait for your blog post on it 😉
Hi NonsensicalSeaanemone47
I'm assuming you mean k8s as compute cluster?
If so, then yes clearml adds priority scheduling on top of your existing kl8s cluster. It also allows you to reuse images as the k8s spins the base container image and then inside the container image the agent sets the environment of the experiment (clones code, apply diff, install missing python packages etc.)
It also gives visibility into the executed pods.
Make sense ?
Hi TenderCoyote78
I'm trying to clearml-agent in my dockerfile,
I'm not sure I'm following, Are you traying to create a docker container containing the agent inside? for what purpose ?
(notice that the agent can spin any off the shelf container, there is no need to add the agent into the container it will take of itself when it is running it)
Specifically to your docker file:
RUN curl -sSL
| sh
No need for this line
COPY clearml.conf ~/clearml.conf
Try the ab...
or can I directly open a PR?
Open a direct PR and link to this thread, I will make sure it is passed along 🙂
Hi SmallDeer34
Hmm I'm not sure you can, the code will by default use rglob with the last part of the path as wildcard selection
😞
You can of course manually create a zip file...
How would you change the interface to support it ?
And having a pdf is easier/better than sharing a link to the results page ?
if executed remotely...
You mean cloning the local execution, sending to the agent, then when running on the agent the Args/command is updated to a list ?
Hi JitteryCoyote63
I change the project.default_output_destination? I tried setting it to None but it is not updated
How did yo try to change it? and where do you see the effect ?
JealousParrot68 yes this seems like a correct description.
The main diff between 1 & 2 is what is the actual data, if this is training/testing data, then Dataset would make sense, if this is a part of a preprocessing pipeline, then artifacts make more sense (notice we added pipeline step caching in the artifacts, so that you can reuse steps if they have the same parameters/code, which means you are able to clone a pipeline and rerun without repeating unnecessary data processing.
no, i just commented it and it worked fine
Yeah, we should add a comment saying "optional" because it looks as if you need to have it there if you are using Azure.
Hi JealousParrot68
You mean by artifact names ?
I see if this is the case try to set
'output_uri="file:///full/path/to/dir"'
Notice it has to have the full path there and the file:// prefix
Edit the cloned version and enqueue it?
I think the only way is using the API, with task.query_tasks and filter, would that have helped?
Hi ColossalAnt7 , I think we run into it on a few dockers, I believe the bug was fixed in the latest trains-agent RC. Could you verify please ?