Reputation
Badges 1
25 × Eureka!SuperiorPanda77 I have to admit, not sure what would cause the slowness only on GCP ... (if anything I would expect the network infrastructure would be faster)
yes thanks , but if I do this, the packages will be installed for each step again, is it possible to use a single venv?
Notice that the venv is Cached on the clearml-agent host machine (if this is k8s glue, make sure to setup the Cache as a PV to achieve the same)
This means there is no need to worry about that and this is stable.
That said, if you have an existing VENV inside the container, just add docker_args="-e CLEARML_AGENT_SKIP_PIP_VENV_INSTALL =/path/to/bin/python"
Se...
Hi LazyTurkey38
Configuring these folders will be pushed later today 🙂
Basically you'll have in your clearml.conf
` agent {
docker_internal_mounts {
sdk_cache: "/clearml_agent_cache"
apt_cache: "/var/cache/apt/archives"
ssh_folder: "/root/.ssh"
pip_cache: "/root/.cache/pip"
poetry_cache: "/root/.cache/pypoetry"
vcs_cache: "/root/.clearml/vcs-cache"
venv_build: "/root/.clearml/venvs-builds"
pip_download: "/root/.clearml/p...
I think CostlyOstrich36 managed to reproduce?!
No worries 🙂
Is this what you were looking for ?
Ohh then you do docker sibling:
Basically you map the docker socket into the agent's docker , that lets the agent launch another docker on the host machine.
You cab see an example here:
https://github.com/allegroai/clearml-server/blob/6434f1028e6e7fd2479b22fe553f7bca3f8a716f/docker/docker-compose.yml#L144
So you are saying it ignored everything after the bucket's "/" ?
'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh',
It's my bad, after that inside the container it does cp -Rf /.ssh ~/.ssh
The reason is that you cannot know the user home folder before spinning the container
Anyhow the point is, are you sure that you have ~/.ssh on the Host machine configured?
And if you do, are you saying this is part of your AMI? if not how did you put it there?
JitteryCoyote63 how can I reproduce it? (obviously when I tested it was okay)
No worries, you should probably change it to pipe.start(queue= 'queue') not start locally
s it working when you are calling it with start locally ?
Hi MammothGoat53
Do you mean working with RestAPI directly?
https://clear.ml/docs/latest/docs/references/api/events
Hi UnsightlySeagull42
Could you test with the latest RCpip install clearml==1.0.4rc0Also could you provide some logs?
Hi @<1570220858075516928:profile|SlipperySheep79>
Is there a way to specify the working dir from the decoratoe
not directly, but why would that change anything? I mean the coponent code will be created in the git root, and you can still access files inside the subfolders
from .subfolder import something
what am I missing?
Hi @<1575656665519230976:profile|SkinnyBat30>
Streamlit apps are backend run (i.e. the python code drives the actual web app)
This means running your Tasks code and exposing the web app (i.e. http) streamlit.
This is fully supported with ClearML, but unfortunately only in the paid tiers 😞
You can however run your Task with an agent, make sure the agent's machine is accessible and report the full IP+URL as a hyper-parameter or property, and then use that to access your streaml...
Can the host server's service agent be used?
In theory yes, just make sure you expose the containers network (check the docker compose)
What exactly do you mean by docker run permissions?
Thanks GentleSwallow91
That's a good tip, where in the docs would you add it?
time.sleep(time_sleep)
You should not call time.sleep in async functions, it should be asyncio.sleep,
None
See if that makes a difference
WackyRabbit7 I do 'pkill -f trains' but it's the same... If you need to debug and test run with --foreground and just hit ctrl-c to end the process (it will never switch to background...). Helps?
Please do, just so it wont be forgotten (it won't but for the sake of transparency )
And do you need to run your code inside a docker, or is venv enough ?
"regular" worker will run one job at a time, services worker will spin multiple tasks at the same time But their setup (i.e. before running the actual task) is one at a time..
WickedGoat98
Put the agent.docker_preprocess_bash_script in the root of the file (i.e. you can just add the entire thing at the top of the trains.conf)
Might it be possible that I can place a trains.conf in the mapped local folder containing the filesystem and mongodb data etc e.g.
I'm assuming you are referring to the trains-=agent services, if this is the case, sure you can,
Edit your docker-compose.yml, under line https://github.com/allegroai/trains-server/blob/b93591ec3226...
so far I understand, clearml tracks each library called from scripts and saves the list of this libraries somewhere (as I assume, this list is saved as requirements.txt file somewhere - which is later loaded into venv, when pipeline is running).
Correct
Can I edit this file (just to comment the row with "object-detection==0.1)?
BTW, regarding the object-detection library. My training scripts have calls like:
Yes in the UI, iu can right click on the Task select "reset", then it...
HI SubstantialElk6
Yes you are correct the glue only needs to change the yaml and it will work.
When you say "Dev end" , what do you mean? I was thinking adding additional glue for multi node and just adding queues , for example add 4nodes queue and attach a glue to it, wdyt?
Regrading horovod, horovod is spinning its own nodes so integration with k8s is not trivial (regardless of ClearML). That said I know that they do have support for horovod in the Enterprise edition, but I'm not sure ...
it will constantly try to resend logs
Notice this happens in the background, in theory you will just get stderr messages when it fails to send but the training should continue
Hi @<1687643893996195840:profile|RoundCat60>
You mean the clearml-server AMI ?
I also wonder is there any specific reason to store previous versions ?
Sure, run:clearml-agent initIt is a CLI wizard to configure the initial configuration file.