Understood, then I would use Task.remote_execution()
Basically :
task = Task.init(...)
# config some stuff
task.remote_execute(quque_name_here)
# this line will be executed on the remote machine only
This will both automatically log your code / repo with Task.init, and the call to Task.remote_execute will stop the local process (on your machine that runs the hydra sweep) and continue on the remote machine.
This will both allow you to use Hydra sweet & schedule / run on remote ...
A few more details on the New RCΒ (1.1.2rc0) change set:
Upload dataset now supports chunksize, for multi-part upload/download (useful with large datasets)
backwards compatibility, i.e. parent datasets do not have to support multi-part datasets
Notice multi-part datasets should be accessed with latest RCcleaml-data upload --chunk-size Dataset().upload(..., chunk_size=None)
Get Dataset support partial download (i.e. for debugging, or for more efficient multi-node support)
Notice total n...
Does it work if I launch the clearml-agent on a docker and pip doesn't know the packages to install
Not sure I follow... the "detect_with_pip_freeze" flag (when set) will tell clearml (at runtime) to create the "installed packages" directly from pip freeze (instead of analyzing the code)
MysteriousBee56 what do you mean "delete a worker"
stop the agent running remotely ?
Hi RipeGoose2
Yes the slide feature is definelty on the do do list (a lot of users asked for it).
Unfortunately other than actually PR-ing to the UI repo, there is no easy way to add customization (If you have an idea on how we could have an easy interface, that would be great.)
I'll check what's the status with the slider, maybe we will be lucky enough to see it in he next update π
Hmm, in the credentials popup there should be a "secure connect" checkbox, it tells it to use https instead of http. Can you verify?
Hi ZippyAlligator65
You can configure it in the clearml.conf: see here:
https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/clearml_agent/backend_api/config/default/agent.conf#L202
Hi @<1658281093108862976:profile|EncouragingPenguin15>
Should work, I'm assuming multiple nodes are running agents ? or are you saying Ray spins the jobs and clearml logs them ?
If you cannot change the "TrainerState" (i.e. inherit and pass it into the code)
you cloud also monkey-patch it, something like
` class OurTrainerState(TrainerState):
def init(...)
...
def load_from_json(cls, json_path: str):
super().load_from_json(json_path))
Task.current_task().upload_artifact(...)
trainer.state = OurTrainerState(trainer.state) `
agent.package_manager.system_site_packages
Β can be used to inherit packages
Correct, it is basically venv with --system-site-packages
I do not think virtualenv nesting is support, if it was then in theory you could have executed the clearml-agent from virtual environment with system_site_packages
turned on and then it would inherit from it. But again I'm not sure virtualenv supports it.
BTW: the latest clearml-agent RC already have venv caching (both pip/conda) π
Wait, @<1686547375457308672:profile|VastLobster56> per your config clearml-fileserver
who sets this domain name? could it be that it is only on our host machine? you can quickly test by running any docker on your machine and running ping clearml-fileserver
from the docker itself.
also your log showed "could not download None ..." , I would expect it to be None ...
, no?
CurvedHedgehog15 is it plots or scalars you are after ?
So sharing with the agent is also not possible.
But they can see each others experiments, so why wouldn't the agent be able to have a read-only access ?
BTW:
ReassuredTiger98 you can put your user/pass into the git URL link, but I'm not sure this will solve the privacy issue π
Is Task.current_task() creating a task?
Hmm it should not, it should return a Task instance if one was already created.
That said, I remember there was a bug (not sure if it was in a released version or an RC) that caused it to create a new Task if there isn't an existing one. Could that be the case ?
but somewhere along the way, the request actually remove the header
Where are you seeing the returned value?
Metadata might be expensive, it's a RestAPI call, and we have found users putting hundreds of artifacts, with preview entries ...
I guess that was never the intention of the function, it just returns the internal representation. Actually my question would be, how do you use it, and why? :)
Hi DullCamel78
Hi everyone! Has anyone tried running
aws_autoscaler.py without docker?
Well generally since this is a remote machine the easiest way to control environment is with containers, hence the default use case. In theory you can change it to use venv, but then of course your a somewhat limited with the diff drivers/cuda/python environement.
performance under docker is 10% lower than on bare metal
add to your extra docker args
` extra_docker_arguments: ["...
i would like to have it also save on the bucket
oh if this is the casse, you can just change the clearml file server to point to GS bucket, everything will be stored there.
Just change your clearml.conf:files_server: "
"
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L10
Hi RipeAnt6
What would be the best way to add another model from another project say C to the same triton server serving the previous model?
You can add multiple call to cleaml-serving
, each one with a new endpoint and a new project/model to watch, then when you launch it it will setup all endpoints on a single Triton server (the model optimization loading is taken care by Triton anyhow)
Using agent v1.01r1 in k8s glue.
I think a fix was recently committed, let me check it
Mmm well, I can think of a pipeline that could save its state in the instant before the error occurred.
This is already the case, if you clone the pipeline Task change the Args/_continue_pipeline_
to True and enqueue
and the inet of the same card ?
Hi WorriedParrot51
Let me shed some light on this complicated mechanism, because this is not very straight forward.
Basically the agent signals the trains package it should ignore the code calls, and use a specific Task in the backend (i.e. if in manual mode, the trains package logs the data into the trains-server, in agent mode (remote mode), it does the opposite and takes the data from the trains-server "into" the code)
Specifically, just like in manual mode, calling argparse.parse is be...
, is the team open to PRs from external people?
Yes please do! PRs are welcomed! I thought we fixed the GitHub readme to reflect it, anyhow I'll make sure we do π
@<1535793988726951936:profile|YummyElephant76> oh you mean like jupyter server was running, then inside the notebook you would start a new venv, in that venv "notebook" package was missing, hence it failed detecting the notebook ?
You can query the system and get all the experiments based on date, then grab the machine GPU metrics.
DefeatedCrab47 check the cleanup service, it queries the system with the Apiclient.
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/examples/services/cleanup/cleanup_service.py#L72