VivaciousWalrus99
Yes this is odd:1608392232071 spectralab:gpu0 DEBUG New python executable in /cs/usr/gal.hyams/.trains/venvs-builds/3.7/bin/python2
So it thinks it has python v3.7 but it is using python2 in the venv...
In your trains.conf file, set agent.python_binary to the python3.7 binary. It should be something like:agent.python_binary=/path/to/python/python3.7
Let me check, it was supposed to be automatically aborted
Are these experiments logged too (with the train-valid curves, etc)?
Yes every run is log as a new experiment (with it's own set of HP). Do notice that the execution itself is done by the "trains-agent". Meaning the HP process creates experiments with new set of HP an dputs them into the execution queue, then trains-agent
pulls them from the queue and starts executing them. You can have multiple trains-agent
on as many machines as you like with specific GPUs etc. each one ...
Hmm, so the way the configuration works is it loads the default configuration (equivalent to the example in the docs) then it adds the ~/clearml.conf on top. That means that you can tell your users to just copy paste the credentials from the UI into a template you make. How is that ?
Thank you AttractiveWoodpecker16 !
Removing the uncommitted changes so that you can launch it from an agent? Or is it visual only?
I would like to be able to send a request to unload the model (because I cannot load all the models in gpu, only 7-8) o
Hmm is this part of the gRPC interface of Triton? if it is, we should be able to add that quite easily,
Well it should work, make sure you see the Task "holds" all the information needed (under the execution tab). repo / uncommitted changes / python packages etc.
Then configure your agent (choose pip/conda/poetry as package managers), and spin it up (by default in venv/coda mode, or in docker mode)
Should work 🙂
TRAINS_WORKER_NAME=first_agent trains-agent --gpus 0
andTRAINS_WORKER_NAME=second_agent trains-agent --gpus 0
What is the proper way to change a clearml.conf ?
inside a container you can mount an external clearml.conf, or override everything with OS environment
https://clear.ml/docs/latest/docs/configs/env_vars#server-connection
Hi WickedBee96
How can I do that?
clearml-task
https://clear.ml/docs/latest/docs/apps/clearml_task#what-is-clearml-task-for
I know this way to run it in the agent only by enqueue the draft after running it on my local machine so is there another way?
Or maybe are you looking for task.execute_remotely
https://clear.ml/docs/latest/docs/references/sdk/task#execute_remotely
Hi UnsightlySeagull42
does anyone know how this works with git ssh credentials?
These will be taken from the host ~/.ssh folder
Hi AstonishingWorm64
Is this the same ?
https://github.com/allegroai/clearml-serving/issues/1
(I think it was fixed on the later branch, we are releasing 0.3.2 later today with a fix)
Can you try:pip install git+
MoodyCentipede68 could it be that the model is on one account (workspace) and your credentials (the ones provided to the docker compose) are from another workspace?
The error itself point to the triton helper failing to get the model ID from the backend. The models are uploaded to a a specific workspace, and it looks like a mismatch (I.e. the model Id is nowhere to be found) wdyt?
Would you have an example of this in your code blogs to demonstrate this utilisation?
Yes! I definitely think this is important, and hopefully we will see something there 🙂 (or at least in the docs)
However, that would mean passing back the hostname to the Autoscaler class.
Sorry my bad, the agent does that automatically in real-time when it starts, no need to pass the hostname it takes it from the VM (usually they have some random number/id)
The import process actually creates a new Task every import, that said if you take a look here:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/task.py#L1733
you can pass a pre-existing Task ID to "import_task" https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/task.py#L1653
Hi @<1674588542971416576:profile|SmarmyGorilla62>
You mean on your elastic / mongo local disk storage ?
Hi @<1523706645840924672:profile|VirtuousFish83>
Hello, is it possible to disable lazy loading ?
You mean in the UI for loading the console ?
The logs can be huge 10s and 100s of MB...
We have the same issue for hyperparameters even with only ~100 keys,
100+ parameters that is quite a lot.
So are you saying the search in the UI only filter the lazily loaded elements and not the entire param list?
Hi LazyTurkey38
What do you mean the git repo is not recognized? When execute_remotely leaves you should see on the task a ref to the git repo with the exact commit ID you have locally pulled, do you see it under the Execution tab?
It should preserve the order as the order of the update back (i.e. when executed by the agent) is the same as the order of the keys (obviously py3.7+ becuase it creates dict not Ordered Dicts)
I think you are correct 😞 Let me make sure we add that (docstring and documentation)
Maybe we should do that automatically ? wdyt?
UpsetTurkey67 are you saying there is a sym link in the original repository, and when it copies it, it breaks the symlink ?
Hi GiddyTurkey39 ,
When you say trains agent, are you referring to the trains agent command ...
I mean running the trains-agent daemon
on a machine. This means you have a daemon pulling jobs from the execution queue and executing them (either in virtual environment, or inside a docker)
You can read more about https://github.com/allegroai/trains-agent and https://allegro.ai/docs/concepts_arch/concepts_arch/
Is it sufficient to queue the experiments
Yes there is no ne...
Switching to process Pool might be a bit of an overkill here (I think)
wdyt?
Can you put here the task.connect line ? (btw: I would assume there is no need for additional connect, if using hydra+fire, no ?)