![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/ManiacalLizard2.png)
Reputation
Badges 1
87 × Eureka!you may want to share your config (with credential redacted) and the full docker compose start up log ?
Task.export_task() will contains what you are looking for.
In this case ['script']['diff']
normally, you should have a agent running behind a "services" queue, as part of your docker-compose. You just need to make sure that you populate the appropriate configuration on the Server (aka set the right environment variable for the docker services)
That agent will run as long as your self-hosted server is running
you are forcing ssh with force_git_ssh_protocol: true
Have you setup ssh keys ?
If you are using ssh keys, why enable_git_ask_pass: true
?
what is the difference between vscode via clearml-session and vscode via remote ssh extension ?
You can either set your user permission to allow group write by default ?
Or maybe create a dedicated user with group write permission and run the agent with that user ?
--gpus 0,1
: I believe this basically say that your code launched by the agent has access to both GPUs and that is it. Now it is up to your code to choose which GPU to use and what not and how ...
I will try it. But it's a bit random when this happen so ... We will see
@<1523701087100473344:profile|SuccessfulKoala55> I can confirm that v1.8.1rc2 fixed the issue in our case. I manage to reproduce it:
- Do a local commit without pushing
- Create task and queue it
- The queue task failed as expected as the commit is only local
- Push your local commit
- Requeue the task
- Expecting that the task succeeed as the commit is avail: but it fails as the vcs seems to be in weird state from previous failure
- Now with v1.8.1rc2 the issue is solved
from the logs, it feels like after git clone, it spend minutes without outputting anything. @<1523701205467926528:profile|AgitatedDove14> Do you know what is the agent suppose to do after git clone ?
I guess a check that all packages is installed ? But then with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1, what is the agent doing ??
so the issue is that for some reason, the pip install
by the agent don't behave the same way as your local pip install
?
Have you tried to manually install your module_b with pip install inside the machine that is running clearml-agent ? Seeing your example, looks like you are even running inside docker ?
or simply create a new venv in your local PC, then install your package with pip install from repo url and see if your file is deployed properly in that venv
once you install manually your package inside the docker container, check that your file module_b/templates/my_template.yml
is where it should be
what you mean by different script ?
all good. Just wanted to know in case I missed it
you should be able to test your credential first using something like rclone or azure-cli
you should be able to explicitly upload a file of your choice as artefact using something like this: None
Nice ! That is handy !!
thanks !
Ok. Found the solution.
The importance is to use this:
Task.add_requirements("requirements.txt")
task = Task.init(project_name='hieutest', task_name='foo',reuse_last_task_id=False)
And not:
task = Task.init(project_name='hieutest', task_name='foo',reuse_last_task_id=False)
task.add_requirements("requirements.txt")
I also have the same issue. Default argument are fine but all supplied argument in command line become duplicated !
Solved @<1533620191232004096:profile|NuttyLobster9> . In my case:
I need to from clearml import Task
very early in the code (like first line), before importing argparse
And not calling task.connect(parser)
what about having 2 agents, one on each GPU, on the same machine, serving the same queue ? So that when you enqueue, which ever agent (thus GPU) available will take the new task
So if i spin up a new clearml server in the cloud and use the same file server mount point, i will see all task and expriment that i had on the in prem server in the cloud server?
something like this: None ?
What about migrating existing expriment in the on prem server?
inside the script that launch the agent, I set all the env need (aka disable installation with the var above)
there is a tricky thing: clearml-agent should not be running from a venv itself ... don't remember where I read that doc