@<1523701087100473344:profile|SuccessfulKoala55> I managed to make this working by:
concat the existing OS ca bundle and zscaler certificate. And set REQUESTS_CA_BUNDLE
to that bundle file
--gpus 0,1
: I believe this basically say that your code launched by the agent has access to both GPUs and that is it. Now it is up to your code to choose which GPU to use and what not and how ...
what you mean by different script ?
I understand for cleaml-agent
What I mean is that I have 2 self deployed server. I want to switch between the 2 config when running the code locally, not inside the agent
what about having 2 agents, one on each GPU, on the same machine, serving the same queue ? So that when you enqueue, which ever agent (thus GPU) available will take the new task
Are you sure all the files needed are pushed to your git repo ?
Go to a another folder and git clone that exact branch/commit and check the files are there ?
i need to do a git clone
You need to do it to test if it works. Clearml-agent will run it itself when it take in a task
in my case, I set eveything inside the container, including the agent and not using docker mode altogether.
When my container start, it start the agent inside it in "normal" mode
ok, so if git commit or uncommit changes differ from previous run, then the cache is "invalidated" and the step will be run again ?
what about the log aroundwhen it try to actually clone your repo ?
For #2: it's a pull rather than a push system: you need to have a script that do pulling at regular interval and need to keep track what new and what not?
Artifact can be anything, that you can use clearml SDK to upload to storage. Which storage is used is defined by your clearml.conf (with its credentials) ClearML web and api server do not store those files
Model is a special artifact: None
Example you have the lineage feature where if you train model B using model A as starting point (aka pre-trained) , and model C from model B, ... The lineage will track modelC was built on...
Do I need not make changes into clearml.conf so that it doesn't ask for my credentials or is there another way around
You have 2 options:
- set credential inside cleaml.conf : i am not familiar with this and never test it.
- or setup password less ssh with public key None
python library don't always use OS certificates ... typically, we have to set REQUESTS_CA_BUNDLE=/path/to/custom_ca_bundle_crt
because requests
ignore OS certificates
You don't need agent on your local machine.
You want an agent running on the GPU machine.
Local code will create an experiment in ClearML Server, then run up to the line remotely_execute()
then stop
Once local code stop, the Clearml Server will take over and enqueue the experiment to the prescribe queue
The agent on the GPU see there is a experiment on its queue and then pull it and execute it. This time, clearml lib magic will make the code on the GPU machine, launched by the agent, run...
nice !! That is exactly what I am looking for !!
if you are using a self hosted clearml server spin up with docker-compose, then you can just mount your NAS to /opt/clearml/fileserver
on the host machine, prior to starting clearml server with docker-compose up
all good. Just wanted to know in case I missed it
have you try a different browser ?
please share your .service
content too as there are a lot of way to "spawn" in systemd
@<1523701087100473344:profile|SuccessfulKoala55> Yes, I am aware of that one. It build docker container ... I wanted to build without docker. Like when clearml-agent run in non-docker mode, it is already building the running env inside it caching folder structure. I was wondering if there was a way to stop that process just before it execute the task .py
what does your clearml.conf look liks ?
I am not familiar with autoscaler ... are you using the paid version of Clearml ?
oh ..... did not know about that ...
what is the command you use to run clearml-agent ?
While creating the autoscaler instance I did provide my git credentials, i.e my username and Personal Access Token.
How exactly did you do that ?
this looks like the agent running inside your docker did not have any username/password to do git clone. so the default behavior is to wait for keyboard input: which look like hanging ....
very hard to diagnose with this tiny bit of log ...