
Reputation
Badges 1
99 × Eureka!just saw that repo: who are coder
? That not the vscode developer team is it ?
If the agent is the one running the experiment, very likely that your task will be killed.
And when the agent come back, immediately or later, probably nothing will happen. It won't resume ...
python library don't always use OS certificates ... typically, we have to set REQUESTS_CA_BUNDLE=/path/to/custom_ca_bundle_crt
because requests
ignore OS certificates
if you have 2 agent serving the same queue and then send 2 task to that queue, each agent should take one task
But if you queue sequentially one task then wait until that task to finish and queue the next: then it will be random which agent will take the task. Can be the same on from the previous task
Are you saying that you have 1 agent running task, 1 agent sitting idle while there is a task waiting in the queue and no one is processing it ??
You don't need agent on your local machine.
You want an agent running on the GPU machine.
Local code will create an experiment in ClearML Server, then run up to the line remotely_execute()
then stop
Once local code stop, the Clearml Server will take over and enqueue the experiment to the prescribe queue
The agent on the GPU see there is a experiment on its queue and then pull it and execute it. This time, clearml lib magic will make the code on the GPU machine, launched by the agent, run...
I also have the same issue. Default argument are fine but all supplied argument in command line become duplicated !
my code looks like this :
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--config-file', type=str, default='train_config.yaml',
help='train config file')
parser.add_argument('-t', '--train-times', type=int, default=1,
help='train the same model several times')
parser.add_argument('--dataset_dir', help='path to folder containing the preped dataset.', required=True)
parser.add_argument('--backup', action='s...
most of people probable wont even know what that do
While creating the autoscaler instance I did provide my git credentials, i.e my username and Personal Access Token.
How exactly did you do that ?
How are you using the function update_output_model
?
is this mongodb type of filtering?
normally, you should have a agent running behind a "services" queue, as part of your docker-compose. You just need to make sure that you populate the appropriate configuration on the Server (aka set the right environment variable for the docker services)
That agent will run as long as your self-hosted server is running
do you have a video showing the use case for clearml-session ? I struggle a bit about how is it used for ?
Found the issue: my bad practice for import 😛
You need to import clearml before doing argument parser. Bad way:
import argparse
def handleArgs():
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--config-file', type=str, default='train_config.yaml',
help='train config file')
parser.add_argument('--device', type=int, default=0,
help='cuda device index to run the training')
args = parser....
something like this: None ?
you are forcing ssh with force_git_ssh_protocol: true
Have you setup ssh keys ?
If you are using ssh keys, why enable_git_ask_pass: true
?
Do you want to use https
or ssh
to do git clone ? Setting up both in the same time is confusing
Feels like Docker, Kubernetes is more fit for that purpose ...
this looks like the agent running inside your docker did not have any username/password to do git clone. so the default behavior is to wait for keyboard input: which look like hanging ....
Ok I think I found the issue. I had to point the file server to azure storage:
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server:
web_server:
files_server: "
"
credentials {"access_key": "REDACTED", "secret_key": "REDACTED"}
}
Just a +1 here. When we use the same name for 3 differents image, the thumbnail show 3 different images, but when clicking on any of them, only one is displayed. No way to display the others
if you want plot, you can simply generate plot with matplotlib and clearml can upload them in the Plot or Debug Sample section
you should be able to explicitly upload a file of your choice as artefact using something like this: None
Solved @<1533620191232004096:profile|NuttyLobster9> . In my case:
I need to from clearml import Task
very early in the code (like first line), before importing argparse
And not calling task.connect(parser)
Found it: None
And credential are set with :
sdk {
azure.storage {
containers: [
{
account_name: "account"
account_key: "xxxx"
container_name:"clearml"
}
]
}
}
you may want to share your config (with credential redacted) and the full docker compose start up log ?
You can either set your user permission to allow group write by default ?
Or maybe create a dedicated user with group write permission and run the agent with that user ?