Error
Failed to get Scalar Charts
agent.hide_docker_command_env_vars.extra_keys: ["DB_PASSWORD=password"]
like this? or ["DB_PASSWORD", "password"]
fantastic, everything is working perfectly
thanks guys
maybe I should use explicit reporting instead of Tensorboard
yeah, I am aware of trains-agent, we are planning to start using it soon, but still, copying original training command would be useful
I guess I could manually explore different containers and their content š as far as I remember, I had to update Elastic records when we moved to the new cloud provider in order to update model URLs
task
=
Task.get_task(task_id
=
args.task_id)
task.mark_started()
task.set_parameters_as_dict(
{
"General": {
"checkpoint_file": model.url,
"restart_optimizer": False,
}
}
)
task.set_initial_iteration(0)
task.mark_stopped()
Task.enqueue(task
=
task, queue_name
=
task.data.execution.queue)
yes. we upload artifacts to Yandex.Cloud S3 using ClearML. we set " s3://storage.yandexcloud.net/clearml-models " as output uri parameter and add this section to the config:{
host: "
http://storage.yandexcloud.net "
key: "KEY"
secret:"SECRET_KEY",
secure: true
}
this works like a charm. but download button in UI is not working
not sure what you mean. I used to do task.set_initial_iteration(task.get_last_iteration()) in the task resuming script, but in the training code I explicitly pass global_step=epoch to the TensorBoard writer
yeah, we've used pipelines in other scenarios. might be a good fit here. thanks!
I added the link just in case anyway š
also, is there any way to install a repo that we clone as a package. we often use absolute imports and do "pip install -e ." to utilize it
sorry there are so many questions, we just really want to migrate to trains-agent)
yeah, server (1.0.0) and client (1.0.1)
do you have any idea why cleanup task keeps failing then (it used to work before the update)
for me, increasing shm-size usually helps. what does this RC fix?
does this mean that setting initial iteration to 0 should help?
okay, what do I do if it IS installed?
on the side note, is there any way to automatically give more meaningful names to the running docker containers?