Unanswered
Hi All
Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag.
I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The
queued with:
task = Task.create(
project_name="name",
task_name="training",
repo="repo",
branch="branch",
script="training_script",
packages=package_list,
docker="docker_gpu_image",
docker_args=["--network=host"],
)
task.output_uri = "filer_server"
task.enqueue(task, "training")
54 Views
0
Answers
4 months ago
4 months ago