Unanswered
Hi All
Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag.
I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The
queued with:
task = Task.create(
project_name="name",
task_name="training",
repo="repo",
branch="branch",
script="training_script",
packages=package_list,
docker="docker_gpu_image",
docker_args=["--network=host"],
)
task.output_uri = "filer_server"
task.enqueue(task, "training")
34 Views
0
Answers
2 months ago
2 months ago