Hey Guys,
noob question/problem ahead, so please forgive me in advance ðŸ«
Context: I was able to host the ClearML server (web server, file server, API server) in an AWS EC2 instance via terraform IaC and GitHub actions and the web server is accessible at the http://<Public-IP of the EC2 instance>:8080/
as I followed referring to this documentation page , And I was able to run experiments from my local and make API calls to the ClearML server hosted in EC2 and the experiments ran fine in this setup and were getting updated in the web server UI as well.
Problem: Then I tried exploring ClearML Agentic runs for remote execution of tasks for which I created a queue and ran clearml-agent daemon --queue <queue_name>
, which assigned a worker to it, and I then enqueued a task to that queue, and the worker picked that task up for execution but it is having error : ERROR! Failed applying git diff, see diff above
I dug down this a bit and found that adding below to clearml.conf might get this fixed but I wanted an opinion or any suggestions on this
# Store uncommitted git/hg source code diff in experiment manifest when training in development mode
# This stores "git diff" or "hg diff" into the experiment's "script.requirements.diff" section
store_uncommitted_code_diff: true
store_code_diff_from_remote: false
I would really appreciate some help or any opinion on this (I am open to an active conversation for taking in knowledge)
Thanks!