
Reputation
Badges 1
212 × Eureka!perhaps I need to use localhost
well I did something on my end, its magically working now
I think if I use the local service URL this problem is fixed
yep that fixed it using references like clearml-webserver.clearml.svc.cluster.local:80
thank you for the help!
So in summary: subprocess calls appear to break clearML tracking, even if I do Task.init() in both main.py and train.py. However the script does run end to end successfully. If I remove the subprocess calls, I only need Task.init() in main.py for everything to work (scalars, reporting, etc).
do I have to fetch it via code? I was hoping to not modify my scripts
SuccessfulKoala55 Darn, so I can only scale vertically?
for example, if my github repo is project.git and my structure is project/utils/tool.py
Essentially I'm mirroring the example here
the worker is now in the dashboard
perhaps the 405 is from nginx
the API url works fine, returns 200
yea let me unwind some changes so I can pinpoint the issue
These are the logs from the fileserver pod
For instance, quotes are used
I dont know how to do that
I learned helm a few days ago
Would using 22.04 Ubuntu still work in the task execution?
This is to address the PYTHONPATH issues
I think the quotes don't effect the yaml
Not yet AgitatedDove14 Perhaps we can pair on this Monday.
I think the issue is the pod to pod comms can't resolve my route53 dns records
Yep I updated those as well
In addition to an EFS mount
ok yes, this is the problem