
Reputation
Badges 1
212 × Eureka!You guys are the maintainers of this repo
SuccessfulKoala55 It looks like it should eval to True?
No I'm not tracking. I'm pretty new to k8s so this might be beyond my current knowledge. Maybe if I rephrase my goals it may make more sense. Essentially I want to enqueue an experiment, pick a queue (gpu), and have a gpu ec2 node provisioned upon that, lastly the experiment is then initialized on that new gpu ec2 and executed. When the work is completed, I want the gpu ec2 node to terminate after x amount of time.
That is the problem, the if
condition is not evaluating to True
ok yes, this is the problem
Yep I updated those as well
perhaps I need to use localhost
When I deployed the webserver, I changed the value https://github.com/allegroai/clearml-helm-charts/blob/main/charts/clearml/values.yaml#L36 to be the public file server URL. Then in the UI, I copied the blob from the settings/API keys. Which had the public URLs. After that I did my data uploads which worked fine as they used public URLs. The problem is due to tight security on this k8 cluster, the k8 pod cannot reach the public file server url which is associated with the dataset.
yea, does the enterprise version have more functionality like this?
Gotcha, and the agent default runtime mode is docker correct? So I could install all my system dependencies in my own docker image?
yes makes sense. So I wouldnt be able to setup the PYTHONPATH
via the setup script?
IMO, the dataset shouldnt be tied to the clearml.conf URLs that it was uploaded with, as that URL could change. It should respect the file server URL the agent has.
{"asctime": "2022-09-28 18:45:55,353", "levelname": "INFO", "name": "root", "module": "ldc_train_end_to_end", "threadName": "MainThread", "message": "Training classifier with command:\npython -m sfi.imagery.models.bbox_predictorv2.train ./sfi/imagery/models/training/train_config.json", "filename": "ldc_train_end_to_end.py", "funcName": "train_model"} File "/usr/lib64/python3.7/site.py", line 177 file=sys.stderr) ^ SyntaxError: invalid syntax
However, the template does not render https://github.com/allegroai/clearml-helm-charts/blob/main/charts/clearml/templates/configmap-apiserver.yaml#L1
If you look lower, it is there '/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py'
When I run this line locally, it works finefrom sfi.imagery.models.chip_classifier.eval import eval_chip_classifier
Seems like it has everything I would need
` PYTHONPATH: /home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py/sfi:/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py:/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py/sfi/imagery/models/training::/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py/sfi:/usr/lib64/python37.zip:/usr/lib64/python3.7:/usr/lib64/python3.7/lib-dynload:/home/npuser/.clearml/venvs-builds/3.7/lib6...
Figured this out, the value is parsed from my local clearml.conf file
Could I simply just reference the files by name and pass in a string such as ~/.clearml/my_file.json
I suppose a short term hack would to just edit the /etc/hosts
file and redirect the public url to k8 dns url?
I think this is VPN related now
so it caches to ~/.clearml/ any files that are under the same project name?
I don't know how to get past this? My k8 pods shouldn't need to reach out to the public file server URL.
I just opened a shell with the api and tried to curl my files URL, and the curl just hangs. no response
This is to address the PYTHONPATH issues