Reputation
Badges 1
55 × Eureka!@<1576381444509405184:profile|ManiacalLizard2>
They are actually from tracked files. Actually, I get the uncommitted changes under Execution tab.
You are right. My colleague wrote it I think starting from the aws autoscaler.
It doesn't work when I insert the credentilas individually either. I am using EC2 as clearml server.
@<1523701070390366208:profile|CostlyOstrich36> I have been exploring. The problem seems to be when the docker container is using the cached dir.
Using cached repository in "/root/.clearml/vcs-cache/****.git.0081a6bc4d7afe6adde369e6aeab9406/****.git"
When inside that directory and tries to fetch, it asks for credentials. when it clones, it doesn't.
cloning: git@github.com:****/****.git
Using user/pass credentials - replacing ssh url 'git@github.com:****/****.git' with https ...
At least I can do that along with matplotlib
@<1523705004920147968:profile|CloudySwallow27>
I commited the uncommited changes and tried it. It works. The batchsize becomes 4.
created virtual environment CPython3.10.13.final.0-64 in 511ms
creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.10, clear=False, no_vcs_ignore=False, global=True)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
added seed packages: pip==23.3.1, setuptools==69.0.2, wheel==0.42.0
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
...
they are different tasks. I start a new task but it can be same commit sometimes.
- batchsize: 22
+ batchsize: 4
I see that under Executed . But, the batch size for me is 22 under Configuration/General
Thank you Jake. is that the same for bitbucket and other repos as well? AND is there a specific part of the doc that talks about it?
The worker machines are on gcp
I didn't write this conf, but it works.
The devops changed the url and I had to go through some steps to find out what the problem was.
configurations:
extra_clearml_conf: 'sdk.aws.s3.region="us-west-2"
agent.extra_docker_arguments=["--shm-size=90g"]
agent.extra_docker_shell_script=["git config --global credential.helper cache --timeout=604800",]'
extra_trains_conf: ''
extra_vm_bash_script: ''
queues:
gcp-v100:
- - gcp-v100
- 4
gcp-l4:
- - gcp-l4
- 4
gcp-cpu:
- - gcp-cpu
- 4
resource_configurations:
gcp-v100:
...
That is the configuration yaml.
You are right. But, I have to start it from draft in the UI to do that, right? I mean, clone and restart.
@<1523701070390366208:profile|CostlyOstrich36>
import yaml
from clearml.automation.auto_scaler import AutoScaler, ScalerConfig
from gcp_driver import GCPDriver
with open('gcp_autoscaler.yaml') as f:
conf = yaml.load(f, Loader=yaml.SafeLoader)
driver = GCPDriver.from_config(conf)
conf = ScalerConfig.from_config(conf)
autoscaler = AutoScaler(conf, driver)
autoscaler.start()
That is the python code.
clearml==1.14.1
That is the version.
I am passing my credentials as well. So, if it tries with the credentials and it doesn't work, may be try twice, may be three times, then raise an error. I don't know a lot about libraries, but the error that I am seeing is a bit confusing now.