Reputation
Badges 1
42 × Eureka!so we may use more specific lib/tool or just add a if-statement for case “ssh:git@”
CostlyOstrich36 I’m running the task that doesn’t need GPU by this commandclearml-task … --docker python:3.7.13-bullseye
this is the case of developing AutoML: when I have a lot of datasets and the model should be nice at all datasets in average. So projects here are different versions of AutoML and I need to compare them
oh, should I use --cpu-only flag?
@<1523701070390366208:profile|CostlyOstrich36> yes, WebApp: 1.12.1-397 • Server: 1.12.1-397 • API: 2.26.
Docker version 28.0.1, build 068a01e (updated to this version few weeks ago).
sorry, just found it)
Hi CostlyOstrich36
How are you mounting the credentials?
Is this also mounted into the docker itself?
as I wrote above, it is mounted automatically:'-v', '/tmp/clearml_agent.ssh.kqzj9sky:/root/.ssh
What version of
ClearML-Agent
are you using?
1.3.0
CostlyOstrich36 it is ok if I use agent in docker mode, but what should I use in other cases?
AgitatedDove14
Specifically
/tmp/clearml_agent.ssh.rbw8o0t7
is the copy of the .ssh that the agent created, and now it is mounting it into the container
but why is it mounted only once? second and following containers do not mount the folder
I want aggregate only final metrics from the model. For example, "Metric HO" (holdout) here:
AgitatedDove14 we can read sys/fs/cgroup/memory/memory.limit_in_bytes to get the limit
https://faun.pub/understanding-docker-container-memory-limit-behavior-41add155236c
python:3.7.13-bullseye
docker will Not actually limit the “vioew of the memory” it will just kill the container if you pass the memory limit, this is a limitation of docker runtime
it will only if oom killer is enabled
task log
` task ca3ab0ce39aa436f9e656fff378a2c25 pulled from c39519fcfb3f4353808fd266d6100795 by worker v012-0:gpuGPU-0929fd0f-eff1-91f1-854e-9874599660c3
2022-12-12 16:32:21
Current configuration (clearml_agent v1.5.1, location: /tmp/.clearml_agent.guezjnez.cfg):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retri...
I think docker mode is what you need to use if you want to pre-install packages in an environment
In order to use newest version I have to install the library at every run. I don’t think that building a docker image at every run is a good solution here. So the only solution is add it pythonically.
@<1523701070390366208:profile|CostlyOstrich36> I need to compare aggregated values: I want to compare mean metric value of N experiments from project 1 vs mean metric value of N experiments from project 2
AgitatedDove14 the best option would be custom charts in Web UI, like in wandb: https://docs.wandb.ai/ref/app/features/custom-charts
But pdf is acceptable too.
Hi AgitatedDove14 , I’m using clearml clearml-task to queue a task in a remote agent. The git remote URL is “ ssh://git@0.0.0.0:1234/path/to/repo.git ”, clearml https://github.com/allegroai/clearml/blob/aad01056b548660bb271c4f98447b715b8ba4c7d/clearml/backend_interface/task/repo/scriptinfo.py#L909 username from it (to cover cases like https://username@github.com/username/repository.git ), so the final URL is ssh://0.0.0.0:1234/path/to/repo.git , not ssh://git@0.0.0.0:1234/path/to/repo.g...
AgitatedDove14 done) btw, could you show me the place in the code where scalars are written? I want to make a hotfix
AgitatedDove14
Are you saying the second time this line is missing?
Yes.
Can you send the full Task log?
I will send the log in direct messages.
AgitatedDove14 sorry, no, in fact my configuration looks like:
` ...
agent.git_user=""
agent.git_pass=""
agent.git_host=""
agent.package_manager.extra_index_url= [
]
agent {
worker_id: ""
worker_name: ""
force_git_ssh_protocol: true
... `
@<1523701087100473344:profile|SuccessfulKoala55> yes, elastic is failed. don’t understand why
AgitatedDove14 , do you know the answer?
AgitatedDove14 for example let’s add to https://github.com/allegroai/clearml/blob/master/examples/frameworks/catboost/catboost_example.py second catboost model training:
` ...
catboost_model = CatBoostRegressor(iterations=iterations, verbose=False)
catboost_model2 = CatBoostRegressor(iterations=iterations+200, verbose=False)
...
catboost_model.fit(train_pool, eval_set=test_pool, verbose=True, plot=False, save_snapshot=True)
catboost_model2.fit(train_pool, eval_set=test_pool, verbose=True,...
When I updated the URL of the remote repository in my git client
SuperiorPanda77 did you just replace “remote” for the client?
My remote in git client is ok:
ssh://git @<address>:5109<repo_path>.git
so I don’t understand why and where it changes :(
