Reputation
Badges 1
38 × Eureka!AgitatedDove14 we can read sys/fs/cgroup/memory/memory.limit_in_bytes to get the limit
https://faun.pub/understanding-docker-container-memory-limit-behavior-41add155236c
docker will Not actually limit the “vioew of the memory” it will just kill the container if you pass the memory limit, this is a limitation of docker runtime
it will only if oom killer is enabled
Hi AgitatedDove14 , I’m using clearml clearml-task to queue a task in a remote agent. The git remote URL is “ ssh://git@0.0.0.0:1234/path/to/repo.git ”, clearml https://github.com/allegroai/clearml/blob/aad01056b548660bb271c4f98447b715b8ba4c7d/clearml/backend_interface/task/repo/scriptinfo.py#L909 username from it (to cover cases like https://username@github.com/username/repository.git ), so the final URL is ssh://0.0.0.0:1234/path/to/repo.git , not ssh://git@0.0.0.0:1234/path/to/repo.g...
AgitatedDove14 the best option would be custom charts in Web UI, like in wandb: https://docs.wandb.ai/ref/app/features/custom-charts
But pdf is acceptable too.
SuccessfulKoala55 yes, I have /usr/bin/python3.8, but it doesn’t help if I set it in agent.python_binary. python3.8 set as alternative #1 for python. but conda for some reason creating env with python3.6...
Executing Conda: /home/user/conda/bin/conda env remove -p /home/jovyan/.clearml/venvs-builds/3.6 --quiet --json
AgitatedDove14 hm, I don’t know what is the right expected behaviour, I’ve expected 2 plots. If my assumption looks right, should I make an issue on github?
AgitatedDove14 done) btw, could you show me the place in the code where scalars are written? I want to make a hotfix
` import numpy as np
import pandas as pd
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from lightautoml.tasks import Task
from lightautoml.automl.presets.tabular_presets import TabularAutoML
import clearml
cml_task = clearml.Task.get_task(clearml.config.get_remote_task_id())
logger = cml_task.get_logger()
data = pd.read_csv("./examples/data/sampled_app_train.csv")
....
automl = TabularAutoML(task=Task('binary'))
cml_task.connect(automl)
o...
AgitatedDove14 for example let’s add to https://github.com/allegroai/clearml/blob/master/examples/frameworks/catboost/catboost_example.py second catboost model training:
` ...
catboost_model = CatBoostRegressor(iterations=iterations, verbose=False)
catboost_model2 = CatBoostRegressor(iterations=iterations+200, verbose=False)
...
catboost_model.fit(train_pool, eval_set=test_pool, verbose=True, plot=False, save_snapshot=True)
catboost_model2.fit(train_pool, eval_set=test_pool, verbose=True,...
AgitatedDove14 no, it’s not a request.
I have custom python class, that uses a lot of models from frameworks that supported by ClearML already. I want to enable auto reporting for all models by using command clearml_task.connect(my_custom_class_instance)
, but it doesn’t work the way I need it to — there is the only one loss curve, because because this graph is redrawn every time a new instance starts training.
Is there any way to reporting all instances inside my custom class without ...
@<1523701181375844352:profile|ExasperatedCrocodile76> hi, try to pass “--network=host” to --docker_args
example:
clearml-task --project project --name name --script run.py --queue queue --requirements requirements.txt --docker python:3.7.13-bullseye --docker_args "--cpus=8 --memory=16g --network=host"
AgitatedDove14
Are you saying the second time this line is missing?
Yes.
Can you send the full Task log?
I will send the log in direct messages.
when I restart the agent, it works fine, but on the second launch docker does not mount the ssh keys folder:'-v', '/tmp/clearml_agent.ssh.rbw8o0t7:/root/.ssh',
I don’t understand why. AgitatedDove14 JitteryCoyote63 could you explain the logic behind that? CLEARML_AGENT_DISABLE_SSH_MOUNT variable is not set.
So it fails with this log message:
` ...
Using cached repository in "/root/.clearml/vcs-cache/<MY_REPO>.git.893c8c47c9813c27eb1fe8d0aeb77a11/<MY_REPO>.git"
fatal: Could not read f...
Hi CostlyOstrich36
How are you mounting the credentials?
Is this also mounted into the docker itself?
as I wrote above, it is mounted automatically:'-v', '/tmp/clearml_agent.ssh.kqzj9sky:/root/.ssh
What version of
ClearML-Agent
are you using?
1.3.0
CostlyOstrich36 thank you! appreciate the quick response!
CostlyOstrich36 no, there is only task_id and name in response
I think docker mode is what you need to use if you want to pre-install packages in an environment
In order to use newest version I have to install the library at every run. I don’t think that building a docker image at every run is a good solution here. So the only solution is add it pythonically.
sureprint(APIClient().tasks.get_all(["95db561a08304a1faac3aabcb117412e"]))
{‘id’: ‘95db561a08304a1faac3aabcb117412e’, ‘name’: ‘task’}
CostlyOstrich36 it is ok if I use agent in docker mode, but what should I use in other cases?
ContemplativeGoat37 hi, any updates? I have a similar issue due executing clearml-data create
command, also the status is stuck in “uploading”
And when I’m trying to add a file to dataset, this happens:
` Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20d7231430>: Failed to establish a new connection: [Errno 111] Connection refused')': /
Retrying (Retry(total=1, conn...
@<1523701087100473344:profile|SuccessfulKoala55> yes, elastic is failed. don’t understand why