Reputation
Badges 1
113 × Eureka!We use task.export_task() and a hacked version to get console log:
def save_console_log(task: clearml.Task, fs, remote_path, number_of_reports=10000):
from clearml.backend_api.services import events
from clearml.backend_api import Session
# Stollen from Task.get_reported_console_output()
if Session.check_min_api_version('2.9'):
request = events.GetTaskLogRequest(
task=task.id,
order='asc',
navigate_earlier=True,
...
@<1523701087100473344:profile|SuccessfulKoala55> I can confirm that v1.8.1rc2 fixed the issue in our case. I manage to reproduce it:
- Do a local commit without pushing
- Create task and queue it
- The queue task failed as expected as the commit is only local
- Push your local commit
- Requeue the task
- Expecting that the task succeeed as the commit is avail: but it fails as the vcs seems to be in weird state from previous failure
- Now with v1.8.1rc2 the issue is solved
you can either:
- Build an image from your docker file and when running the task/experiment, tell it to use that docker image
- If the steps to install dependencies required for your repository is not too complicate, then you can use
agent.extra_docker_shell_scriptin theclearml.confin order to install all the dependencies inside the docker container launched by clearml in docker mode.
I understand to from the agent, point of view, I just need to update the conf file to use new credential and new server address.
If the agent is the one running the experiment, very likely that your task will be killed.
And when the agent come back, immediately or later, probably nothing will happen. It won't resume ...
@<1523701205467926528:profile|AgitatedDove14>
What is the env var name for Azure Blob storage ? That the one we use for our Artifiact.
Also, is there function call rather than env var ?
It would be simplier in our case to call a function to set credential for clearml rather than fetch secret and set env var prior to running the python code.
If there is only the option of using env var, I am thinking fetchcing secrets and set env var from python, eg: os.environ["MY_VARIABLE"] = "hello" ...
I use CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/path/to/my/vemv/bin/python3.12 and it work for me
so in your case, in the clearml-agent conf, it contains multiple credential, each for different cloud storage that you potential use ?
I use ssh public key to access to our repo ... Never tried to provide credential to clearml itself (via clearml.conf ) so I cannot help much here ...
we are usign mmsegmentation by the way
what is the command you use to run clearml-agent ?
Are you talking about this: None
It seems to not doing anything aboout the database data ...
that format is correct as I can run pip install -r requirements.txt
using the exact same file
there is a whole discussion about it here: None
What about migrating existing expriment in the on prem server?
Please refer to here None
The doc need to be a bit clearer: one require a path and not just true/false
1.12.2 because some bug that make fastai lag 2x
1.8.1rc2 because it fix an annoying git clone bug
python library don't always use OS certificates ... typically, we have to set REQUESTS_CA_BUNDLE=/path/to/custom_ca_bundle_crt because requests ignore OS certificates
you an use a docker image that already have those packages and dependencies, then have clearml-agent running inside or launching the docker container
I also use this: None
Which can give more control
or simply create a new venv in your local PC, then install your package with pip install from repo url and see if your file is deployed properly in that venv
Not a solution, but just curious: why would you need that many "debug" images ?
Those are images automatically generated by your training code that ClearML automatically upload them. May be disable auto upload image during Task Init ?
Can you paste here what inside "Installed package" to double check ?
I did.
I am now redeploying to new container to be sure.