In my github action, I should just have a dummy clearml server and run the task there, connecting to this dummy clearml server
is there a command / file for that?
What I put in the clearml.conf is the following:
agent.package_manager.pip_version = "==20.2.3" agent.package_manager.extra_index_url: [" "] agent.python_binary = python3.8
Hi SuccessfulKoala55 , will I be able to update all references to the old s3 bucket using this command?
BTW, is there any specific reason for not upgrading to clearml?
I just didn't have time so far 🙂
very cool, good to know, thanks SuccessfulKoala55 🙂
Ha I see, it is not supported by the autoscaler > https://github.com/allegroai/clearml/blob/282513ac33096197f82e8f5ed654948d97584c35/trains/automation/aws_auto_scaler.py#L120-L125
Hi AgitatedDove14 , initially I was doing this, but then I realised that with the approach you suggest all the packages of the local environment also end up in the “installed packages”, while in reality I only need the dependencies of the local package. That’s why I use _update_requirements , with this approach only the package required will be installed in the agent
I also tried setting ebs_device_name = "/dev/sdf" - didn't work
my agents are all .16 and I install trains 0.16rc2 in each Task being executed by the agent
How exactly is the clearml-agent killing the task?
I have two controller tasks running in parallel in the trains-agent services queue
Hi SoggyFrog26 , https://github.com/allegroai/clearml/blob/master/docs/datasets.md
I am using clearml_agent v1.0.0 and clearml 0.17.5 btw
This is no coincidence - Any data versioning tool you will find are somehow close to how git works (dvc, etc.) since they aim to solve a similar problem. In the end, datasets are just files.
Where clearml-data stands out imo is the straightfoward CLI combined with the Pythonic API that allows you to register/retrieve datasets very easily
I also discovered https://h2oai.github.io/wave/ last week, would be awesome to be able to deploy it in the same manner
If the reporting is done on a subprocess, I can imagine that the task.set_initial_iteration(0) call is only effective in the main process, not in the subprocess used for reporting. Could it be the case?
After some investigation, I think it could come from the way you catch error when checking the creds in trains.conf: When I passed the aws creds using env vars, another error poped up: https://github.com/boto/botocore/issues/2187 , linked to boto3
Thanks! With this I’ll probably be able to reduce the cluster size to be on the safe side for a couple of months at least :)
I am trying to upload an artifact during the execution
awesome! Unfortunately, calling artifact["foo"].get() gave me:Could not retrieve a local copy of artifact foo, failed downloading file:///checkpoints/test_task/test_2.fgjeo3b9f5b44ca193a68011c62841bf/artifacts/foo/foo.json
It tries to get it from the local storage, but the json is stored in s3 (it does exists) and I did create both tasks specifying the correct output_uri (to s3)
When installed with http://get.docker.com , it works
I think the best case scenario would be that ClearML maintains a github action that sets up a dummy clearml-server, so that anyone can use it as a basis to run their tests, so that they just have to change to URL of the server to the local one executed in the github action and they can test seamlessly all their code, wdyt?
So there will be no concurrent cached files access in the cache dir?
ok, so even if that guy is attached, it doesn’t report the scalars
Is there any logic on the server side that could change the iteration number?