Reputation
Badges 1
981 × Eureka!That's why I suspected trains was installing a different version that the one I expected
correct, you could also use
Task.create
that creates a Task but does not do any automagic.
Yes, I didn't use it so far because I didn't know what to expect since the doc states:
"Create a new, non-reproducible Task (experiment). This is called a sub-task."
In my github action, I should just have a dummy clearml server and run the task there, connecting to this dummy clearml server
is there a command / file for that?
What I put in the clearml.conf is the following:
agent.package_manager.pip_version = "==20.2.3" agent.package_manager.extra_index_url: [" "] agent.python_binary = python3.8
Hi SuccessfulKoala55 , will I be able to update all references to the old s3 bucket using this command?
BTW, is there any specific reason for not upgrading to clearml?
I just didn't have time so far ๐
very cool, good to know, thanks SuccessfulKoala55 ๐
Ha I see, it is not supported by the autoscaler > https://github.com/allegroai/clearml/blob/282513ac33096197f82e8f5ed654948d97584c35/trains/automation/aws_auto_scaler.py#L120-L125
Hi AgitatedDove14 , initially I was doing this, but then I realised that with the approach you suggest all the packages of the local environment also end up in the โinstalled packagesโ, while in reality I only need the dependencies of the local package. Thatโs why I use _update_requirements , with this approach only the package required will be installed in the agent
I also tried setting ebs_device_name = "/dev/sdf" - didn't work
my agents are all .16 and I install trains 0.16rc2 in each Task being executed by the agent
How exactly is the clearml-agent killing the task?
I have two controller tasks running in parallel in the trains-agent services queue
Hi SoggyFrog26 , https://github.com/allegroai/clearml/blob/master/docs/datasets.md
I am using clearml_agent v1.0.0 and clearml 0.17.5 btw
This is no coincidence - Any data versioning tool you will find are somehow close to how git works (dvc, etc.) since they aim to solve a similar problem. In the end, datasets are just files.
Where clearml-data stands out imo is the straightfoward CLI combined with the Pythonic API that allows you to register/retrieve datasets very easily
I also discovered https://h2oai.github.io/wave/ last week, would be awesome to be able to deploy it in the same manner
If the reporting is done on a subprocess, I can imagine that the task.set_initial_iteration(0) call is only effective in the main process, not in the subprocess used for reporting. Could it be the case?
After some investigation, I think it could come from the way you catch error when checking the creds in trains.conf: When I passed the aws creds using env vars, another error poped up: https://github.com/boto/botocore/issues/2187 , linked to boto3
Thanks! With this Iโll probably be able to reduce the cluster size to be on the safe side for a couple of months at least :)
I am trying to upload an artifact during the execution
awesome! Unfortunately, calling artifact["foo"].get() gave me:Could not retrieve a local copy of artifact foo, failed downloading file:///checkpoints/test_task/test_2.fgjeo3b9f5b44ca193a68011c62841bf/artifacts/foo/foo.json
It tries to get it from the local storage, but the json is stored in s3 (it does exists) and I did create both tasks specifying the correct output_uri (to s3)
When installed with http://get.docker.com , it works