Interesting! Do they happen to have the same machine name in UI?
Hi @<1717350310768283648:profile|SplendidFlamingo62> , are you using a self hosted server or the community?
And if you clone the the same experiment and run it on the same machine it will again download all packages?
1.7.0 is the latest release for ClearML self hosted. The issue does not reproduce there. Can you try upgrading your server?
https://github.com/allegroai/clearml-server
I think clearml-web is just the source code for the UI.
How did you name the alternative clearml.conf
file?
I'm assuming this is 8 gigs of ram?
I think it should suffice. To be entirely sure you can run one of the AMI's listed below and see it's specifications:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_aws_ec2_ami/#latest-version
Hi CurvedHedgehog15 ,
Can you please provide a short snippet to reproduce this?
In which section are we looking at currently in the comparison?
I'm accessing both using SSH tunneling & the same domain
I guess we found the culprit 🙂
Hi @<1546303277010784256:profile|LivelyBadger26> I'm afraid that in the free version everyone is an admin. In the scale & Enterprise licenses you have full role based access controls on all elements in the system (from experiments to which workers can be provisioned to whom)
Hi @<1792726992181792768:profile|CloudyWalrus66> , from a short read on the docs it seems simply as a way to spin up many machines with many different configurations with very few actions.
The autoscaler spins up and down regular ec2 instances and spot instances automatically by predetermined templates. Basically making the fleet 'feature' redundant.
Or am I missing something?
What is the address of your server?
Can you add a log?
How did you call the steps in the pipeline?
@<1581454875005292544:profile|SuccessfulOtter28> , I don't think there is such a capability currently. I'd suggest opening a GitHub feature request for this.
Hi PanickyMoth78 ,
What is the step trying to do when you hit the exception?
Are you running a self deployed server? What is the version if that is the case?
Hi JumpyPig73 ,
It appears that only the AWS autoscalar is in the open version and other autoscalars are only in advanced tiers (Pro and onwards):
https://clear.ml/pricing/
DeterminedOwl36 , what version of ClearML are you using? Also, does it happen if you run the script standalone and not through jupyter notebook?
We're looking into this currently 🙂
AdventurousButterfly15 please try upgrading to 1.4.0 - this should solve the issuepip uninstall clearml-agent -y && pip install -U clearml-agent
Hi @<1784754456546512896:profile|ConfusedSealion46> , in that case you can simply use add_external_files to the files that are already in your storage. Or am I missing something?
Hi ElegantCoyote26 ,
What happens if you delete ~/.clearml
(This is the default cache for ClearML) and rerun?
I think you can either add the requirement manually through code ( https://clear.ml/docs/latest/docs/references/sdk/task#taskadd_requirements ) or force the agent to use the requirements.txt when running in remote
Hi @<1748153283605696512:profile|GreasyPenguin24> , you certainly can. CLEARML_CONFIG_FILE is the environment variable that allows you to use different configuration files
You need to use a docker image that already has the cuda package installed. Also don't forget to run the agent in --docker
mode 🙂
@<1539417873305309184:profile|DangerousMole43> , I think for this specific ability you would need to re-write your pipeline code with pipelines from decorators
Hi @<1717350332247314432:profile|WittySeal70> , I think that task.get_reported_plots()
is indeed what you're looking for. You might have to do some filtering there