Hi @<1853608151669018624:profile|ColossalSquid53> , if there is no connectivity to the clearml server, your python script will run regardless. clearml
will cache all logs/events and then flush them once connectivity to the server is resumed.
Are you running the agent on the same machine as the server?
Hi @<1523701260895653888:profile|QuaintJellyfish58> , can you elaborate on what uv
is?
Also, how are you trying to download them?
Hi @<1664079296102141952:profile|DangerousStarfish38> , you would need to download the dataset to local using get_local_copy
I suggest going through the docs:
None
Can you add the full log + a snippet to reproduce this?
Meaning that you should configure your host as follows host: "somehost.com:9000"
Please open developer tools (F12) and see if you're getting any console errors when loading a 'stuck' experiment
Hi DefiantSpider5 , please use the following contact form for business related questions 🙂
https://clear.ml/contact-us/
Hi @<1582179661935284224:profile|AbruptJellyfish92> , how do the histograms look when you're not in comparison mode?
Can you provide a self contained snippet that creates such histograms that reproduce this behavior please?
Hi @<1594863230964994048:profile|DangerousBee35> , it sounds like some sort of network lag. I assume you are using app.clear.ml?
I'd check network latency from the instances starting in GCP to the server.
In the webUI, when you go to the dataset, where do you see it is saved? You can click on 'full details' in any version of a dataset and see that in the artifacts section
It's already implemented in the GCP autoscaler. You can use preemptible instances with GPUs
MuddySquid7 , we're having a look and testing it. Thanks!
Yes. Run all the pipelines examples and see how the parameters are added via code to the controller.
For example:
None
@<1533619725983027200:profile|BattyHedgehong22> , it appears from the log that it is failing to clone the repository. You need to provide credentials in clearml.conf
Hi @<1837300695921856512:profile|NastyBear13> , can you provide logs from the machine itself? Are you certain its the same VM? Can you also provide logs from the tasks themselves? \
You can do it by comparing experiments, what is your use case? I think I might be missing something. Can you please elaborate?
Hi @<1523701122311655424:profile|VexedElephant56> , do you get the same response when you try to run a script with Task.init() without agent on that machine?
Hi @<1523703397830627328:profile|CrookedMonkey33> , you can also set the credentials with an env variable. Would that work?
Can you verify you ~/.clearml.conf
has proper configuration. If you dofrom clearml import Task t=Task.init()
Does this work?
Hi @<1552101458927685632:profile|FreshGoldfish34> , the Scale & Enterprise versions indeed also have different features from what is in the self hosted.
You can see a more detailed comparison here , especially if you scroll down.
AgitatedDove41 , forgot to add the link for the docs, here it is:
https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler/
Oh, I understand. I'm guessing the next 1-2 months would be a timeframe for a new release of the server.
Hi @<1523708920831414272:profile|SuperficialDolphin93> , simply set output_uri=/mnt/nfs/shared
in Task.init
Hi @<1576381444509405184:profile|ManiacalLizard2> , it looks like the default setting is still false
Hi @<1523701295830011904:profile|CluelessFlamingo93> , I think you can also control the agent sampling rate (to sample queue every 10 or 20 seconds instead of 5 for example)
Hi @<1789465500154073088:profile|ScaryShrimp8> , can you provide the full log of the run? How did you set up & run the agent? What version and what OS are you on?
Hi @<1635088270469632000:profile|LividReindeer58> , you should do a separation. The pipeline controller should run on the services queue. Pipeline steps should run on different queues. This is why they are sitting in pending - there is no free worker to pick them up.