Hi @<1798162812862730240:profile|PreciousCentipede43> , can you add logs from the apiserver pod?
To test the agent you don't need to use the k8s docker image for it. Is there a reason you're not testing the agent as a package?
You can simply run the script from inside the repo once, you can also use execute_remotely to avoid actually running the entire thing
VexedCat68 , can you try accessing it as
192.168.15.118:8080/login first?
Can you please add here what you're sending + what is received?
Hi @<1590514584836378624:profile|AmiableSeaturtle81> , can you provide a snippet that reproduces this and also the flow of how you registered the data?
But I believe you can use your user/group if you prefer ๐
Does it save the code in the uncommitted changes?
Hopefully will have updates soon
Yeah, I understand the logic of wanting this separation of iteration vs epoch since they sometimes correlate to different 'events'. I don't think there is an elegant way out of the box to do it currently.
Maybe open a GitHub feature request to follow up on this ๐
I'd suggest using the agent in --docker mode
Does the Autoscaler try to spin new instances?
SparklingHedgehong28 , have you tried upgrading to pro? That is the easiest way to evaluate ๐
I think if you inject exactly the same data then it will be copied. I suggest the databases be down during this operation.
YummyLion54 hi!
are you referring to PARAMETERS ย section OR to theย CONFIGURATION OBJECTS
RoughTiger69 Hi ๐
Regarding your questions:
Moving certain tasks/datasets from server to server would require a data merge. This process basically requires merging the databases (mongodb, elasticsearch, files etc.). I think it's something that can be done in the paid version as a service but not in the free. I think if you'd like to 'promote' tasks to production you can either work on a special project for that OR upload the models to S3 and then re-run the experiment and point it to the...
Usually tasks are timed out by default after not having any action after 2 hours. I guess you could just keep the task alive as a process on your machine by printing something once every hour or 30 minutes
Hi AttractiveCockroach17 , in the first question - clearml captures the packages used during the run. What does your script use and what does clearml capture when running locally on your machine?
You can configure the clearml to capture your entire environment as well.
Regarding 2:
Can you please expand on the entire process?
I would ask the IT people managing your server to check server uptime and for any errors in the apiserver log. This is something who's managing the server will know what to do.
I think this is an enterprise only feature
Can you please add log of all related runs including the pipeline controller
LethalCentipede31 , it appears we had an internal issue with a load balancer, it was fixed a couple of minutes after your comment ๐
I think you need to specify some pythonic object to work with torch.save() - as it appears in their documentation:
https://pytorch.org/docs/stable/generated/torch.save.html
Just making sure we cover all bases - you changed updated the optimized to use a base task with _allow_omegaconf_edit_ : True
Hi @<1524922424720625664:profile|TartLeopard58> , you mean the side bar on the left with projects/datasets/etc... ?