Hi @<1552101458927685632:profile|FreshGoldfish34> , the Scale & Enterprise versions indeed also have different features from what is in the self hosted.
You can see a more detailed comparison here , especially if you scroll down.
Can you maybe add a video of what you're doing?
I'm afraid there is no such capability at the moment. However, I'd suggest opening a GitHub feature request for this 🙂
I think if you copy all the data from original server and stick it in the new server it should transfer all data. Otherwise I think you would need to extract that through the API or copy mongo documents
What are the Elasticsearch, Mongo and apiserver versions in the docker compose are? Backup will only work in this scenario when they are exactly the same between 2 systems.
@<1526734383564722176:profile|BoredBat47> , that could indeed be an issue. If the server is still running things could be written in the databases, creating conflicts
Hi @<1526734383564722176:profile|BoredBat47> , do you see any errors in the elastic container?
Hi @<1857232032669634560:profile|ConvolutedRaven86> , what if you manually run the same code? Did you verify that the code itself will utilize the GPU?
Sharing the same workspace so it makes sense that you'd encounter the same issue being on the same network 🙂
@<1808672054950498304:profile|ElatedRaven55> , If you manually spin up the machines, does the issue reproduce? Did you try running the same exact VM setup manually?
@<1594863230964994048:profile|DangerousBee35> , I'd ask the DevOps to check if there might be something slowing communication from your new network in GCP to the app.clear.ml server
@<1855782485208600576:profile|CourageousCoyote72> , do you see crashes or machines going up or down? Or are just machines from EC2 are not being allocated?
Thank you for the detailed explanation. Can you please add a log of the ec2 instance itself? You can find it in the artifacts section of the autoscaler task. Is it the same autoscaler setup that used to work without issue or were there some changes introduced into the configuration?
Hi @<1855782492460552192:profile|IdealCamel90> , is it possible you're over the limit of application instances?
Hi @<1523701949617147904:profile|PricklyRaven28> , I assume this is happening on the same instance? What if you put in like 20 sec sleep before or after the init call, does this behaviour reproduce?
SubstantialElk6 ,
We were trying with 'from task' at the moment. But the question apply to all methods.
You can specify this using add_function_step(..., execution_queue="<QUEUE>")
Make certain tasks in the pipeline run in the same container session, instead of spawning new container sessions? (To improve efficiency)
I'm not sure this is possible currently. This could a be nice feature request. Maybe open a github request?
Hmmmm I couldn't find something in the SDK, however, you can use the API to do it
Are the cloned tasks running? Can you add logs from the HPO and one of the child tasks?
Can you try hitting F12 and seeing if there are any errors in console?
Hi @<1590514584836378624:profile|AmiableSeaturtle81> , not sure I understand this line
Is the order of --ids the same as returned rows?
Also, regarding the hash, I'd suggest opening a github feature request for this.
Hi @<1572395190897872896:profile|ShortWhale75> , is it possible you're using a very old version of clearml
package?
Hi @<1534344462161940480:profile|QuaintSeal61> , have you upgraded to a new version? Are you self hosted or using the community server? Also, can you elaborate on which part of it is slow? 🙂
Hi @<1534344465790013440:profile|UnsightlyAnt34> , what country are you running the code from? Maybe @<1523701087100473344:profile|SuccessfulKoala55> might have some insight?
@<1556812486840160256:profile|SuccessfulRaven86> , to make things easier to debug, can you try running the agent locally?
GrittyCormorant73 , K8s deployment will have easier time to spin up agent instances to run the tasks 🙂
I think AnxiousSeal95 updates us when there is a new version or release 🙂
Hi PunyWoodpecker71 ,
Regarding your questions:
We have an existing EKS cluster. So I'm wondering if I should deploy ClearML on the EKS cluster, or deploy on EC2 using AMI. Is there an advantage of one over the other?I think it's a matter of personal preference. Maybe SuccessfulKoala55 , can add some information.
We have a pipeline that need to run once a month to train a model. Is there an scheduler option we can config to enqueue the pipeline once a month? (It look like the Pro plan has ...
@<1556812486840160256:profile|SuccessfulRaven86> , did you install poetry inside the EC2 instance or inside the docker? Basically, where do you put the poetry installation bash script - in the 'init script' section of the autoscaler or on the task's 'setup shell script' in execution tab (This is basically the script that runs inside the docker)
It sounds like you're installing poetry on the ec2 instance itself but the experiment runs inside a docker container
WackyRabbit7 , isn't this what you need?
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#get_running_nodes
I think you can just send empty payload for users.get_all
like this {}
and it will return all the users in your database 🙂
SuccessfulKoala55 , is this hack applicable for most API calls in ClearML?