
Reputation
Badges 1
92 × Eureka!AgitatedDove14 Maybe I need to change something here: apiserver.conf
for increasing workers number?
Hi SuccessfulKoala55 and AgitatedDove14 ,
Thanks for the quick replay.
I'm not sure I understand your use-case - do you always want to change the contents of the file in your code? Why not change it before connecting?
Changing the file before the connect will make sense only when I am running locally and the file exists. Remotely I must get the file with connect_configuration(path, name=name)
before I am reading it.
"local_path" is ignored, path is a temp file, and the c...
Thanks I will upgrade the server for now and will let you know
ARG USER_ID=1000 RUN useradd -m --no-log-init --system --uid ${USER_ID} appuser -g sudo RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers USER appuser WORKDIR /home/appuser
I am sure you add this timeout for a reason.
Probably since increasing the timeout can affect other functionality. .
Am I wrong?
Thanks I will upgrade my instance type and the add more workers. where I need to configure it?
OK thanks for the answer.. I will usetask.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
as you suggested for now..
If you will add something like I suggest can you notify me?
From the UI it will since it getting the temp file from there.
I mean from the code (let say remotely)
I an running trains-server on AWS with your AMI (instance type t3.large)
The server runs very good, and works amazing!
Until we start to run more training in parallel (around 20).
Then, the UI start to be very slow and getting timeouts often.
Does upgrading the instance type can help here? or there is some limit to parallel running?
Hi SuccessfulKoala55 ,
Dose running_remotely()
will return True even if the task was enqueued from UI and not by execute_remotely
?
Hi DeterminedCrab71 .
Thanks 🙂
Thanks for the reply,
I saw that it prefer to change the fileserver in trains.conf to s3://XXX
So, I changed this as I wrote before.
Hi, AgitatedDove14 Thanks for the answer.
I think the upload reporting (files over 5mb) was added post 0.17 version,
That what I thought...
I think it can be helpful to add it to the conf since 5MB is really small and my files are ~300MB, meaning 60 messages for each upload.
Another option is maybe to configure it as Task.init() parameter
I think both are OK 🙂
OHH nice, I thought that it just some kind of job queue on up and running machines
Thanks I just want to avoid giving the credentials to every user.
If it won't possible, I will do it..
SuccessfulKoala55 Thanks 🙏 I will give it a try tomorrow 🙂
If I will mount the S3 bucket to the trains-server and link the mount to /opt/trains/data/fileserver does it will work?
I update to the new version 0.16.1 few weeks away and it works using the elastic_upgrade.py
SuccessfulKoala55 and AppetizingMouse58 Thanks you very much!!
I have a future question:
Does this fix should harm in future cleraml-server upgrade?
Or what the best practice to upgrade after doing it?
Regarding of moving the fileserver to S3, what is the best way to move the old data to S3 ?
I think if I will move all the /opt/trains/data/fileserver to s3,
the trains-server will not know that right?
I have an other question.
Now that I using the root user it looks better,
But my docker image has all my code and all the packages it needed I don't understand why the agent need to install all of those again?
I tried you solution but since my path is to a YAML file,
and task.set_configuration_object(name=name, config_taxt=my_params)
upload this not in the same format task.connect_configuration(path, name=name)
it not working for me 😞
(even when I am using config_type='yaml'
)
my docker has my project on it all ready so I know where to mount. Maybe the agent moves/create copy of my project somewhere else?