I'm really confused, I'm not sure what is wrong and what is the relationship between the templates the agent and all of those thing
For the meantime, I'm giving up on the pipeline thing and I'll write a bash script to orchestrate the execution, because I need to deliver and I'm not feeling this is going anywhere
On an end note I'd love for this to work as expected, I'm not sure what you need from me. A fully reproducible example will be hard because obviously this is proprietary code. What ...
AgitatedDove14
So nope, this doesn't solve my case, I'll explain the full use case from the beginning.
I have a pipeline controller task, which launches 30 tasks. Semantically there are 10 applications, and I run 3 tasks for each (those 3 are sequential, so in the UI it looks like 10 lines of 3 tasks).
In one of those 3 tasks that run for every app, I save a dataframe under the name "my_dataframe".
What I want to achieve is once all tasks are over, to collect all those "my_dataframe" arti...
anyway, my ultimate goal is to create templates for other tasks... Is that possible in any other way through the CLI?
CostlyOstrich36 so why 1000:1000? My user and group are not that and so do all the otehr files I have under /opt/clearml
few minutes and I'll look at it
Okay, looks interesting but actually there is no final task, this is the pipeline layout
Maybe something similar to dockers, that I could name each one of my trains agents and then refer to them by name something like
trains-agent daemon --name agent_1 ...
Thentrains-agent stop/start
I've dealt with this earlier today because I set up 2 agents, one for each GPU on a machine, and after editing configurations I wanted to restart only one of them (because the other was working) and then I noticed I don't know which one to kill
it seems apiserver_conf doesn't even change
In the larger context I'd look on how other object stores treat similar problems, I'm not that advanced in these topics.
But adding a simple force_download flag to the get_local_copy method could solve many cases I can think of, for example I'd set it to true in my case as I don't mind the times it will re-download when not necessary as it is quite small (currently I always delete the local file, but it looks pretty ugly)
I was here, but I can't find info for the questions I mentioned
is this already available or only on github?
Yep what 😄
alabaster==0.7.12 appdirs==1.4.4 apturl==0.5.2 attrs==21.2.0 Babel==2.9.1 bcrypt==3.1.7 blinker==1.4 Brlapi==0.7.0 cachetools==4.0.0 certifi==2019.11.28 chardet==3.0.4 chrome-gnome-shell==0.0.0 clearml==1.0.5 click==8.0.1 cloud-sptheme==1.10.1.post20200504175005 cloudpickle==1.6.0 colorama==0.4.3 command-not-found==0.3
Thia is just keeping getting better and better.... 🤩
TimelyPenguin76 I think our problem is that the agent is not using this environment, I'm not sure which one he does... Is there a way to hard-code the agent environment?
this is the full one TimelyPenguin76
Yep, the trains server is basically a docker-compose based service.
All you have to do is change the ports in the docker-compose.yml file.
If you followed the instructions in the docs you should find that file in /opt/trains/docker-compose.yml and then you will see that there are multiple services ( apiserver , elasticsearch , redis etc.) and in each there might be a section called ports which then states the mapping of the ports.
The number on the left, is ...
AgitatedDove14 I really don't know how is this possible... I tried upgrading the server, tried whatever I could
About small toy code to reproduce I just don't have the time for that, but I will paste the callback I am using to this explanation. This is the overall logic so you can replicate and use my callback
From the pipeline task, launch some sub tasks, and put in their post_execute_callback the .collect_description_tables method from my callback class (attached below) Run t...
Okay, so let me get this straight
The autoscaling is basically an ever-running task (lets say on the services queue). Now, the actual auto scaling and which queues exist have nothign to do with that, and are configured in the auto scale task?
Now I remind you that using the same credentials exactly, the auto scaler task could launch instances before
AgitatedDove14 sorry for delayed reply - where do I read the version the Cleanup Service is using?