So once I enqueue it is up? Docs says I can configure the queues that the auto scaler listens to in order to spin up instances, inside the auto scale task - I wanted to make sure that this config has nothing to do to where the auto scale task was enqueued to
I guess what I want is a way to define environment variables in agents
okay lets go
to fix it, I excluded this var entirely from the docker-compose
Good, so if I'm templating something using clearml-task
(without queue, so the task is in draft mode) it will use this task? Even though it never exeucted?
By the way, just inspecting, the CUDA version on the output of nvidia-smi
is matching the driver installed on the host, and not the container - look at the image below
Thanks Martin, code runs as expected
Depending on where the agent is, the value of DATA_DIR
might change
Saving part from task A:
pipeline = trials.trials[index]['result']['pipeline'] output_prefix = 'best_iter_' if i == 0 else 'iter_' task.upload_artifact(name=output_prefix + str(index), artifact_object=pipeline)
Couldn't find any logic on which tasks fail and why... all the lines are exactly the same, only different parameters
Let's take a step back. Let's remove the clearml-services from the docker compose for a second, and run it manually (then you can control everything). Once you have it running manually, let's try to replicate the setup back to the docker compose, make sense ?
I'd prefer not to docker-compose down
as researchers are actively working on it, what do you say that I will manually kill the services agent and launch one myself?
192.168.1.71?
Gotcha, didn't think of an external server as Service Containers are part of Github's offering, I'll consider that
why does it deplete so fast?
I'm saying that because in the task under "INSTALLED PACKAGES" this is what appears
so in my code, I'll use this environment variable to read from disk
Any news on this? This is kind of creepy, it's something so basic that I can't trust my prediction pipeline because sometimes it fails randomly with no reason
AgitatedDove14
So nope, this doesn't solve my case, I'll explain the full use case from the beginning.
I have a pipeline controller task, which launches 30 tasks. Semantically there are 10 applications, and I run 3 tasks for each (those 3 are sequential, so in the UI it looks like 10 lines of 3 tasks).
In one of those 3 tasks that run for every app, I save a dataframe under the name "my_dataframe".
What I want to achieve is once all tasks are over, to collect all those "my_dataframe" arti...