Reputation
Badges 1
533 × Eureka!AgitatedDove14 , I followed the instructions for updating the ClearML server, and the visualization stays the same
bottom line I want to edit the cleanup service code to only delete tasks under a specific project - how do I do that?
I was sure you are on Israel times as well, sorry for the night time thing 😄
Continuing on this line of thought... Is it possible to call task.execute_remotely on a CPU only machine (data scientists' laptop for example) and make the agent that fetches this task to run it using GPU? I'm asking that because it is mentioned that it replicates the running environment on the task creator... which is exactly what I'm not trying to do 😄
The scenario I'm going for is never to run on the dev machine, so all I'll need to do once the server + agents are up is to add task.execute_remotely... after the Task.init line and after the execution of the script is called on the dev machine, it won't actually run but rather enqueue itself for the agent to run it?
the path to the JSON file
Let's take a step back. Let's remove the clearml-services from the docker compose for a second, and run it manually (then you can control everything). Once you have it running manually, let's try to replicate the setup back to the docker compose, make sense ?
I'd prefer not to docker-compose down as researchers are actively working on it, what do you say that I will manually kill the services agent and launch one myself?
This is the pip freeze of the environment I don't know why it differs from what the agent has... the agent only has a subset of these google libs
Okay so regarding the version - we are using 1.1.1
The thing with this error it that it happens sometimes, and when it happens it never goes away...
I don't know what causes it, but we have one host where it works okay, then someone else checks out the repo and tried and it fails for this error, while another guy can do the same and it will work for him
That is not very informative
I really don't know, as you can see in my last screenshot, I've configured my base image to be 10.1
Yes, I have a metric I want to monitor so I will be able to sort my experiments by it. It is logged in this manner
logger.report_scalar(title='Mean Top 4 Accuracy', series=ARGS.model, iteration=0, value=results['top_4_acc'].mean())
When looking at my dashboard this is how it looks
I'll check if this works tomorrow
Or should I change all three of them?
In my use case I'm using an agent on the same mahcine I'm developing, so pointing this env var to the same venv I'm using for development, will skip the venv creation process from teh task requirements?
By the way, just inspecting, the CUDA version on the output of nvidia-smi is matching the driver installed on the host, and not the container - look at the image below
I guess not many tensorflowers running agents around here if this wasn't brought up already
I was trying out the pipeline controller for the first time and I felt a bit of a burden that just for the sake of trying I had to launch an agent
Will try this out and report
If this includes scheduling through pipelines, in my opinion there should be an option to execute a pipeline without an agent. Sometimes for development I just want to execute a pipeline on my local machine just as I would a task...
First of all I wasn't aware that was an option - but I think it's preferable to be able to do it through the command line. Because I'm developing the pipeline to be executed remotely, but for debugging I run it locally.
Using what you showed I can obviously write it, and delete it once it is ready, and rewrite it when I'm debugging or adding features - but I think DX-wise it would be nicer to be able to trigger this functionality through the command line
Lets see if this is really the issue
Thia is just keeping getting better and better.... 🤩