TimelyPenguin76 this fixed it, using the detect_with_pip_freeze
as true
solves the issue
That is not very informative
I'm using iteration = 0 at the moment, and I "choose" the max and it shows as a column... But the column is not the scalar name (because it cuts it and puts the >
sign to signal max).
For the sake of comparing and sorting, it makes sense to log a scalar with a given name without the iteration dimension
Yes, I have a metric I want to monitor so I will be able to sort my experiments by it. It is logged in this manner
logger.report_scalar(title='Mean Top 4 Accuracy', series=ARGS.model, iteration=0, value=results['top_4_acc'].mean())
When looking at my dashboard this is how it looks
I'm using pipe.start_locally
so I imagine I don't have to .wait()
right?
By the examples I figured out this ould appear as a scatter plot with X and Y axis and one point only.. Does it avoid that?
Thanks a lot, that clarifies things
Thia is just keeping getting better and better.... 🤩
okay but still I want to take only a row of each artifact
Could be, my message is that in general, the ability to attach a named scalar (without iteration/series dimension) to an experiment is valuable and basic when looking to track a metric over different experiments
Committing that notebook with changes solved it, but I wonder why it failed
So just to be clear - the file server has nothing to do with the storage?
AgitatedDove14 I still can't get it to work... I couldn't figure out how can I change the clearml version in the runtime of the Cleanup Service as I'm not in control of the agent that executes it
Okay Jake, so that basically means I don't have to touch any server configuration regarding the file-server
on the trains server. It will simply get ignored and all I/O initiated by clients with the right configuration will cover for that?
and then how would I register the final artifact to the pipelien? AgitatedDove14 ⬆
Is there a more elegant way to find the process to kill? Right now I'm doing pgrep -af trains
but if I'll have multiples agents, I will never be able to tell them apart
Maybe something similar to dockers, that I could name each one of my trains agents and then refer to them by name something like
trains-agent daemon --name agent_1 ...
Thentrains-agent stop/start
I've dealt with this earlier today because I set up 2 agents, one for each GPU on a machine, and after editing configurations I wanted to restart only one of them (because the other was working) and then I noticed I don't know which one to kill
actually i was thinking about model that werent trained uaing clearml, like pretrained models etc
cluster.routing.allocation.disk.watermark.low:
or its the same palce in the config file for configuring the docker mode agent base image?
How do I get all children tasks given a parent?