Reputation
Badges 1
25 × Eureka!Hi UptightBeetle98
The hyper parameter example assumes you have agents ( trains-agent ) connected to your account. These agents will pull the jobs from the queue (which they are now, aka pending) setup the environment for the jobs (venv or docker+venv) and execute the job with the specific arguments the optimizer chose.
Make sense ?
GiganticTurtle0 is it just --stop that throws this error ?
btw: if you add --queue default to the command line I assume it will work, the thing is , without --queue it will look for any queue with the "default" tag on it, since there are none, we get the error.
regardless that should not happen with --stop I will make sure we fix it
Just so we do not forget, can you please open an issue on clearml-agent github ?
I get what you're saying. Only problem is in the case of AutoLogging, I don't have the model id, for the model being saved.
Task.models['output'] should return all the model objects the autologging created
why not let the user start with an empty comparison page and add them from "Add Experiment" button as well?
Apologies, I was not clear. Yes I'm with you, this is a great idea π
I'm assuming you are building for x86
CleanWhale17 nice ... π
So the answer is Trains supports the Pipeline / Automation of it, but lacks that dataset integration (that is basically up to you to manage, with either artifacts or any other method)
The Allegro Enterprise allows you to rerun the code, on a new version of the dataset from the UI (or automation) without changing a single line of code π
The second run prints out the same (non) "random" numbers as the first run
ClearML sets the initial random seed for you, basically trying to help with reproducibility. That said inside the function you can always do:import random import time random.seed(time.time())
Woo, what a doozy.
yeah those "broken" pip versions are making our life hard ...
so for example if there was an idle GPU and Q3 take it and then there is a task comes to Q2 which we specified 3GPU but now the Q3 is taken some of these GPU what will happen
This is a standard "race" the first one to come will "grab" the GPU and the other will wait for it.
I'm pretty sure enterprise edition has preemption support, but this is not currently part of the open source version (btw: also the dynamic GPU allocation, I think, is part of the enterprise tier, in the opensource ...
Hi BrightGoat74
So merging general purpose plotly plots is very hard (i.e. putting both on the same graph)
But if you report using logger.report_scatter(...) the UI will merge the ROC curves into the dame graph, wdyt?
https://clear.ml/docs/latest/docs/guides/reporting/scatter_hist_confusion_mat_reporting#2d-scatter-plots
Run clearml-agent and enqueue the pipeline ? What am i missing?
BTW: CloudyHamster42 I think this issue was discussed on GitHub, and the final "verdict" was we should have an option to split/combine graphs on the UI side (i.e. similar to the "smoothing" or wall-time axis etc.)
Is it possible to make a connection to a S3 bucket via this authentication method with the open source version on EKS?
Hi BoredBluewhale23
In your setup, are we talking about agents running inside the Kubernetes cluster, or clients connecting from their own machine ?
Hi IrritableJellyfish76
https://clear.ml/docs/latest/docs/references/sdk/task#taskget_tasks
task_name
(
str
) β The full name or partial name of the Tasks to match within the specified
project_name
(or all projects if
project_name
is
None
). This method supports regular expressions for name matching. (Optional)
You are right, this is a bit confusing, I will make sure that we add in the docstring an examp...
SoggyFrog26 you'll have it in the next RC π
Not sure what's the plan I know one should be out today/tomorrow, worst case on the next one π
VexedCat68 actually a few users already suggested we auto log the dataset ID used as an additional configuration section, wdyt?
the separate experiments are not starting back at iteration 0
What do you mean by that?
The docker crashes and I want to be abel to debug it exactly as it is run by the agent
On your machine (any machine)
pip install clearml-agent
clearml-agent build --id <taskID> --docker "local_mydocker_name"
docker run -it local_mydocker_name bash
Actually it is better to leave it as is, it will just automatically mount the .ssh folder into the container, i will make sure the docs point to this option first
RoundMosquito25 how is that possible ? could it be they are connected to a different server ?
CloudyHamster42 you mean that when you set sdk.metrics.tensorboard_single_series_per_graph to True and you rerun the experiment, you are still getting multiple series on the same graph?
What's your Trains version?
Also, on the ClearML dashboard, I can see theΒ
clearml-agent
Β log:
Is the clearml-agent running in docker mode ?
Ohh I see, so basically the ASG should check if the agent is Idle, rather than the Task is running ?
CooperativeFox72 could you expand on "not working"?
If you have a yaml file, I would do:
` # local_path = './my_config.yaml'
path = task.connect_configuration(local_path, name=name)
if task.running_locally():
with open(local_path, "r") as config_file:
my_params_dict = yaml.load(config_file, Loader=yaml.FullLoader)
my_params_dict['change_me'] = 'new value'
my_params_text = yaml.dump(my_params_dict)
store back the change, my_params assumed to be the content of the param file (tex...
The default cleanup service should work with S3 with a correctly configured clearml service agent if I understand the workings correctly.
Yes I think you are correct
I am referring to the UI.
In that case, no π . This is actually a backend server change (from the UI it should be relatively simple). Is this somehow a showstopper ?
HighOtter69
Could you test with the latest RC? I think this fixed it:
https://github.com/allegroai/clearml/issues/306
Hi CheerfulGorilla72
the "installed packages" section is used as "requirements.txt for the agent.
Are you saying the autodetection fails to detect all packages? You can specify in "manual execution" (i.e not when the agent is running the code), to just take the requirements.txt locally:` Task.force_requirements_env_freeze(requirements_file="./requirements.txt")
notice the above call should be executed Before Task.init
task = Task.init(...) `3. If you clear all the "installed packages" se...
Hi ThickDove42 ,
Yes, but by the time you will be able to access it, it will be in a display form (plotly), not very convient.
If this is something you need to re-use, I would argue that it is an artifact and should be stored as artifact (then accessing it is transparent) , obviously you can both report as table and upload as artifact, no harm in that.
what do you think?
yup! that's what I was wondering if you'd help me find a way to change the timings of. Is there an option I can override to make the retry more aggressive?
you mean wait for less?
None
add to your clearml.conf:
api.http.retries.backoff_factor = 0.1