Reputation
Badges 1
25 × Eureka!try Hydra/trainer.params.batch_size
hydra separates nesting with "."
AdventurousRabbit79 you are correct, caching was introduced in v1.0 , also notice the default is no caching, you have to specify that you want caching per step.
Hi DisgustedDove53
Is redis used as permanent data storage or just cache?
Mostly cache (Ithink)
Would there be any problems if it is restarted and comes up clean?
Pretty sure it should be fine, why do you ask ?
Logger.current_logger()Will return the logger for the "main" Task.
The "Main" task is the task of this process, a singleton for the process.
All other instances create Task object. you can have multiple Task objects and log different things to them, but you can only have a single "main" Task (the one created with Task.init).
All the auto-magic stuff is logged automatically to the "main" task.
Make sense ?
Hi ReassuredTiger98
However, the clearml-agent also stops working then.
you mean the clearml-agen daemon (the one that spinned the container) is crashing as well ?
Hi @<1715175986749771776:profile|FuzzySeaanemone21>
and then run "clearml-agent daemon --gpus 0 --queue gcp-l4" to start the worker.
I'm assuming the docker service cannot spin a container with GPU access, usually this means you are missing the nvidia docker runtime component
Hi @<1684010629741940736:profile|NonsensicalSparrow35>
So sorry I missed this thread π
Basically your issue is the load balancer that prevents the post command, you can change that, just add to any clearml.conf the following line:
api.http.default_method: "put"
Since my deps are listed in the dependencies of my setup.py, I don't want clearml to list the dependencies of the current environment
Make sense π
Okay let me check regrading the "." in the venv cache.
Oh...
None
try to add to your config file:
sdk.http.timeout.total = 300
But do consider a sort of a designer's press kit on your page haha
That is a great idea!
Also you can use:
https://2928env351k1ylhds3wjks41-wpengine.netdna-ssl.com/wp-content/uploads/2019/11/Clear_ml_white_logo.svg
Actually this is by default for any multi node training framework torch DDP / openmpi etc.
Hi @<1546665666675740672:profile|AttractiveFrog67>
- Make sure you stored the model's checkpoint (either pass
output_uri=TrueinTask.initor manually upload) - When you call
Task.initpass "continue_last_task=True" - Now you can do
last_checkpoint=task.models["output"][-1].get_local_copy()and all you need is to loadlast_checkpoint
this is very odd, can you post the log?
it fails but with COMPLETED status
Which Task is marked "completed" the pipeline Task or the Step ?
2021-07-11 19:17:32,822 - clearml.Task - INFO - Waiting to finish uploads
I'm assuming a very large uncommitted changes π
Ad1. yes, think this is kind of bug. Using _task to get pipeline input values is a little bit ugly
Good point, let;s fix it π
new pipeline is built from scratch (all steps etc), but by clicking "NEW RUN" in GUI it just reuse existing pipeline. Is it correct?
Oh I think I understand what happens, the way the pipeline logic is built, is that the "DAG" is created the first time the code runs, then when you re-run the pipeline step it serializes the DAG from the Task/backend.
Th...
Hi JitteryCoyote63
The new pipeline is almost ready for release (0.16.2),
It actually contains this exact scenario support.
Check out the example, and let me know if it fits what you are looking for:
https://github.com/allegroai/trains/blob/master/examples/pipeline/pipeline_controller.py
VexedCat68
a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.
From this description it sounds like you created a trigger cycle, am I missing something ?
Basically you can break the cycle by saying, trigger only on New Dataset with a specific Tag (or create the auto dataset in a different project/sub-project).
This will stop your automatic dataset creation from triggering the "orig...
Hi SteadyFox10
I'll use your version instead and put any comment if I find something.
Feel free to join the discussion π https://github.com/pytorch/ignite/issues/892
Thansk for theΒ
ouput_uri
Β can I put in theΒ
~/trains.conf
Β file ?
Sure you can π
https://github.com/allegroai/trains/blob/master/docs/trains.conf#L152
You can add it in the trains-agent machine's conf file, or/and on your development machine. Notice that once you run ...
Hi MiniatureCrocodile39
I would personally recommend the ClearML show π
https://www.youtube.com/watch?v=XpXLMKhnV5k
https://www.youtube.com/watch?v=qz9x7fTQZZ8
I do it to get project name
you can still get it from the task object (even after closing it)
another place I was using was to see if i am in a pipeline task
Yes that makes sense, this is one of the use cases (to see get access to the Task that is currently running). The bug itself will only happen after closing the Task (it needs to clear OS variable).
You can either upgrade to the 1.0.6rc2 or you can hack/fix it with :
` os.environ.pop('CLEARML_PROC_MASTER_ID', None)
os.envi...
GiganticTurtle0 BTW, this mock example worked out of the box (python 3.6 on Ubuntu):
` from typing import Any, Dict, List, Tuple, Union
from clearml import Task
from dask.distributed import Client, LocalCluster
def start_dask_client(
n_workers: int = None, threads_per_worker: int = None, memory_limit: str = "2Gb"
) -> Client:
cluster = LocalCluster(
n_workers=n_workers,
threads_per_worker=threads_per_worker,
memory_limit=memory_limit,
)
client = Cli...
the unclear part is how do I sample another point in the optimization space from the optimizer
Just so I'm clear on the issue, you want multiple machines to access the internals of the optimizer class ? or Do you just want a way to understand what is the optimizer sampling space (i.e. the parameters and options per parameter) ?
Thank you, I would love to make sure we fix it
Is this reproducible? I tried to run the same example code on my machine, and it started training ...
Do you have issues with other pytorch examples? Could you try simple reporting example:
https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py