I mean , the python package, not the trains-server version
Logger.current_logger()
Will return the logger for the "main" Task.
The "Main" task is the task of this process, a singleton for the process.
All other instances create Task object. you can have multiple Task objects and log different things to them, but you can only have a single "main" Task (the one created with Task.init).
All the auto-magic stuff is logged automatically to the "main" task.
Make sense ?
Should work out of the box, as long as the task was started. You can forcefully start the task with:task.mark_started()
maybe you can check alsoÂ
--version
 that returns the helm menu
What do you mean? --version on cleaml-task ?
Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output:
Run with --debug as the first parameter
Are you running the latest from the git repo ?
I have a client that runs clearml-session and i saw from the agent's logs that the installation of vscode fails.
That makes sense, it downloads the vscode in runtime, do you have an alternative location? or maybe it is easier to built a container with the vscode pre installed ?
Notice this is only when:
Using Conda as package manager in the agent the requested python version is already installed (multiple python version installation on the same machine/container are supported)
python version to be used and conda will install it
clearml does that automatically (albeit it is not shown in the UI, which should be fixed)
We use nifty images, except for an 3D array the image also contains voxel spacing, and origin and direction in a world frame
Yep, make sense ... you can just upload them as debug samples from local files.
I guess the main difference is the context, debug samples (used for debugging) vs artifacts (might be useful from other Tasks / context)
https://github.com/allegroai/clearml/blob/6b9297660e0ed83a77bce3da2fab384c552206fd/examples/reporting/image_reporting.py#L36
BTW: GreasyPenguin14 you can also upload them as debug samples (when setting the output_uri, the debug samples will be uploaded to the same destination)
https://github.com/allegroai/clearml/blob/6b9297660e0ed83a77bce3da2fab384c552206fd/examples/reporting/image_reporting.py#L21
Is it possible to get the folder with the artifacts/models? (edited)
You can directly get the artifacts/models url then deduce the foldertask = Task.get_task('my_task_id') print(task.artifacts['my artifact'].url)
We are using k8s glue to spawn the job. ...
I think this is actual network latency, nothing to do with the jobs, could it be the server is very far away?
What happens when you manually start a Task from your machine ?
Is the latency fixed? Is it just when starting a new Task?
Hi SubstantialElk6
32 CPU cores, 64GB ram
Should be plenty, this sounds like network bottle neck issue, I can't imagine the server is actually CPU bounded
We have tried to manually restart tasks reloading all the scalars from a dead task and loading latest saved torch model.
Hi ThickKitten19
how did you try to restart them ? how are you monitoring dying instances ? where . how they are running?
How does ClearML select reference branch? Could it be that ClearML only checks "origin" branch?
Yes 😞 I think we can quickly fix that, I'm just trying to realize if there are down sides to running "git ls-remote --get-url" without origin
ElegantCoyote26 could be, if the Task run is under 30sec?!
Hi FancyWhale93pipe.start()
should actually stop the local pipeline logic execution and fire it on the "services queue".
The idea is that you can launch the pipeline locally, but the actual execution of the entire logic is remote.
You can have the pipeline running locally if you call pipe.start_locally
or also run the steps locally (as sub processes) with pipe.start_locally(run_pipeline_steps_locally=False)
BTW: based on your example, a more intuitive code might be the pi...
Hi GiddyTurkey39 ,
When you say trains agent, are you referring to the trains agent command ...
I mean running the trains-agent daemon
on a machine. This means you have a daemon pulling jobs from the execution queue and executing them (either in virtual environment, or inside a docker)
You can read more about https://github.com/allegroai/trains-agent and https://allegro.ai/docs/concepts_arch/concepts_arch/
Is it sufficient to queue the experiments
Yes there is no ne...
EnviousStarfish54 you can use Use Task.set_credentials
Notice that OS environment or trains.conf will override the programmatic credentials
https://allegro.ai/docs/task.html#trains.task.Task.set_credentials
Your code should have worked, i.e. you should see the 'model.h5' in the artifacts tab. What do you have there?
It should look something like this one:
https://demoapp.trains.allegro.ai/projects/531785e122644ca5b85b2e19b0321def/experiments/e185cf31b2634e95abc7f9fbdef60e0f/artifacts/output-model
BTW:
To manually register any model:
from trains import Task, OutputModel task = Task.init('examples', 'my model') OutputModel().update_weights('my_best_model.h5')
EnviousStarfish54 Sure, see scatter2d
https://allegro.ai/docs/examples/reporting/scatter_hist_confusion_mat_reporting/#2d-scatter-plots
WackyRabbit7
we did execute locally
Sure, instead of pipe.start()
use pipe.start_locally(run_pipeline_steps_locally=False)
, this is it 🙂
It is for storing the predictions a trained model makes, so two different models do create slightly different images
That actually makes sense.
So how would you create exactly the same file (i.e. why do you need to manually control the upload folder, wouldn't creating a new unique folder suffice ?)
. It is not possible to specify the full output destination right?
Correct 😞
Hi ScatteredClams84
Is there any parameter that adjusts the "number of files that can be stored in the cache"? I am using clearml python version 1.0.3 to upload artifacts and get the artifacts back from a task. (edited)
Yes you are correct, the default value is 100 entries.
You can configure it in the clearml.conf, just add:sdk.storage.cache.default_cache_manager_size = 1000
or from code:
` from clearml.storage.cache import CacheManager
CacheManager.get_cache_manager(cache_file_...
Hi SmallDeer34
The any generally any pytorch.save(...) is logged/uploaded by clearml
automatically. specifically in your case I think the only missing one is the trainer_sate.json, which I assume is general json file, and I imagine is part of huggingface framework. You can easily upload it as additional artifact with Task.upload_artifact
wdyt?
Could I use "register artifact"
I think this is somewhat deprecated and we should probably replace it with something similar to what you mentioned (i.e. watch a file change).
Right now the easiest way would e to manually upload the trainer_state.json
every checkpoint:Task.current_task().upload_artifact('trainer_state.json
, name='state') `