on the host machine or inside the containers that are spinning on the host machine ?
Hi RipeGoose2
when creating a task the default path is still there
What do you mean by "PATH" do you want to provide path for the config file? is it for trains
manual execution or the agent
?
Hi @<1523701132025663488:profile|SlimyElephant79>
I would like to save only the last & best checkpoints and not all of them if possible.
Basically it will mimic the local file system, so if you overwrite the local files it will overwrite the remote model.
You can also disable auto logging, and manually upload the models
In Task.init
pass auto_connect_frameworks
False for the specific framework
see:
[None](https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk/#automatic-lo...
Hmm that is odd, could it be you are changing the sys.path ?
(What I'm assuming is happening is that it detects the packages in the PYTHONPATH and for some reason the order is different so it finds the "system" package before the "venv" package, hence the incorrect version)
π It's working as expected for me...
That said I tested on Linux & pip,
Any specific req to test with? from the log I see this is conda on windows, are you using the base conda env or a venv inside conda?
is everything on the same network?
Are you using tensorboard or do you want to log directly to trains ?
Hi UpsetBlackbird87
This is an Optuna decision on how many concurrent tests to run simultaneously.
You limited it to 100, but remember Optuna does a Bayesian optimization process, where it decides on the best set of arguments based on the performance of the previous set, this means it will first try X trials, then decide on the next batch.
That said you can a pruner to Optuna specifying how it should start
https://optuna.readthedocs.io/en/v1.4.0/reference/pruners.html#optuna.pruners.Median...
I look forward to your response on Github.
Great, I would like to make this discussion a bit more open and accessible so GitHub is probably better
I'd like to start contributing to the project...
That will be awesome!
Okay, I think I lost you...
DilapidatedDucks58 you mean detect at which "iteration" the max value was reported, and then extract all the other metrics for that iteration ?
Sadly no π
(I mean you could quickly write a reader for TB and report it, but it is not built into the SDK)
I mean, can you install it with something like ?pip install git+
Basically the agent will install main repository, and any git submodules. But it cannot install multiple repositories, as the directory structure might be too much.
wdyt?
ThickFox50 I also have to point that there is a free hosted server here π https://app.community.clear.ml
Just one more question, do you have any idea about how I could change the x-axis label from "Iterations" to "Epochs"
You mean in the UI (i.e. just the title) ? or are you actually reporting iterations instead of epochs? and if so is this auto connected to tensorboard or is it reported manually ?
okay but still I want to take only a row of each artifact
What do you mean?
How do I get from the node to the task object?
pipeline_task = Task.get_task(task_id=Task.current_task().parent)
MuddySquid7 you mean you are creating them with TB ? or are you uploading them as debug images ?
Specifically in the ClearML UI, do you have it under "plots" tab or "debug samples" tab ?
GreasyPenguin14
In the process MyProcess other processes are created via a ProcessPoolExecutor.
Hmm that is interesting, the sub-process has an additional ProcessPoolExecutor inside it ?
GrittyKangaroo27 if you can help with reproducible code that will be great (or any insight on reproducing the issue)
quick video of the search not working
Thank you! this is very helpful, passing along to front-end guys π
and ctrl-f (of the browser) doesnβt work as lines below not loaded (even when you scroll it will remove the other lines not visible, so you canβt ctrl-f them)
Yeah, that's because they are added lazily
TenseOstrich47 this looks like elasticserach is out of space...
Hi PompousBeetle71 , Trains will log all the torch.save call, I'm assuming they do not actually use it for the rest of the files on that folder.
If you like to share a code snippet we could see if we could auto-magically log it You could use artifacts and store the entire folder. It will zip it an upload it. Then you can reuse it from other experiments. https://allegro.ai/docs/task.html?highlight=artifact#trains.task.Task.upload_artifact
Example:
` task.upload_artifact('transformer', './my_...
Hi SkinnyPanda43
Every "commit" is a new version, so sync changes you need to either create a new version (with parent version as the previous one), and sync the local folder (or manually add/remove files).
If you do not need to actually store the "current" version, you can just reset the Task, and sync it again.
wdyt?
GrievingTurkey78 did you open the 8008 / 8080 / 8081 ports on your GCP instance (I have to admit I can't remember where exactly in the admin panel you do that, but I can assure you it is there :)
SarcasticSquirrel56 when the process dies (i.e. killed) it does not have time not update the state, then the server watchdog will set the state to aborted after X amount of time of inactivity (default is 2 hours)
Hmm I tested on chromium and it seemed to work, let me see if I can reproduce it...
You can however change the prefix, and you can always have access to these links.
Any reason for controlling the exact output destination ?
(BTW: You can manually upload via StorageManager, and then register the uploaded link)