Hi SillySealion58
"keep N best checkpoints" logic in my training loop.
If this is the usecase, may I suggest overwriting them locally? (the same will happen on the remote storage) This is exactly how the lightning / ignite feature is implemented
BTW: you should probably update the server, you're missing out on a lot of cool features 🙂
DilapidatedDucks58 use a full link , without the package namegit+
. I was wondering what is the use ofÂ
PipelineController.create_draft
 if you can't use it to clone and run tasks, as we have seen
I think the initial thought was to allow to create a pipeline from a pipeline programatically. Then once you have the "pipeline" you can manually enqueue it and modify it. Think a pipeline constructing other pipelines in flight based on some logic, then launching them in parallel.
make sense ?
Hi SmugTurtle78
Unfortunately there is no actual filtering for these logs, because they are so important for debugging and visibility. I have to ask, what's the use case to remove some of them ?
VirtuousFish83 I can confirm clearml-server 1.3 solves the issue.
And maybe adding idle time spent without a job to API is not that a bad idea 😉
yes, adding that to the feature list 🙂
What if I write the last active state in an instance tag? This could be a solution…
I love this hack, yes this should just work.
BTW: if you lambda is a for loop that is constantly checking there is no need to actually store "last idle timestamp check as tag", no?
I found the issue, the first run it jumps over the first day (let me check if we can quickly fix that)
JitteryCoyote63 could you test the latest RC 😉pip install clearml-agent==0.17.2rc4
Hi PanickyMoth78
dataset name is ignored if
use_current_task=True
Kind of, it stores the Dataset on the Task itself (then dataset.name becomes the Task name), actually we should probably deprecate this feature, I think this is too confusing?!
What was the use case for using it ?
What if I register the artifact manually?
task.upload_artifact('local folder', artifact_object='
')
This one should be quite quick, it's updating the experiment
ElegantKangaroo44 my bad 😞 I missed the nuance in the description
There seems to be an issue in the web ui -> viewing plots in "view in experiment table" doesn't respect the "scalars to display" one sets when viewing in "view in fullscreen".
Yes the info-panel does not respect the full view selection, It's on the to do list to add this ability, but it is still no implemented...
I ended up using
task = Task.init(
continue_last_task
=task_id)
to reload a specific task and it seems to work well so far.
Exactly, this will initialize and auto log the current process into existing task (task_id). Without the argument continue_last_task ` it will just create a new Task and auto log everything to it 🙂
when you are running the n+1 epoch you get the 2*n+1 reported
RipeGoose2 like twice the gap, i.e internally it adds the an offset of the last iteration... is this easily reproducible ?
I'd prefer to use config_dict, I think it's cleaner
I'm definitely with you
Good news:
newÂ
best_model
 is saved, add a tagÂ
best
,
Already supported, (you just can't see the tag, but it is there :))
My question is, what do you think would be the easiest interface to tell (post/pre) store, tag/mark this model as best so far (btw, obviously if we know it's not good, why do we bother to store it in the first place...)
OutrageousSheep60
I found the task in the UI -
and in the
UNCOMMITTED CHANGES
execution section there is
No changes logged
This is the issue.
and then run the
session
via docker
clearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 \ --packages "clearml" "tensorflow>=2.2" "keras" \ --queue MY_QUEUE \ --verbose
Are you running the "cleamrl-session" from your machine? (i.e. not from inside a docker) ?...
SubstantialElk6 I know they have full permission control in the enterprise edition, if this is something you need I suggest you contact http://allegro.ai 🙂
Hi @<1533257278776414208:profile|SuperiorCockroach75>
ModuleNotFoundError: No module named 'statsmodels'
seems like this package is missing from the Task
wither import it manually import statsmodels
(so the automagic logs it)
Or add before task init:
Task.add_requirements("statsmodels")
task = Task.init(...)
ps: no need to @ so many people ...
SmarmyDolphin68 What's the matplotlib version ? and python version?
JitteryCoyote63 Not sure how/why the X-Pack feature was on (it is not used by the system), but you can disable it with an environment variable in the docker-composexpack.security.enabled=false
Should solve the problem ...
And when exactly are you getting the "user aborted" message)?
How do you start the process (are you manually running it, or is it an agent, or maybe pycharm?)
Can you provide the full log ?
Hi @<1523701304709353472:profile|OddShrimp85>
Do you mean Dataset.get_local_copy()
?
Hi @<1635088270469632000:profile|LividReindeer58>
You mean the clearml.conf?
You can do:
from clearml.config import config_obj
you should have the entire configuration file as an object (dict interface)
fyi: under the hood it uses pyHOCON
Check the examples on the github page, I think this is what you are looking for 🙂
https://github.com/allegroai/trains-agent#running-the-trains-agent
Hmm GreasyLeopard35 can you specify the range you are passing to the HPO, as well as the type of optimization class ? (grid/random/optuna etc.)
Are you suggesting the default "ubuntu:18.04" is somehow contaminated ?
This is an official Ubuntu container (nothing to do with ClearML), this is Very Very odd...
trains-agent should be deployed to GPU instances, not the trains-server.
The trains-agent purpose is for you to be able to send jobs to a GPU (at least in most cases) instance.
The "trains-server" is a control plane , basically telling the agent what to run (by storing the execution queue and tasks). Make sense ?
Hi PanickyMoth78
it was uploading fine for most of the day but now it is not uploading metrics and at the end
Where are you uploading metrics to (i.e. where is the clearml-server) ?
Are you seeing any retry logging on your console ?packages/clearml/backend_interface/metrics/reporter.py", line 124, in wait_for_events
This seems to be consistent with waiting for metrics to be flushed to the backend, but usually you will see retry messages on your console when that happens
Hi FunnyTurkey96
what's the clearml server you are using ?