If you have the check point (see output_uri for automatically uploading it) then you can always load it. Do you mean if you can change the iteration/ step counter? Or do you mean with trains-agent?
Actually this is by default for any multi node training framework torch DDP / openmpi etc.
to setup ClearML agent in kubernetes with the SSH keys?
You can add env variable:CLEARML_AGENT__AGENT__FORCE_GIT_SSH_PROTOCOL="true"https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#dynamic-environment-variables
Hi WittyOwl57
I'm guessing clearml is trying to unify the histograms for each iteration, but the result is in this case not useful.
I think you are correct, the TB histograms are actually a 3d histograms (i.e. 2d histograms over time, which would be the default for kernel;/bias etc.)
is there a way to ungroup the result by iteration, and, is it possible to group it by something else (e.g. the tags of the two plots displayed below side by side).
Can you provide a toy example...
Hi JumpyPig73
import data from old experiments into the dashboard.
what do you mean by "old experiments" ?
Hmm EmbarrassedPeacock82
Let's try with--input-size -1 60 1 --aux-config input.format=FORMAT_NCHWBTW: this seems like a triton LSTM configuration issue, we might want to move the discussion to the Triton server issue, wdyt?
Optional[Sequence[Union[str, Dataset]]]None, list of string or list of Datasets objects
(each one is a parent (supporting multiple parents)
No Task.create is for creating an external Task not logging your own process,
That said you can probably override the git repo with env vars:
None
Okay we have located the issue, thanks guys! We will push a patch release hopefully later today
Here you go:
` @PipelineDecorator.pipeline(name='training', project='kgraph', version='1.2')
def pipeline(...):
return
if name == 'main':
Task.force_requirements_env_freeze(requirements_file="./requirements.txt")
pipeline(...) If you need anything for the pipeline component you can do: @PipelineDecorator.component(packages="./requirements.txt")
def step(data):
some stuff `
I cannot reproduce, tested with the same matplotlib version and python against the community server
Shouldn't this be a real value and not a template
you mean value being pulled to the pod that failed ?
Hi @<1688721797135994880:profile|ThoughtfulPeacock83>
the configuration vault parameters of a pipeline step with the add_function_step method?
The configuration vault are a per set at execution user/project/company .
What would be the value you need to override ? and what is the use case?
QuaintJellyfish58 this is very odd, and the "undefined" is always marked as example?
If the same happens in venv mode, see if pip process actually finished (you can find it with ps -Af | grep pip )
Hi FiercePenguin76
Artifacts are as you mentioned, you can create as many as you like but at the end , there is no "versioning" on top , it can be easily used this way with name+counter.
Contrary to that, Models do offer to create multiple entries with the same name and version is implied by order. Wdyt?
Hmm this is odd in deed, let me verify (thanks! @<1643060801088524288:profile|HarebrainedOstrich43> )
One more question, in the second log, trains agent is configured with Conda, on the first it is configured with pip, or at least this is what it looks like, can you confirm?
WickedGoat98 Basically you have two options:
Build a docker image with wget installed, then in the UI specify this image as "Base Docker Image" Configure the trains.conf file on the machine running the trains-agent, with the above script. This will cause trains-agent to install wget on any container it is running, so it is available for you to use (saving you the trouble of building your own container).With any of these two, by the time your code is executed, wget is installed an...
how I can turn off git diff uploading?
Sure, see here
None
yep, that's the reason it is failing, how did you train the model itself ?
CurvedHedgehog15 is it plots or scalars you are after ?
Thank you so much @<1572395184505753600:profile|GleamingSeagull15> !
looks like your
faq.clear.ml
site is missing from your main sites sitemap files,
Thank you for noticing! I'll check with the webdevs
Also missing the
robots
meta tag on that site,
🙏
Last tip is to add a link on the
faq.clear.ml
site back to
clear.ml
for search index relevancy ( connects the two sites as being related in content...
Firstly, thank you for your efforts and your support.
Thanks SmugOx94 !
Are you running trains-agent in docker mode? The aforementioned scripts are executed before, the experiment is being cloned, they are meant to be a part of the docker setup, not a per experiment script.
You could try to edit the experiment and have:
Working Directory: "."
(that means the root of the repository)
Script Path: "experiments_that_uses_library/train.py"
This will make sure you can do "import l...
there is almost zero overhead if your docker container alreadyt has everything (including the agent) preinstalled and you set it with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
it then should basically just run the code.
Hi @<1631102016807768064:profile|ZanySealion18>
ClearML doesn't pick up model checkpoints automatically.
What's the framework you are using?
BTW:
Task.add_requirements("requirements.txt")
if you want to specify Just your requirements.txt, do not use add_requirements use:
Task.force_requirements_env_freeze(requirements_file="requirements.txt")
(add requirements with a filename does the same thing, but this is more readable)