Sorry, I mean a vault on the clearml-server holding the credentials per user, then agent pulls it based on the user, and it is transparent from the user perspective
…every user in the server has the same credentials, and they don’t need to know them..makes sense?
Make sense, single credentials for everyone, without the need to distribute
Is that correct?
and i found our lab seems only have shared user file because i installed trains on one node, but it doesn’t appear on the others
Do you mean there is no shared filesystem among the different machines ?
Hi FierceFly22
Hi, does anyone know where trains stores tensorboard data
Tesnorboard data is stored wherever you point your file-writer to 🙂
What trains is doing is while tensorboard writes it's own data to disk, it takes the data (in-flight) and sends it to the trains-server. The trains-server puts everything in the DB, so later everything is viewable & searchable.
Basically you don't need to store your TB files after your experiment is done, you have all the data in the trains-s...
and you have clearml v0.17.2 installed on the "system" packages level, and 0.17.5rc6 installed inside the pyenv venv ?
Here you go:
` @PipelineDecorator.pipeline(name='training', project='kgraph', version='1.2')
def pipeline(...):
return
if name == 'main':
Task.force_requirements_env_freeze(requirements_file="./requirements.txt")
pipeline(...) If you need anything for the pipeline component you can do:
@PipelineDecorator.component(packages="./requirements.txt")
def step(data):
some stuff `
@<1545216070686609408:profile|EnthusiasticCow4>git+ssh://
will be converted automatically to git+https
if you have user/pass ocnfigured in your clearml.conf on the agent machine.
More over, git packages are always installed After all other packages are installed (because pip cannot resolve the requirements inside the git repo in time)
ahh, because task_id is the "real" id of a task
Yes the ID is a global system wide unique ID (regardless of the project etc.)
Maybe we will call tasks as
slug_yyyymmdd
Notice that you can just copy-paste the link in the address bar, it will bring you to the exact same view, meaning easily shared among users 🙂 You can, but I would actually use the Task ID. This also means that programatically you can do , task=Task,get_task(task_id_here)
and interact and query a...
Hi JuicyDog96
The easiest way at the moment (apologies for still lack of RestAPI documentation, it is coming:)
Is actually the code (full docstring doc)
https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8
You can access it all with an easy Pythonic interface, for example:from trains.backend_api.session.client import APIClient client = APIClient() tasks = client.tasks.get_all()
I mean you can run it with kubeflow, but it kind of ruins the auto detection there
You can however clone and manually edit it back to your code, that would work
With remote_execution it is
command="[...]"
, but on local it is
command='train'
like it is supposed to be.
I'm not sure I follow, could you expand ?
Yes, but does add_external_files makes chunked zips as add_files do?
No it references them, (i.e. meta-data not actually doing something with the files themselves)
I need the zipping, chunking to manage millions of files
That makes sens, if that's the case you will have to download those files anyway, and then add them with add_files
you can use the StoargeManager to download them, and then add them from the local copy (this will zip/chunk them)
[None](https://clear.ml/docs/la...
Hi @<1523702786867335168:profile|AdventurousButterfly15>
Make sure you pass output_uri=true in Task.init
It will automatically upload your model to the file server. You can also configure it in the clearml.conf, look for defualt_output_uri
Hi SlimyElephant79
As you can imagine, wandb's tracking code would be present across the code modules and I was hoping for a structured approach that would help me transition to ClearMLs experiment tracking.
Do you guys a have a layer in between that does the reporting, or is the codebase riddled with direct reporting calls ? if the latter, then I guess search and replace ? or maybe a module that "converts" wandb call to clearml call ? wdyt?
it is shown in the recording above
It was so odd, I had to ask 🙂 okay let me see if we can reproduce
I don’t have any error message in the browser console - Just an empty array returned on events.get_task_logs. This bug didn’t exist on version 1.1.0 and is quite annoying…
meaning the RestAPI returns nothing, is that correct ?
I think the real issue is that I am not able to specify a platform for the model,
None
there is no need to specify it, remove it from the config.pbtxt - the clearml-serving will automatically add the background
. That speed depends on model sizes, right?
in general yes
Hope that makes sense. This would not work under heavy loads, but eg we have models used once a week only. They would just stay unloaded until use - and could be offloaded afterwards.
but then you still might encounter timeout the first time you access them, no?
Notice you have configure the shared driver for the docker, as the volume mount doesn't work without it. https://stackoverflow.com/a/61850413
Hi PungentLouse55 ,
I think can see how these magic lines solved it, and I think you are onto something.
Any chance what happened is multiple workers were trying to simultaneously save/load the same Model ?
Any chance your code needs more than the main script, but it is Not in a git repo? Because the agent supports either single script file, or a git repo with multiple files
BTW: for future reference, if you set the ulimit in the bash, all processes created after that should have the new ulimit
DistressedGoat23 check this example:
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.pyaSearchStrategy = RandomSearch
It will collect everything on the main Task
This is a curial point for using clearml HPO since comparing dozens of experiments in the UI and searching for the best is just not manageable.
You can of course do that (notice you can actually order them by scalars they report, and even do ...
Hi @<1692345677285167104:profile|ThoughtfulKitten41>
Is it possible to trigger a pipeline run via API?
Yes! a pipeline is at the end a Task, you can take the pipeline ID and clone and enqueue it
pipeline_task = Task.clone("pipeline_id_here")
Task.enqueue(pipeline_task, queue_name="services")
You can also monitor the pipeline with the same Task inyerface.
wdyt?
but this would be still part of the clearml.conf right?
You can pass it per Task , also you can configure the agent to always pass it add this env.
https://github.com/allegroai/clearml-agent/blob/5a080798cb4292e198948fbe16cba70136cb6bdf/docs/clearml.conf#L137