Reputation
Badges 1
25 × Eureka!Hi HelpfulHare30
I mean situations when training is long and its parts can be parallelized in some way like in Spark or Dask
Yes that makes sense, with both the function we are paralleling usually bottle-necked in both data & cpu, and both frameworks try to split & stream the data.
ClearML does not do data split & stream, but what you can do is launch multiple Tasks from a single "controller" and collect the results. I think that one of the main differences is that a ClearML Task is ...
after generating a fresh set of keys
when you have a new set, copy paste them idirectly into the 'cleaml.conf' (should be at the top, can't miss it)
maybe you can check alsoΒ
--version
Β that returns the helm menu
What do you mean? --version on cleaml-task ?
Hi StrangePelican34 , you mean poetry as package manager of the agent? The venvs cache will only work for pip and conda, poetry handles everything internally:(
Then this is by default the free space on the home folder (`~/.clearml') that is missing free space
quick update, still trying to reproduce ...
HealthyStarfish45 you mean as in RestAPI ?
orchestration module
When you previously mention clone the Task I the UI and then run it, how do you actually run it?
regarding the exception stack
It's pointing to a stdout that was closed?! How could that be? Any chance you can provide a toy example for us to debug?
I also found that you should have a deterministic ordering
before
you apply a fixed seed
Not sure I follow ?
Hi PompousBeetle71 , this actually fits with other feedback we received.
And for that reason it is already being worked on! π
I have a few questions as we are designing the new interface.
I think our biggest question was, are projects like folders?
That is: I can have experiments in a project, but also sub-projects?
Or parent projects are a way to introduce hierarchy into the mess, which means a project has either experiments in it, or sub-projects, but not both
(obviously in both cases...
Yes you can drag it in the UI :) it's a new feature in v1
*Actually looking at the code, when you call Task.create(...) it will always store the diff from the remote server.
Could that be the issue?
To edit the Task's diff:task.update_task(dict(script=dict(diff='DIFF TEXT HERE')))
Task.create
will create a new Task (and return an object) but it does not do any auto-magic (like logging the console, tensorboard etc.)
oh dear ...
ScrawnyLion96 let me check with front-end guys π
and this
server_info['url'] = f"http://{server_info['hostname']}:{server_info['port']}/{server_info['base_url']}/"
Hmm you will have to set the trains-server on a machine somewhere, it can be any machine win / Mac / Linux
SweetGiraffe8
That might be it, could you test with the Demo server ?
Hi RipeGoose2
when creating a task the default path is still there
What do you mean by "PATH" do you want to provide path for the config file? is it for trains
manual execution or the agent
?
I guess I got confused since the color choices in
One of the most beloved features we added π
but we run everything in docker containers. Will it still help?
As long as you are running with clearml-agent(in docker mode), all the cache folders (this one included) are mounted on the host machine for persistency
How does ClearML select reference branch? Could it be that ClearML only checks "origin" branch?
Yes π I think we can quickly fix that, I'm just trying to realize if there are down sides to running "git ls-remote --get-url" without origin
I tested and I have no more warning messages
if self._active_gpus and i not in self._active_gpus: continue
This solved it?
If so, PR pretty please π
Is there a way to do this all elegantly?
Of yes there is, this is how TaskB code will look:
` task = Task.init(..., 'task b')
param = {'TaskA' :'TaskAs ID HERE'}
task.connect(param)
taska_model = Task.get_task(param['TaskA']).models['output''][-1]
torch.load(taska_model.get_local_copy())
train
torch.save('modelb') `I might have missed something there, but generally speaking this will let you:
Select TASKA as a parameter of TaskB training process Will register automagically Tasks'A...
Yes this is a misleading title
And I think the default is 100 entries, so it should not get cleaned.
and then they are all removed and for a particular task it even happens before my task is done
Is this reproducible ? Who is cleaning it and when?
I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)
That is a good question, usually the cuda version is automatically detected, unless you overrride it with the conf file or OS env. What's the setup? Are you using as package manager ? (conda actually installs CUDA drivers, if the original Task was executed on a machine with conda, it will take the CUDA version automatically, reason is to match the CUDA/Torch/TF)