Reputation
Badges 1
25 × Eureka!Okay, I'll pass to front-end, see what they can do about it.
Hmm yes we should probably provide metrics:client.workers.get_stats(..., items=[dict(key='cpu_usage'), dict(key='gpu_usage')])
It is recommended to create a git TOKEN with read only permissions and use it (more secure) π
My current experience is there is only print out in the console but no training graph
Yes Nvidia TLT needs to actually use tensorboard for clearml to catch it and display it.
I think that in the latest version they added that. TimelyPenguin76 might know more
@<1523703080200179712:profile|NastySeahorse61> / @<1523702868694011904:profile|AbruptCow41>
Is there a way to avoid each task to create a new environment?
You can just define CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 it will just use whatever you have there (notice it will totally ignore requirements.txt and "installed packages" on the Task)
BTW I would recommend turning on the venv caching, this is per docker/python/packages caching so the next time you are using th exact requi...
Oh, so is it a bug and you should have seen two series on each graph? (I think it is... not sure how to actually name the second instance other than running number)
is there a built in programmatic way to adjustΒ
development.default_output_uri
?
How about: In your Task.init(output_uri='...')
Is is across the board for any Task ?
What would you expect to happen if you clone a Task that used the requirements.txt, would you ignore the full "pip freeze" and use the requirements .txt again, or is this thime we want to use the "installed packages" ?
Just call the Task.init before you create the subprocess, that's it π they will all automatically log to the same Task. You can also call the Task.init again from within the subprocess task, it will not create a new experiment but use the main process experiment.
what's the error/reply ?
PleasantGiraffe85 can you send examples of the different git repo links (one internal one public) ?
kubectl get pods -n {namespace} -o=JSONWhat are you getting when running the above on your cluster ?
can you get the agent to execute the task on the current conda env without setting up new environment?
Wouldn't that break easily ? Is this a way to avoid dockers, or a specific use case ?
is there any other way to get task from the queue running locally in the current conda env?
You mean including cloning the code etc. but not installing any python packages ?
... grab the model artifacts for each, put them into the parent HPO model as its artifacts, and then go through the archive everything.
Nice. wouldn't it make more sense to "store" a link to the "winning" experiment. So you know how to reproduce it, and the set of HP that were chosen?
No that the model is bad, but how would I know how to reproduce it, or retrain when I have more data etc..
task.wait_for_status() task.reload() task.artifacts["output"].get()
Thanks RipeGoose2 !
clearml logging starts from n+n (thats how it seems) for non explicit
I have to say it looks like the expected behavior , I think.
Basically matching the TB, no?
I think the limit is a few GB, I'm not sure, I'll have to check
And yes the oldest experiments will be deleted first (with the exception of published experiments, they will be deleted last)
Hi RipeGoose2
Yes, the "services-mode" of an agent will take multiple Tasks, that said, these are "service" i.e. light CPU tasks, think pipeline controllers etc.
It also seems that
PipelineDecorator.upload_artifact
is not compatible with caching, sadly,
Both use the exact same mechanism of uploading artifacts (i.e. including caching for downloaded artifacts), in terms of caching pipeline components, this is on a component level (i.e. same code/task same arguments, equals cache hit)
What exactly are you getting ? how is it that the "PipelineDecorator.upload_artifact" uploads to a different storage ? is that reproducible ?
These are both specific cases of the glue, and yes both need to be fixed.
(1) I think is actually a feature, nonetheless we should support it.
FriendlySquid61 could you verify specifically on (2)
Hi RotundHedgehog76
Notice that the "queued" is on the state of the Task, as well as the the tag
We tried to enqueue the stopped task at the particular queue and we added the particular tagWhat do you mean by specific queue ? this will trigger on any Queued Task with the 'particular-tag' ?
This means it will Always authenticate with SSH force_git_ssh_protocol ...
But it seems you need mixed behavior ?
Are you using github as git provider ?
How about this one:
None
I want to store only my raw data in my blob storage, and I want to create a Hyperdataset with all the artificats, metrics, frames,
Yes that's exactly how it works.
None
This line adds a reference to raw file (local/remote)
[https://github.com/allegroai/clearml/blob/1b474dc0b057b69c76bc2daa9eb8be927cb25efa[β¦]es/hyperdatasets/data-registration/register_dataset_wit...
It has to be alive so all the "child nodes" could report to it....
Very lacking wrt to how things interact with one another
If I'm reading it correctly, what you are saying is that some of the "big picture" / holistic approach on how different parts interact with one another is missing, is that correct?
I think ClearML would benefit itself a lot if it adopted a documentation structure similar to numpy ecosystem
Interesting thought, what exactly would you suggest we "borrow" in terms of approach?