Hi MistakenDragonfly51
Is it possible to use it without using the clearml agent system?
Yes it is, which would mean everything is executed locally
basically:an_optimizer.start_locally()instead of this line
https://github.com/allegroai/clearml/blob/51af6e833ddc5a8ba1efaaf75980f58616b25e85/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L121
You can make reports on experiments with interactive graphs
Yes, I can totally see how this is a selling point. The closest is the Project Overview (full markdown capabilities, with the ability to embed links to specific experiments). You can also add a "leader metric", so you can track the project performance/progress over time.
I have to admit that creating a better reporting tool is always pushed down in priority as I think this is a good selling point to management but the actual ...
The agent is installing the "Installed Paclages" section of the Task (think of it as requirements.txt)
And again, what do you have there? Is it the outcome of the Task.init auto populating it?
Can you please elaborate on the latter point? My jupyterhub’s fully containerized and allows users to select their own containers (from a list i built) at launch, and launch multiple containers at the same time, not sure I follow how toes are stepped on. (edited)
Definitely a great start, usually it breaks on memory / GPU-mem where too many containers on the same machine are eating each others GPU ram (that cannot be virtualized)
GiddyTurkey39 Okay, can I assume "Installed packages" contains the packages you need?
If so, you can setup trains-agent on a machine (see instructions on the github)
And then clone the experiment, and enqueue it into the "default" queue (or any other queue your agent is connected to)
https://github.com/allegroai/trains-agent
No by definition the agent will only execute one Task at a time, you can spin a second agent on the same GPU :)
I would ideally just want to have NVIDIA drivers and Docker on the on-prem nodes (along with the clearML agents). Would that allow me to get by with basic job scheduling/queues through clearML?
Yes this is fully supported and very easy to setup.
Regrading limiting users usage. This is doable, I think the easiest solution both for users and management of the cluster is introducing priority into the queue, basically a user can push job into low priority, and only some users can push into high...
DistressedGoat23 you are correct, since at the end this become a plotly object the extra_layout is for general purpose layout, but this specific entry is next to the data. Bottom line, can you open a github issue, so we do not forget to fix? In the mean time you can use the general plotly reporting as SweetBadger76 suggested
can i run it on an agent that doesn't have gpu?
Sure this is fully supported
when i run clearml-serving it throughs me an error "please provide specific config.pbtxt definion"
Yes this is a small file that tells the Triton server how load the model:
Here is an example:
https://github.com/triton-inference-server/server/blob/main/docs/examples/model_repository/inception_graphdef/config.pbtxt
Is it vanilla pytorch ?
Is there any way to get just one dataset folder of a Dataset? e.g. only "train" or only "dev"?
They are usually stored in the same "zip" so basically you have to download both folders anyhow, but I guess if this saves space we could add this functionality, wdyt?
It manages the scheduling process, so no need to package your code, or worry about building dockers etc. It also has an AWS autoscaler, that spins ec2 instances based on the amount of jobs you have in the execution queue, and the limit of your budget (obviously spinning down machines that are idle)
Awesome! any way to hear the talk w/o/ registering for the whole conference?
CloudySwallow27 Anyway we will make sure we upload the talk to the clearml youtube channel after the Talk
Hi @<1653207659978952704:profile|LovelyStork78>
I have a docker container with all the dependencies.
Well I think the main question is are you using the clearml-agent to launch jobs/experiments? If you do it makes sense to specify your docker as "base docker image" (in the UI look for under the Execution tab, Container).
This means the agent will use the pre-installed environment and will add anything that your Task needs on top of it, this of course includes pushing your codebase i...
Hi HarebrainedBear62
What's the type of data ?
In case of scalars it is easy to see (maximum number of iterations is a good starting point
DilapidatedDucks58 by default if you continue to execution, it will automatically continue reporting from the last iteration . I think this is what you are seeing
Hi
, It works if I dont specify the project name and just give the task name
But now it searches for it globally , which is not very stable:
Let me check why it fails to find the project...
now, I need to pass a variable to the Preprocess class
you mean for the construction ?
model upload and registration i should pass something like
'xgboost': False
or
'xgboost': False, 'scikit': False
?
Exactly! which framework are you using ?
about 2, I refer to the names of the models.
Hmm that is a good point to test, usually this is based on the Task name (I think), so if the Task name contains the HPO params in the name it should be the same on the model name. Do you see the HPO params on the Task name ? Should we open a Gi...
No, I mean actually compare using the UI, maybe the arguments are different or the "installed packages"
pip cache & git cache & venvs cache
Are all supported, you just need to map the folders.
If you do not want to spin a PVC with NFS mount, you can just mount an S3 bucket with s3fs as part of the container extra bash script,
https://github.com/allegroai/clearml-agent/blob/b39b54bbafab39e6731cb742fdf317bc6dcae54a/docs/clearml.conf#L140
s3 FUSE fuse filesystems:
https://github.com/kahing/goofys
https://github.com/s3fs-fuse/s3fs-fuse
WDYT?
Why can't it be updated after creation?
You can but then you have to rerun it again. I mean technically this is obviously solvable, but the idea was to make it simple to use, and since we "assume" in most cases there is a single Task per execution, it made sense. wdyt?
I double checked with the guys this issue was fixed in 1.14 (of clearml server). It should be released tomorrow / weekend
HelplessCrocodile8
Basically the file URI might be different on a different machine (out of my control) but they point to the same artifact storage location
We might have thought of that...
in your clearml.conf file:
` sdk{
storage {
path_substitution = [
# Replace registered links with local prefixes,
# Solve mapping issues, and allow for external resource caching.
{
registered_prefix = file:///mnt/data/...
maybe you can check also
--version
that returns the helm menu
What do you mean? --version on cleaml-task ?