Reputation
Badges 1
25 × Eureka!The driver script (the one initializes models and initializes a training sequence) was not at git repo and besides that one, everything is.
Yes there is an issue when you have both git repo and totally uncommitted file, since clearml can store either standalone script or a git repository, the mix of the two is not actually supported. Does that make sense ?
The image is
allegroai/clearml:1.0.2-108
Yep, that makes sense, seems like a backwards compatibility issue
ShaggyHare67 in the HPO the learning should be (based on the above):General/training_config/optimizer_params/Adam/learning_rate
Notice the "General" prefix (notice it is case sensitive)
Hi @<1545216070686609408:profile|EnthusiasticCow4>
hmm this seems odd, and definitely looks like a bug, please report on GH π
Could you give an example of such configurations ?
(e.g. what would be diff from one to another)
SourOx12
Hmmm. So if last iteration was 75, the next iteration (after we continue) will be 150 ?
It is available of course, but I think you have to have clearmls-server 1.9+
Which version are you running ?
I could take a look and figure that out.
This will greatly accelerate integration π
BTW: StickyMonkey98 if you feel like writing a few examples I think it will be easy to push into the docs, so that at least we improve iteratively...
Hi GrievingTurkey78 yes, /opt/clearml should contain everything.
That said, backup only after you spin down the DBs so they serialize everything,
I would like to be able to send a request to unload the model (because I cannot load all the models in gpu, only 7-8) o
Hmm is this part of the gRPC interface of Triton? if it is, we should be able to add that quite easily,
Hi FunnyTurkey96
Any chance you can try to run with the latest form GitHub (i just tested your code and it seemed to work on my machine).pip install git+
If nothing specific comes to mind i can try to create some reproducible demo code (after holiday vacation)
Yes please! π
In the mean time see if the workaround is a valid one
About .get_local_copy... would that then work in the agent though?
Yes it would work both locally (i.e. without agent) and remotely
Because I understand that there might not be a local copy in the Agent?
If the file does not exist locally it will be downloaded and cached for you
Hi @<1704304350400090112:profile|UpsetOctopus60>
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_kubernetes_helm
Just use the helm charts. It's the easiest
Hmm, is there a way to do this via code?
Yes, clone the Task Task.clone
Then do data=task.export_task()
and edit the data object (see execution section)
Then update back with task.update_task(data)
The issue I want to avoid is aborting of the dataset task that these regular tasks update.
HelpfulHare30 could you post a pseudo code of the dataset update ?
(My point is, I'm not sure the Dataset actually supports updating, as it need to reupload the previous delta snapshot). Wouldn't it be easier to add another child dataset and then use dataset.squash (like one would do in git) ?
Hi @<1610808279263350784:profile|FriendlyShrimp96>
Is there a way to get a list of variants given a metric, or even just a full list of metrics and variants for a given task id?
Try this
None
from clearml.backend_api.session.client import APIClient
c = APIClient()
metrics = c.events.get_task_metrics(tasks=["TASK_ID_HERE"], event_type="training_debug_image")
print(metrics)
I think API ...
They don't give an in app notification.
Oh I see, I assume this is because the github account is not connected with any email, so no invite is sent.
Basically they should just be able to re-login and then they could switch to your workspace (with the link you generated)
Hi DefeatedCrab47
You should be able to change the Web server port , but API port (8008) cannot be changed. If you can login to the web app and create a project it means everything is okay. Notice that when you configure trains ( trains-init
) the port numbers are correct π
Hi @<1569858449813016576:profile|JumpyRaven4> could you test the fix? just pull & run
allegroai/clearml-serving-triton:1.3.1
allegroai/clearml-serving-inference:1.3.1
I assume the task is being launched sequentially. I'm going to prepare a more elaborate example to see what happens.
Let me know if you can produce a mock test, I would love to make sure we support the use case, this is a great example of using pipeline logic π
WackyRabbit7
Cool - so that means the fileserver which comes with the host will stay emtpy? Or is there anything else being stored there?
Debug Images and artifacts will be automatically stored to the file server.
If you want your models to be automagically uploaded add the following :task=Task.init('example', 'experiment', output_uri='
')
(You can obviously point it to any other http/S3/GS/Azure storage)
Hi PungentLouse55
Hope you are not tired of me
Lol π No worries
I am using trains 0.16.1
Are you referring to the trains-server version or the python package ? (they are not the same and can be of totally different versions)
OHH nice, I thought that it just some kind of job queue on up and running machines
It's much more than that, it's a way of life π
But seriously now, it allows you to use any machine as part of your cluster, and send jobs for execution from the web UI (any machine, even just a standalong GPU machine under your desk, or any cloud GPU instance any mixing the two together:)
Maybe I need to change something here:Β
apiserver.conf
Not sure, I'm still waiting on answer... It...
This is odd, Can you send the full Task log? (remove any pass/user/repo that you think is sensitive)
Regarding the agentΒ - No particular reason. Can you point me on how to do it?
This is a good place to start
https://clear.ml/docs/latest/docs/getting_started/mlops/mlops_first_steps
We need the automagic...Β
This is one of the great benefits of using clearmlΒ
π
Sure, try this one:Task.debug_simulate_remote_task('reused_task_id') task = Task.init(...)
Notice it will take the arguments form the cleaml-task itself (e.g. override argparse arguments with what ...
Ohh, I see now, yes that should be fixed as well π
Nested in the UI is not possible I think?
Yes, but the next version will have nested projects, that's something π
I mean that it is possible to start the subtask while the main task is still active.
You cannot call another Task.init while a main one is running.
But you can call Task.create and log into it, that said the autologging is not supported on the newly created Task.
Maybe the easiest solution is just to do the "sub-tasks" and close them. That means the main Task i...
I think that what you need is to create an OutputModel , then call update weights file when you have the better model, this will also allow you to tag the model object. Would that help? Or would it make sense to use Task.models and count on the auto logging?